Hi,
Could you please tell me which of the three timer processes ("timer", "slow timer" or "timer nh") is responsible for cleaning up the internal usrloc cache?
Looks like every now and then the cleanup of the internal location cache is starting to fail. Funny thing is that expired locations are removed from the mysql backend, but not from the internal cache. We're running kamailio 3.1.5, are there any known issues fixed since that version?
In the meanwhile we're trying to pin the issue down, but maybe someone has a clue...
Andreas
Got the cause of the issue.
What happens is that there's an AOR which registers ever 120 seconds. For some reason, the location entry is in usrloc cache, but not in db. What happens now is that usrloc tries an "update" query in the db, because it still assumes that the entry is there, which obviously fails.
If you remove the entry from usrloc (kamctl ul rm <aor>), then on the next re-registration it's both inserted into the cache and into the db.
Wondering how it could happen to get out of sync, and how we could improve this. Maybe using a "replace into" instead of "update", at least for mysql? Suggestions?
Andreas
On 12/22/2011 05:12 PM, Andreas Granig wrote:
Hi,
Could you please tell me which of the three timer processes ("timer", "slow timer" or "timer nh") is responsible for cleaning up the internal usrloc cache?
Looks like every now and then the cleanup of the internal location cache is starting to fail. Funny thing is that expired locations are removed from the mysql backend, but not from the internal cache. We're running kamailio 3.1.5, are there any known issues fixed since that version?
In the meanwhile we're trying to pin the issue down, but maybe someone has a clue...
Andreas
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
Hello,
On 12/22/11 6:19 PM, Andreas Granig wrote:
Got the cause of the issue.
What happens is that there's an AOR which registers ever 120 seconds. For some reason, the location entry is in usrloc cache, but not in db. What happens now is that usrloc tries an "update" query in the db, because it still assumes that the entry is there, which obviously fails.
If you remove the entry from usrloc (kamctl ul rm<aor>), then on the next re-registration it's both inserted into the cache and into the db.
Wondering how it could happen to get out of sync, and how we could improve this. Maybe using a "replace into" instead of "update", at least for mysql? Suggestions?
is the timer interval parameter of usrloc higher than 120sec?
http://kamailio.org/docs/modules/3.2.x/modules_k/usrloc.html#id2494575
IIRC, there should be anyhow a flag to mark if the record is in db or not, and based on that do insert or update, maybe something is lost there. If you do 'kamctl ul show __aor__', what are the values for flags fields?
Another option, perhaps more portable, but with two db hits is: update and if fails then insert -- considering that these should be corner cases, maybe the performance is not affected much. A blended version is even better, if the db driver supports replace, do replace instead of update (I don't know if replace is faster/slower than update).
Cheers, Daniel
Andreas
On 12/22/2011 05:12 PM, Andreas Granig wrote:
Hi,
Could you please tell me which of the three timer processes ("timer", "slow timer" or "timer nh") is responsible for cleaning up the internal usrloc cache?
Looks like every now and then the cleanup of the internal location cache is starting to fail. Funny thing is that expired locations are removed from the mysql backend, but not from the internal cache. We're running kamailio 3.1.5, are there any known issues fixed since that version?
In the meanwhile we're trying to pin the issue down, but maybe someone has a clue...
Andreas
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
The replace solution will mask the real issue. The flag that is in the usrloc should switch between update or insert and that is the real fix.
Regards, Ovidiu Sas
-- VoIP Embedded, Inc.http://www.voipembedded.com
On Thu, Dec 22, 2011 at 12:58 PM, Daniel-Constantin Mierla miconda@gmail.com wrote:
Hello,
On 12/22/11 6:19 PM, Andreas Granig wrote:
Got the cause of the issue.
What happens is that there's an AOR which registers ever 120 seconds. For some reason, the location entry is in usrloc cache, but not in db. What happens now is that usrloc tries an "update" query in the db, because it still assumes that the entry is there, which obviously fails.
If you remove the entry from usrloc (kamctl ul rm <aor>), then on the next re-registration it's both inserted into the cache and into the db.
Wondering how it could happen to get out of sync, and how we could improve this. Maybe using a "replace into" instead of "update", at least for mysql? Suggestions?
is the timer interval parameter of usrloc higher than 120sec?
http://kamailio.org/docs/modules/3.2.x/modules_k/usrloc.html#id2494575
IIRC, there should be anyhow a flag to mark if the record is in db or not, and based on that do insert or update, maybe something is lost there. If you do 'kamctl ul show __aor__', what are the values for flags fields?
Another option, perhaps more portable, but with two db hits is: update and if fails then insert -- considering that these should be corner cases, maybe the performance is not affected much. A blended version is even better, if the db driver supports replace, do replace instead of update (I don't know if replace is faster/slower than update).
Cheers, Daniel
Andreas
On 12/22/2011 05:12 PM, Andreas Granig wrote:
Hi,
Could you please tell me which of the three timer processes ("timer", "slow timer" or "timer nh") is responsible for cleaning up the internal usrloc cache?
Looks like every now and then the cleanup of the internal location cache is starting to fail. Funny thing is that expired locations are removed from the mysql backend, but not from the internal cache. We're running kamailio 3.1.5, are there any known issues fixed since that version?
In the meanwhile we're trying to pin the issue down, but maybe someone has a clue...
Andreas
On 12/22/11 7:40 PM, Ovidiu Sas wrote:
The replace solution will mask the real issue. The flag that is in the usrloc should switch between update or insert and that is the real fix.
Right, that has to be done, but there are some cases when db can become inconsistent, due to database unavailability, and then some trick have to be done at db layer, example:
- db is unavailable, phone unregisters, contact deleted from memory but not from database - phone register again, usrloc will try insert and will fail - in this case it should be update if insert fails (or replace)
The other way around could happen when mistakenly deleting/changing records in db, which should not happen, but Murphy says opposite.
Cheers, Daniel
Regards, Ovidiu Sas
-- VoIP Embedded, Inc.http://www.voipembedded.com
On Thu, Dec 22, 2011 at 12:58 PM, Daniel-Constantin Mierla miconda@gmail.com wrote:
Hello,
On 12/22/11 6:19 PM, Andreas Granig wrote:
Got the cause of the issue.
What happens is that there's an AOR which registers ever 120 seconds. For some reason, the location entry is in usrloc cache, but not in db. What happens now is that usrloc tries an "update" query in the db, because it still assumes that the entry is there, which obviously fails.
If you remove the entry from usrloc (kamctl ul rm<aor>), then on the next re-registration it's both inserted into the cache and into the db.
Wondering how it could happen to get out of sync, and how we could improve this. Maybe using a "replace into" instead of "update", at least for mysql? Suggestions?
is the timer interval parameter of usrloc higher than 120sec?
http://kamailio.org/docs/modules/3.2.x/modules_k/usrloc.html#id2494575
IIRC, there should be anyhow a flag to mark if the record is in db or not, and based on that do insert or update, maybe something is lost there. If you do 'kamctl ul show __aor__', what are the values for flags fields?
Another option, perhaps more portable, but with two db hits is: update and if fails then insert -- considering that these should be corner cases, maybe the performance is not affected much. A blended version is even better, if the db driver supports replace, do replace instead of update (I don't know if replace is faster/slower than update).
Cheers, Daniel
Andreas
On 12/22/2011 05:12 PM, Andreas Granig wrote:
Hi,
Could you please tell me which of the three timer processes ("timer", "slow timer" or "timer nh") is responsible for cleaning up the internal usrloc cache?
Looks like every now and then the cleanup of the internal location cache is starting to fail. Funny thing is that expired locations are removed from the mysql backend, but not from the internal cache. We're running kamailio 3.1.5, are there any known issues fixed since that version?
In the meanwhile we're trying to pin the issue down, but maybe someone has a clue...
Andreas
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
Right, that has to be done, but there are some cases when db can become inconsistent, due to database unavailability, and then some trick have to be done at db layer, example:
- db is unavailable, phone unregisters, contact deleted from memory but not
from database
- phone register again, usrloc will try insert and will fail - in this case
it should be update if insert fails (or replace)
If the phone registers again, it should be a brand new entry (different Call-ID, CSeq and so on). The leftover entry on the db will be cleanup on a server restart.
The other way around could happen when mistakenly deleting/changing records in db, which should not happen, but Murphy says opposite.
If someone is messing with the db, kamailio shouldn't try to correct admin mistakes.
I think that the replace solution should be a last resort. If implemented, should be configurable. I would rather see the original issue instead of being masked and let it trigger other issues later on which would be more difficult to debug.
Regards, Ovidiu Sas
On 12/22/11 8:02 PM, Ovidiu Sas wrote:
Right, that has to be done, but there are some cases when db can become inconsistent, due to database unavailability, and then some trick have to be done at db layer, example:
- db is unavailable, phone unregisters, contact deleted from memory but not
from database
- phone register again, usrloc will try insert and will fail - in this case
it should be update if insert fails (or replace)
If the phone registers again, it should be a brand new entry (different Call-ID, CSeq and so on). The leftover entry on the db will be cleanup on a server restart.
According to sip specs, the key is the contact address, thus operations are done on (aor, contact). In kamailio you can configure to match using call id (plus path) as well, but is not the default since it breaks the specs.
The other way around could happen when mistakenly deleting/changing records in db, which should not happen, but Murphy says opposite.
If someone is messing with the db, kamailio shouldn't try to correct admin mistakes.
It was just an example, I haven't gone to all corner cases that can happen from human or (self or different) application errors. There were couple of similar reports in the past, related to conflicts of insert/update, update/insert, delete/update a.s.o. cases, so I proposed to go for a portable solution, not for one which valid to a db driver only -- configurable or not, is different thing than the specific topic.
I think that the replace solution should be a last resort. If implemented, should be configurable. I would rather see the original issue instead of being masked and let it trigger other issues later on which would be more difficult to debug.
I think the goal is to have coherent, consistent and persistent records in location table, so that lookup is always valid and restarts don't lose records. In no case you get any feedback implicitly, this is either about logging or malfunctioning. Perhaps is better to get a log message to investigate and have all keep going, than the second.
Cheers, Daniel
-- Daniel-Constantin Mierla -- http://www.asipto.com http://linkedin.com/in/miconda -- http://twitter.com/miconda
Hi,
On 12/23/2011 12:28 AM, Daniel-Constantin Mierla wrote:
If someone is messing with the db, kamailio shouldn't try to correct admin mistakes.
It was just an example, I haven't gone to all corner cases that can happen from human or (self or different) application errors. There were couple of similar reports in the past, related to conflicts of insert/update, update/insert, delete/update a.s.o. cases, so I proposed to go for a portable solution, not for one which valid to a db driver only -- configurable or not, is different thing than the specific topic.
Of course you can never rule out admin errors or application errors, but anyways I'll highly favor a more resilient approach.
I've tried to reproduce the most obvious scenario, which is registering a subscriber, delete it from the underlying db table, then re-register again. The expiry value in the cache is refreshed, but it's never written back to the db table. There are most likely other, more subtle scenarios, and our customers approved that they didn't mess with the db manually. The state was always CS_SYNC and Flags was 0.
I don't know the details of the srdb layer, but probably it's possible to find a way to return the "rows affected" after an update in order to know whether to try an insert afterwards. Would be possible with mysql, not sure about pgsql, oracle, dbtext etc. We'll take a look how we could tackle that.
I don't think that a log message would help very much, because kamailio won't know about the missing entry in the db (unless you evaluate the result of the update), at least in this particular case.
Andreas
o Andreas Granig on 12/23/2011 01:39 PM:
I don't know the details of the srdb layer, but probably it's possible to find a way to return the "rows affected" after an update in order to know whether to try an insert afterwards. Would be possible with mysql,
shouldn't the db layer and driver be smart enough to do insert ... on duplicate key update at least where it's supported?
stefan
On 12/23/2011 08:18 PM, Stefan Sayer wrote:
shouldn't the db layer and driver be smart enough to do insert ... on duplicate key update at least where it's supported?
my fear is that such "first insert then update" policy will affect the performance. can create noise in the log on some db backends too..
On 12/23/11 8:43 PM, Andrew Pogrebennyk wrote:
On 12/23/2011 08:18 PM, Stefan Sayer wrote:
shouldn't the db layer and driver be smart enough to do insert ... on duplicate key update at least where it's supported?
my fear is that such "first insert then update" policy will affect the performance. can create noise in the log on some db backends too..
this one is also a bit tricky to do, as it will require to change the database table definition depending on matching mode from registrar/usrloc. By RFC, the primary key per aor should be contact address for the location records. As we know that lot of phones are behind the nat, many users having same environment for home/work phones, kamailio can be configured to do the matching also with call id and path stack. Because of that, these checks are done inside the modules, there is no constraint at location table sql level.
Such approach as suggested in this discussion, will require to add proper unique keys depending on configuration from kamailio.cfg.
Cheers, Daniel
Hello,
On 12/23/11 1:39 PM, Andreas Granig wrote:
Hi,
On 12/23/2011 12:28 AM, Daniel-Constantin Mierla wrote:
If someone is messing with the db, kamailio shouldn't try to correct admin mistakes.
It was just an example, I haven't gone to all corner cases that can happen from human or (self or different) application errors. There were couple of similar reports in the past, related to conflicts of insert/update, update/insert, delete/update a.s.o. cases, so I proposed to go for a portable solution, not for one which valid to a db driver only -- configurable or not, is different thing than the specific topic.
Of course you can never rule out admin errors or application errors, but anyways I'll highly favor a more resilient approach.
I encountered several cases when things can go bad at database layer while doing cross replication along with t_replicate(). Finding a solution would be a good thing. The fact is that these situations should be corner cases, so there has to be a solution not impact the normal operation mode.
I've tried to reproduce the most obvious scenario, which is registering a subscriber, delete it from the underlying db table, then re-register again. The expiry value in the cache is refreshed, but it's never written back to the db table. There are most likely other, more subtle scenarios, and our customers approved that they didn't mess with the db manually. The state was always CS_SYNC and Flags was 0.
I don't know the details of the srdb layer, but probably it's possible to find a way to return the "rows affected" after an update in order to know whether to try an insert afterwards. Would be possible with mysql, not sure about pgsql, oracle, dbtext etc. We'll take a look how we could tackle that.
Mysql has affected row and it is exported by db_mysql module, not sure about the other drivers.
I don't think that a log message would help very much, because kamailio won't know about the missing entry in the db (unless you evaluate the result of the update), at least in this particular case.
Yes, kamailio does not know, I know, seems it is what we try to solve -- first how to make it aware and then do some actions (like writing a log message).
Perhaps the best for the moment is to detect at startup and rely on db api capabilities and use replace/affected_rows a.s.o. when they are available, in the best efficient order -- a module parameter can give admins the power to decide over the auto-detect.
Cheers, Daniel