Module: sip-router Branch: master Commit: ae86ca3611398ce365ac4a1776ff0c7e95476bbe URL: http://git.sip-router.org/cgi-bin/gitweb.cgi/sip-router/?a=commit;h=ae86ca36...
Author: Anca Vamanu anca.vamanu@1and1.ro Committer: Anca Vamanu anca.vamanu@1and1.ro Date: Wed Feb 15 13:39:55 2012 +0200
modules_k/presence Fixed DB Storage Modes
- removed db_mode and fallback2db parameters and added two new parameters: subs_db_mode and publ_cache - fixed and extended the storage modes for subscriptions: Memory Only, Write Through, Write Back, DB Only - publ_cache parameter offers the possibility to disable publish cache - some other fixes: - delete subscription only for 481 or 408 reply for Notify - call child_init also for main process (no shutdown DB flush was being performed)
---
modules_k/presence/README | 190 ++++++----- modules_k/presence/bind_presence.c | 4 +- modules_k/presence/bind_presence.h | 4 +- modules_k/presence/doc/presence_admin.xml | 127 +++++--- modules_k/presence/doc/presence_devel.xml | 2 +- modules_k/presence/event_list.c | 2 +- modules_k/presence/event_list.h | 2 +- modules_k/presence/hash.c | 25 +-- modules_k/presence/hash.h | 2 +- modules_k/presence/notify.c | 253 ++++++--------- modules_k/presence/notify.h | 2 +- modules_k/presence/presence.c | 108 +++---- modules_k/presence/presence.h | 19 +- modules_k/presence/presentity.c | 30 +-- modules_k/presence/presentity.h | 2 +- modules_k/presence/publish.c | 52 ++-- modules_k/presence/publish.h | 2 +- modules_k/presence/subscribe.c | 495 +++++++++++++++++++---------- modules_k/presence/subscribe.h | 7 +- modules_k/presence/utils_func.c | 2 +- modules_k/presence/utils_func.h | 2 +- modules_k/pua/hash.h | 1 + 22 files changed, 736 insertions(+), 597 deletions(-)
Diff: http://git.sip-router.org/cgi-bin/gitweb.cgi/sip-router/?a=commitdiff;h=ae86...
Hi Anca!
From the README:
subs_db_mode (int): 3 - DB-Only scheme. No memory cache is kept, all operations being directly performed with the database. The timer deletes all expired subscriptions from database. The mode is useful if you configure more servers sharing the same DB without any replication at SIP level. The mode may be slower due the high number of DB operation.
You mention replication at SIP level - I wonder how you can replicate subscriptions on SIP level?
regards Klaus
On 15.02.2012 13:45, Anca Vamanu wrote:
Module: sip-router Branch: master Commit: ae86ca3611398ce365ac4a1776ff0c7e95476bbe URL: http://git.sip-router.org/cgi-bin/gitweb.cgi/sip-router/?a=commit;h=ae86ca36...
Author: Anca Vamanuanca.vamanu@1and1.ro Committer: Anca Vamanuanca.vamanu@1and1.ro Date: Wed Feb 15 13:39:55 2012 +0200
modules_k/presence Fixed DB Storage Modes
- removed db_mode and fallback2db parameters and added two new parameters: subs_db_mode and publ_cache
- fixed and extended the storage modes for subscriptions: Memory Only, Write Through, Write Back, DB Only
- publ_cache parameter offers the possibility to disable publish cache
- some other fixes:
- delete subscription only for 481 or 408 reply for Notify
- call child_init also for main process (no shutdown DB flush was being performed)
modules_k/presence/README | 190 ++++++----- modules_k/presence/bind_presence.c | 4 +- modules_k/presence/bind_presence.h | 4 +- modules_k/presence/doc/presence_admin.xml | 127 +++++--- modules_k/presence/doc/presence_devel.xml | 2 +- modules_k/presence/event_list.c | 2 +- modules_k/presence/event_list.h | 2 +- modules_k/presence/hash.c | 25 +-- modules_k/presence/hash.h | 2 +- modules_k/presence/notify.c | 253 ++++++--------- modules_k/presence/notify.h | 2 +- modules_k/presence/presence.c | 108 +++---- modules_k/presence/presence.h | 19 +- modules_k/presence/presentity.c | 30 +-- modules_k/presence/presentity.h | 2 +- modules_k/presence/publish.c | 52 ++-- modules_k/presence/publish.h | 2 +- modules_k/presence/subscribe.c | 495 +++++++++++++++++++---------- modules_k/presence/subscribe.h | 7 +- modules_k/presence/utils_func.c | 2 +- modules_k/presence/utils_func.h | 2 +- modules_k/pua/hash.h | 1 + 22 files changed, 736 insertions(+), 597 deletions(-)
Diff: http://git.sip-router.org/cgi-bin/gitweb.cgi/sip-router/?a=commitdiff;h=ae86...
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
Hi Klaus,
On 02/20/2012 10:47 AM, Klaus Darilion wrote:
Hi Anca!
From the README:
subs_db_mode (int): 3 - DB-Only scheme. No memory cache is kept, all operations being directly performed with the database. The timer deletes all expired subscriptions from database. The mode is useful if you configure more servers sharing the same DB without any replication at SIP level. The mode may be slower due the high number of DB operation.
You mention replication at SIP level - I wonder how you can replicate subscriptions on SIP level?
From kamailio you could call t_replicate() to send the Subscribe or Publish to another destination. But this is not enough. First, you should take care that that destination will not be allowed to send the reply for Subscribe or the Notifies out. Then, you should replicate the 200Ok for Notifies received from the clients to that destination to be sure the destination updates correctly the local Cseq. Currently in kamailio there isn't a way to replicate replies. I had worked for a solution with a distributed presence server with replication like this, but some custom patches were needed to make this possible.
Regards, Anca
regards Klaus
Hello,
On 2/20/12 10:40 AM, Anca Vamanu wrote:
Hi Klaus,
On 02/20/2012 10:47 AM, Klaus Darilion wrote:
Hi Anca!
From the README:
subs_db_mode (int): 3 - DB-Only scheme. No memory cache is kept, all operations being directly performed with the database. The timer deletes all expired subscriptions from database. The mode is useful if you configure more servers sharing the same DB without any replication at SIP level. The mode may be slower due the high number of DB operation.
You mention replication at SIP level - I wonder how you can replicate subscriptions on SIP level?
From kamailio you could call t_replicate() to send the Subscribe or Publish to another destination. But this is not enough. First, you should take care that that destination will not be allowed to send the reply for Subscribe or the Notifies out. Then, you should replicate the 200Ok for Notifies received from the clients to that destination to be sure the destination updates correctly the local Cseq. Currently in kamailio there isn't a way to replicate replies.
send() (or send_*proto*()) from the core can mirror practically any kind of sip message -- for replies should be enough.
Cheers, Daniel
I had worked for a solution with a distributed presence server with replication like this, but some custom patches were needed to make this possible.
Regards, Anca
regards Klaus
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
On 20.02.2012 10:40, Anca Vamanu wrote:
Hi Klaus,
On 02/20/2012 10:47 AM, Klaus Darilion wrote:
Hi Anca!
From the README:
subs_db_mode (int): 3 - DB-Only scheme. No memory cache is kept, all operations being directly performed with the database. The timer deletes all expired subscriptions from database. The mode is useful if you configure more servers sharing the same DB without any replication at SIP level. The mode may be slower due the high number of DB operation.
You mention replication at SIP level - I wonder how you can replicate subscriptions on SIP level?
From kamailio you could call t_replicate() to send the Subscribe or Publish to another destination. But this is not enough. First, you should take care that that destination will not be allowed to send the reply for Subscribe or the Notifies out. Then, you should replicate the 200Ok for Notifies received from the clients to that destination to be sure the destination updates correctly the local Cseq. Currently in kamailio there isn't a way to replicate replies. I had worked for a solution with a distributed presence server with replication like this, but some custom patches were needed to make this possible.
Hi Anca, that sounds like an awful hack ;-)
Just to be sure that I really understand the hack - for example presence server p1 and backup presence server p2.
p1 receives SUBSCRIBE: local handle_subscribe() and t_replicate() to p2
p1 sends back 200 OK and stores dialog.
p2 sends back 200 OK to p1, but using a different to-tag thus having another dialog. Thus, NOTIFYs by p2 have a different Via branch-id and a different from-tag. Thus, I wonder how failover to the backup-node should work and how the backup node will accept a faked NOTIFY-response if branch-param and from-tag are different ???
thanks Klaus
Hi Klaus,
It's not quite the way you described it. In the scenario describe by me, p2 was just a hot stand by, sending nothing outside while in backup mode. In an active - active setup, it is a lot harder to achieve.
First, as Daniel suggested, send() is the best option to use when replicating - forwarding the message as it was received. Then, p2 should work in a synchronous way - generating the same tag and Via branch as p1. With the Via branch it is easy to achieve with the *syn_branch* global parameter. With the tag, some coding needs to be done.
P2 should not send anything out, it should be blocked at network layer, or from inside the application. However it will have the impression it had send out the Notifies, so it has to receive the replies for it. This is why p1 must also replicate replies for Notifies.
Regards, Anca
On 02/20/2012 12:43 PM, Klaus Darilion wrote:
Hi Anca, that sounds like an awful hack ;-)
Just to be sure that I really understand the hack - for example presence server p1 and backup presence server p2.
p1 receives SUBSCRIBE: local handle_subscribe() and t_replicate() to p2
p1 sends back 200 OK and stores dialog.
p2 sends back 200 OK to p1, but using a different to-tag thus having another dialog. Thus, NOTIFYs by p2 have a different Via branch-id and a different from-tag. Thus, I wonder how failover to the backup-node should work and how the backup node will accept a faked NOTIFY-response if branch-param and from-tag are different ???
thanks Klaus
I actually also meant the hot-standby scenario. So all you had to do was to change the totag generation to be identical on both sides? Still a hack but it will work ;-)
Of course this hack only achieves HA, but does not improve scalability. The new modes will allow multiple presence servers in active-active setup, but moves the bottleneck to the DB. Maybe some caching can be used depending on dispatching configuration of the load-balancer. I guess I have to think some more about that.
regards Klaus
On 20.02.2012 12:03, Anca Vamanu wrote:
Hi Klaus,
It's not quite the way you described it. In the scenario describe by me, p2 was just a hot stand by, sending nothing outside while in backup mode. In an active - active setup, it is a lot harder to achieve.
First, as Daniel suggested, send() is the best option to use when replicating - forwarding the message as it was received. Then, p2 should work in a synchronous way - generating the same tag and Via branch as p1. With the Via branch it is easy to achieve with the *syn_branch* global parameter. With the tag, some coding needs to be done.
P2 should not send anything out, it should be blocked at network layer, or from inside the application. However it will have the impression it had send out the Notifies, so it has to receive the replies for it. This is why p1 must also replicate replies for Notifies.
Regards, Anca
On 02/20/2012 12:43 PM, Klaus Darilion wrote:
Hi Anca, that sounds like an awful hack ;-)
Just to be sure that I really understand the hack - for example presence server p1 and backup presence server p2.
p1 receives SUBSCRIBE: local handle_subscribe() and t_replicate() to p2
p1 sends back 200 OK and stores dialog.
p2 sends back 200 OK to p1, but using a different to-tag thus having another dialog. Thus, NOTIFYs by p2 have a different Via branch-id and a different from-tag. Thus, I wonder how failover to the backup-node should work and how the backup node will accept a faked NOTIFY-response if branch-param and from-tag are different ???
thanks Klaus
Hi Anca!
I wonder what happens if I build a server cluster (p1 and p2) with subs_db_mode=3 and publ_cache=0.
Then a subscription to user1 is received from p1 and stored into the DB. Then user1 sends a publish via p2, which is stored in the DB.
Who will send the NOTIFY for the subscription? p1 or p2? I guess p2 as p1 has no idea about the publish - correct?
thanks Klaus
On 15.02.2012 13:45, Anca Vamanu wrote:
Module: sip-router Branch: master Commit: ae86ca3611398ce365ac4a1776ff0c7e95476bbe URL: http://git.sip-router.org/cgi-bin/gitweb.cgi/sip-router/?a=commit;h=ae86ca36...
Author: Anca Vamanuanca.vamanu@1and1.ro Committer: Anca Vamanuanca.vamanu@1and1.ro Date: Wed Feb 15 13:39:55 2012 +0200
modules_k/presence Fixed DB Storage Modes
- removed db_mode and fallback2db parameters and added two new parameters: subs_db_mode and publ_cache
- fixed and extended the storage modes for subscriptions: Memory Only, Write Through, Write Back, DB Only
- publ_cache parameter offers the possibility to disable publish cache
- some other fixes:
- delete subscription only for 481 or 408 reply for Notify
- call child_init also for main process (no shutdown DB flush was being performed)
modules_k/presence/README | 190 ++++++----- modules_k/presence/bind_presence.c | 4 +- modules_k/presence/bind_presence.h | 4 +- modules_k/presence/doc/presence_admin.xml | 127 +++++--- modules_k/presence/doc/presence_devel.xml | 2 +- modules_k/presence/event_list.c | 2 +- modules_k/presence/event_list.h | 2 +- modules_k/presence/hash.c | 25 +-- modules_k/presence/hash.h | 2 +- modules_k/presence/notify.c | 253 ++++++--------- modules_k/presence/notify.h | 2 +- modules_k/presence/presence.c | 108 +++---- modules_k/presence/presence.h | 19 +- modules_k/presence/presentity.c | 30 +-- modules_k/presence/presentity.h | 2 +- modules_k/presence/publish.c | 52 ++-- modules_k/presence/publish.h | 2 +- modules_k/presence/subscribe.c | 495 +++++++++++++++++++---------- modules_k/presence/subscribe.h | 7 +- modules_k/presence/utils_func.c | 2 +- modules_k/presence/utils_func.h | 2 +- modules_k/pua/hash.h | 1 + 22 files changed, 736 insertions(+), 597 deletions(-)
Diff: http://git.sip-router.org/cgi-bin/gitweb.cgi/sip-router/?a=commitdiff;h=ae86...
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
Hi Klaus,
On 02/20/2012 10:54 AM, Klaus Darilion wrote:
Hi Anca!
I wonder what happens if I build a server cluster (p1 and p2) with subs_db_mode=3 and publ_cache=0.
Then a subscription to user1 is received from p1 and stored into the DB. Then user1 sends a publish via p2, which is stored in the DB.
Who will send the NOTIFY for the subscription? p1 or p2? I guess p2 as p1 has no idea about the publish - correct?
The machine that received the Publish, in this case p2 will send the Notify. Looking now in the code, I saw that when sending out this Notify, p2 will actually put in the Contact header the contact of the p1 (initial contact). I don't think there should be any problem with this ( except someone compares Via and Contact, but probably not). In this way always the reSubscribes will get to the same machine. I see now an opportunity to improve the performance because of this behavior - having a mode in which to read from memory for Subscribes, but always read from DB for Publishes.
Regards, Anca
thanks Klaus
On 20.02.2012 11:52, Anca Vamanu wrote:
Hi Klaus,
On 02/20/2012 10:54 AM, Klaus Darilion wrote:
Hi Anca!
I wonder what happens if I build a server cluster (p1 and p2) with subs_db_mode=3 and publ_cache=0.
Then a subscription to user1 is received from p1 and stored into the DB. Then user1 sends a publish via p2, which is stored in the DB.
Who will send the NOTIFY for the subscription? p1 or p2? I guess p2 as p1 has no idea about the publish - correct?
The machine that received the Publish, in this case p2 will send the Notify. Looking now in the code, I saw that when sending out this Notify, p2 will actually put in the Contact header the contact of the p1 (initial contact). I don't think there should be any problem with this ( except someone compares Via and Contact, but probably not). In this way always the reSubscribes will get to the same machine.
Makes sense.
I see now an opportunity to improve the performance because of this behavior - having a mode in which to read from memory for Subscribes, but always read from DB for Publishes.
What if publ_cache=1. What really happens on incoming PUBLISH?
I have not read the code but I suspect with subs_db_mode=3 and publ_cache=0 the module does (without permissions checking):
incoming PUBLISH without e-tag: INSERT INTO presentity .....
incoming PUBLISH with e-tag: UPDATE presentity ... WHERE etag='..' AND ......
incoming SUBSCRIBE without to-tag: INSERT INTO active-watchers ... SELECT * FROM presentity WHERE ....
incoming SUBSCRIBE with to-tag: UPDATE active-watchers ... SELECT * FROM presentity WHERE ....
So what changes if publ_cache=1?
thanks Klaus
Regards, Anca
thanks Klaus
On 02/20/2012 01:47 PM, Klaus Darilion wrote:
I see now an opportunity to improve the performance because of this behavior - having a mode in which to read from memory for Subscribes, but always read from DB for Publishes.
What if publ_cache=1. What really happens on incoming PUBLISH?
I have not read the code but I suspect with subs_db_mode=3 and publ_cache=0 the module does (without permissions checking):
incoming PUBLISH without e-tag: INSERT INTO presentity .....
incoming PUBLISH with e-tag:
add here query to verify if the publish record exists: SELECT * FROM presentity WHERE etag=..
UPDATE presentity ... WHERE etag='..' AND ......
incoming SUBSCRIBE without to-tag: INSERT INTO active-watchers ... SELECT * FROM presentity WHERE ....
incoming SUBSCRIBE with to-tag:
add there the query to verify if the subscribe dialog exists: SELECT * FROM active_watchers WHERE ..
UPDATE active-watchers ... SELECT * FROM presentity WHERE ....
And you have all the operations that are performed with with subs_db_mode=3 and publ_cache=0;
So what changes if publ_cache=1?
If publ_cache=1 the performance improvement happens when a Subscribe comes. Before doing the query in the presentity table, a check in the publish cache is made to see if there are any known active publications for that presentity. This way the query in presentity table is made only when there is something to retrieve from there.
This could be extended to help with the read query when a Publish with e-tag comes. Now the etag is not stored in cache, only the presentity uri.
Regards, Anca
thanks Klaus
Regards, Anca
thanks Klaus
On 20.02.2012 13:31, Anca Vamanu wrote:
On 02/20/2012 01:47 PM, Klaus Darilion wrote:
I see now an opportunity to improve the performance because of this behavior - having a mode in which to read from memory for Subscribes, but always read from DB for Publishes.
What if publ_cache=1. What really happens on incoming PUBLISH?
I have not read the code but I suspect with subs_db_mode=3 and publ_cache=0 the module does (without permissions checking):
incoming PUBLISH without e-tag: INSERT INTO presentity .....
incoming PUBLISH with e-tag:
add here query to verify if the publish record exists: SELECT * FROM presentity WHERE etag=..
UPDATE presentity ... WHERE etag='..' AND ......
Is the SELECT really necessary? If only UPDATE is done, and UPDATE returns '0 rows' affected then respond with 412.
incoming SUBSCRIBE without to-tag: INSERT INTO active-watchers ... SELECT * FROM presentity WHERE ....
incoming SUBSCRIBE with to-tag:
add there the query to verify if the subscribe dialog exists: SELECT * FROM active_watchers WHERE ..
UPDATE active-watchers ... SELECT * FROM presentity WHERE ....
same here. If the subs_db_mode=3 the SELECT could be avoided and send back 481 in case of 0 affected rows.
Thus, during normal operation the SELECT query could be avoided - which safes at least 1 RTT between PS and DB. On the other hand, during "abnormal" operation (lots of 481 responses), the UPDATE query may take longer than the SELECT query.
Maybe this is an area where noSQL DBs could speed up look-ups and updates.
And you have all the operations that are performed with with subs_db_mode=3 and publ_cache=0;
So what changes if publ_cache=1?
If publ_cache=1 the performance improvement happens when a Subscribe comes. Before doing the query in the presentity table, a check in the publish cache is made to see if there are any known active publications for that presentity. This way the query in presentity table is made only when there is something to retrieve from there.
This could be extended to help with the read query when a Publish with e-tag comes. Now the etag is not stored in cache, only the presentity uri.
OK, I understand it now.
Thanks Klaus
On 02/20/2012 03:26 PM, Klaus Darilion wrote:
Is the SELECT really necessary? If only UPDATE is done, and UPDATE returns '0 rows' affected then respond with 412.
If only it were like this :) . I have asked about this behavior long time ago. Unfortunately the DB API of kamailio does not say '0 rows affected' when update is performed. It says only if the update was successful or not. No indication if it has actually found a match or not. And actually I don't see this possibility in the mysql library either:
http://dev.mysql.com/doc/refman/5.0/en/mysql-real-query.html
Regards, Anca
On 20.02.2012 14:47, Anca Vamanu wrote:
On 02/20/2012 03:26 PM, Klaus Darilion wrote:
Is the SELECT really necessary? If only UPDATE is done, and UPDATE returns '0 rows' affected then respond with 412.
If only it were like this :) . I have asked about this behavior long time ago. Unfortunately the DB API of kamailio does not say '0 rows affected' when update is performed. It says only if the update was successful or not. No indication if it has actually found a match or not. And actually I don't see this possibility in the mysql library either:
http://dev.mysql.com/doc/refman/5.0/en/mysql-real-query.html
It should work with: http://dev.mysql.com/doc/refman/5.0/en/mysql-affected-rows.html
klaus
On 02/20/2012 04:12 PM, Klaus Darilion wrote:
On 20.02.2012 14:47, Anca Vamanu wrote:
On 02/20/2012 03:26 PM, Klaus Darilion wrote:
Is the SELECT really necessary? If only UPDATE is done, and UPDATE returns '0 rows' affected then respond with 412.
If only it were like this :) . I have asked about this behavior long time ago. Unfortunately the DB API of kamailio does not say '0 rows affected' when update is performed. It says only if the update was successful or not. No indication if it has actually found a match or not. And actually I don't see this possibility in the mysql library either:
http://dev.mysql.com/doc/refman/5.0/en/mysql-real-query.html
It should work with: http://dev.mysql.com/doc/refman/5.0/en/mysql-affected-rows.html
klaus
Right. A great new feature idea - add affected_rows() function in the DB API and for the backends that have this defined, use it instead of query before update. Probably there are more modules that could benefit from this.
Anca
On Monday 20 February 2012 15:18:27 Anca Vamanu wrote:
Right. A great new feature idea - add affected_rows() function in the DB API and for the backends that have this defined, use it instead of query before update. Probably there are more modules that could benefit from this.
It already exists in srdb1 and is implemented for MySQL. See commits 26f2169a0aef931b17b35e64a2aca580d82b6b1a and 588d1ffbb7b6e5074e3dbb6950b2149544ea1521
On 02/20/2012 07:44 PM, Alex Hermann wrote:
On Monday 20 February 2012 15:18:27 Anca Vamanu wrote:
Right. A great new feature idea - add affected_rows() function in the DB API and for the backends that have this defined, use it instead of query before update. Probably there are more modules that could benefit from this.
It already exists in srdb1 and is implemented for MySQL. See commits 26f2169a0aef931b17b35e64a2aca580d82b6b1a and 588d1ffbb7b6e5074e3dbb6950b2149544ea1521
A, great, thank you! I will update the presence code to make use of the affected_rows function.
Regards, Anca
Hi,
I just tried this updated presence module and ran into some problems.
The scenario I use is presence and RLS. The client SUBSCRIBEs to a single resource list on the RLS and to presence.winfo on the PS. The client PUBLISHes to the PS.
With the new code I get two error messages out a lot:
Feb 29 11:31:18 pd-laptop-linux ./kamailio[22566]: ERROR: presence [subscribe.c:799]: wrong status Feb 29 11:32:39 pd-laptop-linux ./kamailio[22566]: ERROR: presence [subscribe.c:1314]: wrong sequence number received: 4 - stored: 10
I also get segmentation faults when I change presence state or log out. Here is a fragment of the back-trace:
#0 0x00007f0447191438 in get_subs_db (pres_uri=0x7fff0ce09ea0, event=0x7f0443f15728, sender=<optimized out>, s_array=0x7fff0ce09bf0, n=0x7fff0ce09c00) at notify.c:1120 #1 0x00007f044719375f in get_subs_dialog (pres_uri=0x7fff0ce09ea0, event=0x7f0443f15728, sender=0x0) at notify.c:1238 #2 0x00007f04471994ce in query_db_notify (pres_uri=0x7fff0ce09ea0, event=0x7f0443f15728, watcher_subs=0x0) at notify.c:1353 #3 0x00007f04471b024e in update_subscription (msg=0x7f044d36ec38, subs=0x7fff0ce09ea0, to_tag_gen=<optimized out>, sent_reply=0x7fff0ce09ff0) at subscribe.c:452 #4 0x00007f04471be82a in handle_subscribe (msg=0x7f044d36ec38, str1=<optimized out>, str2=<optimized out>) at subscribe.c:805 #5 0x00000000004e50c5 in do_action (h=0x7fff0ce0c950, a=<optimized out>, msg=0x7f044d36ec38) at action.c:1116
I rolled back to the previous version of presence and it all works correctly as before.
Thanks,
Peter
On Wed, 2012-02-15 at 13:45 +0100, Anca Vamanu wrote:
Module: sip-router Branch: master Commit: ae86ca3611398ce365ac4a1776ff0c7e95476bbe URL: http://git.sip-router.org/cgi-bin/gitweb.cgi/sip-router/?a=commit;h=ae86ca36...
Author: Anca Vamanu anca.vamanu@1and1.ro Committer: Anca Vamanu anca.vamanu@1and1.ro Date: Wed Feb 15 13:39:55 2012 +0200
modules_k/presence Fixed DB Storage Modes
- removed db_mode and fallback2db parameters and added two new parameters: subs_db_mode and publ_cache
- fixed and extended the storage modes for subscriptions: Memory Only, Write Through, Write Back, DB Only
- publ_cache parameter offers the possibility to disable publish cache
- some other fixes:
- delete subscription only for 481 or 408 reply for Notify
- call child_init also for main process (no shutdown DB flush was being performed)
modules_k/presence/README | 190 ++++++----- modules_k/presence/bind_presence.c | 4 +- modules_k/presence/bind_presence.h | 4 +- modules_k/presence/doc/presence_admin.xml | 127 +++++--- modules_k/presence/doc/presence_devel.xml | 2 +- modules_k/presence/event_list.c | 2 +- modules_k/presence/event_list.h | 2 +- modules_k/presence/hash.c | 25 +-- modules_k/presence/hash.h | 2 +- modules_k/presence/notify.c | 253 ++++++--------- modules_k/presence/notify.h | 2 +- modules_k/presence/presence.c | 108 +++---- modules_k/presence/presence.h | 19 +- modules_k/presence/presentity.c | 30 +-- modules_k/presence/presentity.h | 2 +- modules_k/presence/publish.c | 52 ++-- modules_k/presence/publish.h | 2 +- modules_k/presence/subscribe.c | 495 +++++++++++++++++++---------- modules_k/presence/subscribe.h | 7 +- modules_k/presence/utils_func.c | 2 +- modules_k/presence/utils_func.h | 2 +- modules_k/pua/hash.h | 1 + 22 files changed, 736 insertions(+), 597 deletions(-)
Diff: http://git.sip-router.org/cgi-bin/gitweb.cgi/sip-router/?a=commitdiff;h=ae86...
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
Hi Peter,
Which version exactly did you use? There was a bug in the first version that I fixed last week. Also can you please tell me with what db mode you are using?
Regards, Anca
On 02/29/2012 02:19 PM, Peter Dunkley wrote:
Hi,
I just tried this updated presence module and ran into some problems.
The scenario I use is presence and RLS. The client SUBSCRIBEs to a single resource list on the RLS and to presence.winfo on the PS. The client PUBLISHes to the PS.
With the new code I get two error messages out a lot:
Feb 29 11:31:18 pd-laptop-linux ./kamailio[22566]: ERROR: presence [subscribe.c:799]: wrong status Feb 29 11:32:39 pd-laptop-linux ./kamailio[22566]: ERROR: presence [subscribe.c:1314]: wrong sequence number received: 4 - stored: 10
I also get segmentation faults when I change presence state or log out. Here is a fragment of the back-trace:
#0 0x00007f0447191438 in get_subs_db (pres_uri=0x7fff0ce09ea0, event=0x7f0443f15728, sender=<optimized out>, s_array=0x7fff0ce09bf0, n=0x7fff0ce09c00) at notify.c:1120 #1 0x00007f044719375f in get_subs_dialog (pres_uri=0x7fff0ce09ea0, event=0x7f0443f15728, sender=0x0) at notify.c:1238 #2 0x00007f04471994ce in query_db_notify (pres_uri=0x7fff0ce09ea0, event=0x7f0443f15728, watcher_subs=0x0) at notify.c:1353 #3 0x00007f04471b024e in update_subscription (msg=0x7f044d36ec38, subs=0x7fff0ce09ea0, to_tag_gen=<optimized out>, sent_reply=0x7fff0ce09ff0) at subscribe.c:452 #4 0x00007f04471be82a in handle_subscribe (msg=0x7f044d36ec38, str1=<optimized out>, str2=<optimized out>) at subscribe.c:805 #5 0x00000000004e50c5 in do_action (h=0x7fff0ce0c950, a=<optimized out>, msg=0x7f044d36ec38) at action.c:1116
I rolled back to the previous version of presence and it all works correctly as before.
Thanks,
Peter
Hi,
I used the latest version (taken from git on Thursday evening - so after the last change).
I was using db mode 3 (DB only) and d publ_cache set to 0.
Thanks,
Peter
On Thu, 2012-03-01 at 12:27 +0200, Anca Vamanu wrote:
Hi Peter,
Which version exactly did you use? There was a bug in the first version that I fixed last week. Also can you please tell me with what db mode you are using?
Regards, Anca
On 02/29/2012 02:19 PM, Peter Dunkley wrote:
Hi,
I just tried this updated presence module and ran into some problems.
The scenario I use is presence and RLS. The client SUBSCRIBEs to a single resource list on the RLS and to presence.winfo on the PS. The client PUBLISHes to the PS.
With the new code I get two error messages out a lot:
Feb 29 11:31:18 pd-laptop-linux ./kamailio[22566]: ERROR: presence [subscribe.c:799]: wrong status Feb 29 11:32:39 pd-laptop-linux ./kamailio[22566]: ERROR: presence [subscribe.c:1314]: wrong sequence number received: 4 - stored: 10
I also get segmentation faults when I change presence state or log out. Here is a fragment of the back-trace:
#0 0x00007f0447191438 in get_subs_db (pres_uri=0x7fff0ce09ea0, event=0x7f0443f15728, sender=<optimized out>, s_array=0x7fff0ce09bf0, n=0x7fff0ce09c00) at notify.c:1120 #1 0x00007f044719375f in get_subs_dialog (pres_uri=0x7fff0ce09ea0, event=0x7f0443f15728, sender=0x0) at notify.c:1238 #2 0x00007f04471994ce in query_db_notify (pres_uri=0x7fff0ce09ea0, event=0x7f0443f15728, watcher_subs=0x0) at notify.c:1353 #3 0x00007f04471b024e in update_subscription (msg=0x7f044d36ec38, subs=0x7fff0ce09ea0, to_tag_gen=<optimized out>, sent_reply=0x7fff0ce09ff0) at subscribe.c:452 #4 0x00007f04471be82a in handle_subscribe (msg=0x7f044d36ec38, str1=<optimized out>, str2=<optimized out>) at subscribe.c:805 #5 0x00000000004e50c5 in do_action (h=0x7fff0ce0c950, a=<optimized out>, msg=0x7f044d36ec38) at action.c:1116
I rolled back to the previous version of presence and it all works correctly as before.
Thanks,
Peter
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
Hi Peter,
Is this really the line where it is crashing?
1120: s.to_domain.len= strlen(s.to_domain.s);
Just the make sure we are speaking about the same sources and because I find it hard to believe it could crash there.
Thanks, Anca
On 03/01/2012 12:39 PM, Peter Dunkley wrote:
Hi,
I used the latest version (taken from git on Thursday evening - so after the last change).
I was using db mode 3 (DB only) and d publ_cache set to 0.
Thanks,
Peter
Hi,
I've been doing more investigation myself. It seems that even with old presence module I can get it to crash here if the records in the DB are "broken" enough. So I think it was just a co-incidence that I first spotted the crash with the new code. I suspect that with the two errors coming out of the new presence code (and the state things are in when they come out) it is just easier to get the DB records broken in the right way.
The error messages are actually the bigger issue as they are stopping presence and RLS working together.
Thanks,
Peter
On Thu, 2012-03-01 at 13:58 +0200, Anca Vamanu wrote:
Hi Peter,
Is this really the line where it is crashing?
1120: s.to_domain.len= strlen(s.to_domain.s);
Just the make sure we are speaking about the same sources and because I find it hard to believe it could crash there.
Thanks, Anca
On 03/01/2012 12:39 PM, Peter Dunkley wrote:
Hi,
I used the latest version (taken from git on Thursday evening - so after the last change).
I was using db mode 3 (DB only) and d publ_cache set to 0.
Thanks,
Peter
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
Hi Peter,
So the cause was bad data in the db? Was that data corrupted by an external application, or did presence module insert bad data? I would really like to investigate this problem thoroughly to be sure there isn't a bug in the new code.
Regards, Anca
On 03/01/2012 04:45 PM, Peter Dunkley wrote:
Hi,
I've been doing more investigation myself. It seems that even with old presence module I can get it to crash here if the records in the DB are "broken" enough. So I think it was just a co-incidence that I first spotted the crash with the new code. I suspect that with the two errors coming out of the new presence code (and the state things are in when they come out) it is just easier to get the DB records broken in the right way.
The error messages are actually the bigger issue as they are stopping presence and RLS working together.
Thanks,
Peter
-- Peter Dunkley Technical Director Crocodile RCS Ltd
Hi,
For the bad data... I am not sure how I got it into that state. It happened to me today with the old presence code when I was changing my kamailio.cfg and things weren't being cleared out of the DB. I do not think this a bug in the new presence module. I think it is in the current code as well. I have two theories:
1. There is a problem with the code in or around the loop in get_subs_db() so if you end up with more than one matching record you get a crash. 2. The db result in get_subs_db() is too big and maybe db_fetch_query() should be used.
The two error message that were coming out (before and separate to the crash) are showing an error in either the new presence module or RLS, and I think it is possible for records to get left in active_watchers between client sessions when those errors occur. If that is what is happening then the crash in the new code could be happening for the same reason it happens in the old code. That is, there is a problem when there are multiple results from the query in get_subs_db().
Peter
On Thu, 2012-03-01 at 16:55 +0200, Anca Vamanu wrote:
Hi Peter,
So the cause was bad data in the db? Was that data corrupted by an external application, or did presence module insert bad data? I would really like to investigate this problem thoroughly to be sure there isn't a bug in the new code.
Regards, Anca
On 03/01/2012 04:45 PM, Peter Dunkley wrote:
Hi,
I've been doing more investigation myself. It seems that even with old presence module I can get it to crash here if the records in the DB are "broken" enough. So I think it was just a co-incidence that I first spotted the crash with the new code. I suspect that with the two errors coming out of the new presence code (and the state things are in when they come out) it is just easier to get the DB records broken in the right way.
The error messages are actually the bigger issue as they are stopping presence and RLS working together.
Thanks,
Peter
-- Peter Dunkley Technical Director Crocodile RCS Ltd
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
Hi Anca!
I just wondered if race conditions are handled properly (DB-only mode)
E.g. there are 2 PUBLISH for a certain presentity (or a PUBLISH and reSUBSCRIBE) received almost at the same time. Processing of these PUBLISH can happen in different processes (or servers), thus when generating the NOTIFY I think it can happen that both processes try to UPDATE the cseq and e.g. use the wrong cseq when sending the NOTIFYs. Reading the cseq and increasing the cseq should happen in a transaction with a row lock.
Another approach would be to have a dedicated NOTIFYer process which takes care of sending NOTIFYs. On reSUBSCRIBE or PUBLISH a presentity gets marked for updates (e.g. in a separate table) and the NOTIFYer polls this table and sends the NOTIFYs in proper order.
What do you think about that?
regards Klaus
On 15.02.2012 13:45, Anca Vamanu wrote:
Module: sip-router Branch: master Commit: ae86ca3611398ce365ac4a1776ff0c7e95476bbe URL: http://git.sip-router.org/cgi-bin/gitweb.cgi/sip-router/?a=commit;h=ae86ca36...
Author: Anca Vamanuanca.vamanu@1and1.ro Committer: Anca Vamanuanca.vamanu@1and1.ro Date: Wed Feb 15 13:39:55 2012 +0200
modules_k/presence Fixed DB Storage Modes
- removed db_mode and fallback2db parameters and added two new parameters: subs_db_mode and publ_cache
- fixed and extended the storage modes for subscriptions: Memory Only, Write Through, Write Back, DB Only
- publ_cache parameter offers the possibility to disable publish cache
- some other fixes:
- delete subscription only for 481 or 408 reply for Notify
- call child_init also for main process (no shutdown DB flush was being performed)
modules_k/presence/README | 190 ++++++----- modules_k/presence/bind_presence.c | 4 +- modules_k/presence/bind_presence.h | 4 +- modules_k/presence/doc/presence_admin.xml | 127 +++++--- modules_k/presence/doc/presence_devel.xml | 2 +- modules_k/presence/event_list.c | 2 +- modules_k/presence/event_list.h | 2 +- modules_k/presence/hash.c | 25 +-- modules_k/presence/hash.h | 2 +- modules_k/presence/notify.c | 253 ++++++--------- modules_k/presence/notify.h | 2 +- modules_k/presence/presence.c | 108 +++---- modules_k/presence/presence.h | 19 +- modules_k/presence/presentity.c | 30 +-- modules_k/presence/presentity.h | 2 +- modules_k/presence/publish.c | 52 ++-- modules_k/presence/publish.h | 2 +- modules_k/presence/subscribe.c | 495 +++++++++++++++++++---------- modules_k/presence/subscribe.h | 7 +- modules_k/presence/utils_func.c | 2 +- modules_k/presence/utils_func.h | 2 +- modules_k/pua/hash.h | 1 + 22 files changed, 736 insertions(+), 597 deletions(-)
Diff: http://git.sip-router.org/cgi-bin/gitweb.cgi/sip-router/?a=commitdiff;h=ae86...
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
Hi Klaus,
I thought about these race conditions also.
The problem is a bit more complicated because with UDP actually we can not assure that the requests will be received in the same order by the destination even if they are send by the same process. As an example, we have the problem with the Notify that is received before 200OK for Subscribe in some cases even if they are generated by the same process. So I don't know if there is much point in implementing a more rigorous synchronization considering this..
Of course for TCP it would make more sense and the 100% sure solution is indeed the second one that you proposed - having a specialized process to send out the Notifies. But this is the most complicated, as actually we might need a pool of processes not to get blocked because one destination is unreachable for example. And then we would need the divide the Notifies per destination to actually ensure that one process receives all the Notifies for a certain destination.
The first solution that you propose is not that difficult to implement - to have an array of locks and take the corresponding lock when updating a certain dialog ( hash on a callid for example). And do an update immediatelly after search and both under the lock. This would not insure a 100% synchronization but better then now.
Regards, Anca
On 03/27/2012 02:46 PM, Klaus Darilion wrote:
Hi Anca!
I just wondered if race conditions are handled properly (DB-only mode)
E.g. there are 2 PUBLISH for a certain presentity (or a PUBLISH and reSUBSCRIBE) received almost at the same time. Processing of these PUBLISH can happen in different processes (or servers), thus when generating the NOTIFY I think it can happen that both processes try to UPDATE the cseq and e.g. use the wrong cseq when sending the NOTIFYs. Reading the cseq and increasing the cseq should happen in a transaction with a row lock.
Another approach would be to have a dedicated NOTIFYer process which takes care of sending NOTIFYs. On reSUBSCRIBE or PUBLISH a presentity gets marked for updates (e.g. in a separate table) and the NOTIFYer polls this table and sends the NOTIFYs in proper order.
What do you think about that?
regards Klaus
On 15.02.2012 13:45, Anca Vamanu wrote:
Module: sip-router Branch: master Commit: ae86ca3611398ce365ac4a1776ff0c7e95476bbe URL: http://git.sip-router.org/cgi-bin/gitweb.cgi/sip-router/?a=commit;h=ae86ca36...
Author: Anca Vamanuanca.vamanu@1and1.ro Committer: Anca Vamanuanca.vamanu@1and1.ro Date: Wed Feb 15 13:39:55 2012 +0200
modules_k/presence Fixed DB Storage Modes
- removed db_mode and fallback2db parameters and added two new parameters: subs_db_mode and publ_cache
- fixed and extended the storage modes for subscriptions: Memory Only, Write Through, Write Back, DB Only
- publ_cache parameter offers the possibility to disable publish cache
- some other fixes: - delete subscription only for 481 or 408 reply for Notify - call child_init also for main process (no shutdown DB flush was being performed)
modules_k/presence/README | 190 ++++++----- modules_k/presence/bind_presence.c | 4 +- modules_k/presence/bind_presence.h | 4 +- modules_k/presence/doc/presence_admin.xml | 127 +++++--- modules_k/presence/doc/presence_devel.xml | 2 +- modules_k/presence/event_list.c | 2 +- modules_k/presence/event_list.h | 2 +- modules_k/presence/hash.c | 25 +-- modules_k/presence/hash.h | 2 +- modules_k/presence/notify.c | 253 ++++++--------- modules_k/presence/notify.h | 2 +- modules_k/presence/presence.c | 108 +++---- modules_k/presence/presence.h | 19 +- modules_k/presence/presentity.c | 30 +-- modules_k/presence/presentity.h | 2 +- modules_k/presence/publish.c | 52 ++-- modules_k/presence/publish.h | 2 +- modules_k/presence/subscribe.c | 495 +++++++++++++++++++---------- modules_k/presence/subscribe.h | 7 +- modules_k/presence/utils_func.c | 2 +- modules_k/presence/utils_func.h | 2 +- modules_k/pua/hash.h | 1 + 22 files changed, 736 insertions(+), 597 deletions(-)
Diff: http://git.sip-router.org/cgi-bin/gitweb.cgi/sip-router/?a=commitdiff;h=ae86...
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
Indeed, it is even more complex :-)
So, depending on the Event type - e.g. presence allows sending NOTIFY without waiting for the response, but xcap-diff does not allow it - we would have to queue notifications and then have a dedicated NOTIFYer who performs the notifications. I do not think that a single NOTIFYer can be blocked, as UDP is always non-blocking and TCP is now non-blocking too.
regards Klaus
On 27.03.2012 16:43, Anca Vamanu wrote:
Hi Klaus,
I thought about these race conditions also.
The problem is a bit more complicated because with UDP actually we can not assure that the requests will be received in the same order by the destination even if they are send by the same process. As an example, we have the problem with the Notify that is received before 200OK for Subscribe in some cases even if they are generated by the same process. So I don't know if there is much point in implementing a more rigorous synchronization considering this..
Of course for TCP it would make more sense and the 100% sure solution is indeed the second one that you proposed - having a specialized process to send out the Notifies. But this is the most complicated, as actually we might need a pool of processes not to get blocked because one destination is unreachable for example. And then we would need the divide the Notifies per destination to actually ensure that one process receives all the Notifies for a certain destination.
The first solution that you propose is not that difficult to implement - to have an array of locks and take the corresponding lock when updating a certain dialog ( hash on a callid for example). And do an update immediatelly after search and both under the lock. This would not insure a 100% synchronization but better then now.
Regards, Anca
On 03/27/2012 02:46 PM, Klaus Darilion wrote:
Hi Anca!
I just wondered if race conditions are handled properly (DB-only mode)
E.g. there are 2 PUBLISH for a certain presentity (or a PUBLISH and reSUBSCRIBE) received almost at the same time. Processing of these PUBLISH can happen in different processes (or servers), thus when generating the NOTIFY I think it can happen that both processes try to UPDATE the cseq and e.g. use the wrong cseq when sending the NOTIFYs. Reading the cseq and increasing the cseq should happen in a transaction with a row lock.
Another approach would be to have a dedicated NOTIFYer process which takes care of sending NOTIFYs. On reSUBSCRIBE or PUBLISH a presentity gets marked for updates (e.g. in a separate table) and the NOTIFYer polls this table and sends the NOTIFYs in proper order.
What do you think about that?
regards Klaus
On 15.02.2012 13:45, Anca Vamanu wrote:
Module: sip-router Branch: master Commit: ae86ca3611398ce365ac4a1776ff0c7e95476bbe URL: http://git.sip-router.org/cgi-bin/gitweb.cgi/sip-router/?a=commit;h=ae86ca36...
Author: Anca Vamanuanca.vamanu@1and1.ro Committer: Anca Vamanuanca.vamanu@1and1.ro Date: Wed Feb 15 13:39:55 2012 +0200
modules_k/presence Fixed DB Storage Modes
- removed db_mode and fallback2db parameters and added two new
parameters: subs_db_mode and publ_cache
- fixed and extended the storage modes for subscriptions: Memory Only,
Write Through, Write Back, DB Only
- publ_cache parameter offers the possibility to disable publish cache
- some other fixes:
- delete subscription only for 481 or 408 reply for Notify
- call child_init also for main process (no shutdown DB flush was
being performed)
modules_k/presence/README | 190 ++++++----- modules_k/presence/bind_presence.c | 4 +- modules_k/presence/bind_presence.h | 4 +- modules_k/presence/doc/presence_admin.xml | 127 +++++--- modules_k/presence/doc/presence_devel.xml | 2 +- modules_k/presence/event_list.c | 2 +- modules_k/presence/event_list.h | 2 +- modules_k/presence/hash.c | 25 +-- modules_k/presence/hash.h | 2 +- modules_k/presence/notify.c | 253 ++++++--------- modules_k/presence/notify.h | 2 +- modules_k/presence/presence.c | 108 +++---- modules_k/presence/presence.h | 19 +- modules_k/presence/presentity.c | 30 +-- modules_k/presence/presentity.h | 2 +- modules_k/presence/publish.c | 52 ++-- modules_k/presence/publish.h | 2 +- modules_k/presence/subscribe.c | 495 +++++++++++++++++++---------- modules_k/presence/subscribe.h | 7 +- modules_k/presence/utils_func.c | 2 +- modules_k/presence/utils_func.h | 2 +- modules_k/pua/hash.h | 1 + 22 files changed, 736 insertions(+), 597 deletions(-)
Diff: http://git.sip-router.org/cgi-bin/gitweb.cgi/sip-router/?a=commitdiff;h=ae86...
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev