when testing presence/rls, i'm getting a pua error when presence server handles subscribe:
Sep 8 20:36:12 siika /usr/sbin/pres-serv[22932]: INFO: Handling SUBSCRIBE sip:test@test.fi Sep 8 20:36:12 siika /usr/sbin/pres-serv[22932]: INFO: Routing locally generated NOTIFY to sip:rls@127.0.0.1:5082;transport=tcp Sep 8 20:36:12 siika /usr/sbin/pres-serv[22932]: INFO: presence [notify.c:1581]: NOTIFY sip:jh@test.fi via sip:rls@127.0.0.1:5082;transport=tcp on behalf of sip:test@test.fi for event presence Sep 8 20:36:12 siika /usr/sbin/pres-serv[22931]: INFO: <core> [tcp_main.c:2787]: quick connect for 0x7f7114cfa490 Sep 8 20:36:12 siika /usr/sbin/pres-serv[22933]: ERROR: pua [send_subscribe.c:689]: Could not convert temporary dialog into a dialog
what does pua error message 'Could not convert temporary dialog into a dialog' mean? when does it happen?
-- juha
In order to cope with the scenario when a NOTIFY is received on a new dialog before the 2XX to the SUBSCRIBE, pua creates a temporary dialog (for the NOTIFY to match against) when it sends the SUBSCRIBE. This temporary dialog is converted to a full dialog when the 2XX is received (information such as to-tag is filled in).
This error occurs when a matching temporary dialog can't be found in the hash table (or DB if in DB only mode).
The temporary dialog record cannot be created until after the SUBSCRIBE has been sent (the call-id and from-tag are generated by the TM module). This means, on a single server with RLS back-end subscriptions over the local loopback, it is theoretically possible for the SUBSCRIBE to be sent and the 2XX generated, sent, received, and processed before the temporary record is created.
This is much more likely to happen when using DB only mode than when using the in-memory hash (in fact, it was added and tested before the DB only mode changes and I never observed it happening with the in-memory hash).
If you are using DB only mode there is a fix for this. The new (start|end|abort)_transaction() and query_lock() DB APIs are used in pua to prevent this problem. However, these DB APIs are only supported by db_postgres at the moment.
Regards,
Peter
when testing presence/rls, i'm getting a pua error when presence server handles subscribe:
Sep 8 20:36:12 siika /usr/sbin/pres-serv[22932]: INFO: Handling SUBSCRIBE sip:test@test.fi Sep 8 20:36:12 siika /usr/sbin/pres-serv[22932]: INFO: Routing locally generated NOTIFY to sip:rls@127.0.0.1:5082;transport=tcp Sep 8 20:36:12 siika /usr/sbin/pres-serv[22932]: INFO: presence [notify.c:1581]: NOTIFY sip:jh@test.fi via sip:rls@127.0.0.1:5082;transport=tcp on behalf of sip:test@test.fi for event presence Sep 8 20:36:12 siika /usr/sbin/pres-serv[22931]: INFO: <core> [tcp_main.c:2787]: quick connect for 0x7f7114cfa490 Sep 8 20:36:12 siika /usr/sbin/pres-serv[22933]: ERROR: pua [send_subscribe.c:689]: Could not convert temporary dialog into a dialog
what does pua error message 'Could not convert temporary dialog into a dialog' mean? when does it happen?
-- juha
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
peter,
thanks for the explanation. i was using in rls db_mode=0 when i saw the error message. i then switched to db_mode=2 for rls just in case you had fixed something there and i haven't seen the error since then although i'm using mysql where your fix should not help.
The temporary dialog record cannot be created until after the SUBSCRIBE has been sent (the call-id and from-tag are generated by the TM module). This means, on a single server with RLS back-end subscriptions over the local loopback, it is theoretically possible for the SUBSCRIBE to be sent and the 2XX generated, sent, received, and processed before the temporary record is created.
i have exactly the above situation, i.e., backend subscriptions are send over loopback interface.
i'll keep on watching if i see the error again with db_mode=2. it would be nice to get your DB API enhancement implemented for mysql too.
-- juha
Hi,
I only use PostgreSQL, so I'd have no way to test changes made to db_mysql (which is why I only added the APIs to PostgreSQL). That said, I don't believe it will be hard to do and most (if not all) of the PostgreSQL code should be re-usable (although some of the locking stuff might be PostgreSQL specific - but even then there should be MySQL equivalents). If someone wants to do work on MySQL this I'd be happy to help out if I can.
The DB locking stuff (or rather its use in the presence modules - the locking API itself is fine I think) is still something of a work in progress. At the moment the locking is very strict everywhere to ensure that there is no chance of conflicts between processes and servers. It can probably be loosened off a bit in some areas. Loosening up the locking would increase performance, but it is a bit of balancing act finding the minimum set of locking needed to prevent problems.
The db_mode that affects this error should be the PUA one, not the RLS one.
If the problem is occurring in db_mode=0 in PUA there might be a way to fix it by moving the hash table locking around. At the moment the hash table is locked in the insert_htable() function (called sometime after the SUBSCRIBE is sent - that is before the call to tmb.t_request_outside() in send_subscribe()). If the hash table was locked before the SUBSCRIBE is sent and then unlocked after the insertion it wouldn't be possible for the 2XX to be processed (and the dialog converted) before the dialog record exists. The insert_htable() function would need tweaking (perhaps a second argument) so that it can be run without obtaining a lock itself in this case.
The downside to this is that locking the hash table for longer may reduce performance - but if you are seeing this problem perhaps it is necessary?
Regards,
Peter
peter,
thanks for the explanation. i was using in rls db_mode=0 when i saw the error message. i then switched to db_mode=2 for rls just in case you had fixed something there and i haven't seen the error since then although i'm using mysql where your fix should not help.
The temporary dialog record cannot be created until after the SUBSCRIBE has been sent (the call-id and from-tag are generated by the TM module). This means, on a single server with RLS back-end subscriptions over the local loopback, it is theoretically possible for the SUBSCRIBE to be sent and the 2XX generated, sent, received, and processed before the temporary record is created.
i have exactly the above situation, i.e., backend subscriptions are send over loopback interface.
i'll keep on watching if i see the error again with db_mode=2. it would be nice to get your DB API enhancement implemented for mysql too.
-- juha
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
peter,
i analyzed the pua error situation more closely. it happens when rls has db_mode=0 and does not happen when rls has db_mode=2. presence and pua both do not have db_mode set.
the test scenario is such where rls list sip:jh-buddies@test.fi consist of two entries: sip:test@test.fi and sip:foo@test.fi. the former is offline and the latter is online and has published its presence state.
the case where rls db_mode=0 goes like this.
sip:jh@test.fi rls subscribes sip:jh-buddies@test.fi, which sip proxy routes to rls/presence server, which handles the subscribe and (according to wireshark) replies to it with 200 ok:
Sep 9 13:36:59 siika /usr/sbin/sip-proxy[7472]: INFO: Routing SUBSCRIBE sip:jh-buddies@test.fi to presence server Sep 9 13:36:59 siika /usr/sbin/pres-serv[7313]: INFO: Handling SUBSCRIBE sip:jh-buddies@test.fi
then presence/rls server goes on and generates notify on this subscribe to sip:jh@test.fi, which sip proxy forwards to subscriber:
Sep 9 13:36:59 siika /usr/sbin/pres-serv[7313]: INFO: Routing locally generated NOTIFY sip:edxjbtay@192.98.103.10:58563;alias=192.98.103.10~59572~2;transport=tcp to sip:jh@test.fi from sip:jh-buddies@test.fi Sep 9 13:36:59 siika /usr/sbin/sip-proxy[7458]: INFO: Routing in-dialog NOTIFY sip:edxjbtay@192.98.103.10:58563;transport=tcp from sip:jh-buddies@test.fi to sip:192.98.103.10:59572;transport=tcp
at this point, presence server has NOT received 200 ok to this notify from subscriber sip:jh@test.fi yet.
then presence/rls server generates subscribe over loopback interface to one of the entries (sip:test@test.fi) on jh-buddies list, which presence/rls server handles, replies with 202 ok and generates notify to itself over loopback interface:
Sep 9 13:36:59 siika /usr/sbin/pres-serv[7313]: INFO: <core> [tcp_main.c:2787]: quick connect for 0x7ffaa38faed0 Sep 9 13:36:59 siika /usr/sbin/pres-serv[7313]: INFO: Routing locally generated SUBSCRIBE sip:test@test.fi to sip:test@test.fi from sip:jh@test.fi Sep 9 13:36:59 siika /usr/sbin/pres-serv[7316]: INFO: Handling SUBSCRIBE sip:test@test.fi Sep 9 13:36:59 siika /usr/sbin/pres-serv[7316]: INFO: Routing locally generated NOTIFY sip:rls@127.0.0.1:5082;transport=tcp to sip:jh@test.fi from sip:test@test.fi Sep 9 13:36:59 siika /usr/sbin/pres-serv[7316]: INFO: presence [notify.c:1581]: NOTIFY sip:jh@test.fi via sip:rls@127.0.0.1:5082;transport=tcp on behalf of sip:test@test.fi for event presence Sep 9 13:36:59 siika /usr/sbin/pres-serv[7313]: INFO: <core> [tcp_main.c:2787]: quick connect for 0x7ffaa391c890 Sep 9 13:36:59 siika /usr/sbin/pres-serv[7316]: INFO: Handling in-dialog NOTIFY sip:rls@127.0.0.1:5082;transport=tcp from sip:test@test.fi to sip:jh@test.fi
at this point, 200 ok to this notify has NOT been generated yet.
then wireshark shows that 200 ok arrives from sip:jh@test.fi to the notify that presence/rls server sent to it before the above requests. after that the error is reported to syslog:
Sep 9 13:36:59 siika /usr/sbin/pres-serv[7311]: ERROR: pua [send_subscribe.c:689]: Could not convert temporary dialog into a dialog
i'm not sure how this matches your explanation, because i don't see any notify received on a new dialog before 2XX to subscribe.
the pcap file is enclosed. i'll describe in next email, how things work when rls db_mode=2.
-- juha
peter,
when rls db_mode=2, the test described in previous email works fine and very differently as in the case when db_mode=0.
the difference is that after presence/rls server receives from sip:jh@test.fi subscribe on sip:jh-buddies@test.fi, it does NOT generate notify to it before it has subscribed both members of the list (sip:test@test.fi and sip:foo@test.fi) and gotten notify from both of them:
Sep 9 13:41:42 siika /usr/sbin/sip-proxy[7469]: INFO: SUBSCRIBE sip:jh-buddies@test.fi by jh@test.fi as sip:jh@test.fi from <192.98.103.10> is authorized Sep 9 13:41:42 siika /usr/sbin/sip-proxy[7469]: INFO: Routing SUBSCRIBE sip:jh-buddies@test.fi to presence server Sep 9 13:41:42 siika /usr/sbin/pres-serv[7692]: INFO: Handling SUBSCRIBE sip:jh-buddies@test.fi Sep 9 13:41:42 siika /usr/sbin/pres-serv[7692]: INFO: Routing locally generated SUBSCRIBE sip:test@test.fi to sip:test@test.fi from sip:jh@test.fi Sep 9 13:41:42 siika /usr/sbin/pres-serv[7692]: INFO: <core> [tcp_main.c:2787]: quick connect for 0x7f091036c340 Sep 9 13:41:42 siika /usr/sbin/pres-serv[7692]: INFO: Routing locally generated SUBSCRIBE sip:foo@test.fi to sip:foo@test.fi from sip:jh@test.fi Sep 9 13:41:42 siika /usr/sbin/pres-serv[7693]: INFO: Handling SUBSCRIBE sip:test@test.fi Sep 9 13:41:42 siika /usr/sbin/pres-serv[7693]: INFO: Routing locally generated NOTIFY sip:rls@127.0.0.1:5082;transport=tcp to sip:jh@test.fi from sip:test@test.fi Sep 9 13:41:42 siika /usr/sbin/pres-serv[7693]: INFO: presence [notify.c:1581]: NOTIFY sip:jh@test.fi via sip:rls@127.0.0.1:5082;transport=tcp on behalf of sip:test@test.fi for event presence Sep 9 13:41:42 siika /usr/sbin/pres-serv[7693]: INFO: Handling SUBSCRIBE sip:foo@test.fi Sep 9 13:41:42 siika /usr/sbin/pres-serv[7693]: INFO: Routing locally generated NOTIFY sip:rls@127.0.0.1:5082;transport=tcp to sip:jh@test.fi from sip:foo@test.fi Sep 9 13:41:42 siika /usr/sbin/pres-serv[7693]: INFO: presence [notify.c:1581]: NOTIFY sip:jh@test.fi via sip:rls@127.0.0.1:5082;transport=tcp on behalf of sip:foo@test.fi for event presence Sep 9 13:41:42 siika /usr/sbin/pres-serv[7693]: INFO: Handling in-dialog NOTIFY sip:rls@127.0.0.1:5082;transport=tcp from sip:test@test.fi to sip:jh@test.fi Sep 9 13:41:42 siika /usr/sbin/pres-serv[7693]: INFO: Handling in-dialog NOTIFY sip:rls@127.0.0.1:5082;transport=tcp from sip:foo@test.fi to sip:jh@test.fi Sep 9 13:41:44 siika /usr/sbin/pres-serv[7690]: INFO: Routing locally generated NOTIFY sip:rbjfhkmc@192.98.103.10:40359;alias=192.98.103.10~41792~2;transport=tcp to sip:jh@test.fi from sip:jh-buddies@test.fi Sep 9 13:41:44 siika /usr/sbin/pres-serv[7690]: INFO: <core> [tcp_main.c:2787]: quick connect for 0x7f09103bad18 Sep 9 13:41:44 siika /usr/sbin/sip-proxy[7475]: INFO: Routing in-dialog NOTIFY sip:rbjfhkmc@192.98.103.10:40359;transport=tcp from sip:jh-buddies@test.fi to sip:192.98.103.10:41792;transport=tcp
the notify to sip:jh@test.fi thus contains real presence status of both of the list members.
why the behavior changes depending on rls db_mode? pcap of this db_mode=2 test is enclosed.
-- juha
Hi Juha,
The temporary dialogs are in the pua table, not the rls tables. The rls db_mode parameter has no direct affect on the pua module.
However, this kind of issue is timing related. So by changing the rls db_mode you have changed the timing on your system and the problem has gone away - for now.
Regards,
Peter
peter,
when rls db_mode=2, the test described in previous email works fine and very differently as in the case when db_mode=0.
the difference is that after presence/rls server receives from sip:jh@test.fi subscribe on sip:jh-buddies@test.fi, it does NOT generate notify to it before it has subscribed both members of the list (sip:test@test.fi and sip:foo@test.fi) and gotten notify from both of them:
Sep 9 13:41:42 siika /usr/sbin/sip-proxy[7469]: INFO: SUBSCRIBE sip:jh-buddies@test.fi by jh@test.fi as sip:jh@test.fi from <192.98.103.10> is authorized Sep 9 13:41:42 siika /usr/sbin/sip-proxy[7469]: INFO: Routing SUBSCRIBE sip:jh-buddies@test.fi to presence server Sep 9 13:41:42 siika /usr/sbin/pres-serv[7692]: INFO: Handling SUBSCRIBE sip:jh-buddies@test.fi Sep 9 13:41:42 siika /usr/sbin/pres-serv[7692]: INFO: Routing locally generated SUBSCRIBE sip:test@test.fi to sip:test@test.fi from sip:jh@test.fi Sep 9 13:41:42 siika /usr/sbin/pres-serv[7692]: INFO: <core> [tcp_main.c:2787]: quick connect for 0x7f091036c340 Sep 9 13:41:42 siika /usr/sbin/pres-serv[7692]: INFO: Routing locally generated SUBSCRIBE sip:foo@test.fi to sip:foo@test.fi from sip:jh@test.fi Sep 9 13:41:42 siika /usr/sbin/pres-serv[7693]: INFO: Handling SUBSCRIBE sip:test@test.fi Sep 9 13:41:42 siika /usr/sbin/pres-serv[7693]: INFO: Routing locally generated NOTIFY sip:rls@127.0.0.1:5082;transport=tcp to sip:jh@test.fi from sip:test@test.fi Sep 9 13:41:42 siika /usr/sbin/pres-serv[7693]: INFO: presence [notify.c:1581]: NOTIFY sip:jh@test.fi via sip:rls@127.0.0.1:5082;transport=tcp on behalf of sip:test@test.fi for event presence Sep 9 13:41:42 siika /usr/sbin/pres-serv[7693]: INFO: Handling SUBSCRIBE sip:foo@test.fi Sep 9 13:41:42 siika /usr/sbin/pres-serv[7693]: INFO: Routing locally generated NOTIFY sip:rls@127.0.0.1:5082;transport=tcp to sip:jh@test.fi from sip:foo@test.fi Sep 9 13:41:42 siika /usr/sbin/pres-serv[7693]: INFO: presence [notify.c:1581]: NOTIFY sip:jh@test.fi via sip:rls@127.0.0.1:5082;transport=tcp on behalf of sip:foo@test.fi for event presence Sep 9 13:41:42 siika /usr/sbin/pres-serv[7693]: INFO: Handling in-dialog NOTIFY sip:rls@127.0.0.1:5082;transport=tcp from sip:test@test.fi to sip:jh@test.fi Sep 9 13:41:42 siika /usr/sbin/pres-serv[7693]: INFO: Handling in-dialog NOTIFY sip:rls@127.0.0.1:5082;transport=tcp from sip:foo@test.fi to sip:jh@test.fi Sep 9 13:41:44 siika /usr/sbin/pres-serv[7690]: INFO: Routing locally generated NOTIFY sip:rbjfhkmc@192.98.103.10:40359;alias=192.98.103.10~41792~2;transport=tcp to sip:jh@test.fi from sip:jh-buddies@test.fi Sep 9 13:41:44 siika /usr/sbin/pres-serv[7690]: INFO: <core> [tcp_main.c:2787]: quick connect for 0x7f09103bad18 Sep 9 13:41:44 siika /usr/sbin/sip-proxy[7475]: INFO: Routing in-dialog NOTIFY sip:rbjfhkmc@192.98.103.10:40359;transport=tcp from sip:jh-buddies@test.fi to sip:192.98.103.10:41792;transport=tcp
the notify to sip:jh@test.fi thus contains real presence status of both of the list members.
why the behavior changes depending on rls db_mode? pcap of this db_mode=2 test is enclosed.
-- juha
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
Peter Dunkley writes:
However, this kind of issue is timing related. So by changing the rls db_mode you have changed the timing on your system and the problem has gone away - for now.
peter,
in case of rls db_mode=0, rls server sends notify to subscriber 3 ms after it has sent 200 ok to subscribe request.
in case of rls db_mode=2, rls server starts to send subscribe requests to list members about 100 ms after it has sent sent 200 ok to subscribe request and waits until it has received all notify requests from members before it sends notify to subscriber.
i have hard time understanding, why in db_mode=0 rls server sends notify to subscriber so fast and does not wait for notify requests from list members.
anyway, the pua error that happens when rls db_mode=0, is very serious. after the pua error message, i have noticed that presence server gets totally stuck and stops processing or receiving any new request.
-- juha
Hi,
When RLS is in DB only mode it uses separate notifier processes. This means that NOTIFYs are never sent immediately in DB only mode, but may be in non DB only mode. The use of separate notifier processes fixes a number of race conditions (but not the one you are seeing), spreads the NOTIFY generation load evenly over time, and makes the RLS module behaviour closer to that from the RFC. However, the implementation of the notifier processes is such that DB only mode is a pre-requisite.
When RLS receives a SUBSCRIBE and it is not in DB only mode it will send an immediate full-state NOTIFY containing the list of contacts and any presentity information present for this dialog in the rls_presentity table. When this is an initial SUBSCRIBE there will be nothing relating to the dialog in the rls_presentities table. This means that the first NOTIFY sent from RLS will just be a full-state NOTIFY containing a list of the contacts. The NOTIFY requests containing aggregated presentities from the contacts will start to come out (on timed intervals) once NOTIFYs have been received back from presence.
When RLS receives a SUBSCRIBE and it is in DB only mode (and the notifier parameters are at the default values), RLS will send a full-state NOTIFY within five seconds. This full-state NOTIFY will contain the list of contacts and any presentity information present for this dialog in the rls_presentity table. Because this NOTIFY will be sent up to five seconds after the SUBSCRIBE was received there is a good possibility that some (or all) back-end subscriptions will have been created and notified. This means that the first NOTIFY sent from RLS may just be a full-state NOTIFY containing a list of the contacts, but it may also be a full-state NOTIFY containing a list of contacts and some (or all) of the contacts presentities.
However, the "temporary dialog" error relates only to back-end traffic between RLS and presence (the pua database table or hash table). The DB mode parameter in RLS relates only to front-end traffic between the user-agent and RLS (the rls_watchers database table or hash table). So changing the DB mode of RLS has no direct effect on how pua uses its database table or hash table - but it may change the timing (additional processes are started, the order things happen changes, and so on).
Regards,
Peter
Peter Dunkley writes:
However, this kind of issue is timing related. So by changing the rls db_mode you have changed the timing on your system and the problem has gone away - for now.
peter,
in case of rls db_mode=0, rls server sends notify to subscriber 3 ms after it has sent 200 ok to subscribe request.
in case of rls db_mode=2, rls server starts to send subscribe requests to list members about 100 ms after it has sent sent 200 ok to subscribe request and waits until it has received all notify requests from members before it sends notify to subscriber.
i have hard time understanding, why in db_mode=0 rls server sends notify to subscriber so fast and does not wait for notify requests from list members.
anyway, the pua error that happens when rls db_mode=0, is very serious. after the pua error message, i have noticed that presence server gets totally stuck and stops processing or receiving any new request.
-- juha
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
peter,
mysql has two table locking modes: READ, which seems to correspond to postgre EXCLUSIVE lock, and WRITE, which seems to correspond to postgre ACCESS EXCLUSIVE lock. the difference seems to be that in postgre table locking can only be done inside transaction whereas in mysql table can be locked also outside of transaction.
i didn't find in the sources a description of current kamailio db api. lib/srdb1/README claims that it is in doc directory, but i have not found it. perhaps someone more familiar with kamailio db api could check if also table locking could be added to db_mysql module or if the transaction difference is a show stopper.
-- juha