Folks,
I've been having a bit of a battle with a concurrency issue.
If we have a reasonable number of contacts in an RLS resource list (around 50 does it on my test server), we see a get the following error message thrown up between 2 and 6 times whenever the client logs in. ERROR: rls [resource_notify.c:663]: no presence dialog record for non-TERMINATED state uri pres_uri = sip:0033@lab8.croc.internal watcher_uri = sip:ernie@lab8.croc.internal (I've extended the debug here to include the URIs, so I can see what is not being found)
It is not always the same URIs that go missing, nor is it always the same number of faults.
On investigation this turns out to be a race condition.
subs_cback_func (pua/send_subscribe.c) locks the presentity hash table and inserts a dialog entry when it receives a 200 to the subscribe. rls_handle_notify (rls/resource_notify.c) calls pua_get_record_id (pua/hash.c get_record_id()) which also locks the presentity hash table looks up the dialog.
It seems that in some cases the NOTIFY is getting the lock before the 200 to the SUBSCRIBE. Thus the NOTIFY handler is looking for the dialog before the 200 handler has inserted it.
I attempted to insert a dialog entry in the hash table on sending the SUBSCRIBE, unfortunately this did not cure the problem
Has anyone any suggestions for the cleanest and easiest method to ensure that the 200 is handled before the NOTIFY?
Andy Miller Crocodile RCS