On 1/30/12 6:35 PM, Peter Dunkley wrote:

Any retransmission will cause the problem, so anyone using UDP over the Internet to a Kamailio presence server where there is occasional packet-loss will see it.  It was just first noticed here under heavy load.

By creating a new transaction and absorbing the retransmissions, do you mean calling t_newtran()/t_release() when the SUBSCRIBE is received?

yes, like in default config, using t_newtran() before handling the subscribe. t_check_trans() is used to figure out of there is already a transaction for that request and absorbs the request if it is retransmissions.

Not sure t_release() is explicitly needed anymore, Andrei did some work long time ago in this area, iirc, but if used it is harmless, so it is still in the default config.


If so I didn't think of that.  It'd make sense to do that too.  I think the presence module should cope with retransmissions (especially as we need it to cope in a multi-server environment with load-balancers/fail-over and a shared database).  But if using t_newtran()/t_release() will handle retransmissions in the normal case it should help reduce the load on the database.



On Mon, 2012-01-30 at 18:26 +0100, Daniel-Constantin Mierla wrote:

it can be held for next minor release to be tested more, if you feel it is better (we have to include something there as well :-) ). From commit log I understood is happening usually under RLS heavy load with retransmissions, does not help creating the transaction and absorbing the retransmissions with tm?


On 1/30/12 6:19 PM, Peter Dunkley wrote:

I believe that this bug also affects the 3.2 branch, but the change is quite big and with the next release of 3.2 due tomorrow I thought it best to hold off "cherry-pick"ing it until after the release.  That is, unless anyone else thinks it should go in there?



On Mon, 2012-01-30 at 18:16 +0100, Peter Dunkley wrote:
Module: sip-router
Branch: master
Commit: e6a50c5c0957a5ad3e08e57ede5be775a41ac24f
URL:    http://git.sip-router.org/cgi-bin/gitweb.cgi/sip-router/?a=commit;h=e6a50c5c0957a5ad3e08e57ede5be775a41ac24f

Author: pd <peter.dunkley@crocodile-rcs.com>
Committer: pd <peter.dunkley@crocodile-rcs.com>
Date:   Mon Jan 30 17:06:42 2012 +0000

modules_k/presence: Improved handling of retransmitted SUBSCRIBE requests

- handle_subscribe() doesn't handle retransmitted SUBSCRIBEs properly. This was
  noticed with back-end SUBSCRIBEs from RLS under heavy load (also tried TCP
  here but under-load this caused a different set of problems with buffer
  sizes and buffers taking too long to process).
- Although this was originally observed with RLS back-end SUBSCRIBEs it
  appears to be a general issue when UDP is used.
- There were two main problems:
  1) On an un-SUBSCRIBE the record in the hash-table or DB will be removed.  If
     the un-SUBSCRIBE is retransmitted there is no record to be found and
     handle_subscribe() fails.
  2) After fixing 1, and on re-SUBSCRIBE, remote CSeq's with lower than
     expected values cause failures.  This can also happen when there are
- The fix was to catch both these cases and treat them as a special class of
  error.  In these two cases and when the protocol is UDP, a correct-looking
  2XX response is sent, but no further processing (database updates, sending
  NOTIFY, and so on) is performed on the SUBSCRIBE request.
- Also modified the query in get_database_info() to just use Call-ID, To-tag,
  and From-tag for dialog matching (so it duplicates the query from
  get_stored_info()) as the query that was there looked a little aggressive.

Daniel-Constantin Mierla -- http://www.asipto.com
http://linkedin.com/in/miconda -- http://twitter.com/miconda