New subject: Kamailio 5.0 - Better way of dealing with stateless CANCELs

14 Mar 2016


      Currently (AFAIK), restarting Kamailio amidst production call processing 
is basically "safe"; except for no availability during the few seconds 
it takes to restart (which will just result in retransmissions until it 
is available), most things will happen "correctly" after restart even 
though TM state has been lost:
(1) Initial requests will be routed as initial requests always were;
(2) In-dialog requests will be loose-routed as sequential requests 
always were;
(3) Replies to open transactions will fall back to stateless routing but 
will be delivered correctly to their destinations based on SIP 
fundamentals (i.e. Via).
(4) rtpproxy & rtpengine control messages are grouped by Call-ID, so 
also stateless. If the proper destroy/remove functions are not called 
from failure_route[] due to lack of TM state, it's not so bad; rtpproxy 
& rtpengine will see an RTP timeout after a while and expire the 
bindings on their own.
If dialog state is used, it will be lost, but assuming one is willing to 
live with that, it's okay. I don't know if there has been any work done 
to create a persistence layer for dialog that can be re-read completely 
on startup, and if it actually works - does it? - but it's a relatively 
small price to pay if it's important to integrate a change into 
production in the middle of the day.
The one exception is CANCEL handling. CANCEL is a special animal, since 
it's a hop-by-hop (branch-level) request, so CANCELs sent from a caller 
apply to the 'caller -> Kamailio' branch. Kamailio generates separate 
CANCELs endogenously for one or more 'Kamailio -> gateway' branches.
Stateful CANCEL handling with TM is implemented using t_check_trans() or 
t_relay_cancel(). For example, in the stock config[1]
# CANCEL processing
    if (is_method("CANCEL")) {
       if (t_check_trans()) {
          route(RELAY);
        }
exit;
    }
Or, as in our case, more folklorically:
if(is_method("CANCEL")) {
       if(!t_relay_cancel()) {
          # Corresponding INVITE transaction found, but error
          # occurred.
sl_send_reply("500", "Internal Server Error");
          exit;
       }
# Corresponding INVITE transaction for CANCEL was not
       # found.
exit;
    }
In both cases, the corresponding INVITE transaction must exist.
Unfortunately, there's no good alternative. According to RFC 3261 
Section 16.11 ("Stateless Proxy"):
Stateless proxies MUST NOT perform special processing for
    CANCEL requests. They are processed by the above rules
    as any other requests.
So, in other words, route_logic(CANCEL) == route_logic(initial INVITE).
Sometimes, this is possible - with considerable config logic labour - 
but other times the path taken by the CANCEL is not so deterministic, as 
for example with round-robin load balancing, random distribution, 
complex LCR, etc. One is basically left in these situations with the 
choice of implementing one's own Call-ID/branch => destination 
persistent state database of some kind, which is, to put it mildly, 
complicated and undesirable.
Now, if the INVITE transaction receives a final negative reply, this 
will get back to the calling UAC, and it will process it correctly. 
However, some calls get answered with 2xx. Many UACs will behave 
reasonably in this situation: when they don't receive a 200 OK for their 
CANCELs but later receive an answer, they will go ahead and send the 
end-to-end ACK, then BYE the call. However, they cannot be reliably 
counted upon to do this. Some simply get drop the INVITE transaction 
after their CANCEL has gone unreplied for a short time, regardless of 
whether they receive a final negative reply for it.
Is there a better way? Perhaps a feature can be devised by which 
Kamailio keeps some kind of lightweight and restart-persistent map to 
which to send the CANCELs? Or perhaps TM is due for a feature that 
allows the shm transaction table to be dumped to disk and persisted 
across restarts?
Comments welcome. Also, if I'm missing something, please let me know!
-- Alex
[1] https://github.com/kamailio/kamailio/blob/master/etc/kamailio.cfg#L466
-- 
Alex Balashov | Principal | Evariste Systems LLC
1447 Peachtree Street NE, Suite 700
Atlanta, GA 30309
United States

Tel: +1-800-250-5920 (toll-free) / +1-678-954-0671 (direct)
Web: http://www.evaristesys.com/, http://www.csrpswitch.com/