[OpenSER-Users] 200 OK retransmissions on missing ACK can cause subsequent calls to fail

Sean O'Donnell skodonnell at ureach.com
Tue Apr 22 18:17:52 CEST 2008


Hi all,

I’m using openser as a call distributor/proxy between a soft-switch/SBC and
voicemail platform.  I’m seeing a problem with openser in that it is sometimes
cancels an in-progress call (fr_inv_timer firing) because it didn’t match the
200/OK with the call.

After some investigation, I noticed that this was happening after a missing ACK
on a previous call caused the voicemail platform to retransmit 200/OK responses
beyond the TM wt_timer expiration, which in turn left several openser child
processes (those that received a 200 after wt_timer expiration) in a state such
that they might not properly match transactions on subsequent calls.  

My setup:
I have openser 1.2.0 operating on a linux box with two network interfaces, with
one interface (call it the outside interface) taking incoming calls from the
soft-switch, and the other (inside) connected to the VM platform.  I have
openser configured to use both interfaces (see config below) and the TM wt_timer
set to 5 seconds (default).  As this is a voicemail system, all of the call
traffic is inbound from the soft-switch.   Given the traffic flow, for the most
part the openser child processes servicing the inside interface are handling
responses (180,183,200) from the VM platform.

Call scenario:
When an INVITE arrives from the soft-switch, openser forwards it to the VM
platform.  The VM platform responds with a 180 and then a 200.  I've noticed
several instances where the soft-switch did not respond with an ACK.  This
caused the VM platform to retransmit the 200 several times over a 10 second
period.   These were absorbed correctly by openser for the duration of wt_timer.
 After the timer expired, however, each openser child process that received a
retransmitted 200 logged something like this:
 4(2715) DEBUG: t_reply_matching: hash 45870 label 727647196 branch 0
 4(2715) DEBUG: t_reply_matching: no matching transaction exists
 4(2715) DEBUG: t_reply_matching: failure to match a transaction
 4(2715) DEBUG: t_check: end=(nil)

When I look at the TM code, the static variable T in t_lookup.c is now NULL for
this child process.  

On a subsequent inbound call, the INVITE is passed to the VM correctly, and the
180 transaction matches (causing the fr_inv_timer to be armed).  If the 200 is
read by child proc 2715, I see: 
 4(2715) DEBUG: t_check: start=(nil)
 4(2715) DEBUG: t_check: T previously sought and not found

The 200 is forwarded back to the soft-switch, which responds with an ACK.  Both
end-points think the call is up, but since openser never matched the 200 with
the call, the fr_inv_timer fires and cancels the call.   Basically, child proc
2715 won’t match any transaction after this unless it happens to process a
request.

I think this problem is made worse by the fact that I’m using two network
interfaces, and that the openser children on the inside interface handle (for
the most part) only responses.  This problem was touched on here:
http://lists.openser.org/pipermail/users/2007-November/014188.html   but I
didn’t see any follow up.    Also, I’ve checked openser 1.2.3 and 1.3.1 for
fixes, but I don’t think this has been addressed.

I have a work around, I think, by upping the wt_timer to something like 15
seconds, but I was wondering if there is any scenario in which leaving T=NULL is
desirable.

Thanks in advance
Sean


==== CONFIG =====
#
# VMPROXY-PROXY - route between softswitch and media servers
#

# ----------- global configuration parameters ------------------------

check_via=yes   # (cmd. line: -v)
dns=no     # (cmd. line: -r)
rev_dns=no      # (cmd. line: -R)
children=8

# multi-homed (i.e. listen on two or more interfaces)
mhomed=1
alias=sipproxy1
alias=sipproxy1.vmproxy.com

port=5060
#inside address
listen=192.168.1.100
#outside address
listen=10.55.2.100

disable_dns_blacklist=yes

# ------------------ module loading ----------------------------------

# Uncomment this if you want to use SQL database
#loadmodule "/usr/local/openser/lib/openser/modules/mysql.so"
mpath="/usr/local/openser/lib/openser/modules"
loadmodule "sl.so"
loadmodule "tm.so"
loadmodule "rr.so"
loadmodule "maxfwd.so"
loadmodule "usrloc.so"
loadmodule "registrar.so"
loadmodule "exec.so"
loadmodule "xlog.so"
loadmodule "textops.so"
loadmodule "options.so"
loadmodule "mi_fifo.so"

# Uncomment this if you want digest authentication
# mysql.so must be loaded !
#loadmodule "/usr/local/openser/lib/openser/modules/auth.so"
#loadmodule "/usr/local/openser/lib/openser/modules/auth_db.so"

# ----------------- setting module-specific parameters ---------------

# -- usrloc params --
# local memory
#modparam("usrloc", "db_mode",   0)
# -- rr params --
# add value to ;lr param to make some broken UAs happy
modparam("rr", "enable_full_lr", 1)
# uncomment this for multiple record routes
#modparam("rr", "enable_double_rr", 1)

# -- tm params --
modparam("tm", "fr_timer", 5)
modparam("tm", "fr_inv_timer", 60)

# -- mi_fifo parms
modparam("mi_fifo", "fifo_name", "/tmp/openser_fifo_proxy")

# -------------------------  request routing logic -------------------

# main routing logic

route{

        # initial sanity checks -- messages with
        # max_forwards==0, or excessively long requests
        if (!mf_process_maxfwd_header("10")) {
                sl_send_reply("483","Too Many Hops");
                return;
        };
        if ( msg:len > max_len ) {
                sl_send_reply("513", "Message too big");
                return;
        };

        xlog("L_INFO","$Tf VMPROXY: <$rm> message From:<$fu> To:<$tu>
srcip:<$si>\n");

        #
        # we record-route all messages -- to make sure that
        # subsequent messages will go through our proxy; that's
        # particularly good if upstream and downstream entities
        # use different transport protocol
        #
        record_route();

        #
        # loose-route processing
        # messages that already have routes on them are proxied here
        #
        if (loose_route()) {
                t_relay();
                return;
        };

        #
        # message not for us, route it through
        #
        if ( !(uri==myself) ) {
                if (!t_relay()) {
                        sl_reply_error();
                };
                return;
        };

        xlog("L_DBG","$Tf VMPROXY: <$rm> for me\n");

        # handle options meant for me
        if( method == "OPTIONS" ) {
                options_reply();
                return;
        };

        if( method == "REGISTER" || method == "PUBLISH" || method == "SUBSCRIBE"
|| method == "MESSAGE" ) {
                sl_send_reply("405","Method Not Allowed");
                return;
        };


        # route INVITE based on source
        if( method == "INVITE" ) {
                exec_dset("/usr/local/vmproxy/bin/routeINVITE.sh");
                xlog("L_DBG","$Tf VMPROXY: <$rm> after routeINVITE: new
destination <$ru>\n");
                t_on_failure("1");
        };

        # route NOTIFY to softswitch
        if( method == "NOTIFY" ) {
                exec_dset("/usr/local/vmproxy/bin/routeNOTIFY.sh");
                xlog("L_DBG","$Tf VMPROXY: <$rm> after routeNOTIFY: new
destination <$ru>\n");
                t_on_failure("2");
        };

        #
        # NOTE: this logic is only executed if the request didn't have
        # a route on it (tsk,tsk).  If it did, it would have been handed via
        # loose route function above.
        if( method == "ACK"  || method == "BYE" || method == "CANCEL" || method
== "PRACK" ) {
                exec_dset("/usr/local/vmproxy/bin/routeACK.sh");
                xlog("L_DBG","$Tf VMPROXY: <$rm> after routeACK: new destination
<$ru>\n");
         };

        # routing script error?
        if( uri=="sip:err at 127.0.0.1" ) {
                sl_reply_error();
                return;
        };

        # forward to current uri
        if (!t_relay()) {
                sl_reply_error();
        };

}

#
# Initial INVITE route failed, retry
#
failure_route[1] {
        # endpoint canceled call
        if( t_was_cancelled()) {
                xlog("L_DBG", "$Tf VMPROXY: call <id:$ci> cancelled\n");
                return;
        }

        # don't retry on 40[0-7|5xx failures
        if( !t_check_status("40[0-7]|420|487|488|5[0-9][0-9]")) {
                xlog("L_ERR","$Tf VMPROXY: ERROR routing call <$rm> from <$fu>
to <$ru>\n");
                exec_dset("/usr/local/vmproxy/bin/routeINVITE2.sh");
                xlog("L_ERR","$Tf VMPROXY: Rerouting call <$rm> to <$ru>\n");
                append_branch();
                t_relay();
        }
}

#
# Initial NOTIFY route failed, retry
#
failure_route[2] {
        # endpoint canceled call
        if( t_was_cancelled()) {
                xlog("L_DBG", "$Tf VMPROXY: call <id:$ci> cancelled\n");
                return;
        }

        # don't retry on 40[0-7|5xx failures
        if( !t_check_status("40[0-7]|5[0-9][0-9]")) {
                xlog("L_ERR","$Tf VMPROXY: ERROR routing <$rm> from <$fu> to
<$ru>\n");
                exec_dset("/usr/local/vmproxy/bin/routeNOTIFY2.sh");
                xlog("L_ERR","$Tf VMPROXY: Rerouting <$rm> to <$ru>\n");
                append_branch();
                t_relay();
        }
}






More information about the sr-users mailing list