Hi all,
Im using openser as a call distributor/proxy between a soft-switch/SBC and voicemail platform. Im seeing a problem with openser in that it is sometimes cancels an in-progress call (fr_inv_timer firing) because it didnt match the 200/OK with the call.
After some investigation, I noticed that this was happening after a missing ACK on a previous call caused the voicemail platform to retransmit 200/OK responses beyond the TM wt_timer expiration, which in turn left several openser child processes (those that received a 200 after wt_timer expiration) in a state such that they might not properly match transactions on subsequent calls.
My setup: I have openser 1.2.0 operating on a linux box with two network interfaces, with one interface (call it the outside interface) taking incoming calls from the soft-switch, and the other (inside) connected to the VM platform. I have openser configured to use both interfaces (see config below) and the TM wt_timer set to 5 seconds (default). As this is a voicemail system, all of the call traffic is inbound from the soft-switch. Given the traffic flow, for the most part the openser child processes servicing the inside interface are handling responses (180,183,200) from the VM platform.
Call scenario: When an INVITE arrives from the soft-switch, openser forwards it to the VM platform. The VM platform responds with a 180 and then a 200. I've noticed several instances where the soft-switch did not respond with an ACK. This caused the VM platform to retransmit the 200 several times over a 10 second period. These were absorbed correctly by openser for the duration of wt_timer. After the timer expired, however, each openser child process that received a retransmitted 200 logged something like this: 4(2715) DEBUG: t_reply_matching: hash 45870 label 727647196 branch 0 4(2715) DEBUG: t_reply_matching: no matching transaction exists 4(2715) DEBUG: t_reply_matching: failure to match a transaction 4(2715) DEBUG: t_check: end=(nil)
When I look at the TM code, the static variable T in t_lookup.c is now NULL for this child process.
On a subsequent inbound call, the INVITE is passed to the VM correctly, and the 180 transaction matches (causing the fr_inv_timer to be armed). If the 200 is read by child proc 2715, I see: 4(2715) DEBUG: t_check: start=(nil) 4(2715) DEBUG: t_check: T previously sought and not found
The 200 is forwarded back to the soft-switch, which responds with an ACK. Both end-points think the call is up, but since openser never matched the 200 with the call, the fr_inv_timer fires and cancels the call. Basically, child proc 2715 wont match any transaction after this unless it happens to process a request.
I think this problem is made worse by the fact that Im using two network interfaces, and that the openser children on the inside interface handle (for the most part) only responses. This problem was touched on here: http://lists.openser.org/pipermail/users/2007-November/014188.html but I didnt see any follow up. Also, Ive checked openser 1.2.3 and 1.3.1 for fixes, but I dont think this has been addressed.
I have a work around, I think, by upping the wt_timer to something like 15 seconds, but I was wondering if there is any scenario in which leaving T=NULL is desirable.
Thanks in advance Sean
==== CONFIG ===== # # VMPROXY-PROXY - route between softswitch and media servers #
# ----------- global configuration parameters ------------------------
check_via=yes # (cmd. line: -v) dns=no # (cmd. line: -r) rev_dns=no # (cmd. line: -R) children=8
# multi-homed (i.e. listen on two or more interfaces) mhomed=1 alias=sipproxy1 alias=sipproxy1.vmproxy.com
port=5060 #inside address listen=192.168.1.100 #outside address listen=10.55.2.100
disable_dns_blacklist=yes
# ------------------ module loading ----------------------------------
# Uncomment this if you want to use SQL database #loadmodule "/usr/local/openser/lib/openser/modules/mysql.so" mpath="/usr/local/openser/lib/openser/modules" loadmodule "sl.so" loadmodule "tm.so" loadmodule "rr.so" loadmodule "maxfwd.so" loadmodule "usrloc.so" loadmodule "registrar.so" loadmodule "exec.so" loadmodule "xlog.so" loadmodule "textops.so" loadmodule "options.so" loadmodule "mi_fifo.so"
# Uncomment this if you want digest authentication # mysql.so must be loaded ! #loadmodule "/usr/local/openser/lib/openser/modules/auth.so" #loadmodule "/usr/local/openser/lib/openser/modules/auth_db.so"
# ----------------- setting module-specific parameters ---------------
# -- usrloc params -- # local memory #modparam("usrloc", "db_mode", 0) # -- rr params -- # add value to ;lr param to make some broken UAs happy modparam("rr", "enable_full_lr", 1) # uncomment this for multiple record routes #modparam("rr", "enable_double_rr", 1)
# -- tm params -- modparam("tm", "fr_timer", 5) modparam("tm", "fr_inv_timer", 60)
# -- mi_fifo parms modparam("mi_fifo", "fifo_name", "/tmp/openser_fifo_proxy")
# ------------------------- request routing logic -------------------
# main routing logic
route{
# initial sanity checks -- messages with # max_forwards==0, or excessively long requests if (!mf_process_maxfwd_header("10")) { sl_send_reply("483","Too Many Hops"); return; }; if ( msg:len > max_len ) { sl_send_reply("513", "Message too big"); return; };
xlog("L_INFO","$Tf VMPROXY: <$rm> message From:<$fu> To:<$tu> srcip:<$si>\n");
# # we record-route all messages -- to make sure that # subsequent messages will go through our proxy; that's # particularly good if upstream and downstream entities # use different transport protocol # record_route();
# # loose-route processing # messages that already have routes on them are proxied here # if (loose_route()) { t_relay(); return; };
# # message not for us, route it through # if ( !(uri==myself) ) { if (!t_relay()) { sl_reply_error(); }; return; };
xlog("L_DBG","$Tf VMPROXY: <$rm> for me\n");
# handle options meant for me if( method == "OPTIONS" ) { options_reply(); return; };
if( method == "REGISTER" || method == "PUBLISH" || method == "SUBSCRIBE" || method == "MESSAGE" ) { sl_send_reply("405","Method Not Allowed"); return; };
# route INVITE based on source if( method == "INVITE" ) { exec_dset("/usr/local/vmproxy/bin/routeINVITE.sh"); xlog("L_DBG","$Tf VMPROXY: <$rm> after routeINVITE: new destination <$ru>\n"); t_on_failure("1"); };
# route NOTIFY to softswitch if( method == "NOTIFY" ) { exec_dset("/usr/local/vmproxy/bin/routeNOTIFY.sh"); xlog("L_DBG","$Tf VMPROXY: <$rm> after routeNOTIFY: new destination <$ru>\n"); t_on_failure("2"); };
# # NOTE: this logic is only executed if the request didn't have # a route on it (tsk,tsk). If it did, it would have been handed via # loose route function above. if( method == "ACK" || method == "BYE" || method == "CANCEL" || method == "PRACK" ) { exec_dset("/usr/local/vmproxy/bin/routeACK.sh"); xlog("L_DBG","$Tf VMPROXY: <$rm> after routeACK: new destination <$ru>\n"); };
# routing script error? if( uri=="sip:err@127.0.0.1" ) { sl_reply_error(); return; };
# forward to current uri if (!t_relay()) { sl_reply_error(); };
}
# # Initial INVITE route failed, retry # failure_route[1] { # endpoint canceled call if( t_was_cancelled()) { xlog("L_DBG", "$Tf VMPROXY: call id:$ci cancelled\n"); return; }
# don't retry on 40[0-7|5xx failures if( !t_check_status("40[0-7]|420|487|488|5[0-9][0-9]")) { xlog("L_ERR","$Tf VMPROXY: ERROR routing call <$rm> from <$fu> to <$ru>\n"); exec_dset("/usr/local/vmproxy/bin/routeINVITE2.sh"); xlog("L_ERR","$Tf VMPROXY: Rerouting call <$rm> to <$ru>\n"); append_branch(); t_relay(); } }
# # Initial NOTIFY route failed, retry # failure_route[2] { # endpoint canceled call if( t_was_cancelled()) { xlog("L_DBG", "$Tf VMPROXY: call id:$ci cancelled\n"); return; }
# don't retry on 40[0-7|5xx failures if( !t_check_status("40[0-7]|5[0-9][0-9]")) { xlog("L_ERR","$Tf VMPROXY: ERROR routing <$rm> from <$fu> to <$ru>\n"); exec_dset("/usr/local/vmproxy/bin/routeNOTIFY2.sh"); xlog("L_ERR","$Tf VMPROXY: Rerouting <$rm> to <$ru>\n"); append_branch(); t_relay(); } }
Hi Sean,
Yes, t_check() sets T as NULL if no transaction is matched, but the reply_received() function (that calls t_check), if T was set to NULL will go to "not_found" label and set T to T_UNDEFINED.
Do you agree on this? if so, we can start working in adding some more debug logs to see where the problem is.
Regards, Bogdan
Sean O'Donnell wrote:
Hi all,
I’m using openser as a call distributor/proxy between a soft-switch/SBC and voicemail platform. I’m seeing a problem with openser in that it is sometimes cancels an in-progress call (fr_inv_timer firing) because it didn’t match the 200/OK with the call.
After some investigation, I noticed that this was happening after a missing ACK on a previous call caused the voicemail platform to retransmit 200/OK responses beyond the TM wt_timer expiration, which in turn left several openser child processes (those that received a 200 after wt_timer expiration) in a state such that they might not properly match transactions on subsequent calls.
My setup: I have openser 1.2.0 operating on a linux box with two network interfaces, with one interface (call it the outside interface) taking incoming calls from the soft-switch, and the other (inside) connected to the VM platform. I have openser configured to use both interfaces (see config below) and the TM wt_timer set to 5 seconds (default). As this is a voicemail system, all of the call traffic is inbound from the soft-switch. Given the traffic flow, for the most part the openser child processes servicing the inside interface are handling responses (180,183,200) from the VM platform.
Call scenario: When an INVITE arrives from the soft-switch, openser forwards it to the VM platform. The VM platform responds with a 180 and then a 200. I've noticed several instances where the soft-switch did not respond with an ACK. This caused the VM platform to retransmit the 200 several times over a 10 second period. These were absorbed correctly by openser for the duration of wt_timer. After the timer expired, however, each openser child process that received a retransmitted 200 logged something like this: 4(2715) DEBUG: t_reply_matching: hash 45870 label 727647196 branch 0 4(2715) DEBUG: t_reply_matching: no matching transaction exists 4(2715) DEBUG: t_reply_matching: failure to match a transaction 4(2715) DEBUG: t_check: end=(nil)
When I look at the TM code, the static variable T in t_lookup.c is now NULL for this child process.
On a subsequent inbound call, the INVITE is passed to the VM correctly, and the 180 transaction matches (causing the fr_inv_timer to be armed). If the 200 is read by child proc 2715, I see: 4(2715) DEBUG: t_check: start=(nil) 4(2715) DEBUG: t_check: T previously sought and not found
The 200 is forwarded back to the soft-switch, which responds with an ACK. Both end-points think the call is up, but since openser never matched the 200 with the call, the fr_inv_timer fires and cancels the call. Basically, child proc 2715 won’t match any transaction after this unless it happens to process a request.
I think this problem is made worse by the fact that I’m using two network interfaces, and that the openser children on the inside interface handle (for the most part) only responses. This problem was touched on here: http://lists.openser.org/pipermail/users/2007-November/014188.html but I didn’t see any follow up. Also, I’ve checked openser 1.2.3 and 1.3.1 for fixes, but I don’t think this has been addressed.
I have a work around, I think, by upping the wt_timer to something like 15 seconds, but I was wondering if there is any scenario in which leaving T=NULL is desirable.
Thanks in advance Sean