Jeroen van Bemmel wrote:
Frank,
RFC3261 specifically specifies that proxies should convert response code 503 into 500:
A proxy which receives a 503 (Service Unavailable) response SHOULD NOT forward it upstream unless it can determine that any subsequent requests it might proxy will also generate a 503. In other words, forwarding a 503 means that the proxy knows it cannot service any requests, not just the one for the Request- URI in the request which generated the 503. If the only response that was received is a 503, the proxy SHOULD generate a 500 response and forward that upstream.
So SER is behaving in accordance with the standard. There might be a configuration option to turn this off?
Well, yes and no. Thanks for pointing out this item in the RFC which I missed, but if the RFC action is honored, then SER should have emitted "500 Server Internal Error" which is in the RFC, and not the hybrid and made-up "500 Service Unavailable", which is not in the RFC. So, I think SER is wrong at least on that point.
(Personally I think SIP desperately needs at least 20 additional defined Response Codes so we can all quit using the existing not-entirely-inappropriate values to cover real-life situations, but right now the book says that's all the codes that we've all got to work with. When you see a dozen SS7 release codes all map to the same SIP Response Codes, you don't have nearly enough SIP Response Codes, but I digress.)
That said, our SER server knows the given condition sent from a paired PSTN switch is permanent, eg the SIP caller can't call this number via our network now, tomorrow or next week because of who you they or whom their provider is (or what they failed to buy), so in this situation returning 503 all the way out of our network is correct behavior (as stated in the RFC), and doing so allows the upstream entity to click over to the next preferred carrier that might reach that destination.
We have found that many SIP providers simply blindly lob all calls at the cheapest carrier and if a given call bounces, repeat that action with the next-cheapest carrier, and the next until they finally resort to a Tandem that will take anything, but will also be the most expensive way to route the call. Nearly all require us to send them back a 503 (a few want us to send 502 back instead) to say back to them "not via here, try next door". Deplorable and this method certainly causes slow call set-up times, but that is what quite a few of these outfits are doing.
So, our SER must send them back a 503 and not a 500 in this situation. If I explicitly state one in a reply rule, will that override this default behavior, or will some deeper part of SER veto even a sl_reply("503", "Service Unavailable"); added to the onreply_route? If so, where is this piece of source code, so that I may break this well-intended but undesired behavior?
Frank Durda IV wrote:
Well, yes and no. Thanks for pointing out this item in the RFC which I missed, but if the RFC action is honored, then SER should have emitted "500 Server Internal Error" which is in the RFC, and not the hybrid and made-up "500 Service Unavailable", which is not in the RFC. So, I think SER is wrong at least on that point.
Reason phrases are not writ in stone. You are free and even encouraged to replace the default phrases with more meaningful text if you have one.
(Personally I think SIP desperately needs at least 20 additional defined Response Codes so we can all quit using the existing not-entirely-inappropriate values to cover real-life situations, but right now the book says that's all the codes that we've all got to work with. When you see a dozen SS7 release codes all map to the same SIP Response Codes, you don't have nearly enough SIP Response Codes, but I digress.)
Feel free to write a draft and enjoy your time at IETF. Personally, I think there is more then enough status codes. Already people just seem to pick one without actually reading up on what it is supposed to mean. Like using 488 when an anonymous call is rejected.
If you desperately need to send around SS7 release codes, there is the Reason header.
That said, our SER server knows the given condition sent from a paired PSTN switch is permanent, eg the SIP caller can't call this number via our network now, tomorrow or next week because of who you they or whom their provider is (or what they failed to buy), so in this situation returning 503 all the way out of our network is correct behavior (as stated in the RFC), and doing so allows the upstream entity to click over to the next preferred carrier that might reach that destination.
I think it isn't. Neither 500 nor 503 is a status code for a permanent situation. The gateway should return a 4xx class code instead because actually this is a client error -- the client is not allowed to call this specific number. Sounds like a 403 to me.
So, our SER must send them back a 503 and not a 500 in this situation. If I explicitly state one in a reply rule, will that override this default behavior, or will some deeper part of SER veto even a sl_reply("503", "Service Unavailable"); added to the onreply_route?
That'll break horribly. Once you successfully called t_relay(), you can't use sl_* anymore.
There is t_reply() with the same parameters. I am not sure, though, if it really works, but you could give it a try.
Regards, Martin
On Sun, Oct 12, 2008 at 8:49 AM, Martin Hoffmann hn@nvnc.de wrote:
That said, our SER server knows the given condition sent from a paired PSTN switch is permanent, eg the SIP caller can't call this number via our network now, tomorrow or next week because of who you they or whom their provider is (or what they failed to buy), so in this situation returning 503 all the way out of our network is correct behavior (as stated in the RFC), and doing so allows the upstream entity to click over to the next preferred carrier that might reach that destination.
I think it isn't. Neither 500 nor 503 is a status code for a permanent situation. The gateway should return a 4xx class code instead because actually this is a client error -- the client is not allowed to call this specific number. Sounds like a 403 to me.
I agree with Martin, a 4xx class code should be used instead.
IMO it sounds like an ISUP Cause value "2 no route to destination" which should be mapped to SIP "404 Not found" (RFC 3398).
Please, correct me if I'm wrong,
Victor Pascual Ávila wrote:
On Sun, Oct 12, 2008 at 8:49 AM, Martin Hoffmann hn@nvnc.de wrote:
That said, our SER server knows the given condition sent from a paired PSTN switch is permanent, eg the SIP caller can't call this number via our network now, tomorrow or next week because of who you they or whom their provider is (or what they failed to buy), so in this situation returning 503 all the way out of our network is correct behavior (as stated in the RFC), and doing so allows the upstream entity to click over to the next preferred carrier that might reach that destination.
I think it isn't. Neither 500 nor 503 is a status code for a permanent situation. The gateway should return a 4xx class code instead because actually this is a client error -- the client is not allowed to call this specific number. Sounds like a 403 to me.
I agree with Martin, a 4xx class code should be used instead.
IMO it sounds like an ISUP Cause value "2 no route to destination" which should be mapped to SIP "404 Not found" (RFC 3398).
Please, correct me if I'm wrong,
No, you probably are not wrong, but you are trying to push a 50 story building that is already built six feet to the left just because you would like it to be there. This is just not going to happen.
I have tried to convince these companies (now numbering in the dozens that I have encountered this issue with) that they should use 404 in this situation, but they are consistently adamant, saying the SBC models they own only allows them to redirect calls on receipt of a 503 and will not do so for a 404 or any other more plausable code, although one reluctantly could handle 502 instead of 503. So if I want their business I am stuck catering to their demands, regardless of the SHOULDiness of the RFC on this point. Plus I've got sales people reminding me that the customer is always right, give them what they want, etc.
(Oh, I have worked as part of a team or contributor to RFCs that are in force today, and have submitted RFC-DRAFTs over the past 25 years, so I do know a tiny bit about that process. I also know that even if I came up with a RFC that obsoleted the existing SIP ones how unlikely it is to get anybody to accept an expansion of the SIP response code lexicon mainly because of how long the existing limited set has existed and how much embedded hardware (switches, SBCs, etc) exists that will only know the old codes. That's why 2821/2822 look so much like 821/822 and didn't address many issues that needed addressing.)
To be clear, even if the RFC said MUST on the current SER handling of 503, I would lose this argument and be forced to provide 503 from SER, or dump SER and replace it with some hardware that will (as apparently most of the turn-key boxes out there will happily do this). This is not what is right or wrong, but what the majority of the gear (that the clients already own) are using 503 for. I suggest taking religious concerns on this point to the makers of SBCs who didn't provide a far better way to do LCR.
Shoot, I've got a carrier who also wants us to reject calls they are sending to us if they aren't marked as having been dipped already. And we are return them with what response code? Oh. with a 503 of course. Then they will pass that call on to someone who apparently charges less for database lookups than we do, and then maybe the same now-dipped call will come back to us, or go elsewhere.
Maybe the situation elsewhere in the world is different, but in the US I would estimate that currently at least 70% of the companies wanting to send us SIP want us to send back a 503 for at least one condition and no other value will do, due to limitations in the reaction choices available in their equipment, or perhaps this mostly in-company policies they have standardized on even if their equipment could do the same thing in response to some other code. And almost all of those outfits are sending us some or 100% international calls.
As mentioned before, too few SIP codes, too many SS7 to SIP overloads in the SIP world. Example: 47 is the SS7 release when call gapping occurs, as in when too many people trying to vote on American Idol or any other contest that hits the pre-defined limit for a giving called party number. What does 47 usually turn into? 503, of course, and what do these carriers do? They promptly try this gapped call somewhere else, where it is also going to fail because the throttle timers likely have not expired yet. Arguably, that situation might be better served by having this become a 600 by the time it gets to SIP, but that is also something out of my control. Telcoridia says a 47 should be produced, and the RFC says turn that into a 503. And SER turns that into 500. And here we are.
Meanwhile, back to the original problem I have of making SER emit a 503 or having to dump SER: Someone else pointed out that sl_reply wasn't appropriate where I suggested I could used it, will try t_reply(). If that is still overridden by the SER internals as well (and I was really hoping someone here would save me time by saying if the internal rule could not be defeated via the config file but no one has come forward on that point), then I have no choice then to make another round of changes to the SER source code, or buy a turnkey box and throw SER away.
I would prefer to salvage SER and not have the demise of its use here occur over such a silly point, but I'm not getting much confidence here that SER is willing to work with what the rest of the world seems to be doing.
On Oct 12, 2008 at 12:18, Frank Durda IV frank.durda@hypercube-llc.com wrote:
[...]
Meanwhile, back to the original problem I have of making SER emit a 503 or having to dump SER: Someone else pointed out that sl_reply wasn't appropriate where I suggested I could used it, will try t_reply(). If that is still overridden by the SER internals as well (and I was really hoping someone here would save me time by saying if the internal rule could not be defeated via the config file but no one has come forward on that point), then I have no choice then to make another round of changes to the SER source code, or buy a turnkey box and throw SER away.
If you want to override the final reply sent by ser, you have to catch it in the failure route and send a new reply in its place (if you send a reply " by hand" in the failure route, its value won't be fixed by ser final reply processing code) e.g.:
failure_route[0]{ if (t_check_status("503")){ t_reply("503", "Keeping 503 because I don't like the rfc"); return; } /* ... */ }
There are other ways to do it, but I think this is the shortest one.
Andrei
Andrei Pelinescu-Onciul wrote:
If you want to override the final reply sent by ser, you have to catch it in the failure route and send a new reply in its place (if you send a reply " by hand" in the failure route, its value won't be fixed by ser final reply processing code) e.g.:
failure_route[0]{ if (t_check_status("503")){ t_reply("503", "Keeping 503 because I don't like the rfc"); return; } /* ... */ }
There are other ways to do it, but I think this is the shortest one.
Andrei
This suggestion does appear to override the reply and that's great. I found that I have to do a t_on_failure("FAILURE_ROUTE") on all INVITEs, not just those where the @to.tag=="" (as written in the sample ser.cfg file).
However, this alteration triggers another problem, as the logs report:
Oct 21 02:37:47 ser1 ser[19225]: WARNING: -_set_fr_timer- already added: 0x80252 7400 , tl=0x802527418!!! Oct 21 02:37:47 ser1 ser[19225]: BUG: set_final_timer: start retr failed for 0x8 02527400
Neither of these appear to be things I have intentionally specified/set in ser.cfg, although perhaps they go by other names. Anybody have any idea what it is complaining about?
The pieces that might help troubleshoot this:
route[PSTN_FORWARD] { log(1, "Route PSTN_FORWARD\n"); # here you could decide wether this call needs a RTP relay or not
# Set where packets should be sent. If you don't do this here, # OPTIONS and other messages will be transmitted back to the # outside interface and processed all over again, and you eventually # get a Too Many Hops error on every OPTIONS message. # Code to make this point in multiple directions would be needed # to implement two-way origination.
rewritehost("10.131.0.2"); #Send to Telica
xlog("L_ERR", "DDD method:<%rm> From URI:<%fu> From Tag:<%ft> Destination Set:<%ds> Request's R-URI:<%ru> To URI:<%tu> Received IP:<%Ri> Source IP:<%si> Sou rce Port:<%sp> Call ID:<%ci> Host's Hostname:<%Hn> Hosts Domainname:<%Hd> Hosts FQDN <%Hf> Hosts IP <%Hi>");
if (method == "BYE" || method == "CANCEL") { unforce_rtp_proxy(); } else if (method == "INVITE") {
...Customer-specific sanity checks here, pass or reject, no modifications...
# Force record routing if either party is behind NAT
$record_route_nated = true;
force_rtp_proxy("he","10.131.128.18");
}
t_on_reply("PSTN_REPLY");
# if this is called from the failure route we need to open a new branch
if (isflagset(FLAG_FAILUREROUTE)) { append_branch(); }
# if this is an initial INVITE (without a To-tag) we might try another # (forwarding or voicemail) target after receiving an error
#Use_next_for_503_fixup# if (method=="INVITE" && @to.tag=="") { if (method=="INVITE") { xlog("L_ERR", "route[PSTN] arm FAILURE_ROUTE"); t_on_failure("FAILURE_ROUTE"); }
# send it out now; use stateful forwarding as it works reliably # even for UDP2TCP
if (!t_relay()) { sl_reply_error(); } drop; }
onreply_route["PSTN_REPLY"] { xlog("L_ERR","In onreply_route %rs %rr");
#xlog("L_ERR", "EEE method:<%rm> From URI:<%fu> From Tag:<%ft> Destination Set:<%ds> Request's R-URI:<%ru> To URI:<%tu> Received IP:<%Ri> Source IP:<%si> So urce Port:<%sp> Call ID:<%ci> Host's Hostname:<%Hn> Hosts Domainname:<%Hd> Hosts FQDN <%Hf> Hosts IP <%Hi>");
# Rewrite Contact in 200 OK if UAS is behind NAT
if (@cseq.method == "INVITE") { xlog("L_ERR","onreply_route[NAT_MANGLE] R1 %rs %rr"); } else { xlog("L_ERR","onreply_route[NAT_MANGLE] R2 %rs %rr"); }
# Apply RTP proxy if necessary, but only for INVITE transactions # and 183 or 2xx replies
if (@cseq.method != "INVITE") return;
if ((status =~ "(183)|2[0-9][0-9]") && search("^(Content-Type|c):.*application/sdp")) { # xlog("L_ERR","onreply_route[NAT_MANGLE] force rtp proxy %rs %rr K5");
# Fix outbound c= value to reflect what client can reach
route(OUTBOUND_RTP_FIXUP); # xlog("L_ERR","onreply_route[NAT_MANGLE] back from force_rtp_proxy %rs %rr K6"); } return; }
failure_route[FAILURE_ROUTE] { log(1, "Route ROUTE FAILURE\n"); if (t_check_status("503")) { t_reply("503", "Keeping 503 because I don't like the rfc"); log(1, "DID 503 FIX\n"); return; }
# mark for the other routes that we are operating from here on from a # failure route
setflag(FLAG_FAILUREROUTE);
# if we received a busy and a busy target is set, forward it there # Note: again the forwarding target has to be a routeable URI
if (t_check_status("486|600")) { if ($tu.fwd_busy_target) { route(FORWARD); } # alternatively you could forward the request to SEMS/voicemail here
} else if (t_check_status("408|480")) {
# if we received no answer and the noanswer target is set, # forward it there # Note: again the target has to be a routeable URI
if ($tu.fwd_noanswer_target) { route(FORWARD); }
# alternatively you could forward the request to SEMS/voicemail here
} }
Debugging log messages:
Oct 21 02:37:47 ser1 ser[19224]: Route PSTN_FORWARD Oct 21 02:37:47 ser1 ser[19224]: DDD method:<INVITE> From URI:sip:9999999999@11.22.33.44 From Tag:<as5e682242> Destination Set:<Contact: sip:6666666666@10.131.0.2> Request's R-URI:sip:6666666666@10.131.0.2 To URI:sip:6666666666@222.222.222.222 Received IP:<10.181.90.6> Source IP:<11.22.33.44> Source Port:<5060> Call ID:5624efc06bde6e38543ebd53687a5b2b@11.22.33.44 Host's Hostname:<ser1> Hosts Domainname:<the.test.box.notvalid> Hosts FQDN <the.test.box.notvalid> Hosts IP <> Oct 21 02:37:47 ser1 ser[19224]: route[PSTN] arm FAILURE_ROUTE Oct 21 02:37:47 ser1 ser[19225]: In onreply_route 100 Trying Oct 21 02:37:47 ser1 ser[19225]: onreply_route[NAT_MANGLE] R1 100 Trying Oct 21 02:37:47 ser1 ser[19225]: In onreply_route 503 Service Unavailable Oct 21 02:37:47 ser1 ser[19225]: onreply_route[NAT_MANGLE] R1 503 Service Unavailable Oct 21 02:37:47 ser1 ser[19225]: Route ROUTE FAILURE Oct 21 02:37:47 ser1 ser[19225]: DID 503 FIX Oct 21 02:37:47 ser1 ser[19225]: WARNING: -_set_fr_timer- already added: 0x802527400 , tl=0x802527418!!! Oct 21 02:37:47 ser1 ser[19225]: BUG: set_final_timer: start retr failed for 0x802527400 Oct 21 02:37:47 ser1 ser[19224]: MAIN ROUTE WITH method:<ACK> From URI:sip:9999999999@11.22.33.44 From Tag:<as5e682242> Destination Set:<<null>> Request's R-URI:sip:6666666666@222.222.222.222 To URI:sip:6666666666@222.222.222.222 Received IP:<10.181.90.6> Source IP:<11.22.33.44> Source Port:<5060> Call ID:5624efc06bde6e38543ebd53687a5b2b@11.22.33.44 Host's Hostname:<ser1> Hosts Domainname:<the.test.box.notvalid> Hosts FQDN <the.test.box.notvalid> Hosts IP <> ...
Thanks in advance for any sage advice on what is going on here.
On Oct 20, 2008 at 22:03, Frank Durda IV frank.durda@hypercube-llc.com wrote:
Andrei Pelinescu-Onciul wrote:
If you want to override the final reply sent by ser, you have to catch it in the failure route and send a new reply in its place (if you send a reply " by hand" in the failure route, its value won't be fixed by ser final reply processing code) e.g.:
failure_route[0]{ if (t_check_status("503")){ t_reply("503", "Keeping 503 because I don't like the rfc"); return; } /* ... */ }
There are other ways to do it, but I think this is the shortest one.
Andrei
This suggestion does appear to override the reply and that's great. I found that I have to do a t_on_failure("FAILURE_ROUTE") on all INVITEs, not just those where the @to.tag=="" (as written in the sample ser.cfg file).
However, this alteration triggers another problem, as the logs report:
Oct 21 02:37:47 ser1 ser[19225]: WARNING: -_set_fr_timer- already added: 0x80252 7400 , tl=0x802527418!!! Oct 21 02:37:47 ser1 ser[19225]: BUG: set_final_timer: start retr failed for 0x8 02527400
Neither of these appear to be things I have intentionally specified/set in ser.cfg, although perhaps they go by other names. Anybody have any idea what it is complaining about?
Just ignore them, it's perfectly safe in this case (just some bug catching code that outlived its usefulness :-)). For a more detailed discussion see: http://bugs.iptel.org/browse/SER-302.
I'll try to fix it if I can find an easy non-intrusive way.
Andrei
[...]
Just for closure on this item, the fix used was:
*** t_reply.c.STOCK Tue Nov 14 18:11:06 2006 --- t_reply.c Tue Oct 21 16:38:25 2008 *************** *** 1315,1320 **** --- 1315,1323 ---- } } else { relayed_code=relayed_msg->REPLY_STATUS; + #ifdef RFCMUST503 + /* This bit of nonsense changes the RFC handling from SHOULD to MUST, + or in our case, from "no problem" to "broken"*/ if (relayed_code==503){ /* replace a final 503 with a 500: * generate a "FAKE" reply and a new to_tag (for easier *************** *** 1331,1337 **** buf=build_res_buf_from_sip_req(500, error_text(relayed_code),
to_tag, t->uas.request, &res_len, &bm); relayed_code=500; ! }else if (tm_aggregate_auth && (relayed_code==401 || relayed_code==407) && (auth_reply_count(t, p_msg)>1)){ /* aggregate 401 & 407 www & proxy authenticate headers in --- 1334,1342 ---- buf=build_res_buf_from_sip_req(500, error_text(relayed_code),
to_tag, t->uas.request, &res_len, &bm); relayed_code=500; ! }else ! #endif /*RFCMUST503*/ ! if (tm_aggregate_auth && (relayed_code==401 || relayed_code==407) && (auth_reply_count(t, p_msg)>1)){ /* aggregate 401 & 407 ww
I won't dwell on any religious issues raised here, because the fact of the matter is I now have a dozen telcos using everything from Sonus to asterisk boxes and all of them want it to work this way, eg for 503 to pass unaltered. Follow the money, and the money wants it to work this way.
Thanks for the assistance of those who pointed me to the right neighborhood to alter.
P. S. The idea of doing this via ser.cfg and t_on_failure by just ignoring those set_final_timer messages, was a bad one. Something leaks memory, so that could not be used for more than a few hours.
El Viernes, 10 de Octubre de 2008, Frank Durda IV escribió:
(Personally I think SIP desperately needs at least 20 additional defined Response Codes so we can all quit using the existing not-entirely-inappropriate values to cover real-life situations, but right now the book says that's all the codes that we've all got to work with. When you see a dozen SS7 release codes all map to the same SIP Response Codes, you don't have nearly enough SIP Response Codes, but I digress.)
There is a painful draft about the usage of 503 for overload cases: http://tools.ietf.org/html/draft-ietf-sipping-overload-reqs-05