[Serdev] and another RTPPROXY outage

Andres andres at telesip.net
Wed Jun 29 22:09:54 UTC 2005


Hi Maxim,

We implemented rtpproxy with UDP sockets about 2 months ago according to 
Jan's suggestion.  Today we had another outage, but this time at least 
it only affected users needing the RTPPROXY (SER continued to operate 
normally, which is great progress).

The symptom is the same as before.  The Recv-Q  buffer maxes out and 
apparently locks up the process.  We simply see a whole bunch of lines 
like this:

# netstat -pau
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address               Foreign 
Address             State       PID/Program name
udp   255496      0 sitges.telesip.ne:48140 
*:*                                 974/rtpproxy  
udp        0      0 sitges.telesip.ne:48141 
*:*                                 974/rtpproxy       
udp   255496      0 sitges.telesip.ne:48142 
*:*                                 974/rtpproxy       
udp        0      0 sitges.telesip.ne:48143 
*:*                                 974/rtpproxy
udp   141488      0 sitges.telesip.ne:48332 
*:*                                 974/rtpproxy
udp   255496      0 sitges.telesip.ne:48334 
*:*                                 974/rtpproxy

Plus the SYSLOG gets filled with lines like:
Jun 29 16:07:37 sitges /usr/local/sbin/ser[32267]: ERROR: 
send_rtpp_command: timeout waiting reply from a RTP proxy


Killing the rtpproxy process and starting it again fixes the issue.  I 
still do not see how we can reproduce it since almost 2 months passed 
since the last ocurrence.  Do you have a version that can give more 
debug info.  We would be glad to test it out in order to find a solution 
to this.

Our version is
# ./rtpproxy -v
Basic version: 20040107
Extension 20050322: Support for multiple RTP streams and MOH

Thanks,
Andres.


Andres wrote:

> Maxim Sobolev wrote:
>
>> Andres,
>>
>> What you are reporting is very strange indeed. Unfortunately your 
>> report doesn't contain enough information to identify the source of 
>> the problem. Is the problem reproduceable?
>
>
> Hi Maxim,
>
> The problem is not reproducible but it has happened to someone else 
> before.  Take a look at:
> http://lists.iptel.org/pipermail/serusers/2005-April/017970.html
>
> The recomendation from Jan was to switch to UDP sockets.  We will be 
> implementing that after a few tests so hopefully if rtpproxy blocks in 
> the future, it will not take down SER with it.
>
> Thanks,
> Andres
>
>>
>> -Maxim
>>
>> Andres wrote:
>>
>>> Hi,
>>>
>>> After more than 2 years of flawlessly processing millions of calls 
>>> and going through versions 0.8.10 all the way to 0.9.1, we had our 
>>> first major SER outage yesterday.  One of our SER boxes stopped 
>>> responding completely to any SIP messages (running on 0.9.1 from 
>>> about 3 weeks ago).  We stopped and started SER multiple times, ran 
>>> sniffer traces, turned on maximum debugging and all we could see was 
>>> that SER did not respond to anything.  Not even "serctl moni" seemed 
>>> to work.
>>>
>>> We finally ran the command "netstat -a -u -p", and saw about a dozen 
>>> rtpproxy sockets that had exhausted the receive or send buffers 
>>> (columns Recv-Q, or Send-Q).  After we killed and restarted 
>>> rtpproxy, everything went back to normal.
>>>
>>> The questions now are:
>>> 1.  Why would a problem in rtpproxy completely lock up SER?  Even 
>>> after stopping and starting SER multiple times it was still blocked.
>>> 2.  Why would rtpproxy lock up in the first place and exhaust the 
>>> network buffers?  This is all UDP traffic so its not like the other 
>>> side was slow at sending ACKs or something.  UDP traffic should be 
>>> received and sent out on the fly by rtpproxy.  (Network interface is 
>>> 100Mbps full-duplex and it never went down or showed any problems).
>>>
>>> The box is running Red Hat ES3.0 on dual 3.6 Xeons and 2GB of 
>>> memory.  We can confirm that no more than 20-25 calls were running 
>>> via rtpproxy on it at the time of the incident and that no more that 
>>> 2-3% of our totall calls are handled by rtpproxy. If anybody can 
>>> share any insights on what could be the cause to something like 
>>> this, we would greatly appreciate it.
>>>
>>> Thanks,
>>> Andres.
>>> TeleSIP Network Admin
>>>
>>>
>>>
>>
>
> _______________________________________________
> Serdev mailing list
> serdev at lists.iptel.org
> http://lists.iptel.org/mailman/listinfo/serdev
>
>

-- 
Andres
Network Admin
http://www.telesip.net





More information about the Serdev mailing list