[Serusers] SER Children Misbehaving

Andres andres at telesip.net
Wed Nov 23 16:15:59 CET 2005


Hi Greger,

For this same reason we switched to a UDP interface about 6 months ago.  
But as you can see we still managed to get 4 Children locked up.  It 
still was able to do millions of transaction since the last lockup (2 
months ago) but we are still clueless about why this is happening.  We 
have rtpproxy running again in the foreground outputting all messages to 
a file.  Hopefully this will provide enough info to get to the bottom of 
this.

Thanks,

-- 
Andres
Network Admin
http://www.telesip.net

Greger V. Teigre wrote:

> Andres,
> I believe I remember this issue being brought up a while back. If I 
> remember correctly, ser children locked up when communicating to a 
> locked rtpproxy over socket interface. The "solution" was to use udp 
> over loopback to communicate as this would fail that specific call, 
> but not lock the ser process.
> g-)
>
> ----- Original Message ----- From: "Andres" <andres at telesip.net>
> To: <serusers at lists.iptel.org>
> Sent: Tuesday, November 22, 2005 10:19 PM
> Subject: [Serusers] SER Children Misbehaving
>
>
>> Today we had an incident where SER (0.9.4) children drained all the 
>> CPUs of one of our servers.
>> Top Showed:
>>  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
>> 17925 root      25   0  5644 5644  3888 R    25.5  0.2   6:26   1 ser
>> 17929 root      25   0  5672 5672  3880 R    24.7  0.2   6:48   0 ser
>> 17928 root      25   0  5688 5688  3872 R    24.3  0.2   6:25   1 ser
>> 17933 root      25   0  4540 4540  3740 R    22.8  0.2   6:00   0 ser
>>
>> And ..
>> # ps -Al | grep ser
>> 1 S     0 17901     1  0  85   0    - 14200 pause  ?        00:00:00 ser
>> 1 S     0 17916 17901  0  75   0    - 14200 pipe_w ?        00:00:00 ser
>> 1 S     0 17917 17901  0  75   0    - 14418 schedu ?        00:00:22 ser
>> 1 S     0 17918 17901  0  75   0    - 14422 schedu ?        00:00:23 ser
>> 1 S     0 17919 17901  0  75   0    - 14423 schedu ?        00:00:24 ser
>> 1 S     0 17920 17901  0  75   0    - 14447 schedu ?        00:00:22 ser
>> 1 S     0 17921 17901  0  75   0    - 14421 schedu ?        00:00:22 ser
>> 1 S     0 17922 17901  0  75   0    - 14424 schedu ?        00:00:22 ser
>> 1 S     0 17923 17901  0  75   0    - 14428 schedu ?        00:00:21 ser
>> 1 S     0 17924 17901  0  75   0    - 14424 schedu ?        00:00:22 ser
>> 1 R     0 17925 17901  0  85   0    - 14448 -      ?        00:06:22 ser
>> 1 S     0 17926 17901  0  75   0    - 14457 schedu ?        00:00:49 ser
>> 1 S     0 17927 17901  0  75   0    - 14453 schedu ?        00:00:50 ser
>> 1 R     0 17928 17901  0  85   0    - 14477 -      ?        00:06:20 ser
>> 1 R     0 17929 17901  0  85   0    - 14455 -      ?        00:06:44 ser
>> 1 S     0 17930 17901  0  75   0    - 14452 schedu ?        00:00:50 ser
>> 1 S     0 17931 17901  0  75   0    - 14448 schedu ?        00:00:50 ser
>> 1 S     0 17932 17901  0  76   0    - 14448 schedu ?        00:00:49 ser
>> 1 R     0 17933 17901  0  85   0    - 14235 -      ?        00:05:55 ser
>>
>> As you can see it looks like 4 children dropped out of the 
>> scheduler.  The only thing suspicious is that RTPProxy became 
>> non-responsive around that time.  At least thats the only thing the 
>> log shows:
>> Nov 22 15:56:17  /usr/local/sbin/ser[17931]: ERROR: 
>> send_rtpp_command: timeout waiting reply from a RTP proxy
>>
>> Any idea why these 4 children dropped out?  Any hints on how to 
>> troubleshoot this?
>>
>> Thanks,
>>
>> -- 
>> Andres
>> Network Admin
>> http://www.telesip.net
>>
>>
>> _______________________________________________
>> Serusers mailing list
>> serusers at lists.iptel.org
>> http://lists.iptel.org/mailman/listinfo/serusers
>>
>>
>
>





More information about the sr-users mailing list