[Serusers] SER Children Misbehaving

Greger V. Teigre greger at teigre.com
Thu Nov 24 10:25:30 CET 2005


Ah, of course you knew about that :-)
Well, then I cannot really see why the SER children locked up.  Do you have 
any SER logs right before locking up? You are sure it's related to a 
non-responding rtpproxy?
I remember that Maxim was not able to reproduce the rtpproxy lockup, so 
that's one hard bug to nail down. Hopefully you will get something useful 
out of the debugging.
g-)
----- Original Message ----- 
From: "Andres" <andres at telesip.net>
To: "Greger V. Teigre" <greger at teigre.com>
Cc: <serusers at lists.iptel.org>
Sent: Wednesday, November 23, 2005 4:15 PM
Subject: Re: [Serusers] SER Children Misbehaving


> Hi Greger,
>
> For this same reason we switched to a UDP interface about 6 months ago. 
> But as you can see we still managed to get 4 Children locked up.  It still 
> was able to do millions of transaction since the last lockup (2 months 
> ago) but we are still clueless about why this is happening.  We have 
> rtpproxy running again in the foreground outputting all messages to a 
> file.  Hopefully this will provide enough info to get to the bottom of 
> this.
>
> Thanks,
>
> -- 
> Andres
> Network Admin
> http://www.telesip.net
>
> Greger V. Teigre wrote:
>
>> Andres,
>> I believe I remember this issue being brought up a while back. If I 
>> remember correctly, ser children locked up when communicating to a locked 
>> rtpproxy over socket interface. The "solution" was to use udp over 
>> loopback to communicate as this would fail that specific call, but not 
>> lock the ser process.
>> g-)
>>
>> ----- Original Message ----- From: "Andres" <andres at telesip.net>
>> To: <serusers at lists.iptel.org>
>> Sent: Tuesday, November 22, 2005 10:19 PM
>> Subject: [Serusers] SER Children Misbehaving
>>
>>
>>> Today we had an incident where SER (0.9.4) children drained all the CPUs 
>>> of one of our servers.
>>> Top Showed:
>>>  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
>>> 17925 root      25   0  5644 5644  3888 R    25.5  0.2   6:26   1 ser
>>> 17929 root      25   0  5672 5672  3880 R    24.7  0.2   6:48   0 ser
>>> 17928 root      25   0  5688 5688  3872 R    24.3  0.2   6:25   1 ser
>>> 17933 root      25   0  4540 4540  3740 R    22.8  0.2   6:00   0 ser
>>>
>>> And ..
>>> # ps -Al | grep ser
>>> 1 S     0 17901     1  0  85   0    - 14200 pause  ?        00:00:00 ser
>>> 1 S     0 17916 17901  0  75   0    - 14200 pipe_w ?        00:00:00 ser
>>> 1 S     0 17917 17901  0  75   0    - 14418 schedu ?        00:00:22 ser
>>> 1 S     0 17918 17901  0  75   0    - 14422 schedu ?        00:00:23 ser
>>> 1 S     0 17919 17901  0  75   0    - 14423 schedu ?        00:00:24 ser
>>> 1 S     0 17920 17901  0  75   0    - 14447 schedu ?        00:00:22 ser
>>> 1 S     0 17921 17901  0  75   0    - 14421 schedu ?        00:00:22 ser
>>> 1 S     0 17922 17901  0  75   0    - 14424 schedu ?        00:00:22 ser
>>> 1 S     0 17923 17901  0  75   0    - 14428 schedu ?        00:00:21 ser
>>> 1 S     0 17924 17901  0  75   0    - 14424 schedu ?        00:00:22 ser
>>> 1 R     0 17925 17901  0  85   0    - 14448 -      ?        00:06:22 ser
>>> 1 S     0 17926 17901  0  75   0    - 14457 schedu ?        00:00:49 ser
>>> 1 S     0 17927 17901  0  75   0    - 14453 schedu ?        00:00:50 ser
>>> 1 R     0 17928 17901  0  85   0    - 14477 -      ?        00:06:20 ser
>>> 1 R     0 17929 17901  0  85   0    - 14455 -      ?        00:06:44 ser
>>> 1 S     0 17930 17901  0  75   0    - 14452 schedu ?        00:00:50 ser
>>> 1 S     0 17931 17901  0  75   0    - 14448 schedu ?        00:00:50 ser
>>> 1 S     0 17932 17901  0  76   0    - 14448 schedu ?        00:00:49 ser
>>> 1 R     0 17933 17901  0  85   0    - 14235 -      ?        00:05:55 ser
>>>
>>> As you can see it looks like 4 children dropped out of the scheduler. 
>>> The only thing suspicious is that RTPProxy became non-responsive around 
>>> that time.  At least thats the only thing the log shows:
>>> Nov 22 15:56:17  /usr/local/sbin/ser[17931]: ERROR: send_rtpp_command: 
>>> timeout waiting reply from a RTP proxy
>>>
>>> Any idea why these 4 children dropped out?  Any hints on how to 
>>> troubleshoot this?
>>>
>>> Thanks,
>>>
>>> -- 
>>> Andres
>>> Network Admin
>>> http://www.telesip.net
>>>
>>>
>>> _______________________________________________
>>> Serusers mailing list
>>> serusers at lists.iptel.org
>>> http://lists.iptel.org/mailman/listinfo/serusers
>>>
>>>
>>
>>
>
> 




More information about the sr-users mailing list