[Serusers] SER Children Misbehaving

Andres andres at telesip.net
Tue Nov 22 22:19:51 CET 2005


Today we had an incident where SER (0.9.4) children drained all the CPUs 
of one of our servers. 

Top Showed:
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
17925 root      25   0  5644 5644  3888 R    25.5  0.2   6:26   1 ser
17929 root      25   0  5672 5672  3880 R    24.7  0.2   6:48   0 ser
17928 root      25   0  5688 5688  3872 R    24.3  0.2   6:25   1 ser
17933 root      25   0  4540 4540  3740 R    22.8  0.2   6:00   0 ser

And ..
# ps -Al | grep ser
1 S     0 17901     1  0  85   0    - 14200 pause  ?        00:00:00 ser
1 S     0 17916 17901  0  75   0    - 14200 pipe_w ?        00:00:00 ser
1 S     0 17917 17901  0  75   0    - 14418 schedu ?        00:00:22 ser
1 S     0 17918 17901  0  75   0    - 14422 schedu ?        00:00:23 ser
1 S     0 17919 17901  0  75   0    - 14423 schedu ?        00:00:24 ser
1 S     0 17920 17901  0  75   0    - 14447 schedu ?        00:00:22 ser
1 S     0 17921 17901  0  75   0    - 14421 schedu ?        00:00:22 ser
1 S     0 17922 17901  0  75   0    - 14424 schedu ?        00:00:22 ser
1 S     0 17923 17901  0  75   0    - 14428 schedu ?        00:00:21 ser
1 S     0 17924 17901  0  75   0    - 14424 schedu ?        00:00:22 ser
1 R     0 17925 17901  0  85   0    - 14448 -      ?        00:06:22 ser
1 S     0 17926 17901  0  75   0    - 14457 schedu ?        00:00:49 ser
1 S     0 17927 17901  0  75   0    - 14453 schedu ?        00:00:50 ser
1 R     0 17928 17901  0  85   0    - 14477 -      ?        00:06:20 ser
1 R     0 17929 17901  0  85   0    - 14455 -      ?        00:06:44 ser
1 S     0 17930 17901  0  75   0    - 14452 schedu ?        00:00:50 ser
1 S     0 17931 17901  0  75   0    - 14448 schedu ?        00:00:50 ser
1 S     0 17932 17901  0  76   0    - 14448 schedu ?        00:00:49 ser
1 R     0 17933 17901  0  85   0    - 14235 -      ?        00:05:55 ser

As you can see it looks like 4 children dropped out of the scheduler.  
The only thing suspicious is that RTPProxy became non-responsive around 
that time.  At least thats the only thing the log shows:
Nov 22 15:56:17  /usr/local/sbin/ser[17931]: ERROR: send_rtpp_command: 
timeout waiting reply from a RTP proxy

Any idea why these 4 children dropped out?  Any hints on how to 
troubleshoot this?

Thanks,

-- 
Andres
Network Admin
http://www.telesip.net





More information about the sr-users mailing list