Hi Greger,
For this same reason we switched to a UDP interface about 6 months ago.
But as you can see we still managed to get 4 Children locked up. It
still was able to do millions of transaction since the last lockup (2
months ago) but we are still clueless about why this is happening. We
have rtpproxy running again in the foreground outputting all messages to
a file. Hopefully this will provide enough info to get to the bottom of
this.
Thanks,
--
Andres
Network Admin
Greger V. Teigre wrote:
Andres,
I believe I remember this issue being brought up a while back. If I
remember correctly, ser children locked up when communicating to a
locked rtpproxy over socket interface. The "solution" was to use udp
over loopback to communicate as this would fail that specific call,
but not lock the ser process.
g-)
----- Original Message ----- From: "Andres" <andres(a)telesip.net>
To: <serusers(a)lists.iptel.org>
Sent: Tuesday, November 22, 2005 10:19 PM
Subject: [Serusers] SER Children Misbehaving
Today we had an incident where SER (0.9.4)
children drained all the
CPUs of one of our servers.
Top Showed:
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
17925 root 25 0 5644 5644 3888 R 25.5 0.2 6:26 1 ser
17929 root 25 0 5672 5672 3880 R 24.7 0.2 6:48 0 ser
17928 root 25 0 5688 5688 3872 R 24.3 0.2 6:25 1 ser
17933 root 25 0 4540 4540 3740 R 22.8 0.2 6:00 0 ser
And ..
# ps -Al | grep ser
1 S 0 17901 1 0 85 0 - 14200 pause ? 00:00:00 ser
1 S 0 17916 17901 0 75 0 - 14200 pipe_w ? 00:00:00 ser
1 S 0 17917 17901 0 75 0 - 14418 schedu ? 00:00:22 ser
1 S 0 17918 17901 0 75 0 - 14422 schedu ? 00:00:23 ser
1 S 0 17919 17901 0 75 0 - 14423 schedu ? 00:00:24 ser
1 S 0 17920 17901 0 75 0 - 14447 schedu ? 00:00:22 ser
1 S 0 17921 17901 0 75 0 - 14421 schedu ? 00:00:22 ser
1 S 0 17922 17901 0 75 0 - 14424 schedu ? 00:00:22 ser
1 S 0 17923 17901 0 75 0 - 14428 schedu ? 00:00:21 ser
1 S 0 17924 17901 0 75 0 - 14424 schedu ? 00:00:22 ser
1 R 0 17925 17901 0 85 0 - 14448 - ? 00:06:22 ser
1 S 0 17926 17901 0 75 0 - 14457 schedu ? 00:00:49 ser
1 S 0 17927 17901 0 75 0 - 14453 schedu ? 00:00:50 ser
1 R 0 17928 17901 0 85 0 - 14477 - ? 00:06:20 ser
1 R 0 17929 17901 0 85 0 - 14455 - ? 00:06:44 ser
1 S 0 17930 17901 0 75 0 - 14452 schedu ? 00:00:50 ser
1 S 0 17931 17901 0 75 0 - 14448 schedu ? 00:00:50 ser
1 S 0 17932 17901 0 76 0 - 14448 schedu ? 00:00:49 ser
1 R 0 17933 17901 0 85 0 - 14235 - ? 00:05:55 ser
As you can see it looks like 4 children dropped out of the
scheduler. The only thing suspicious is that RTPProxy became
non-responsive around that time. At least thats the only thing the
log shows:
Nov 22 15:56:17 /usr/local/sbin/ser[17931]: ERROR:
send_rtpp_command: timeout waiting reply from a RTP proxy
Any idea why these 4 children dropped out? Any hints on how to
troubleshoot this?
Thanks,
--
Andres
Network Admin
http://www.telesip.net
_______________________________________________
Serusers mailing list
serusers(a)lists.iptel.org
http://lists.iptel.org/mailman/listinfo/serusers