We have multiple kamailio servers with 4 cpu cores and 16G RAM.
We use kamailio+rtpproxy as a outbound sip proxy. Usually there are many thousands of concurrent sip sessions of occurring there. Periodically sometimes it just stops serving request and spits out 5** replies. At that point we usually reload the kamailio daemon and things return back to normal. As we have a lot of servers, manually doing this is a pain in the neck. So We installed a homer/sipcapture on a separate server . And from there, we periodically scan for 500 msgs for a given time interval. If there are any, we then proceed to reload kamailio.
I know its hell of an inefficient system to monitor.
So I'm wondering, if there can be any more rational way to detect problems pre-emptively. any fifo/mi command which we could run to find out if the kamailio instance is hung up?
Cheers Arif
Hello Arif,
This sounds a lot like you've got a pathological database query or other source of I/O wait which periodically deadlocks or hangs for an inordinate amount of time, blocking one of Kamailio's worker threads.
Such a condition can have knock-on effects which overwhelm other workers, since delay invites retransmissions.
If this hypothesis is meritorious (i.e. you have nontrivial database or other synchronous I/O-bound interactions), the best thing to do, perhaps, is to audit your external dependency services. For instance, you can turn on the slow query log in your database and see if an exceptionally slow query or deadlock is detected. That sort of thing would be "low-hanging fruit" here.
Failing that, you can run 'netstat --inet -n -l'. Check the Recv-Q column. Under normal conditions, this should be near zero, though it may burst ephemerally under high load. If all of Kamailio's workers stop responding, however, it won't empty out, but just grow bigger.
-- Alex
-- Alex Balashov | Principal | Evariste Systems LLC 303 Perimeter Center North, Suite 300 Atlanta, GA 30346 United States
Tel: +1-800-250-5920 (toll-free) / +1-678-954-0671 (direct) Web: http://www.evaristesys.com/, http://www.csrpswitch.com/
Sent from my BlackBerry.
Hello,
to observer if there are actions in config that are slow, enable latency related parameters:
http://www.kamailio.org/wiki/cookbooks/4.3.x/core#latency_limit_action
Then watch the syslog and see if you get messages reporting slow actions.
Cheers, Daniel
On 16/11/15 08:33, Alex Balashov wrote:
Hello Arif,
This sounds a lot like you've got a pathological database query or other source of I/O wait which periodically deadlocks or hangs for an inordinate amount of time, blocking one of Kamailio's worker threads.
Such a condition can have knock-on effects which overwhelm other workers, since delay invites retransmissions.
If this hypothesis is meritorious (i.e. you have nontrivial database or other synchronous I/O-bound interactions), the best thing to do, perhaps, is to audit your external dependency services. For instance, you can turn on the slow query log in your database and see if an exceptionally slow query or deadlock is detected. That sort of thing would be "low-hanging fruit" here.
Failing that, you can run 'netstat --inet -n -l'. Check the Recv-Q column. Under normal conditions, this should be near zero, though it may burst ephemerally under high load. If all of Kamailio's workers stop responding, however, it won't empty out, but just grow bigger.
-- Alex
-- Alex Balashov | Principal | Evariste Systems LLC 303 Perimeter Center North, Suite 300 Atlanta, GA 30346 United States
Tel: +1-800-250-5920 (toll-free) / +1-678-954-0671 (direct) Web: http://www.evaristesys.com/, http://www.csrpswitch.com/
Sent from my BlackBerry.
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users