We have a Kamailio 4.2 machine with a custom config file we've been using for the better part of two years without issue. Suddenly we've run into an issue where Kamailio just stops responding to any INVITEs, and stops any logging. The processes keep running but stop doing anything. It is a full production machine which naturally causes problems. I have the memlog from the restart after a hang if that helps, but I presume it is too large for this list. We have increased shared memory to 1536 and Pkg memory to 32 but that didn't help, and in any case the shmem stats look OK to us. It is a full production machine doing about 200 CPS, the load is barely measurable but turning logging to debug will surely kill it.
We use TOPOH and do get lots of log entries like :
/usr/sbin/kamailio[17865]: ERROR: <core> [parser/parse_via.c:2048]: parse_via(): ERROR: parse_via: bad char <▒> on state 100 /usr/sbin/kamailio[17865]: ERROR: <core> [parser/parse_via.c:2738]: parse_via(): ERROR: parse_via on: <▒#010d▒▒_▒#0117Jt▒#015#012X> /usr/sbin/kamailio[17865]: ERROR: <core> [parser/parse_via.c:2744]: parse_via(): ERROR: parse_via: via parse error /usr/sbin/kamailio[17865]: ERROR: topoh [th_msg.c:438]: th_unmask_via(): cannot find cookie in via2 /usr/sbin/kamailio[17865]: ERROR: topoh [th_mask.c:165]: th_mask_decode(): invalid input string"642525435_130081688@xxx.xxx.xxx.xxx" /usr/sbin/kamailio[17865]: ERROR: topoh [th_msg.c:484]: th_unmask_callid(): cannot decode callid
and
CRITICAL: sl [../../forward.h:279]: msg_send(): unknown proto 0
However, I believe these are not new and do not seem to affect overall call processing. Anyone have any idea what to look for or how to figure this out?
shmem:fragments = 4019 shmem:free_size = 1538943936 shmem:max_used_size = 84349120 shmem:real_used_size = 71668800 shmem:total_size = 1610612736 shmem:used_size = 54855384
Peter,
The natural question that would arise is whether your SIP worker threads are waiting on any external I/O, such as database queries.
When this event occurs, you'll want to take a look at your RecvQ in netstat, e.g.
# netstat --inet -n -l | grep 5060
It should be 0 or substantially 0 under normal conditions. If the SIP worker threads are "stuck" waiting on something and unable to cope with the incoming packet load, you'll see the number increase.
For more information on the topic overall, see my article on Kamailio concurrency:
http://blog.csrpswitch.com/tuning-kamailio-for-high-throughput-and-performan...
-- Alex