Hi!
I just stumbled into a hard-to-find problem with Kamailio.
The symtom was that Kamailio simply did not read a lot of incoming messages, so they did
not get any response. We saw traffic coming in on the interface with tcpdump, but not in
the kamailio logs. Restarting Kamailio solved the problem for a while, then it started
again.
I noticed there was something related to the DB, so for testing I disabled all db-based
functions, but missed siptrace that connected and operated in the background by the config
settings, nothing in the routing script.
I finally traced the problem to the siptrace module. The db table had grown way too big
(due to attacks) and adding new records to it took way too much time. During that time,
with SIPtrace waiting for confirmation, the process locked and packets where dropped.
Kamailio still handled calls, but a lot of calls was dropped.
I will solve this shortly by setting up a parallell smaller Kamailio that logs to database
instead of running siptrace with DB in a production server.
But in order for others not to find themselves in this very hard to find situation, is
there something that can be done to the siptrace code so it doesn’t execute in the same
process that listens for incoming SIP messages? It would be a huge improvement to just
block a background process so that siptrace doesn’t work, instead of blocking a listen
process and drop traffic.
I don’t think I can write that code. Sorry. Otherwise it would have been a high priority,
because these problems in production drove me almost insane. Restarting - all works fine.
After a few hours the first packets are lost. Calls getting through for some customers all
of the time.
Sorry for a long post, but I want this in the archives to help others.
Regards,
/O