[SR-Users] Timer child process loosing MySQL connections

Henning Westerholt hw at kamailio.org
Sun Jun 17 20:08:50 CEST 2018


Am Freitag, 15. Juni 2018, 12:32:25 CEST schrieb Tobias Lindgren:
> Having an issue with MySQL db connections being dropped in a system running
> 4.4.7.
> 
> We're using db_mysql and db_cluster modules setup a cluster connecting two
> different DB servers. We have two cluster connections, one for acc and one
> for "other queries". One DB (A) is on the same network, another DB (B) is
> on another network. The default DB connection is for the remote server B.
> Auto reconnect is enabled.
> 
> The specific issue seen is that the "timer" child process looses/drops both
> connections to DB A and B. Looking at the output from lsof when this
> happens, the connections usually does not both drop connections for A and B
> at the same time. Sometimes the connections keep up for ~24h, sometimes for
> 10 minutes, but normally the problem re-occurs every 6 hours or so. We're
> seeing this problem on two Kamailio servers, both handling fairly high
> amount of calls.
> 
> None of the other Kamailio child processes seems to get their connections
> dropped, only the "timer" process. To solve this we need to restart
> Kamailio.
> 
> Lately I've added the timer.so module to make a simple query on each cluster
> connection each 10 seconds.
> 
> This is an example output from when the problem appears and connections are
> dropped: Jun 15 09:39:12  /usr/sbin/kamailio[10439]: ERROR: db_mysql
> [km_dbase.c:128]: db_mysql_submit_query(): driver error on query: Can't
> connect to MySQL server on 'xxx' (4) (2003) Jun 15 09:39:12 
> /usr/sbin/kamailio[10439]: ERROR: <core> [db_query.c:181]:
> db_do_raw_query(): error while submitting query Jun 15 09:39:12 
> /usr/sbin/kamailio[10439]: ERROR: db_mysql [km_dbase.c:128]:
> [looking for ideas..]

Hello Tobias,

the timer process is obviously the one that is not doing any "heavy work" during SIP 
message processing. Its mostly concerned with cleanup and maintenance tasks e.g. usrloc 
user deletion, if you use this. Does the simple timer every 10s changes the behavior for 
you?

I have observed recently some similar issue (with a different code base). Here we found 
out after a long debugging that the firewall at the network border had some issues and was 
quitting this long running TCP sessions. Changing the respective timeout was fixing this 
issue. As you seen this issue every 6h, it could be a similar issue.

If there is no firewall or other network element between the hosts, I would try to reproduce 
this with e.g. a simple python script that connects to the DB and sleeps periodically. You 
could even just start a second kamailio with limited children count and attach to it with strace 
or a debugger.

Best regards,

Henning

-- 
If you like my work in the Kamailio project, it would be great if you could consider 
supporting me on Patreon: https://www.patreon.com/henningw
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kamailio.org/pipermail/sr-users/attachments/20180617/2b5a230d/attachment.html>


More information about the sr-users mailing list