Hi Henning,
So I went with your suggestion to start a new Kamailio with just a few childs. Got the
same issue after running for some time (again, with the timer process):
Jun 18 10:19:36 /usr/sbin/kamailio[25582]: DEBUG: db_cluster [dbcl_api.c:370]:
db_cluster_raw_query(): executing db cluster raw query command
Jun 18 10:19:36 /usr/sbin/kamailio[25582]: DEBUG: db_cluster [dbcl_api.c:371]:
db_cluster_raw_query(): serial operation - cluster [k1] (9/0)
Jun 18 10:19:36 /usr/sbin/kamailio[25582]: ERROR: db_mysql [km_dbase.c:128]:
db_mysql_submit_query(): driver error on query: Lost connection to MySQL server at
'waiting for initial communication packet', system error: 4 (2013)
Jun 18 10:19:36 /usr/sbin/kamailio[25582]: ERROR: <core> [db_query.c:181]:
db_do_raw_query(): error while submitting query
Jun 18 10:19:36 /usr/sbin/kamailio[25582]: DEBUG: db_cluster [dbcl_api.c:371]:
db_cluster_raw_query(): serial operation - failre on cluster [k1] (9/0)
Jun 18 10:19:36 /usr/sbin/kamailio[25582]: DEBUG: db_cluster [dbcl_api.c:371]:
db_cluster_raw_query(): serial operation - cluster [k1] (1/0)
Not really sure how perform a good debug beyond this point, any suggestions?
Kind regards,
/Tobias
________________________________
From: sr-users <sr-users-bounces(a)lists.kamailio.org> on behalf of Tobias Lindgren
<the_fx(a)hotmail.com
Sent: Sunday, June
17, 2018 9:13 PM
To: Henning Westerholt; sr-users(a)lists.kamailio.org
Subject: Re: [SR-Users] Timer child process loosing MySQL connections
Hi Henning,
I've understood that timer has fairly little to do, at least normally. I've been
using strace to verify that the 10s requests are being made on my two different
connections, on the timer process. However, still seeing the same issue with connections
dropped, and I'm not convinced the behaviour has changed either.
Something that makes me confused is that the mysql connection code in Kamailio seems to
have support for auto reconnect and it's also enabled. If this was a firewall issue, I
would expect to see other child processes loosing their connection as well. So my guess
would be that they actually do that. And if they really do then it would seem they are
able to reconnect properly (will try to confirm tomorrow if they do). But not the timer
process, for some reason, which is very strange.
There is a firewall involved, and I've been try to find something timeout related
there, but here it's also confusing as the connections can drop after 10 minuters or
10 hours. I'll revisit that thought though.. Also your idea on starting a new Kamailio
with limited config is a very good idea, I'll give that a try.
Thanks!
/Tobias
________________________________
From: Henning Westerholt <hw(a)kamailio.org
Sent: Sunday, June 17, 2018 8:08 PM
To: sr-users(a)lists.kamailio.org
Cc: Tobias Lindgren
Subject: Re: [SR-Users] Timer child process loosing MySQL connections
Am Freitag, 15. Juni 2018, 12:32:25 CEST schrieb Tobias Lindgren:
Having an issue with MySQL db connections being
dropped in a system running
4.4.7.
We're using db_mysql and db_cluster modules setup
a cluster connecting two
different DB servers. We have two cluster connections,
one for acc and one
for "other queries". One DB (A) is on the
same network, another DB (B) is
on another network. The default DB connection is for
the remote server B.
Auto reconnect is enabled.
The specific issue seen is that the "timer"
child process looses/drops both
connections to DB A and B. Looking at the output from
lsof when this
happens, the connections usually does not both drop
connections for A and B
at the same time. Sometimes the connections keep up
for ~24h, sometimes for
10 minutes, but normally the problem re-occurs every 6
hours or so. We're
seeing this problem on two Kamailio servers, both
handling fairly high
amount of calls.
None of the other Kamailio child processes seems to
get their connections
dropped, only the "timer" process. To solve
this we need to restart
Kamailio.
Lately I've added the timer.so module to make a
simple query on each cluster
connection each 10 seconds.
This is an example output from when the problem
appears and connections are
dropped: Jun 15 09:39:12 /usr/sbin/kamailio[10439]:
ERROR: db_mysql
[km_dbase.c:128]: db_mysql_submit_query(): driver
error on query: Can't
connect to MySQL server on 'xxx' (4) (2003)
Jun 15 09:39:12
/usr/sbin/kamailio[10439]: ERROR: <core>
[db_query.c:181]:
db_do_raw_query(): error while submitting query Jun 15
09:39:12
/usr/sbin/kamailio[10439]: ERROR: db_mysql
[km_dbase.c:128]:
[looking for ideas..]
Hello Tobias,
the timer process is obviously the one that is not doing any "heavy work" during
SIP message processing. Its mostly concerned with cleanup and maintenance tasks e.g.
usrloc user deletion, if you use this. Does the simple timer every 10s changes the
behavior for you?
I have observed recently some similar issue (with a different code base). Here we found
out after a long debugging that the firewall at the network border had some issues and was
quitting this long running TCP sessions. Changing the respective timeout was fixing this
issue. As you seen this issue every 6h, it could be a similar issue.
If there is no firewall or other network element between the hosts, I would try to
reproduce this with e.g. a simple python script that connects to the DB and sleeps
periodically. You could even just start a second kamailio with limited children count and
attach to it with strace or a debugger.
Best regards,
Henning
--
If you like my work in the Kamailio project, it would be great if you could consider
supporting me on Patreon:
https://www.patreon.com/henningw