Running Kamailio 4.4.7:97f308
with heavy app_perl
usage, usrloc
(db_mode
3) and not much else.
After an upgrade from 4.1, getting periodic death like this:
Sep 25 20:54:28 switch /sbin/kamailio[29771]: CRITICAL: <core> [pass_fd.c:277]: receive_fd(): EOF on 9
Also happens on 4.2.x.
It happens every 30-31 minutes or so on the dot, which suggests that there is some sort of background operation on this operator's system elsewhere that is causing this, but I haven't been able to find it.
Anyway, the PID is that of the TCP main marshalling process, i.e.
# kamcmd ps | grep -i tcp
15716 tcp receiver (generic) child=0
15717 tcp receiver (generic) child=1
15718 tcp receiver (generic) child=2
15719 tcp receiver (generic) child=3
15720 tcp receiver (generic) child=4
15721 tcp receiver (generic) child=5
15722 tcp receiver (generic) child=6
15723 tcp receiver (generic) child=7
15724 tcp receiver (generic) child=8
15725 tcp receiver (generic) child=9
15726 tcp receiver (generic) child=10
15727 tcp receiver (generic) child=11
15728 tcp receiver (generic) child=12
15729 tcp receiver (generic) child=13
15730 tcp receiver (generic) child=14
15731 tcp receiver (generic) child=15
15732 tcp main process
In this case, that would be 15732
.
I assume this is because one of the TCP receiver processes dies, but I haven't been able to find any evidence of that. This is a high-volume system, so I can't reduce the worker thread pool too much, but I tried reducing the number of child processes per listener from 16 to 3, and attaching GDB to each one. They all die normally upon receipt of SIGTERM
:
Program received signal SIGTERM, Terminated.
0x00002b1517a436f3 in __epoll_wait_nocancel () from /lib64/libc.so.6
Yet, it is the TCP distributor thread that shows the EOF in receive_fd()
.
Because it's not a crash per se, I don't have a core dump or a way of grabbing the state of the program at the exact moment of the crash. All the processes seem to exit normally.
I have read some past issues that mention this, but their ultimate causes don't seem to be relevant here (e.g. no dialog
usage). Moreover, the commits made to address this issue in other forms are present in the latest 4.4.x.
For reasons related to the high traffic volume, running with a higher debug verbosity level or some other fairly obvious ideas (e.g. no forking) aren't practical at all.
Any suggestions welcome!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.