Running `Kamailio 4.4.7:97f308` with heavy `app_perl` usage, `usrloc` (`db_mode` 3) and
not much else.
After an upgrade from 4.1, getting periodic death like this:
```
Sep 25 20:54:28 switch /sbin/kamailio[29771]: CRITICAL: <core> [pass_fd.c:277]:
receive_fd(): EOF on 9
```
Also happens on 4.2.x.
It happens every 30-31 minutes or so on the dot, which suggests that there is some sort of
background operation on this operator's system elsewhere that is causing this, but I
haven't been able to find it.
Anyway, the PID is that of the TCP main marshalling process, i.e.
```
# kamcmd ps | grep -i tcp
15716 tcp receiver (generic) child=0
15717 tcp receiver (generic) child=1
15718 tcp receiver (generic) child=2
15719 tcp receiver (generic) child=3
15720 tcp receiver (generic) child=4
15721 tcp receiver (generic) child=5
15722 tcp receiver (generic) child=6
15723 tcp receiver (generic) child=7
15724 tcp receiver (generic) child=8
15725 tcp receiver (generic) child=9
15726 tcp receiver (generic) child=10
15727 tcp receiver (generic) child=11
15728 tcp receiver (generic) child=12
15729 tcp receiver (generic) child=13
15730 tcp receiver (generic) child=14
15731 tcp receiver (generic) child=15
15732 tcp main process
```
In this case, that would be `15732`.
I assume this is because one of the TCP receiver processes dies, but I haven't been
able to find any evidence of that. This is a high-volume system, so I can't reduce the
worker thread pool too much, but I tried reducing the number of child processes per
listener from 16 to 3, and attaching GDB to each one. They all die normally upon receipt
of `SIGTERM`:
```
Program received signal SIGTERM, Terminated.
0x00002b1517a436f3 in __epoll_wait_nocancel () from /lib64/libc.so.6
```
Yet, it is the TCP distributor thread that shows the EOF in `receive_fd()`.
Because it's not a crash per se, I don't have a core dump or a way of grabbing the
state of the program at the exact moment of the crash. All the processes seem to exit
normally.
I have read some past issues that mention this, but their ultimate causes don't seem
to be relevant here (e.g. no `dialog` usage). Moreover, the commits made to address this
issue in other forms are present in the latest 4.4.x.
For reasons related to the high traffic volume, running with a higher debug verbosity
level or some other fairly obvious ideas (e.g. no forking) aren't practical at all.
Any suggestions welcome!
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/2075