[sr-dev] [kamailio/kamailio] EOF in pass_fd.c:receive_fd() (#2075)

Alex Balashov notifications at github.com
Thu Sep 26 03:22:06 CEST 2019


Running `Kamailio 4.4.7:97f308` with heavy `app_perl` usage, `usrloc` (`db_mode` 3) and not much else. 

After an upgrade from 4.1, getting periodic death like this:

```
Sep 25 20:54:28 switch /sbin/kamailio[29771]: CRITICAL: <core> [pass_fd.c:277]: receive_fd(): EOF on 9
```

Also happens on 4.2.x.

It happens every 30-31 minutes or so on the dot, which suggests that there is some sort of background operation on this operator's system elsewhere that is causing this, but I haven't been able to find it. 

Anyway, the PID is that of the TCP main marshalling process, i.e.

```
# kamcmd ps | grep -i tcp
15716	tcp receiver (generic) child=0
15717	tcp receiver (generic) child=1
15718	tcp receiver (generic) child=2
15719	tcp receiver (generic) child=3
15720	tcp receiver (generic) child=4
15721	tcp receiver (generic) child=5
15722	tcp receiver (generic) child=6
15723	tcp receiver (generic) child=7
15724	tcp receiver (generic) child=8
15725	tcp receiver (generic) child=9
15726	tcp receiver (generic) child=10
15727	tcp receiver (generic) child=11
15728	tcp receiver (generic) child=12
15729	tcp receiver (generic) child=13
15730	tcp receiver (generic) child=14
15731	tcp receiver (generic) child=15
15732	tcp main process
```

In this case, that would be `15732`. 

I assume this is because one of the TCP receiver processes dies, but I haven't been able to find any evidence of that. This is a high-volume system, so I can't reduce the worker thread pool too much, but I tried reducing the number of child processes per listener from 16 to 3, and attaching GDB to each one. They all die normally upon receipt of `SIGTERM`:

```
Program received signal SIGTERM, Terminated.
0x00002b1517a436f3 in __epoll_wait_nocancel () from /lib64/libc.so.6
```

Yet, it is the TCP distributor thread that shows the EOF in `receive_fd()`.

Because it's not a crash per se, I don't have a core dump or a way of grabbing the state of the program at the exact moment of the crash. All the processes seem to exit normally.

I have read some past issues that mention this, but their ultimate causes don't seem to be relevant here (e.g. no `dialog` usage). Moreover, the commits made to address this issue in other forms are present in the latest 4.4.x.

For reasons related to the high traffic volume, running with a higher debug verbosity level or some other fairly obvious ideas (e.g. no forking) aren't practical at all.

Any suggestions welcome!

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/2075
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kamailio.org/pipermail/sr-dev/attachments/20190925/5fbf95cb/attachment.html>


More information about the sr-dev mailing list