Hello,
Do you know what can cause this issue ? Do you need more information ?
### Description
I had an issue on a kamailio instance with TLS connections.
User agents where not able to register to kamailio using TLS
With the follonwing setup : UDP => Kamailio => TLS Kamailio was able to route from UDP to TLS but not from TLS to UDP.
### Troubleshooting
#### Log Messages
Here are some unusual logs during the issue :
``` ERROR: <core> [core/tcp_main.c:3504]: send_fd_queue_run(): send_fd failed on socket 27 , queue entry 0, retries 17, connection 0x7f913e1d7fd8, tcp socket 134, errno=11 (Resource temporarily unavailable) ERROR: <core> [core/tcp_main.c:3504]: send_fd_queue_run(): send_fd failed on socket 23 , queue entry 1, retries 13, connection 0x7f913e1a0eb0, tcp socket 105, errno=11 (Resource temporarily unavailable) ERROR: <core> [core/tcp_main.c:3504]: send_fd_queue_run(): send_fd failed on socket 25 , queue entry 2, retries 11, connection 0x7f913e1ada08, tcp socket 137, errno=11 (Resource temporarily unavailable) ERROR: <core> [core/tcp_main.c:3504]: send_fd_queue_run(): send_fd failed on socket 27 , queue entry 3, retries 9, connection 0x7f913e1c2cf0, tcp socket 170, errno=11 (Resource temporarily unavailable) ERROR: <core> [core/tcp_main.c:3504]: send_fd_queue_run(): send_fd failed on socket 23 , queue entry 0, retries 7, connection 0x7f913e1cb480, tcp socket 281, errno=11 (Resource temporarily unavailable) ERROR: <core> [core/tcp_main.c:3504]: send_fd_queue_run(): send_fd failed on socket 25 , queue entry 1, retries 5, connection 0x7f913e1e4b30, tcp socket 398, errno=11 (Resource temporarily unavailable) ERROR: <core> [core/tcp_main.c:3504]: send_fd_queue_run(): send_fd failed on socket 27 , queue entry 2, retries 3, connection 0x7f913e1e8ef8, tcp socket 406, errno=11 (Resource temporarily unavailable) CRITICAL: <core> [core/tcp_main.c:4216]: send2child(): tcp child 1, socket 23: queue full, 304 requests queued (total handled 425) CRITICAL: <core> [core/tcp_main.c:4216]: send2child(): tcp child 2, socket 25: queue full, 304 requests queued (total handled 4940) CRITICAL: <core> [core/tcp_main.c:4216]: send2child(): tcp child 3, socket 27: queue full, 304 requests queued (total handled 310) CRITICAL: <core> [core/tcp_main.c:4216]: send2child(): tcp child 1, socket 23: queue full, 305 requests queued (total handled 426) CRITICAL: <core> [core/tcp_main.c:4216]: send2child(): tcp child 2, socket 25: queue full, 305 requests queued (total handled 4941) CRITICAL: <core> [core/tcp_main.c:4216]: send2child(): tcp child 3, socket 27: queue full, 305 requests queued (total handled 311) ``` Just before the CRITICAL log, there is this comment in sources : /* FIXME: remove after debugging */
### Additional Information
``` version: kamailio 5.3.5 (x86_64/linux) 9e70e8 flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. id: 9e70e8 compiled on 03:49:09 Jun 24 2020 with /usr/bin/gcc 5.4.0 ```
Looks like tcp workers being too busy, can you get the output of `kamctl trap` when this happens?
Strange, at the moment of the issue, kamailio was using about 0% CPU (it never goes over a few %), server load and network usage where very low. I'll do a `kamctl trap` next time.
Maybe they were blocked in some operation (e.g., sql query). A kamctl trap will show what each kamailio process does at that moment.
Any update on this one?
The issue occured a few times before I enabled kamctl trap. It has not occured since... I'm still waiting Fell free to close if you want, I can reopen when I have more information
Another possibility is a deadlock, I know I've had a few in the cdp module, the solution of which I'm currently testing. Maybe you can also get a gdb trace once the issue happens.
On Wed, 26 Aug 2020, 3:36 PM ThomasSevestre, notifications@github.com wrote:
The issue occured a few times before I enabled kamctl trap. It has not occured since... I'm still waiting Fell free to close if you want, I can reopen when I have more information
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kamailio/kamailio/issues/2392#issuecomment-680883713, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABO7UZLRR2ZXI2YDONE3N7TSCUFTTANCNFSM4ORWFURA . _______________________________________________ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Closed #2392.