schneuwlym created an issue (kamailio/kamailio#4185)
### Description
This issue is a follow up of the discussion in the [mailing list](https://lists.kamailio.org/mailman3/hyperkitty/list/sr-users@lists.kamailio....)
We are using Kamailio 5.8.4 in a yocto scarthgap based image. We noticed on multiple nodes, that suddenly phones lost their registrations and the log showed messages like `send_fd(): sendmsg failed sending 55 on 52: Too many references: cannot splice (109)`.
After debugging we guess that the `send_fd` messages are just a symptom, not the cause. Generating a backtrace using `kamctl trap` shows that all TCP receiver are waiting for a `pthread_rwlock_...` lock.
Our deployments, where we randomly see the issue, are very small; 10-30 registrations.
I've tested kamailio version 5.8.4, 5.8.5, 5.8.6 and 6.0.1 and I was able to reproduce the issue with all versions.
I've also tested version 5.5.4, but with an older yocto (dunfell) which used openssl 1.1.1w, **which did not block!**
Yocto Scarthgap uses OpenSSL 3.2.3 and we also updated to version 3.4.1, which didn't show any difference.
### Troubleshooting
#### Reproduction
Meanwhile we found a way to reproduce the issue within seconds. It requires a bit of brute force, but it locks kamailio reliable.
We are generating 3 parallel SIP REGISTER streams with sipexer to 3 different accounts. Following an example of one stream: ``` watch -n 0.1 "./sipexer -register -vl 3 -co -com -ex 3600 -fu "2" -au "2" -ap "test" -tk <path to key>.key -tc <path to certificate>.cer -ti -sd -ru sip:<domain> -cu sip:2@<domain> tls:<fqdn>:5061;sleep 0.1;./sipexer -register -vl 3 -co -com -ex 0 -fu "2" -au "2" -ap "test" -tk <path to key>.key -tc <path to cert>.cer -ti -sd -ru sip:<domain> -cu sip:2@<domain> tls:<fqdn>:5061" ```
Our dialplan is quite complex, but I was also able to reproduce the issue with the default [dialplan](https://raw.githubusercontent.com/kamailio/kamailio/refs/heads/5.8/etc/kamai...) with the following changes: ``` 169a170,171
#!define WITH_TLS
205a208,219
alias=udp:<domain>:5060 alias=udp:127.0.0.1:5060 alias=udp:10.0.6.1:5060 alias=tls:<domain> alias=tls:10.0.6.1:5061 alias=tls:10.165.42.76:5061 alias=tls:<domain>:5061 alias=udp:<domain> listen=udp:127.0.0.1:5060 listen=tls:10.0.6.1:5061 listen=udp:10.0.6.1:5060 listen=tls:10.165.42.76:5061
498c512 < modparam("tls", "config", "/usr/local/etc/kamailio/tls.cfg") ---
modparam("tls", "config", "/etc/kamailio/kamailio-tls.cfg")
```
``` # cat /etc/kamailio/kamailio-tls.cfg [server:default] method = TLSv1+ verify_certificate = yes require_certificate = yes private_key = "<path to key>.key" certificate = "<path to cert>.crt" ca_list = "<path to ca>.pem"
[client:default] method = TLSv1+ verify_certificate = yes require_certificate = yes private_key = "<path to key>.key" certificate = "<path to cert>.crt" ca_list = "<path to ca>.pem"
```
#### Debugging Data
Following the backtrace of one tcp listener process. The full backtrace can be found [here](https://github.com/user-attachments/files/19430313/gdb_kamailio_20250321_154...)
``` ---start 4608 ----------------------------------------------------- [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib/libthread_db.so.1". __futex_abstimed_wait_common (futex_word=futex_word@entry=0x7fc855758a0c, expected=expected@entry=3, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=<optimized out>, cancel=cancel@entry=false) at futex-internal.c:103
warning: 103 futex-internal.c: No such file or directory #0 __futex_abstimed_wait_common (futex_word=futex_word@entry=0x7fc855758a0c, expected=expected@entry=3, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=<optimized out>, cancel=cancel@entry=false) at futex-internal.c:103 err = -512 clockbit = 256 op = <optimized out> #1 0x00007fc85a59429c in __GI___futex_abstimed_wait64 (futex_word=futex_word@entry=0x7fc855758a0c, expected=expected@entry=3, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=<optimized out>) at futex-internal.c:128 No locals. #2 0x00007fc85a59d830 in __pthread_rwlock_wrlock_full64 (abstime=0x0, clockid=0, rwlock=0x7fc855758a00) at /usr/src/debug/glibc/2.39+git/nptl/pthread_rwlock_common.c:730 private = <optimized out> wf = <optimized out> err = <optimized out> prefer_writer = false may_share_futex_used_flag = true wpf = <optimized out> ready = <optimized out> r = <optimized out> may_share_futex_used_flag = <optimized out> r = <optimized out> done = <optimized out> wpf = <optimized out> ready = <optimized out> __value = <optimized out> prefer_writer = <optimized out> private = <optimized out> wf = <optimized out> err = <optimized out> w = <optimized out> w = <optimized out> private = <optimized out> err = <optimized out> w = <optimized out> wf = <optimized out> wf = <optimized out> __value = <optimized out> #3 ___pthread_rwlock_wrlock (rwlock=0x7fc855758a00) at pthread_rwlock_wrlock.c:26 result = <optimized out> #4 0x00007fc859820b49 in CRYPTO_THREAD_write_lock (lock=<optimized out>) at /usr/src/debug/openssl/3.2.3/crypto/threads_pthread.c:108 No locals. #5 0x00007fc8599bd5da in ossl_prov_drbg_generate (drbg=0x7fc855710040, out=0x7fc856471980 "", outlen=48, strength=256, prediction_resistance=0, adin=0x7ffeec8b67e0 "@", adinlen=8) at /usr/src/debug/openssl/3.2.3/providers/implementations/rands/drbg.c:645 fork_id = <optimized out> reseed_required = 0 ret = 0 __func__ = "ossl_prov_drbg_generate" err = <optimized out> now = <optimized out> #6 0x00007fc8599bd9ca in ossl_drbg_get_seed (vdrbg=<optimized out>, pout=0x7ffeec8b6880, entropy=<optimized out>, min_len=<optimized out>, max_len=<optimized out>, prediction_resistance=0, adin=0x7ffeec8b6838 "\220'\037V\310\177", adin_len=8) at /usr/src/debug/openssl/3.2.3/providers/implementations/rands/drbg.c:175 drbg = 0x7fc855710040 bytes_needed = 48 buffer = 0x7fc856471980 "" __func__ = "ossl_drbg_get_seed" #7 0x00007fc8599bc907 in get_entropy (drbg=<optimized out>, drbg@entry=0x7fc8561f2790, pout=pout@entry=0x7ffeec8b6880, entropy=<optimized out>, min_len=48, max_len=48, prediction_resistance=prediction_resistance@entry=0) at /usr/src/debug/openssl/3.2.3/providers/implementations/rands/drbg.c:245 bytes = <optimized out> p_str = 256 __func__ = "get_entropy" #8 0x00007fc8599bd26c in ossl_prov_drbg_reseed_unlocked (drbg=drbg@entry=0x7fc8561f2790, prediction_resistance=prediction_resistance@entry=0, ent=ent@entry=0x0, ent_len=ent_len@entry=0, adin=adin@entry=0x0, adinlen=adinlen@entry=0) at /usr/src/debug/openssl/3.2.3/providers/implementations/rands/drbg.c:573 entropy = 0x0 entropylen = 0 __func__ = "ossl_prov_drbg_reseed_unlocked" #9 0x00007fc8599bd68e in ossl_prov_drbg_generate (adinlen=0, adin=0x0, prediction_resistance=0, strength=<optimized out>, outlen=32, out=0x7fc8564717c8 "", drbg=0x7fc8561f2790) at /usr/src/debug/openssl/3.2.3/providers/implementations/rands/drbg.c:697 fork_id = <optimized out> reseed_required = <optimized out> ret = <optimized out> fork_id = <optimized out> reseed_required = <optimized out> ret = <optimized out> __func__ = "ossl_prov_drbg_generate" err = <optimized out> now = <optimized out> #10 ossl_prov_drbg_generate (drbg=0x7fc8561f2790, out=0x7fc8564717c8 "", outlen=32, strength=<optimized out>, prediction_resistance=0, adin=0x0, adinlen=0) at /usr/src/debug/openssl/3.2.3/providers/implementations/rands/drbg.c:634 fork_id = <optimized out> reseed_required = 0 ret = 0 __func__ = "ossl_prov_drbg_generate" err = <optimized out> now = <optimized out> #11 0x00007fc8597e75c6 in evp_rand_generate_locked (ctx=ctx@entry=0x7fc8561f26f0, out=out@entry=0x7fc8564717c8 "", outlen=outlen@entry=32, strength=strength@entry=0, prediction_resistance=<optimized out>, prediction_resistance@entry=0, addin=0x0, addin_len=0) at /usr/src/debug/openssl/3.2.3/crypto/evp/evp_rand.c:570 chunk = 32 max_request = 65536 params = {{key = 0x7fc8599e9521 "max_request", data_type = 2, data = 0x7ffeec8b6968, data_size = 8, return_size = 8}, {key = 0x0, data_type = 0, data = 0x0, data_size = 0, return_size = 0}} __func__ = "evp_rand_generate_locked" #12 0x00007fc8597e840b in EVP_RAND_generate (ctx=0x7fc8561f26f0, out=out@entry=0x7fc8564717c8 "", outlen=outlen@entry=32, strength=strength@entry=0, prediction_resistance=prediction_resistance@entry=0, addin=addin@entry=0x0, addin_len=0) at /usr/src/debug/openssl/3.2.3/crypto/evp/evp_rand.c:592 res = <optimized out> #13 0x00007fc8598f9282 in RAND_bytes_ex (ctx=warning: could not convert 'ossl_lib_ctx_st' from the host encoding (ANSI_X3.4-1968) to UTF-32. This normally should not happen, please file a bug report. 0x0, buf=0x7fc8564717c8 "", num=32, strength=0) at /usr/src/debug/openssl/3.2.3/crypto/rand/rand_lib.c:380 rand = <optimized out> meth = <optimized out> __func__ = "RAND_bytes_ex" #14 0x00007fc859bdb62c in def_generate_session_id (ssl=ssl@entry=0x7fc856436e50, id=id@entry=0x7fc8564717c8 "", id_len=id_len@entry=0x7ffeec8b6ae4) at /usr/src/debug/openssl/3.2.3/ssl/ssl_sess.c:326 retry = 0 #15 0x00007fc859bdc132 in ssl_generate_session_id (s=s@entry=0x7fc856436e50, ss=ss@entry=0x7fc856471570) at /usr/src/debug/openssl/3.2.3/ssl/ssl_sess.c:409 tmp = 32 cb = 0x7fc859bdb600 <def_generate_session_id> ssl = <optimized out> __func__ = "ssl_generate_session_id" #16 0x00007fc859bdc31e in ssl_get_new_session (s=s@entry=0x7fc856436e50, session=session@entry=1) at /usr/src/debug/openssl/3.2.3/ssl/ssl_sess.c:464 ss = 0x7fc856471570 __func__ = "ssl_get_new_session" #17 0x00007fc859c3f169 in tls_early_post_process_client_hello (s=0x7fc856436e50) at /usr/src/debug/openssl/3.2.3/ssl/statem/statem_srvr.c:1898 i = <optimized out> loop = <optimized out> sctx = 0x7fc855d24280 j = <optimized out> dgrd = DOWNGRADE_NONE protverr = <optimized out> c = <optimized out> clienthello = 0x7fc856470a90 al = 80 id = <optimized out> comp = 0x0 ciphers = 0x7fc856471330 scsvs = 0x7fc8564713c0 ssl = 0x7fc856436e50 j = <optimized out> i = <optimized out> al = <optimized out> protverr = <optimized out> loop = <optimized out> id = <optimized out> comp = <optimized out> c = <optimized out> ciphers = <optimized out> scsvs = <optimized out> clienthello = <optimized out> dgrd = <optimized out> sctx = <optimized out> ssl = <optimized out> __func__ = "tls_early_post_process_client_hello" err = <optimized out> cipher = <optimized out> trc_out = <optimized out> pos = <optimized out> pref_cipher = <optimized out> master_key_length = <optimized out> m = <optimized out> comp_id = <optimized out> k = <optimized out> m = <optimized out> nn = <optimized out> v = <optimized out> done = <optimized out> o = <optimized out> #18 tls_post_process_client_hello (s=0x7fc856436e50, wst=<optimized out>) at /usr/src/debug/openssl/3.2.3/ssl/statem/statem_srvr.c:2295 rv = <optimized out> cipher = <optimized out> ssl = 0x7fc856436e50 err = <optimized out> __func__ = "tls_post_process_client_hello" #19 0x00007fc859c2bc1b in read_state_machine (s=0x7fc856436e50) at /usr/src/debug/openssl/3.2.3/ssl/statem/statem.c:712 mt = 1 pkt = {curr = 0x7fc856466125 "", remaining = 0} max_message_size = 0x7fc859c3e3e0 <ossl_statem_server_max_message_size> ret = <optimized out> process_message = 0x7fc859c42400 <ossl_statem_server_process_message> post_process_message = 0x7fc859c414f0 <ossl_statem_server_post_process_message> st = 0x7fc856436ee0 len = 273 transition = 0x7fc859c3d330 <ossl_statem_server_read_transition> cb = 0x7fc859c91d90 <sr_ssl_ctx_info_callback> ssl = 0x7fc856436e50 st = <optimized out> ret = <optimized out> mt = <optimized out> len = <optimized out> transition = <optimized out> pkt = <optimized out> process_message = <optimized out> post_process_message = <optimized out> max_message_size = <optimized out> cb = <optimized out> ssl = <optimized out> __func__ = "read_state_machine" #20 state_machine (s=0x7fc856436e50, server=1) at /usr/src/debug/openssl/3.2.3/ssl/statem/statem.c:478 buf = 0x0 cb = 0x7fc859c91d90 <sr_ssl_ctx_info_callback> st = 0x7fc856436ee0 ret = -1 ssret = <optimized out> ssl = 0x7fc856436e50 __func__ = "state_machine" #21 0x00007fc859cbf036 in tls_accept (c=c@entry=0x7fc856461be0, error=error@entry=0x7ffeec8b6dd8) at tls_server.c:457 ret = <optimized out> ssl = 0x7fc856436e50 cert = <optimized out> tls_c = 0x7fc8564171b0 tls_log = <optimized out> __func__ = "tls_accept" ```
### Additional Information
* **Kamailio Version** - output of `kamailio -v`
``` version: kamailio 5.8.4 (x86_64/linux) f1d83d flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, MEM_JOIN_FREE, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLOCKLIST, HAVE_RESOLV_RES ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_SEND_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. id: f1d83d compiled on 10:25:58 Nov 12 2024 with x86_64-ruag-linux-gcc 13.3.0
```
* **Operating System**:
It is a yocto scarthgap based image (Version 5.0.6 LTS)
``` Linux ttel 6.6.54-yocto-standard #1 SMP PREEMPT_DYNAMIC Tue Oct 8 03:03:40 UTC 2024 x86_64 GNU/Linux ```