Thank you, I've already changed the value, but that didn't solve my issue with the blocking Kamailio.
I did some tests with an older Yocto image which contains Kamailio 5.5.4 and OpenSSL 1.1.1w. With this combination I was not able to reproduce the deadlock.
Am Fr., 21. März 2025 um 16:49 Uhr schrieb Harald Kapper hk@kapper.net:
Hi
fyi – in case you need this:
systemctl edit kamailio.service
add:
### Editing /etc/systemd/system/kamailio.service.d/override.conf
### Anything between here and the comment below will become the contents of the drop-in file
[Service]
LimitNOFILE=40960
### Edits below this comment will be discarded
(or whatever number you’d like)
saving saves the override for systemd into (like) /etc/systemd/system/kamailio.service.d/override.conf
Afterwards you have to restart the service (systemctl restart kamailio.service)
And you should be fine.
Regards, hk
*Von:* Henning Westerholt via sr-users sr-users@lists.kamailio.org *Gesendet:* Donnerstag, 20. März 2025 14:25 *An:* Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org *Cc:* Mathias Schneuwly mathias@schneuwlys.ch; Henning Westerholt < hw@gilawa.com> *Betreff:* [SR-Users] Re: send_fd(): sendmsg failed sending 55 on 52: Too many references: cannot splice (109)
Hello Mathias,
yes, you can set the open files e.g. in the systemd unit file for Kamailio.
Cheers,
Henning
*From:* Mathias Schneuwly via sr-users sr-users@lists.kamailio.org *Sent:* Donnerstag, 20. März 2025 13:43 *To:* sr-users@lists.kamailio.org *Cc:* Mathias Schneuwly mathias@schneuwlys.ch *Subject:* [SR-Users] Re: send_fd(): sendmsg failed sending 55 on 52: Too many references: cannot splice (109)
I'll also give a try with 6.0.1 later.
Meanwhile, I've tested 5.8.6 and 6.0.1 and both show the same issue.
Am Do., 20. März 2025 um 10:21 Uhr schrieb Mathias Schneuwly < mathias@schneuwlys.ch>:
Hi Henning
Maybe also check the actual limits by “cat /proc/$kamailio-pid/limits”
Not 100% if its related, but if you are looking to scale this would be for sure required.
The limit was
Max open files 1024 524288 files
and I changed it now to
Max open files 65536 65536 files
Are there some recommendations for the limit? Would it make sense to change that in the provided systemd file if you say that 1024 is not that much?
But anyway, it fails with 1024 and with 65536.
I've also updated to 5.8.5 and it shows the same issue. I'll also give a try with 6.0.1 later.
About the tls_thread_mode – you should use mode 2 if you are running
with openssl 3.x, otherwise you probably will observe random crashes.
We are running OpenSSL 3.2.3 and tls_thread_mode is set to 2. But it still blocks.
Best regards
Mathias
Am Do., 20. März 2025 um 08:42 Uhr schrieb Henning Westerholt < hw@gilawa.com>:
Hello Mathias,
1024 open files are not that much, you should increase it for the kamailio user. Maybe also check the actual limits by “cat /proc/$kamailio-pid/limits”
Not 100% if its related, but if you are looking to scale this would be for sure required.
About the tls_thread_mode – you should use mode 2 if you are running with openssl 3.x, otherwise you probably will observe random crashes.
If you are seeing the mentioned problem with 15 phones, this is clearly a problem and should be further investigated.
Cheers,
Henning
*From:* Mathias Schneuwly via sr-users sr-users@lists.kamailio.org *Sent:* Donnerstag, 20. März 2025 08:07 *To:* sr-users@lists.kamailio.org *Cc:* Mathias Schneuwly mathias@schneuwlys.ch *Subject:* [SR-Users] Re: send_fd(): sendmsg failed sending 55 on 52: Too many references: cannot splice (109)
Hi
Thanks for the reply. We are using the following limits and I guess they are the default values
real-time non-blocking time (microseconds, -R) unlimited core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 15666 max locked memory (kbytes, -l) 8192 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 15666 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
But to me, it looks more like a deadlock somewhere around SSL_accept. According to the stack trace generated with "kamctl trap", all 8 TCP receiver processes are waiting in "#1 0x00007fa14ca04830 in pthread_rwlock_wrlock () from /usr/lib/libc.so.6". I guess the message "Too many references: cannot splice (109)" is just the consequence as the main process is no longer able to dispatch jobs to the child workers.
Meanwhile, I'm able to reproduce the issue. It requires a bit of brute force, but it deadlocks reliably after a few seconds.
I'm flooding Kamailio with register messages using sipexer. I'm running the following command 3 times in parallel and always after a few seconds, Kamailio is completely blocked.
watch -n 0.1 "./sipexer -register -vl 3 -co -com -ex 3600 -fu "2" -au "2" -ap "test" -tk <path to key>.key -tc <path to cert>.cer -ti -sd -ru sip:<domain> -cu sip:2@<domain> tls:<fqdn>:5061;sleep 0.1;./sipexer -register -vl 3 -co -com -ex 0 -fu "2" -au "2" -ap "test" -tk <path to key>.key -tc <path to cert>.cer -ti -sd -ru sip:<domain> -cu sip:2@<domain> tls:<fqdn>:5061"
I played a bit with the parameters "modparam("tls", "init_mode", 1)" and "tls_threads_mode = 2", but it didn't seem to change anything.
Please have in mind that we usually do not stress Kamailio that much. We have seen the issue a couple of times with around 5 to 15 SIP phones registered. But in this usual setup, the problem does not occur reliably.
Best regards
Mathias
Am Di., 18. März 2025 um 21:53 Uhr schrieb Henning Westerholt < hw@gilawa.com>:
Hello,
check the ulimit settings for the kamailio user and general resource usage on that system. In development branch there is also a feature present to print the actual ulimits on startup.
Cheers,
Henning
*From:* Mathias Schneuwly via sr-users sr-users@lists.kamailio.org *Sent:* Montag, 17. März 2025 13:41 *To:* sr-users@lists.kamailio.org *Cc:* Mathias Schneuwly mathias@schneuwlys.ch *Subject:* [SR-Users] send_fd(): sendmsg failed sending 55 on 52: Too many references: cannot splice (109)
Hi
We are using kamailio 5.8.4.
Recently, we noticed, that after a few days, we see the following messages from Kamailio `2025 Mar 17 10:22:11 ttel CRITICAL: <core> [core/pass_fd.c:193]: send_fd(): sendmsg failed sending 55 on 52: Too many references: cannot splice (109)`.
When this happens, devices are no longer able to register. At the moment we have no idea what might be wrong.
Unfortunately, the internet does not reveal much information regarding this issue. In one issue related to Kamailio ( https://lists.kamailio.org/mailman3/hyperkitty/list/sr-users@lists.kamailio....), I found the command `kamctl trap` and when I execute that, I see one issue, which might be related, but I'm just guessing...
---start 1782236 ----------------------------------------------------- [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib/libthread_db.so.1". 0x00007fa14c9fb1c6 in ?? () from /usr/lib/libc.so.6 $1 = 26 $2 = {pid = 1782236, unix_sock = 36, idx = 0, status = 1, rank = 17, desc = "tcp receiver (generic) child=0", '\000' <repeats 97 times>} #0 0x00007fa14c9fb1c6 in ?? () from /usr/lib/libc.so.6 No symbol table info available. #1 0x00007fa14ca04830 in pthread_rwlock_wrlock () from /usr/lib/libc.so.6 No symbol table info available. #2 0x00007fa14ac20b49 in CRYPTO_THREAD_write_lock () from /usr/lib/libcrypto.so.3 No symbol table info available. #3 0x00007fa14adbd5da in ?? () from /usr/lib/libcrypto.so.3 No symbol table info available. #4 0x00007fa14adbd9ca in ?? () from /usr/lib/libcrypto.so.3 No symbol table info available. #5 0x00007fa14adbc907 in ?? () from /usr/lib/libcrypto.so.3 No symbol table info available. #6 0x00007fa14adbd26c in ?? () from /usr/lib/libcrypto.so.3 No symbol table info available. #7 0x00007fa14adbd68e in ?? () from /usr/lib/libcrypto.so.3 No symbol table info available. #8 0x00007fa14abe75c6 in ?? () from /usr/lib/libcrypto.so.3 No symbol table info available. #9 0x00007fa14abe840b in EVP_RAND_generate () from /usr/lib/libcrypto.so.3 No symbol table info available. #10 0x00007fa14acf9282 in RAND_bytes_ex () from /usr/lib/libcrypto.so.3 No symbol table info available. #11 0x00007fa14b06cce8 in ?? () from /usr/lib/libssl.so.3 No symbol table info available. #12 0x00007fa14b05bdf7 in ?? () from /usr/lib/libssl.so.3 No symbol table info available. #13 0x00007fa14b0ef036 in tls_accept (c=c@entry=0x7fa1475c0dc0, error=error@entry=0x7ffebd9443b8) at tls_server.c:457 ret = <optimized out> ssl = warning: could not convert 'ssl_st' from the host encoding (ANSI_X3.4-1968) to UTF-32. This normally should not happen, please file a bug report. 0x7fa147668bd0 cert = <optimized out> tls_c = 0x7fa14768f490 tls_log = <optimized out> __func__ = "tls_accept" #14 0x00007fa14b0f7540 in tls_h_read_f (c=0x7fa1475c0dc0, flags=0x7ffebd9647d0) at tls_server.c:1153 r = 0x7fa1475c0ee8 bytes_free = 16383 bytes_read = <optimized out> read_size = <optimized out> ssl_error = 0 ssl_read = 0 ssl = 0x7fa147668bd0 rd_buf = "\026\003\003\004\016\v\000\004\n\000\004\a\000\004\0040\202\004\0000\202\002\350\240\003\002\001\002\002\002\026\0270\r\006\t*\206H\206\367\r\001\001\v\005\0000b1\v0\t\006\003U\004\006\023\002CH1\r0\v\006\003U\004\b\f\004Bern1\r0\v\006\003U\004\a\f\004Bern1\r0\v\006\003U\004\n\f\004RUAG1\0200\016\006\003U\004\v\f\aDefence1\0240\022\006\003U\004\003\f\versaimfs.ch0\036\027\r230302163519Z\027\r291230163519Z0`1\v0\t\006\003U\004\006\023\002CH1\r0\v\006\003"... wr_buf = "\026\003\003\000A\002\000\000=\003\003\371\204\326\361\262-9\300\344gV\2316\373\235D\314\225u8\v}\215\254DOWNGRD\001\000\3000\000\000\025\377\001\000\001\000\000\v\000\004\003\000\001\002\000#\000\000\000\027\000\000\026\003\003\a\267\v\000\a\263\000\a\260\000\004\0040\202\004\0000\202\002\350\240\003\002\001\002\002\002\r>0\r\006\t*\206H\206\367\r\001\001\v\005\0000b1\v0\t\006\003U\004\006\023\002CH1\r0\v\006\003U\004\b\f\004Bern1\r0\v\006\003U\004\a\f\004Bern1\r0\v\006\003U\004\n\f\004RUAG1\0200\016\006\003U\004\v\f\aDefence1\0240\022\006"... rd = {buf = 0x7ffebd944480 "\026\003\003\004\016\v", pos = 1405, used = 1405, size = 65536} wr = {buf = 0x7ffebd954480 "\026\003\003", pos = 0, used = 0, size = 65536} tls_c = 0x7fa14768f490 enc_rd_buf = <optimized out> n = 0 flush_flags = 0 err_src = <optimized out> ip_buf = '\000' <repeats 63 times> x = <optimized out> tls_dbg = <optimized out> __func__ = "tls_h_read_f" redo_read = <optimized out> continue_ssl_read = <optimized out> __llevel = <optimized out> __kld = <optimized out> __llevel = <optimized out> __kld = <optimized out> #15 0x000055d2519871b4 in tcp_read_headers (c=c@entry=0x7fa1475c0dc0, read_flags=read_flags@entry=0x7ffebd9647d0) at core/tcp_read.c:445 bytes = <optimized out> remaining = <optimized out> p = <optimized out> r = 0x7fa1475c0ee8 mc = <optimized out> body_len = <optimized out> tvnow = {tv_sec = 1742100857, tv_usec = 944923} tvdiff = <optimized out> mfline = <optimized out> mtransid = <optimized out> __func__ = "tcp_read_headers" #16 0x000055d251989cef in tcp_read_req (con=0x7fa1475c0dc0, bytes_read=bytes_read@entry=0x7ffebd9647c8, read_flags=read_flags@entry=0x7ffebd9647d0) at core/tcp_read.c:1509 bytes = <optimized out> total_bytes = 0 resp = 1 size = <optimized out> req = 0x7fa1475c0ee8 dst = {send_sock = 0x55d251a91020, to = {s = {sa_family = 52024, sa_data = "\257Q\322U\000\000\002\000\000\000\376\177\000"}, sin = {sin_family = 52024, sin_port = 20911, sin_addr = {s_addr = 21970}, sin_zero = "\002\000\000\000\376\177\000"}, sin6 = {sin6_family = 52024, sin6_port = 20911, sin6_flowinfo = 21970, sin6_addr = {__in6_u = {__u6_addr8 = "\002\000\000\000\376\177\000\000\030\000\000\000\000\000\000", __u6_addr16 = {2, 0, 32766, 0, 24, 0, 0, 0}, __u6_addr32 = {2, 32766, 24, 0}}}, sin6_scope_id = 0}, sas = {ss_family = 52024, __ss_padding = "\257Q\322U\000\000\002\000\000\000\376\177\000\000\030", '\000' <repeats 23 times>, "\001\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\262+\226Q\001 \000\000\200,5L\241\177\000\000 \020\251Q\322U\000\000 \020\251Q\322U\000\000\200\303\257Q\322U\000\00009SG\241\177\000\000\330G\226\275\376\177\000\000\b\000\000\000\000\000\000", __ss_align = 20}}, id = 1, send_flags = {f = 1, blst_imask = 0}, proto = 14 '\016', proto_pad0 = 0 '\000', proto_pad1 = 0} c = <optimized out> ret = <optimized out> again = <optimized out> __func__ = "tcp_read_req" #17 0x000055d25198ebec in handle_io (fm=fm@entry=0x7fa14c352c80, events=events@entry=1, idx=idx@entry=-1) at core/tcp_read.c:1906 ret = 8 n = 0 read_flags = RD_CONN_SHORT_READ con = 0x7fa1475c0dc0 s = 14 resp = <optimized out> t = <optimized out> ee = 0x0 __func__ = "handle_io" error = <optimized out> #18 0x000055d25199469b in io_wait_loop_epoll (repeat=repeat@entry=0, t=2, h=0x55d251c49d60 <io_w>) at core/io_wait.h:1075 n = 1 r = 0 fm = 0x7fa14c352c80 revents = 1 __func__ = "io_wait_loop_epoll" #19 0x000055d251994e87 in tcp_receive_loop (unix_sock=<optimized out>) at core/tcp_read.c:2026 __func__ = "tcp_receive_loop" #20 0x000055d25197e72c in tcp_init_children (woneinit=woneinit@entry=0x7ffebd964d1c) at core/tcp_main.c:5391 r = <optimized out> i = <optimized out> reader_fd_1 = 39 pid = <optimized out> si_desc = "tcp receiver (generic)\000\000`M\226\275\376\177\000\000\001\000\000\000\000\000\000\000\210\035\223Q\322U\000\000 \020\251Q\322U\000\000\220\315\257Q\322U\000\000\220\242/L\241\177\000\000\025\000\000\000\000\000\000\000\004\000\000\000\000\000\000\000%\020\251Q\322U\000\000HU\257Q\322U\000\000e\005\000\000\000\000\000\000 \020\251Q\322U\000\000\3750\235Q\322U\000" si = <optimized out> __func__ = "tcp_init_children" error = <optimized out> #21 0x000055d2517c6c7e in main_loop () at main.c:1950 i = <optimized out> pid = <optimized out> si = <optimized out> si_desc = "udp receiver child=7 sock=10.0.90.1:5060\000\000\000\000\322U\000\000?\342\017K\241\177\000\000\022\000\000\000\377\377\377\377\200\006\265L\241\177\000\000\253m\237L\241\177\000\000\361|\003\000\000\000\000\000\t\000\000\000\000\000\000\000Sat Mar 15 05:25:32 2025\n\000\264L\241\177\000" nrprocs = <optimized out> woneinit = 1 __func__ = "main_loop" error = <optimized out> #22 0x000055d2517bc151 in main (argc=<optimized out>, argv=<optimized out>) at main.c:3256 cfg_stream = <optimized out> c = <optimized out> r = <optimized out> tmp = 0x7ffebd966e7f "" tmp_len = 32673 port = 5060 proto = 0 aproto = 0 ahost = 0x0 aport = 0 options = 0x55d251aa74b8 ":f:cm:M:dVIhEeb:B:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:" ret = -1 seed = 3209119183 rfd = <optimized out> debug_save = <optimized out> debug_flag = <optimized out> dont_fork_cnt = 0 n_lst = <optimized out> p = <optimized out> st = {st_dev = 24, st_ino = 46237, st_nlink = 2, st_mode = 16888, st_uid = 899, st_gid = 899, __pad0 = 0, st_rdev = 0, st_size = 40, st_blksize = 4096, st_blocks = 0, st_atim = {tv_sec = 1742012732, tv_nsec = 217328715}, st_mtim = {tv_sec = 1742012732, tv_nsec = 217328715}, st_ctim = {tv_sec = 1742012732, tv_nsec = 217328715}, __glibc_reserved = {0, 0, 0}} l1 = <optimized out> tbuf = '\000' <repeats 32 times>, "\030\202\237\275\376\177\000\000@\002\000\000@\003\000\000\001", '\000' <repeats 23 times>, "@\003\000\000@\003\000\000@\003\000\000@\003\000\000@ \003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@ \003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@ \003\000\000@\003\000\000@\003\000\000@\003\000\000@\003\000\000@ \003\000\000@\003\000\000@\003", '\000' <repeats 11 times>, "\001", '\000' <repeats 118 times>... option_index = 12 long_options = {{name = 0x55d251a9230b "help", has_arg = 0, flag = 0x0, val = 104}, {name = 0x55d251a937a4 "version", has_arg = 0, flag = 0x0, val = 118}, {name = 0x55d251a9826a "alias", has_arg = 1, flag = 0x0, val = 1024}, {name = 0x55d251a92310 "subst", has_arg = 1, flag = 0x0, val = 1025}, {name = 0x55d251a92316 "substdef", has_arg = 1, flag = 0x0, val = 1026}, {name = 0x55d251a9231f "substdefs", has_arg = 1, flag = 0x0, val = 1027}, {name = 0x55d251a92329 "server-id", has_arg = 1, flag = 0x0, val = 1028}, {name = 0x55d251a92333 "loadmodule", has_arg = 1, flag = 0x0, val = 1029}, {name = 0x55d251a9233e "modparam", has_arg = 1, flag = 0x0, val = 1030}, {name = 0x55d251a92347 "log-engine", has_arg = 1, flag = 0x0, val = 1031}, {name = 0x55d251a938c1 "debug", has_arg = 1, flag = 0x0, val = 1032}, {name = 0x55d251a92352 "cfg-print", has_arg = 0, flag = 0x0, val = 1033}, {name = 0x55d251a9235c "atexit", has_arg = 1, flag = 0x0, val = 1034}, {name = 0x55d251a92363 "all-errors", has_arg = 0, flag = 0x0, val = 1035}, {name = 0x0, has_arg = 0, flag = 0x0, val = 0}} __func__ = "main" [Inferior 1 (process 1782236) detached] ---end 1782236 -------------------------------------------------------
Does anybody have an idea what this could be? I never saw that with previous versions, and now we can see it on several instances, but we are yet unable to reproduce.
Not sure if this might be related, but we also see many messages like `Mar 17 13:39:03 ttel /usr/sbin/kamailio[2882127]: CRITICAL: <core> [core/tcp_main.c:5558]: tcp_timer_check_connections(): message processing timeout on connection id: 402 (state: 0) - closing`
Best regards
Mathias