Hi kamailios,
i have a creepy situation with v5.2.1 stable Kamilio. After a day or so, Kamailio stop to process incoming SIP traffic via TCP. The incoming TCP network packages get TCP-ACK from the OS (Debian 9, 4.18.0-15-generic-Linux) but Kamailio does not show any processing for the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via UDP is working just totally fine.
When i look via command "netstat -ntp" is see, that the Recv-Q get bigger and bigger. e.g.:
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 ESTABLISHED 31347/kamailio
After Kamailio restart, all is working fine again for a day. We have maybe 10-20 devices online via TCP and low call volume (1-2 call per minute). The only settings for tcp we have is "tcp_delayed_ack=no"
How to could we debug this situation? Again, no error, no warings in the log. Just nothing.
Kristijan
Hi,
I experienced something similar on Debian Stretch, nevertheless on Debian Jessie it worked fine. We use TLS and I was thinking that it is something to do with SSL libraries, but never had chance to find out. But maybe my problem was nothing to do with what you just described.
Jurijs
On Wed, Feb 27, 2019 at 12:54 PM Kristijan Vrban vrban.lkml@gmail.com wrote:
Hi kamailios,
i have a creepy situation with v5.2.1 stable Kamilio. After a day or so, Kamailio stop to process incoming SIP traffic via TCP. The incoming TCP network packages get TCP-ACK from the OS (Debian 9, 4.18.0-15-generic-Linux) but Kamailio does not show any processing for the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via UDP is working just totally fine.
When i look via command "netstat -ntp" is see, that the Recv-Q get bigger and bigger. e.g.:
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 ESTABLISHED 31347/kamailio
After Kamailio restart, all is working fine again for a day. We have maybe 10-20 devices online via TCP and low call volume (1-2 call per minute). The only settings for tcp we have is "tcp_delayed_ack=no"
How to could we debug this situation? Again, no error, no warings in the log. Just nothing.
Kristijan
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
when is strace to the kamailio process that is attached to the tcp port. it get sporadic this:
[], 46, 5000) = 0 epoll_wait(17, [{EPOLLIN, {u32=2692971064, u64=139924137540152}}], 46, 5000) = 1 accept(14, {sa_family=AF_INET, sin_port=htons(59766), sin_addr=inet_addr("xxx.xx.xxx.xxx")}, [28->16]) = 275 fcntl(275, F_GETFL) = 0x2 (flags O_RDWR) fcntl(275, F_SETFL, O_RDWR|O_NONBLOCK) = 0 epoll_ctl(17, EPOLL_CTL_ADD, 275, {EPOLLIN|EPOLLRDHUP, {u32=2692977328, u64=139924137546416}}) = 0 epoll_wait(17, [{EPOLLIN, {u32=2692977328, u64=139924137546416}}], 47, 5000) = 1 epoll_ctl(17, EPOLL_CTL_DEL, 275, 0x7ffdae44ee4c) = 0 recvmsg(53, {msg_namelen=0}, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable) recvfrom(56, 0x7ffdae44ed90, 16, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) sendmsg(56, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\210ku\230B\177\0\0", iov_len=8}], msg_iovlen=1, msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, cmsg_data=[275]}], msg_controllen=20, msg_flags=0}, 0) = 8 epoll_wait(17,
But that's all, no further processing by kamailio.
Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
Hi kamailios,
i have a creepy situation with v5.2.1 stable Kamilio. After a day or so, Kamailio stop to process incoming SIP traffic via TCP. The incoming TCP network packages get TCP-ACK from the OS (Debian 9, 4.18.0-15-generic-Linux) but Kamailio does not show any processing for the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via UDP is working just totally fine.
When i look via command "netstat -ntp" is see, that the Recv-Q get bigger and bigger. e.g.:
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 ESTABLISHED 31347/kamailio
After Kamailio restart, all is working fine again for a day. We have maybe 10-20 devices online via TCP and low call volume (1-2 call per minute). The only settings for tcp we have is "tcp_delayed_ack=no"
How to could we debug this situation? Again, no error, no warings in the log. Just nothing.
Kristijan
Hi,
Just to add that in my case I had a problem when after some period of time with a lot of TLS clients(100k+) I got a lot of TCP connections in CLOSE_WAIT state. When connections in CLOSE_WAIT state hit more then 1k, then kamailio stopped to receive traffic via TLS, nevertheless UDP at same time worked fine. From my point of view it looked like there was issue somewhere on Linux side, cause Kamailio never got anything... At least this is what I remember... I still plan to work on it someday. :) And if I will find out, I'll let you know.
Jurijs
On Wed, Feb 27, 2019 at 1:13 PM Kristijan Vrban vrban.lkml@gmail.com wrote:
when is strace to the kamailio process that is attached to the tcp port. it get sporadic this:
[], 46, 5000) = 0 epoll_wait(17, [{EPOLLIN, {u32=2692971064, u64=139924137540152}}], 46, 5000) = 1 accept(14, {sa_family=AF_INET, sin_port=htons(59766), sin_addr=inet_addr("xxx.xx.xxx.xxx")}, [28->16]) = 275 fcntl(275, F_GETFL) = 0x2 (flags O_RDWR) fcntl(275, F_SETFL, O_RDWR|O_NONBLOCK) = 0 epoll_ctl(17, EPOLL_CTL_ADD, 275, {EPOLLIN|EPOLLRDHUP, {u32=2692977328, u64=139924137546416}}) = 0 epoll_wait(17, [{EPOLLIN, {u32=2692977328, u64=139924137546416}}], 47, 5000) = 1 epoll_ctl(17, EPOLL_CTL_DEL, 275, 0x7ffdae44ee4c) = 0 recvmsg(53, {msg_namelen=0}, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable) recvfrom(56, 0x7ffdae44ed90, 16, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) sendmsg(56, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\210ku\230B\177\0\0", iov_len=8}], msg_iovlen=1, msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, cmsg_data=[275]}], msg_controllen=20, msg_flags=0}, 0) = 8 epoll_wait(17,
But that's all, no further processing by kamailio.
Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
Hi kamailios,
i have a creepy situation with v5.2.1 stable Kamilio. After a day or so, Kamailio stop to process incoming SIP traffic via TCP. The incoming TCP network packages get TCP-ACK from the OS (Debian 9, 4.18.0-15-generic-Linux) but Kamailio does not show any processing for the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via UDP is working just totally fine.
When i look via command "netstat -ntp" is see, that the Recv-Q get bigger and bigger. e.g.:
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 ESTABLISHED 31347/kamailio
After Kamailio restart, all is working fine again for a day. We have maybe 10-20 devices online via TCP and low call volume (1-2 call per minute). The only settings for tcp we have is "tcp_delayed_ack=no"
How to could we debug this situation? Again, no error, no warings in the log. Just nothing.
Kristijan
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Hello,
I believe this issue is related - https://github.com/kamailio/kamailio/issues/1172. We encountered the problem before and the solution is to link kamailio-tls-modules with libssl1.0.X instead of libssl1.1.
On 27/02/2019 13:23, Jurijs Ivolga wrote:
Hi,
Just to add that in my case I had a problem when after some period of time with a lot of TLS clients(100k+) I got a lot of TCP connections in CLOSE_WAIT state. When connections in CLOSE_WAIT state hit more then 1k, then kamailio stopped to receive traffic via TLS, nevertheless UDP at same time worked fine. From my point of view it looked like there was issue somewhere on Linux side, cause Kamailio never got anything... At least this is what I remember... I still plan to work on it someday. :) And if I will find out, I'll let you know.
Jurijs
On Wed, Feb 27, 2019 at 1:13 PM Kristijan Vrban <vrban.lkml@gmail.com mailto:vrban.lkml@gmail.com> wrote:
when is strace to the kamailio process that is attached to the tcp port. it get sporadic this: [], 46, 5000) = 0 epoll_wait(17, [{EPOLLIN, {u32=2692971064, u64=139924137540152}}], 46, 5000) = 1 accept(14, {sa_family=AF_INET, sin_port=htons(59766), sin_addr=inet_addr("xxx.xx.xxx.xxx")}, [28->16]) = 275 fcntl(275, F_GETFL) = 0x2 (flags O_RDWR) fcntl(275, F_SETFL, O_RDWR|O_NONBLOCK) = 0 epoll_ctl(17, EPOLL_CTL_ADD, 275, {EPOLLIN|EPOLLRDHUP, {u32=2692977328, u64=139924137546416}}) = 0 epoll_wait(17, [{EPOLLIN, {u32=2692977328, u64=139924137546416}}], 47, 5000) = 1 epoll_ctl(17, EPOLL_CTL_DEL, 275, 0x7ffdae44ee4c) = 0 recvmsg(53, {msg_namelen=0}, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable) recvfrom(56, 0x7ffdae44ed90, 16, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) sendmsg(56, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\210ku\230B\177\0\0", iov_len=8}], msg_iovlen=1, msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, cmsg_data=[275]}], msg_controllen=20, msg_flags=0}, 0) = 8 epoll_wait(17, But that's all, no further processing by kamailio. Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban <vrban.lkml@gmail.com <mailto:vrban.lkml@gmail.com>>: > > Hi kamailios, > > i have a creepy situation with v5.2.1 stable Kamilio. After a day or > so, Kamailio stop to process incoming SIP traffic via TCP. The > incoming TCP network packages get TCP-ACK from the OS (Debian 9, > 4.18.0-15-generic-Linux) but Kamailio does not show any processing for > the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via > UDP is working just totally fine. > > When i look via command "netstat -ntp" is see, that the Recv-Q get > bigger and bigger. e.g.: > > Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program > name tcp 4566 0 172.17.217.12:5060 <http://172.17.217.12:5060> xxx.xxx.xxx.xxx:57252 ESTABLISHED > 31347/kamailio > > After Kamailio restart, all is working fine again for a day. We have > maybe 10-20 devices online via TCP and low call volume (1-2 call per > minute). The only settings for tcp we have is "tcp_delayed_ack=no" > > How to could we debug this situation? Again, no error, no warings in > the log. Just nothing. > > Kristijan _______________________________________________ Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org <mailto:sr-users@lists.kamailio.org> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
I think need to increase LimitNOFILE in systemd file https://www.freedesktop.org/software/systemd/man/systemd.exec.html
[Service] LimitNOFILE=999999 .....
Sergey
ср, 27 февр. 2019 г., 14:27 Ivaylo Markov ivo@schupen.net:
Hello,
I believe this issue is related - https://github.com/kamailio/kamailio/issues/1172. We encountered the problem before and the solution is to link kamailio-tls-modules with libssl1.0.X instead of libssl1.1. On 27/02/2019 13:23, Jurijs Ivolga wrote:
Hi,
Just to add that in my case I had a problem when after some period of time with a lot of TLS clients(100k+) I got a lot of TCP connections in CLOSE_WAIT state. When connections in CLOSE_WAIT state hit more then 1k, then kamailio stopped to receive traffic via TLS, nevertheless UDP at same time worked fine. From my point of view it looked like there was issue somewhere on Linux side, cause Kamailio never got anything... At least this is what I remember... I still plan to work on it someday. :) And if I will find out, I'll let you know.
Jurijs
On Wed, Feb 27, 2019 at 1:13 PM Kristijan Vrban vrban.lkml@gmail.com wrote:
when is strace to the kamailio process that is attached to the tcp port. it get sporadic this:
[], 46, 5000) = 0 epoll_wait(17, [{EPOLLIN, {u32=2692971064 <(269)%20297-1064>, u64=139924137540152}}], 46, 5000) = 1 accept(14, {sa_family=AF_INET, sin_port=htons(59766), sin_addr=inet_addr("xxx.xx.xxx.xxx")}, [28->16]) = 275 fcntl(275, F_GETFL) = 0x2 (flags O_RDWR) fcntl(275, F_SETFL, O_RDWR|O_NONBLOCK) = 0 epoll_ctl(17, EPOLL_CTL_ADD, 275, {EPOLLIN|EPOLLRDHUP, {u32=2692977328 <(269)%20297-7328>, u64=139924137546416}}) = 0 epoll_wait(17, [{EPOLLIN, {u32=2692977328 <(269)%20297-7328>, u64=139924137546416}}], 47, 5000) = 1 epoll_ctl(17, EPOLL_CTL_DEL, 275, 0x7ffdae44ee4c) = 0 recvmsg(53, {msg_namelen=0}, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable) recvfrom(56, 0x7ffdae44ed90, 16, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) sendmsg(56, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\210ku\230B\177\0\0", iov_len=8}], msg_iovlen=1, msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, cmsg_data=[275]}], msg_controllen=20, msg_flags=0}, 0) = 8 epoll_wait(17,
But that's all, no further processing by kamailio.
Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
Hi kamailios,
i have a creepy situation with v5.2.1 stable Kamilio. After a day or so, Kamailio stop to process incoming SIP traffic via TCP. The incoming TCP network packages get TCP-ACK from the OS (Debian 9, 4.18.0-15-generic-Linux) but Kamailio does not show any processing for the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via UDP is working just totally fine.
When i look via command "netstat -ntp" is see, that the Recv-Q get bigger and bigger. e.g.:
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 ESTABLISHED 31347/kamailio
After Kamailio restart, all is working fine again for a day. We have maybe 10-20 devices online via TCP and low call volume (1-2 call per minute). The only settings for tcp we have is "tcp_delayed_ack=no"
How to could we debug this situation? Again, no error, no warings in the log. Just nothing.
Kristijan
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Kamailio (SER) - Users Mailing Listsr-users@lists.kamailio.orghttps://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
first of all thanks for the feedback. i prepared our system now to run with debug=3 I hope to see more then then.
Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
Hi kamailios,
i have a creepy situation with v5.2.1 stable Kamilio. After a day or so, Kamailio stop to process incoming SIP traffic via TCP. The incoming TCP network packages get TCP-ACK from the OS (Debian 9, 4.18.0-15-generic-Linux) but Kamailio does not show any processing for the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via UDP is working just totally fine.
When i look via command "netstat -ntp" is see, that the Recv-Q get bigger and bigger. e.g.:
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 ESTABLISHED 31347/kamailio
After Kamailio restart, all is working fine again for a day. We have maybe 10-20 devices online via TCP and low call volume (1-2 call per minute). The only settings for tcp we have is "tcp_delayed_ack=no"
How to could we debug this situation? Again, no error, no warings in the log. Just nothing.
Kristijan
Hi, with full debug is see this in log for every incoming TCP SIP request:
Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp receiver, connection passed to the least busy one (105) Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker 2 27(17937) for activity on [tls:172.17.217.10:5061], 0x7fdaeda8f928
So the Kamailio TCP process is working, and received TCP traffic. But the tcp workers are somehow busy.
When i attach via strace to the TCP worker, i do not see any activity. Just:
futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL
and nothing, even when i see the main tcp process choose this worker process.
Kristijan
Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
first of all thanks for the feedback. i prepared our system now to run with debug=3 I hope to see more then then.
Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
Hi kamailios,
i have a creepy situation with v5.2.1 stable Kamilio. After a day or so, Kamailio stop to process incoming SIP traffic via TCP. The incoming TCP network packages get TCP-ACK from the OS (Debian 9, 4.18.0-15-generic-Linux) but Kamailio does not show any processing for the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via UDP is working just totally fine.
When i look via command "netstat -ntp" is see, that the Recv-Q get bigger and bigger. e.g.:
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 ESTABLISHED 31347/kamailio
After Kamailio restart, all is working fine again for a day. We have maybe 10-20 devices online via TCP and low call volume (1-2 call per minute). The only settings for tcp we have is "tcp_delayed_ack=no"
How to could we debug this situation? Again, no error, no warings in the log. Just nothing.
Kristijan
When i attach via gdb to one of the tcp worker, i see this:
(gdb) bt #0 0x00007fdaf4d14470 in futex_wait (private=<optimized out>, expected=1, futex_word=0x7fdaeca92f8c) at ../sysdeps/unix/sysv/linux/futex-internal.h:61 #1 futex_wait_simple (private=<optimized out>, expected=1, futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135 #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at pthread_rwlock_wrlock.c:67 #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #4 0x00007fdaf08e1c08 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #5 0x00007fdaf08a6f69 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #7 0x00007fdaf0c31144 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #8 0x00007fdaf0c2bddb in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #9 0x00007fdaf0c22858 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #10 0x00007fdaf0c1af61 in SSL_do_handshake () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, error=0x7ffffe2a2df0) at tls_server.c:422 #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, flags=0x7ffffe2c318c) at tls_server.c:1116 #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98, read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at core/tcp_read.c:1496 #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1, idx=-1) at core/tcp_read.c:1862 #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0 <io_w>, t=2, repeat=0) at core/io_wait.h:1065 #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at core/tcp_read.c:1974 #18 0x0000556ead4c8e24 in tcp_init_children () at core/tcp_main.c:4853 #19 0x0000556ead3c352a in main_loop () at main.c:1735 #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at main.c:2675
Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
Hi, with full debug is see this in log for every incoming TCP SIP request:
Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp receiver, connection passed to the least busy one (105) Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker 2 27(17937) for activity on [tls:172.17.217.10:5061], 0x7fdaeda8f928
So the Kamailio TCP process is working, and received TCP traffic. But the tcp workers are somehow busy.
When i attach via strace to the TCP worker, i do not see any activity. Just:
futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL
and nothing, even when i see the main tcp process choose this worker process.
Kristijan
Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
first of all thanks for the feedback. i prepared our system now to run with debug=3 I hope to see more then then.
Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
Hi kamailios,
i have a creepy situation with v5.2.1 stable Kamilio. After a day or so, Kamailio stop to process incoming SIP traffic via TCP. The incoming TCP network packages get TCP-ACK from the OS (Debian 9, 4.18.0-15-generic-Linux) but Kamailio does not show any processing for the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via UDP is working just totally fine.
When i look via command "netstat -ntp" is see, that the Recv-Q get bigger and bigger. e.g.:
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 ESTABLISHED 31347/kamailio
After Kamailio restart, all is working fine again for a day. We have maybe 10-20 devices online via TCP and low call volume (1-2 call per minute). The only settings for tcp we have is "tcp_delayed_ack=no"
How to could we debug this situation? Again, no error, no warings in the log. Just nothing.
Kristijan
Can you get file written by `kamctl trap`? It should have the backtrace for all kamailio processes. You need latest kamailio 5.2.
Also, get the output for: kamctl ps
Cheers, Daniel
On 14.03.19 13:52, Kristijan Vrban wrote:
When i attach via gdb to one of the tcp worker, i see this:
(gdb) bt #0 0x00007fdaf4d14470 in futex_wait (private=<optimized out>, expected=1, futex_word=0x7fdaeca92f8c) at ../sysdeps/unix/sysv/linux/futex-internal.h:61 #1 futex_wait_simple (private=<optimized out>, expected=1, futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135 #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at pthread_rwlock_wrlock.c:67 #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #4 0x00007fdaf08e1c08 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #5 0x00007fdaf08a6f69 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #7 0x00007fdaf0c31144 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #8 0x00007fdaf0c2bddb in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #9 0x00007fdaf0c22858 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #10 0x00007fdaf0c1af61 in SSL_do_handshake () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, error=0x7ffffe2a2df0) at tls_server.c:422 #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, flags=0x7ffffe2c318c) at tls_server.c:1116 #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98, read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at core/tcp_read.c:1496 #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1, idx=-1) at core/tcp_read.c:1862 #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0 <io_w>, t=2, repeat=0) at core/io_wait.h:1065 #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at core/tcp_read.c:1974 #18 0x0000556ead4c8e24 in tcp_init_children () at core/tcp_main.c:4853 #19 0x0000556ead3c352a in main_loop () at main.c:1735 #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at main.c:2675
Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
Hi, with full debug is see this in log for every incoming TCP SIP request:
Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp receiver, connection passed to the least busy one (105) Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker 2 27(17937) for activity on [tls:172.17.217.10:5061], 0x7fdaeda8f928
So the Kamailio TCP process is working, and received TCP traffic. But the tcp workers are somehow busy.
When i attach via strace to the TCP worker, i do not see any activity. Just:
futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL
and nothing, even when i see the main tcp process choose this worker process.
Kristijan
Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
first of all thanks for the feedback. i prepared our system now to run with debug=3 I hope to see more then then.
Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
Hi kamailios,
i have a creepy situation with v5.2.1 stable Kamilio. After a day or so, Kamailio stop to process incoming SIP traffic via TCP. The incoming TCP network packages get TCP-ACK from the OS (Debian 9, 4.18.0-15-generic-Linux) but Kamailio does not show any processing for the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via UDP is working just totally fine.
When i look via command "netstat -ntp" is see, that the Recv-Q get bigger and bigger. e.g.:
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 ESTABLISHED 31347/kamailio
After Kamailio restart, all is working fine again for a day. We have maybe 10-20 devices online via TCP and low call volume (1-2 call per minute). The only settings for tcp we have is "tcp_delayed_ack=no"
How to could we debug this situation? Again, no error, no warings in the log. Just nothing.
Kristijan
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Hi Daniel,
for testing, i now had set: "tcp_children=1" and so far this issue did not occur ever since. So now value to provide for "kamctl trap" yet.
"kamctl ps" show this two process to handle tcp:
... }, { "IDX": 25, "PID": 71929, "DSC": "tcp receiver (generic) child=0" }, { "IDX": 26, "PID": 71933, "DSC": "tcp main process" } ...
Ok, but then is was wondering to see a TCP connection on a udp receiver child:
netstat -ntp |grep 5061
... tcp 0 0 172.17.217.10:5061 195.70.114.125:18252 ESTABLISHED 71895/kamailio ...
An pid 71895 is:
}, { "IDX": 3, "PID": 71895, "DSC": "udp receiver child=2 sock=127.0.0.1:5060" }, {
And if i look into it via "lsof -p 71895" (the udp receiver child)
... kamailio 71895 kamailio 14u sock 0,9 0t0 8856085 protocol: TCP kamailio 71895 kamailio 15u sock 0,9 0t0 8886886 protocol: TCP kamailio 71895 kamailio 16u sock 0,9 0t0 8854886 protocol: TCP kamailio 71895 kamailio 17u sock 0,9 0t0 8828915 protocol: TCP kamailio 71895 kamailio 18u unix 0x000000005f73cb91 0t0 1680314 type=DGRAM kamailio 71895 kamailio 19u IPv4 1846523 0t0 TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED) kamailio 71895 kamailio 20u sock 0,9 0t0 8887192 protocol: TCP kamailio 71895 kamailio 21u sock 0,9 0t0 8813634 protocol: TCP kamailio 71895 kamailio 22u unix 0x00000000c19bd102 0t0 1681407 type=STREAM kamailio 71895 kamailio 23u sock 0,9 0t0 8850488 protocol: TCP ...
Not only the ESTABLISHED TCP session. But also this empty sockets "protocol: TCP" What are they doing there in the udp receiver? Is that how it's supposed to be?
Kristijan
Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Can you get file written by `kamctl trap`? It should have the backtrace for all kamailio processes. You need latest kamailio 5.2.
Also, get the output for: kamctl ps
Cheers, Daniel
On 14.03.19 13:52, Kristijan Vrban wrote:
When i attach via gdb to one of the tcp worker, i see this:
(gdb) bt #0 0x00007fdaf4d14470 in futex_wait (private=<optimized out>, expected=1, futex_word=0x7fdaeca92f8c) at ../sysdeps/unix/sysv/linux/futex-internal.h:61 #1 futex_wait_simple (private=<optimized out>, expected=1, futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135 #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at pthread_rwlock_wrlock.c:67 #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #4 0x00007fdaf08e1c08 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #5 0x00007fdaf08a6f69 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #7 0x00007fdaf0c31144 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #8 0x00007fdaf0c2bddb in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #9 0x00007fdaf0c22858 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #10 0x00007fdaf0c1af61 in SSL_do_handshake () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, error=0x7ffffe2a2df0) at tls_server.c:422 #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, flags=0x7ffffe2c318c) at tls_server.c:1116 #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98, read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at core/tcp_read.c:1496 #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1, idx=-1) at core/tcp_read.c:1862 #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0 <io_w>, t=2, repeat=0) at core/io_wait.h:1065 #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at core/tcp_read.c:1974 #18 0x0000556ead4c8e24 in tcp_init_children () at core/tcp_main.c:4853 #19 0x0000556ead3c352a in main_loop () at main.c:1735 #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at main.c:2675
Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
Hi, with full debug is see this in log for every incoming TCP SIP request:
Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp receiver, connection passed to the least busy one (105) Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker 2 27(17937) for activity on [tls:172.17.217.10:5061], 0x7fdaeda8f928
So the Kamailio TCP process is working, and received TCP traffic. But the tcp workers are somehow busy.
When i attach via strace to the TCP worker, i do not see any activity. Just:
futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL
and nothing, even when i see the main tcp process choose this worker process.
Kristijan
Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
first of all thanks for the feedback. i prepared our system now to run with debug=3 I hope to see more then then.
Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
Hi kamailios,
i have a creepy situation with v5.2.1 stable Kamilio. After a day or so, Kamailio stop to process incoming SIP traffic via TCP. The incoming TCP network packages get TCP-ACK from the OS (Debian 9, 4.18.0-15-generic-Linux) but Kamailio does not show any processing for the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via UDP is working just totally fine.
When i look via command "netstat -ntp" is see, that the Recv-Q get bigger and bigger. e.g.:
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 ESTABLISHED 31347/kamailio
After Kamailio restart, all is working fine again for a day. We have maybe 10-20 devices online via TCP and low call volume (1-2 call per minute). The only settings for tcp we have is "tcp_delayed_ack=no"
How to could we debug this situation? Again, no error, no warings in the log. Just nothing.
Kristijan
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
Hello,
setting tcp_children=1 is not a god option for scallability, practically you set kamailio to process a single tcp message at one time, on high traffic, that won't work well.
Maybe try to set tcp_children to 2 or 4, that should make an eventual race appear faster.
Regarding the pid, if it is an outgoing connection, then it can be created by any worker process, including a UDP worker, if that was the one receiving the sip message over udp and sends it out via tcp.
Cheers, Daniel
On 18.03.19 10:09, Kristijan Vrban wrote:
Hi Daniel,
for testing, i now had set: "tcp_children=1" and so far this issue did not occur ever since. So now value to provide for "kamctl trap" yet.
"kamctl ps" show this two process to handle tcp:
... }, { "IDX": 25, "PID": 71929, "DSC": "tcp receiver (generic) child=0" }, { "IDX": 26, "PID": 71933, "DSC": "tcp main process" } ...
Ok, but then is was wondering to see a TCP connection on a udp receiver child:
netstat -ntp |grep 5061
... tcp 0 0 172.17.217.10:5061 195.70.114.125:18252 ESTABLISHED 71895/kamailio ...
An pid 71895 is:
}, { "IDX": 3, "PID": 71895, "DSC": "udp receiver child=2 sock=127.0.0.1:5060" }, {
And if i look into it via "lsof -p 71895" (the udp receiver child)
... kamailio 71895 kamailio 14u sock 0,9 0t0 8856085 protocol: TCP kamailio 71895 kamailio 15u sock 0,9 0t0 8886886 protocol: TCP kamailio 71895 kamailio 16u sock 0,9 0t0 8854886 protocol: TCP kamailio 71895 kamailio 17u sock 0,9 0t0 8828915 protocol: TCP kamailio 71895 kamailio 18u unix 0x000000005f73cb91 0t0 1680314 type=DGRAM kamailio 71895 kamailio 19u IPv4 1846523 0t0 TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED) kamailio 71895 kamailio 20u sock 0,9 0t0 8887192 protocol: TCP kamailio 71895 kamailio 21u sock 0,9 0t0 8813634 protocol: TCP kamailio 71895 kamailio 22u unix 0x00000000c19bd102 0t0 1681407 type=STREAM kamailio 71895 kamailio 23u sock 0,9 0t0 8850488 protocol: TCP ...
Not only the ESTABLISHED TCP session. But also this empty sockets "protocol: TCP" What are they doing there in the udp receiver? Is that how it's supposed to be?
Kristijan
Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Can you get file written by `kamctl trap`? It should have the backtrace for all kamailio processes. You need latest kamailio 5.2.
Also, get the output for: kamctl ps
Cheers, Daniel
On 14.03.19 13:52, Kristijan Vrban wrote:
When i attach via gdb to one of the tcp worker, i see this:
(gdb) bt #0 0x00007fdaf4d14470 in futex_wait (private=<optimized out>, expected=1, futex_word=0x7fdaeca92f8c) at ../sysdeps/unix/sysv/linux/futex-internal.h:61 #1 futex_wait_simple (private=<optimized out>, expected=1, futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135 #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at pthread_rwlock_wrlock.c:67 #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #4 0x00007fdaf08e1c08 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #5 0x00007fdaf08a6f69 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #7 0x00007fdaf0c31144 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #8 0x00007fdaf0c2bddb in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #9 0x00007fdaf0c22858 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #10 0x00007fdaf0c1af61 in SSL_do_handshake () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, error=0x7ffffe2a2df0) at tls_server.c:422 #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, flags=0x7ffffe2c318c) at tls_server.c:1116 #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98, read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at core/tcp_read.c:1496 #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1, idx=-1) at core/tcp_read.c:1862 #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0 <io_w>, t=2, repeat=0) at core/io_wait.h:1065 #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at core/tcp_read.c:1974 #18 0x0000556ead4c8e24 in tcp_init_children () at core/tcp_main.c:4853 #19 0x0000556ead3c352a in main_loop () at main.c:1735 #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at main.c:2675
Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
Hi, with full debug is see this in log for every incoming TCP SIP request:
Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp receiver, connection passed to the least busy one (105) Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker 2 27(17937) for activity on [tls:172.17.217.10:5061], 0x7fdaeda8f928
So the Kamailio TCP process is working, and received TCP traffic. But the tcp workers are somehow busy.
When i attach via strace to the TCP worker, i do not see any activity. Just:
futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL
and nothing, even when i see the main tcp process choose this worker process.
Kristijan
Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
first of all thanks for the feedback. i prepared our system now to run with debug=3 I hope to see more then then.
Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
Hi kamailios,
i have a creepy situation with v5.2.1 stable Kamilio. After a day or so, Kamailio stop to process incoming SIP traffic via TCP. The incoming TCP network packages get TCP-ACK from the OS (Debian 9, 4.18.0-15-generic-Linux) but Kamailio does not show any processing for the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via UDP is working just totally fine.
When i look via command "netstat -ntp" is see, that the Recv-Q get bigger and bigger. e.g.:
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 ESTABLISHED 31347/kamailio
After Kamailio restart, all is working fine again for a day. We have maybe 10-20 devices online via TCP and low call volume (1-2 call per minute). The only settings for tcp we have is "tcp_delayed_ack=no"
How to could we debug this situation? Again, no error, no warings in the log. Just nothing.
Kristijan
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
So i had again the situation. But this time, incoming udp was affected. Kamailio was sending out OPTIONS (via dispatcher module) to a group of asterisk machines but the 200 OK reply to the OPTIONS where not processed, so the dispatcher module set all asterisk to inactive, even though they replied 200 OK
Attached the output of kamctl trap during the situation. Hope there is any useful in it. Because after "kamctl trap" it was working again without kamailio restart.
Best Kristijan
Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Hello,
setting tcp_children=1 is not a god option for scallability, practically you set kamailio to process a single tcp message at one time, on high traffic, that won't work well.
Maybe try to set tcp_children to 2 or 4, that should make an eventual race appear faster.
Regarding the pid, if it is an outgoing connection, then it can be created by any worker process, including a UDP worker, if that was the one receiving the sip message over udp and sends it out via tcp.
Cheers, Daniel
On 18.03.19 10:09, Kristijan Vrban wrote:
Hi Daniel,
for testing, i now had set: "tcp_children=1" and so far this issue did not occur ever since. So now value to provide for "kamctl trap" yet.
"kamctl ps" show this two process to handle tcp:
... }, { "IDX": 25, "PID": 71929, "DSC": "tcp receiver (generic) child=0" }, { "IDX": 26, "PID": 71933, "DSC": "tcp main process" } ...
Ok, but then is was wondering to see a TCP connection on a udp receiver child:
netstat -ntp |grep 5061
... tcp 0 0 172.17.217.10:5061 195.70.114.125:18252 ESTABLISHED 71895/kamailio ...
An pid 71895 is:
}, { "IDX": 3, "PID": 71895, "DSC": "udp receiver child=2 sock=127.0.0.1:5060" }, {
And if i look into it via "lsof -p 71895" (the udp receiver child)
... kamailio 71895 kamailio 14u sock 0,9 0t0 8856085 protocol: TCP kamailio 71895 kamailio 15u sock 0,9 0t0 8886886 protocol: TCP kamailio 71895 kamailio 16u sock 0,9 0t0 8854886 protocol: TCP kamailio 71895 kamailio 17u sock 0,9 0t0 8828915 protocol: TCP kamailio 71895 kamailio 18u unix 0x000000005f73cb91 0t0 1680314 type=DGRAM kamailio 71895 kamailio 19u IPv4 1846523 0t0 TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED) kamailio 71895 kamailio 20u sock 0,9 0t0 8887192 protocol: TCP kamailio 71895 kamailio 21u sock 0,9 0t0 8813634 protocol: TCP kamailio 71895 kamailio 22u unix 0x00000000c19bd102 0t0 1681407 type=STREAM kamailio 71895 kamailio 23u sock 0,9 0t0 8850488 protocol: TCP ...
Not only the ESTABLISHED TCP session. But also this empty sockets "protocol: TCP" What are they doing there in the udp receiver? Is that how it's supposed to be?
Kristijan
Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Can you get file written by `kamctl trap`? It should have the backtrace for all kamailio processes. You need latest kamailio 5.2.
Also, get the output for: kamctl ps
Cheers, Daniel
On 14.03.19 13:52, Kristijan Vrban wrote:
When i attach via gdb to one of the tcp worker, i see this:
(gdb) bt #0 0x00007fdaf4d14470 in futex_wait (private=<optimized out>, expected=1, futex_word=0x7fdaeca92f8c) at ../sysdeps/unix/sysv/linux/futex-internal.h:61 #1 futex_wait_simple (private=<optimized out>, expected=1, futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135 #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at pthread_rwlock_wrlock.c:67 #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #4 0x00007fdaf08e1c08 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #5 0x00007fdaf08a6f69 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #7 0x00007fdaf0c31144 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #8 0x00007fdaf0c2bddb in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #9 0x00007fdaf0c22858 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #10 0x00007fdaf0c1af61 in SSL_do_handshake () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, error=0x7ffffe2a2df0) at tls_server.c:422 #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, flags=0x7ffffe2c318c) at tls_server.c:1116 #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98, read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at core/tcp_read.c:1496 #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1, idx=-1) at core/tcp_read.c:1862 #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0 <io_w>, t=2, repeat=0) at core/io_wait.h:1065 #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at core/tcp_read.c:1974 #18 0x0000556ead4c8e24 in tcp_init_children () at core/tcp_main.c:4853 #19 0x0000556ead3c352a in main_loop () at main.c:1735 #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at main.c:2675
Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
Hi, with full debug is see this in log for every incoming TCP SIP request:
Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp receiver, connection passed to the least busy one (105) Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker 2 27(17937) for activity on [tls:172.17.217.10:5061], 0x7fdaeda8f928
So the Kamailio TCP process is working, and received TCP traffic. But the tcp workers are somehow busy.
When i attach via strace to the TCP worker, i do not see any activity. Just:
futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL
and nothing, even when i see the main tcp process choose this worker process.
Kristijan
Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
first of all thanks for the feedback. i prepared our system now to run with debug=3 I hope to see more then then.
Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com: > Hi kamailios, > > i have a creepy situation with v5.2.1 stable Kamilio. After a day or > so, Kamailio stop to process incoming SIP traffic via TCP. The > incoming TCP network packages get TCP-ACK from the OS (Debian 9, > 4.18.0-15-generic-Linux) but Kamailio does not show any processing for > the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via > UDP is working just totally fine. > > When i look via command "netstat -ntp" is see, that the Recv-Q get > bigger and bigger. e.g.: > > Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program > name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 ESTABLISHED > 31347/kamailio > > After Kamailio restart, all is working fine again for a day. We have > maybe 10-20 devices online via TCP and low call volume (1-2 call per > minute). The only settings for tcp we have is "tcp_delayed_ack=no" > > How to could we debug this situation? Again, no error, no warings in > the log. Just nothing. > > Kristijan
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
Hello,
based on the trap output I think I could figure out what happened there.
You have tcp_children to very low value (1 or so), the problem is not actually that one, but the fact that the connection to upstream (the device/app sending the request) was closed after receiving the request and routing of the reply gets stuck in the way of:
- a reply is received and has to be forwarded - connection was lost, so Kamailio tries to establish a new one, but takes time till fails because the upstream is behind nat or so based on the via header:
Via: SIP/2.0/TLS 10.1.0.4:10002;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
- the reply is retransmitted and gets to another worker, which tries to forward it again, but discovers a connection structure for that destination exists (created by previous reply worker) and now waits for the connection to be released (or better said, for the mutex on writing buffer to be unlocked)
- as the second reply waits, there can be other retransmissions of the reply ending up in other workers stuck on waiting for the mutex of the connection write buffer
The solution here is to use set_reply_no_connect() -- you can put it first in request_route block. I think this would be a good addition to the default configuration file as well, IMO, the sip server should not connect for sending replies and should do it also for requests that go behind nat.
Cheers, Daniel
On 19.03.19 10:53, Kristijan Vrban wrote:
So i had again the situation. But this time, incoming udp was affected. Kamailio was sending out OPTIONS (via dispatcher module) to a group of asterisk machines but the 200 OK reply to the OPTIONS where not processed, so the dispatcher module set all asterisk to inactive, even though they replied 200 OK
Attached the output of kamctl trap during the situation. Hope there is any useful in it. Because after "kamctl trap" it was working again without kamailio restart.
Best Kristijan
Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Hello,
setting tcp_children=1 is not a god option for scallability, practically you set kamailio to process a single tcp message at one time, on high traffic, that won't work well.
Maybe try to set tcp_children to 2 or 4, that should make an eventual race appear faster.
Regarding the pid, if it is an outgoing connection, then it can be created by any worker process, including a UDP worker, if that was the one receiving the sip message over udp and sends it out via tcp.
Cheers, Daniel
On 18.03.19 10:09, Kristijan Vrban wrote:
Hi Daniel,
for testing, i now had set: "tcp_children=1" and so far this issue did not occur ever since. So now value to provide for "kamctl trap" yet.
"kamctl ps" show this two process to handle tcp:
... }, { "IDX": 25, "PID": 71929, "DSC": "tcp receiver (generic) child=0" }, { "IDX": 26, "PID": 71933, "DSC": "tcp main process" } ...
Ok, but then is was wondering to see a TCP connection on a udp receiver child:
netstat -ntp |grep 5061
... tcp 0 0 172.17.217.10:5061 195.70.114.125:18252 ESTABLISHED 71895/kamailio ...
An pid 71895 is:
}, { "IDX": 3, "PID": 71895, "DSC": "udp receiver child=2 sock=127.0.0.1:5060" }, {
And if i look into it via "lsof -p 71895" (the udp receiver child)
... kamailio 71895 kamailio 14u sock 0,9 0t0 8856085 protocol: TCP kamailio 71895 kamailio 15u sock 0,9 0t0 8886886 protocol: TCP kamailio 71895 kamailio 16u sock 0,9 0t0 8854886 protocol: TCP kamailio 71895 kamailio 17u sock 0,9 0t0 8828915 protocol: TCP kamailio 71895 kamailio 18u unix 0x000000005f73cb91 0t0 1680314 type=DGRAM kamailio 71895 kamailio 19u IPv4 1846523 0t0 TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED) kamailio 71895 kamailio 20u sock 0,9 0t0 8887192 protocol: TCP kamailio 71895 kamailio 21u sock 0,9 0t0 8813634 protocol: TCP kamailio 71895 kamailio 22u unix 0x00000000c19bd102 0t0 1681407 type=STREAM kamailio 71895 kamailio 23u sock 0,9 0t0 8850488 protocol: TCP ...
Not only the ESTABLISHED TCP session. But also this empty sockets "protocol: TCP" What are they doing there in the udp receiver? Is that how it's supposed to be?
Kristijan
Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Can you get file written by `kamctl trap`? It should have the backtrace for all kamailio processes. You need latest kamailio 5.2.
Also, get the output for: kamctl ps
Cheers, Daniel
On 14.03.19 13:52, Kristijan Vrban wrote:
When i attach via gdb to one of the tcp worker, i see this:
(gdb) bt #0 0x00007fdaf4d14470 in futex_wait (private=<optimized out>, expected=1, futex_word=0x7fdaeca92f8c) at ../sysdeps/unix/sysv/linux/futex-internal.h:61 #1 futex_wait_simple (private=<optimized out>, expected=1, futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135 #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at pthread_rwlock_wrlock.c:67 #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #4 0x00007fdaf08e1c08 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #5 0x00007fdaf08a6f69 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #7 0x00007fdaf0c31144 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #8 0x00007fdaf0c2bddb in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #9 0x00007fdaf0c22858 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #10 0x00007fdaf0c1af61 in SSL_do_handshake () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, error=0x7ffffe2a2df0) at tls_server.c:422 #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, flags=0x7ffffe2c318c) at tls_server.c:1116 #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98, read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at core/tcp_read.c:1496 #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1, idx=-1) at core/tcp_read.c:1862 #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0 <io_w>, t=2, repeat=0) at core/io_wait.h:1065 #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at core/tcp_read.c:1974 #18 0x0000556ead4c8e24 in tcp_init_children () at core/tcp_main.c:4853 #19 0x0000556ead3c352a in main_loop () at main.c:1735 #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at main.c:2675
Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
Hi, with full debug is see this in log for every incoming TCP SIP request:
Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp receiver, connection passed to the least busy one (105) Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker 2 27(17937) for activity on [tls:172.17.217.10:5061], 0x7fdaeda8f928
So the Kamailio TCP process is working, and received TCP traffic. But the tcp workers are somehow busy.
When i attach via strace to the TCP worker, i do not see any activity. Just:
futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL
and nothing, even when i see the main tcp process choose this worker process.
Kristijan
Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com: > first of all thanks for the feedback. i prepared our system now to run > with debug=3 > I hope to see more then then. > > Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban > vrban.lkml@gmail.com: >> Hi kamailios, >> >> i have a creepy situation with v5.2.1 stable Kamilio. After a day or >> so, Kamailio stop to process incoming SIP traffic via TCP. The >> incoming TCP network packages get TCP-ACK from the OS (Debian 9, >> 4.18.0-15-generic-Linux) but Kamailio does not show any processing for >> the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via >> UDP is working just totally fine. >> >> When i look via command "netstat -ntp" is see, that the Recv-Q get >> bigger and bigger. e.g.: >> >> Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program >> name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 ESTABLISHED >> 31347/kamailio >> >> After Kamailio restart, all is working fine again for a day. We have >> maybe 10-20 devices online via TCP and low call volume (1-2 call per >> minute). The only settings for tcp we have is "tcp_delayed_ack=no" >> >> How to could we debug this situation? Again, no error, no warings in >> the log. Just nothing. >> >> Kristijan
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
Hi List,
I want to share that I also met this issue last week with my kamailio 5.2.2.
As far as I was able to see, SIP application were able to "connect()" with TCP, but my logs wasn't reporting any of the SIP message received with TCP.
I have an pike right before an xlog showing every incoming request. However I suspect the issue was not related to pike module. The log didn't showed unusual number of blocked traffic.
I'm almost sure I haven't reached any ulimit restrictions. I have many TCP, UDP childreen... Server was not under high load Nothing unusual.
I'm running the default build for debian stretch from here: http://deb.kamailio.org/kamailio52 stretch
And unfortunatly, I had some tiny pressure to restart the service so I was not able to get deeper into the issue.
If I'm correct, I will certainly improve much things by using "set_reply_no_connect()". I have added it and restarted! (Tks Daniel for this tip!)
I have been looking at issue reported here: "Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting." https://github.com/kamailio/kamailio/issues/1172
I have to say that I do have libssl1.1. And I do have crash when I restart my kamailio. (even when I simply restart after a configuration modification)
Mar 21 18:28:50 sip kamailio[19222]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 21 18:28:50 sip kamailio[19175]: ERROR: ctl [ctl.c:390]: mod_destroy(): ERROR: ctl: could not delete unix socket /var/run/kamailio/kamailio_ctl: Permission denied (13)
[... one minute without nothing...]
Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs [jsonrpcs_fifo.c:599]: jsonrpc_fifo_destroy(): FIFO stat failed: Permission denied Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs [jsonrpcs_sock.c:516]: jsonrpc_dgram_destroy(): socket stat failed: Permission denied Mar 21 18:29:50 sip kamailio[19175]: CRITICAL: <core> [main.c:662]: sig_alarm_abort(): shutdown timeout triggered, dying...
As the 1172 issue is closed, should I expect kamailio to still have trouble with libssl1.1?
I just restarted again my service (to see if it restart better after 30 minutes only instead of a week)
Mar 21 19:07:30 sip kamailio[28737]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 21 19:07:31 sip kamailio[28671]: ERROR: ctl [ctl.c:390]: mod_destroy(): ERROR: ctl: could not delete unix socket /var/run/kamailio/kamailio_ctl: Permission denied (13)
[... one minute without nothing...]
Mar 21 19:08:30 sip kamailio[28671]: CRITICAL: <core> [main.c:662]: sig_alarm_abort(): shutdown timeout triggered, dying...
Still not able to restart in a clean way! Tks! Regards Aymeric
Le mer. 20 mars 2019 à 15:08, Daniel-Constantin Mierla miconda@gmail.com a écrit :
Hello,
based on the trap output I think I could figure out what happened there.
You have tcp_children to very low value (1 or so), the problem is not actually that one, but the fact that the connection to upstream (the device/app sending the request) was closed after receiving the request and routing of the reply gets stuck in the way of:
- a reply is received and has to be forwarded
- connection was lost, so Kamailio tries to establish a new one, but
takes time till fails because the upstream is behind nat or so based on the via header:
Via: SIP/2.0/TLS 10.1.0.4:10002 ;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
- the reply is retransmitted and gets to another worker, which tries
to forward it again, but discovers a connection structure for that destination exists (created by previous reply worker) and now waits for the connection to be released (or better said, for the mutex on writing buffer to be unlocked)
- as the second reply waits, there can be other retransmissions of the
reply ending up in other workers stuck on waiting for the mutex of the connection write buffer
The solution here is to use set_reply_no_connect() -- you can put it first in request_route block. I think this would be a good addition to the default configuration file as well, IMO, the sip server should not connect for sending replies and should do it also for requests that go behind nat.
Cheers, Daniel
On 19.03.19 10:53, Kristijan Vrban wrote:
So i had again the situation. But this time, incoming udp was affected. Kamailio was sending out OPTIONS (via dispatcher module) to a group of asterisk machines but the 200 OK reply to the OPTIONS where not processed, so the dispatcher module set all asterisk to inactive, even though they replied 200 OK
Attached the output of kamctl trap during the situation. Hope there is any useful in it. Because after "kamctl trap" it was working again without kamailio restart.
Best Kristijan
Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Hello,
setting tcp_children=1 is not a god option for scallability, practically you set kamailio to process a single tcp message at one time, on high traffic, that won't work well.
Maybe try to set tcp_children to 2 or 4, that should make an eventual race appear faster.
Regarding the pid, if it is an outgoing connection, then it can be created by any worker process, including a UDP worker, if that was the one receiving the sip message over udp and sends it out via tcp.
Cheers, Daniel
On 18.03.19 10:09, Kristijan Vrban wrote:
Hi Daniel,
for testing, i now had set: "tcp_children=1" and so far this issue did
not occur
ever since. So now value to provide for "kamctl trap" yet.
"kamctl ps" show this two process to handle tcp:
... }, { "IDX": 25, "PID": 71929, "DSC": "tcp receiver (generic) child=0" }, { "IDX": 26, "PID": 71933, "DSC": "tcp main process" } ...
Ok, but then is was wondering to see a TCP connection on a udp
receiver child:
netstat -ntp |grep 5061
... tcp 0 0 172.17.217.10:5061 195.70.114.125:18252 ESTABLISHED 71895/kamailio ...
An pid 71895 is:
}, { "IDX": 3, "PID": 71895, "DSC": "udp receiver child=2 sock=127.0.0.1:5060" }, {
And if i look into it via "lsof -p 71895" (the udp receiver child)
... kamailio 71895 kamailio 14u sock 0,9 0t0 8856085 protocol: TCP kamailio 71895 kamailio 15u sock 0,9 0t0 8886886 protocol: TCP kamailio 71895 kamailio 16u sock 0,9 0t0 8854886 protocol: TCP kamailio 71895 kamailio 17u sock 0,9 0t0 8828915 protocol: TCP kamailio 71895 kamailio 18u unix 0x000000005f73cb91 0t0 1680314 type=DGRAM kamailio 71895 kamailio 19u IPv4 1846523 0t0 TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED) kamailio 71895 kamailio 20u sock 0,9 0t0 8887192 protocol: TCP kamailio 71895 kamailio 21u sock 0,9 0t0 8813634 protocol: TCP kamailio 71895 kamailio 22u unix 0x00000000c19bd102 0t0 1681407 type=STREAM kamailio 71895 kamailio 23u sock 0,9 0t0 8850488 protocol: TCP ...
Not only the ESTABLISHED TCP session. But also this empty sockets "protocol: TCP" What are they doing there in the udp receiver? Is that how it's
supposed to be?
Kristijan
Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Can you get file written by `kamctl trap`? It should have the
backtrace
for all kamailio processes. You need latest kamailio 5.2.
Also, get the output for: kamctl ps
Cheers, Daniel
On 14.03.19 13:52, Kristijan Vrban wrote:
When i attach via gdb to one of the tcp worker, i see this:
(gdb) bt #0 0x00007fdaf4d14470 in futex_wait (private=<optimized out>, expected=1, futex_word=0x7fdaeca92f8c) at ../sysdeps/unix/sysv/linux/futex-internal.h:61 #1 futex_wait_simple (private=<optimized out>, expected=1, futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135 #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at pthread_rwlock_wrlock.c:67 #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #4 0x00007fdaf08e1c08 in ?? () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
#5 0x00007fdaf08a6f69 in ?? () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
#6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #7 0x00007fdaf0c31144 in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
#8 0x00007fdaf0c2bddb in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
#9 0x00007fdaf0c22858 in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
#10 0x00007fdaf0c1af61 in SSL_do_handshake () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, error=0x7ffffe2a2df0) at tls_server.c:422 #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, flags=0x7ffffe2c318c) at tls_server.c:1116 #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98, read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at core/tcp_read.c:1496 #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1, idx=-1) at core/tcp_read.c:1862 #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0
<io_w>,
t=2, repeat=0) at core/io_wait.h:1065 #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at core/tcp_read.c:1974 #18 0x0000556ead4c8e24 in tcp_init_children () at
core/tcp_main.c:4853
#19 0x0000556ead3c352a in main_loop () at main.c:1735 #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at
main.c:2675
Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com: > Hi, with full debug is see this in log for every incoming TCP SIP
request:
> > Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: > <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp > receiver, connection passed to the least busy one (105) > Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: > <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker 2 > 27(17937) for activity on [tls:172.17.217.10:5061], 0x7fdaeda8f928 > > So the Kamailio TCP process is working, and received TCP traffic.
But
> the tcp workers are somehow busy. > > When i attach via strace to the TCP worker, i do not see any
activity. Just:
> > futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL > > and nothing, even when i see the main tcp process choose this
worker process.
> > Kristijan > > Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban > vrban.lkml@gmail.com: >> first of all thanks for the feedback. i prepared our system now to
run
>> with debug=3 >> I hope to see more then then. >> >> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban >> vrban.lkml@gmail.com: >>> Hi kamailios, >>> >>> i have a creepy situation with v5.2.1 stable Kamilio. After a day
or
>>> so, Kamailio stop to process incoming SIP traffic via TCP. The >>> incoming TCP network packages get TCP-ACK from the OS (Debian 9, >>> 4.18.0-15-generic-Linux) but Kamailio does not show any
processing for
>>> the SIP-Traffic incoming via TCP. No logs, nothing. While traffic
via
>>> UDP is working just totally fine. >>> >>> When i look via command "netstat -ntp" is see, that the Recv-Q get >>> bigger and bigger. e.g.: >>> >>> Proto Recv-Q Send-Q Local Address Foreign Address State
PID/Program
>>> name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252
ESTABLISHED
>>> 31347/kamailio >>> >>> After Kamailio restart, all is working fine again for a day. We
have
>>> maybe 10-20 devices online via TCP and low call volume (1-2 call
per
>>> minute). The only settings for tcp we have is "tcp_delayed_ack=no" >>> >>> How to could we debug this situation? Again, no error, no warings
in
>>> the log. Just nothing. >>> >>> Kristijan _______________________________________________ Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA
-- www.asipto.com
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA --
www.asipto.com
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Do you have pure tcp traffic and facing this issue, or there are actually tls connections?
What are the values for core parameters related to tcp connect and tcp send timeouts?
As for restart taking long, see exit_timeout parameter:
* https://www.kamailio.org/wiki/cookbooks/5.2.x/core#exit_timeout
As for tls with libssl1.1/libcrypto1.1, I think I discover what the issue is. With v1.1 they use their own internal locking functions, not exposing any api to set them from outside. Before, kamailio was initializing the libray telling to use Kamailio locks, giving one lock per connection. As i could get from some gdb traces I received, with libssl 1.1, the same internal lock is used for when attempting to connect to different addresses as well as when trying to write to different connections. If one operation is slow for what so ever reason, the others are waiting for the lock to be lifted by the slow operation. I am digging in the source code of libssl1.1 to figure out a solution, it can still take a bit because I am travelling for several days with no much spare time.
Among the tunnings would be lower timeouts to connect and send, do not attempt to connect unless you are sure the target expects new connections (e.g., sending to a gateway/sip server accepting traffic via tls, but don't do it even for the requests routed via lookup(location) as the registration is using a connection with an ephemeral source port and trying to connect back to it will fail). If still a major issue for what so ever reason, using a version compiled with libssl1.0 would be something to go for it.
Cheers, Daniel
On 21.03.19 19:17, Aymeric Moizard wrote:
Hi List,
I want to share that I also met this issue last week with my kamailio 5.2.2.
As far as I was able to see, SIP application were able to "connect()" with TCP, but my logs wasn't reporting any of the SIP message received with TCP.
I have an pike right before an xlog showing every incoming request. However I suspect the issue was not related to pike module. The log didn't showed unusual number of blocked traffic.
I'm almost sure I haven't reached any ulimit restrictions. I have many TCP, UDP childreen... Server was not under high load Nothing unusual.
I'm running the default build for debian stretch from here: http://deb.kamailio.org/kamailio52 stretch
And unfortunatly, I had some tiny pressure to restart the service so I was not able to get deeper into the issue.
If I'm correct, I will certainly improve much things by using "set_reply_no_connect()". I have added it and restarted! (Tks Daniel for this tip!)
I have been looking at issue reported here: "Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting." https://github.com/kamailio/kamailio/issues/1172
I have to say that I do have libssl1.1. And I do have crash when I restart my kamailio. (even when I simply restart after a configuration modification)
Mar 21 18:28:50 sip kamailio[19222]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 21 18:28:50 sip kamailio[19175]: ERROR: ctl [ctl.c:390]: mod_destroy(): ERROR: ctl: could not delete unix socket /var/run/kamailio/kamailio_ctl: Permission denied (13)
[... one minute without nothing...]
Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs [jsonrpcs_fifo.c:599]: jsonrpc_fifo_destroy(): FIFO stat failed: Permission denied Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs [jsonrpcs_sock.c:516]: jsonrpc_dgram_destroy(): socket stat failed: Permission denied Mar 21 18:29:50 sip kamailio[19175]: CRITICAL: <core> [main.c:662]: sig_alarm_abort(): shutdown timeout triggered, dying...
As the 1172 issue is closed, should I expect kamailio to still have trouble with libssl1.1?
I just restarted again my service (to see if it restart better after 30 minutes only instead of a week)
Mar 21 19:07:30 sip kamailio[28737]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 21 19:07:31 sip kamailio[28671]: ERROR: ctl [ctl.c:390]: mod_destroy(): ERROR: ctl: could not delete unix socket /var/run/kamailio/kamailio_ctl: Permission denied (13)
[... one minute without nothing...]
Mar 21 19:08:30 sip kamailio[28671]: CRITICAL: <core> [main.c:662]: sig_alarm_abort(): shutdown timeout triggered, dying...
Still not able to restart in a clean way! Tks! Regards Aymeric
Le mer. 20 mars 2019 à 15:08, Daniel-Constantin Mierla <miconda@gmail.com mailto:miconda@gmail.com> a écrit :
Hello, based on the trap output I think I could figure out what happened there. You have tcp_children to very low value (1 or so), the problem is not actually that one, but the fact that the connection to upstream (the device/app sending the request) was closed after receiving the request and routing of the reply gets stuck in the way of: - a reply is received and has to be forwarded - connection was lost, so Kamailio tries to establish a new one, but takes time till fails because the upstream is behind nat or so based on the via header: Via: SIP/2.0/TLS 10.1.0.4:10002;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9 - the reply is retransmitted and gets to another worker, which tries to forward it again, but discovers a connection structure for that destination exists (created by previous reply worker) and now waits for the connection to be released (or better said, for the mutex on writing buffer to be unlocked) - as the second reply waits, there can be other retransmissions of the reply ending up in other workers stuck on waiting for the mutex of the connection write buffer The solution here is to use set_reply_no_connect() -- you can put it first in request_route block. I think this would be a good addition to the default configuration file as well, IMO, the sip server should not connect for sending replies and should do it also for requests that go behind nat. Cheers, Daniel On 19.03.19 10:53, Kristijan Vrban wrote: > So i had again the situation. But this time, incoming udp was > affected. Kamailio was sending out OPTIONS (via dispatcher module) to > a group of asterisk machines > but the 200 OK reply to the OPTIONS where not processed, so the > dispatcher module set all asterisk to inactive, even though they > replied 200 OK > > Attached the output of kamctl trap during the situation. Hope there is > any useful in it. Because after "kamctl trap" it was working again > without kamailio restart. > > Best > Kristijan > > Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla > <miconda@gmail.com <mailto:miconda@gmail.com>>: >> Hello, >> >> setting tcp_children=1 is not a god option for scallability, practically >> you set kamailio to process a single tcp message at one time, on high >> traffic, that won't work well. >> >> Maybe try to set tcp_children to 2 or 4, that should make an eventual >> race appear faster. >> >> Regarding the pid, if it is an outgoing connection, then it can be >> created by any worker process, including a UDP worker, if that was the >> one receiving the sip message over udp and sends it out via tcp. >> >> Cheers, >> Daniel >> >> On 18.03.19 10:09, Kristijan Vrban wrote: >>> Hi Daniel, >>> >>> for testing, i now had set: "tcp_children=1" and so far this issue did not occur >>> ever since. So now value to provide for "kamctl trap" yet. >>> >>> "kamctl ps" show this two process to handle tcp: >>> >>> ... >>> }, { >>> "IDX": 25, >>> "PID": 71929, >>> "DSC": "tcp receiver (generic) child=0" >>> }, { >>> "IDX": 26, >>> "PID": 71933, >>> "DSC": "tcp main process" >>> } >>> ... >>> >>> >>> Ok, but then is was wondering to see a TCP connection on a udp receiver child: >>> >>> >>> netstat -ntp |grep 5061 >>> >>> ... >>> tcp 0 0 172.17.217.10:5061 <http://172.17.217.10:5061> 195.70.114.125:18252 <http://195.70.114.125:18252> >>> ESTABLISHED 71895/kamailio >>> ... >>> >>> An pid 71895 is: >>> >>> }, { >>> "IDX": 3, >>> "PID": 71895, >>> "DSC": "udp receiver child=2 sock=127.0.0.1:5060 <http://127.0.0.1:5060>" >>> }, { >>> >>> >>> >>> And if i look into it via "lsof -p 71895" (the udp receiver child) >>> >>> ... >>> kamailio 71895 kamailio 14u sock 0,9 0t0 >>> 8856085 protocol: TCP >>> kamailio 71895 kamailio 15u sock 0,9 0t0 >>> 8886886 protocol: TCP >>> kamailio 71895 kamailio 16u sock 0,9 0t0 >>> 8854886 protocol: TCP >>> kamailio 71895 kamailio 17u sock 0,9 0t0 >>> 8828915 protocol: TCP >>> kamailio 71895 kamailio 18u unix 0x000000005f73cb91 0t0 >>> 1680314 type=DGRAM >>> kamailio 71895 kamailio 19u IPv4 1846523 0t0 >>> TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED) >>> kamailio 71895 kamailio 20u sock 0,9 0t0 >>> 8887192 protocol: TCP >>> kamailio 71895 kamailio 21u sock 0,9 0t0 >>> 8813634 protocol: TCP >>> kamailio 71895 kamailio 22u unix 0x00000000c19bd102 0t0 >>> 1681407 type=STREAM >>> kamailio 71895 kamailio 23u sock 0,9 0t0 >>> 8850488 protocol: TCP >>> ... >>> >>> Not only the ESTABLISHED TCP session. But also this empty sockets >>> "protocol: TCP" >>> What are they doing there in the udp receiver? Is that how it's supposed to be? >>> >>> Kristijan >>> >>> Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla >>> <miconda@gmail.com <mailto:miconda@gmail.com>>: >>>> Can you get file written by `kamctl trap`? It should have the backtrace >>>> for all kamailio processes. You need latest kamailio 5.2. >>>> >>>> Also, get the output for: kamctl ps >>>> >>>> Cheers, >>>> Daniel >>>> >>>> On 14.03.19 13:52, Kristijan Vrban wrote: >>>>> When i attach via gdb to one of the tcp worker, i see this: >>>>> >>>>> (gdb) bt >>>>> #0 0x00007fdaf4d14470 in futex_wait (private=<optimized out>, >>>>> expected=1, futex_word=0x7fdaeca92f8c) at >>>>> ../sysdeps/unix/sysv/linux/futex-internal.h:61 >>>>> #1 futex_wait_simple (private=<optimized out>, expected=1, >>>>> futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135 >>>>> #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at >>>>> pthread_rwlock_wrlock.c:67 >>>>> #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from >>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>>>> #4 0x00007fdaf08e1c08 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>>>> #5 0x00007fdaf08a6f69 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>>>> #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from >>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>>>> #7 0x00007fdaf0c31144 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>>>> #8 0x00007fdaf0c2bddb in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>>>> #9 0x00007fdaf0c22858 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>>>> #10 0x00007fdaf0c1af61 in SSL_do_handshake () from >>>>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>>>> #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, >>>>> error=0x7ffffe2a2df0) at tls_server.c:422 >>>>> #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, >>>>> flags=0x7ffffe2c318c) at tls_server.c:1116 >>>>> #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98, >>>>> read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 >>>>> #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, >>>>> bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at >>>>> core/tcp_read.c:1496 >>>>> #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1, >>>>> idx=-1) at core/tcp_read.c:1862 >>>>> #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0 <io_w>, >>>>> t=2, repeat=0) at core/io_wait.h:1065 >>>>> #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at >>>>> core/tcp_read.c:1974 >>>>> #18 0x0000556ead4c8e24 in tcp_init_children () at core/tcp_main.c:4853 >>>>> #19 0x0000556ead3c352a in main_loop () at main.c:1735 >>>>> #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at main.c:2675 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban >>>>> <vrban.lkml@gmail.com <mailto:vrban.lkml@gmail.com>>: >>>>>> Hi, with full debug is see this in log for every incoming TCP SIP request: >>>>>> >>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: >>>>>> <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp >>>>>> receiver, connection passed to the least busy one (105) >>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: >>>>>> <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker 2 >>>>>> 27(17937) for activity on [tls:172.17.217.10:5061 <http://172.17.217.10:5061>], 0x7fdaeda8f928 >>>>>> >>>>>> So the Kamailio TCP process is working, and received TCP traffic. But >>>>>> the tcp workers are somehow busy. >>>>>> >>>>>> When i attach via strace to the TCP worker, i do not see any activity. Just: >>>>>> >>>>>> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL >>>>>> >>>>>> and nothing, even when i see the main tcp process choose this worker process. >>>>>> >>>>>> Kristijan >>>>>> >>>>>> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban >>>>>> <vrban.lkml@gmail.com <mailto:vrban.lkml@gmail.com>>: >>>>>>> first of all thanks for the feedback. i prepared our system now to run >>>>>>> with debug=3 >>>>>>> I hope to see more then then. >>>>>>> >>>>>>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban >>>>>>> <vrban.lkml@gmail.com <mailto:vrban.lkml@gmail.com>>: >>>>>>>> Hi kamailios, >>>>>>>> >>>>>>>> i have a creepy situation with v5.2.1 stable Kamilio. After a day or >>>>>>>> so, Kamailio stop to process incoming SIP traffic via TCP. The >>>>>>>> incoming TCP network packages get TCP-ACK from the OS (Debian 9, >>>>>>>> 4.18.0-15-generic-Linux) but Kamailio does not show any processing for >>>>>>>> the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via >>>>>>>> UDP is working just totally fine. >>>>>>>> >>>>>>>> When i look via command "netstat -ntp" is see, that the Recv-Q get >>>>>>>> bigger and bigger. e.g.: >>>>>>>> >>>>>>>> Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program >>>>>>>> name tcp 4566 0 172.17.217.12:5060 <http://172.17.217.12:5060> xxx.xxx.xxx.xxx:57252 ESTABLISHED >>>>>>>> 31347/kamailio >>>>>>>> >>>>>>>> After Kamailio restart, all is working fine again for a day. We have >>>>>>>> maybe 10-20 devices online via TCP and low call volume (1-2 call per >>>>>>>> minute). The only settings for tcp we have is "tcp_delayed_ack=no" >>>>>>>> >>>>>>>> How to could we debug this situation? Again, no error, no warings in >>>>>>>> the log. Just nothing. >>>>>>>> >>>>>>>> Kristijan >>>>> _______________________________________________ >>>>> Kamailio (SER) - Users Mailing List >>>>> sr-users@lists.kamailio.org <mailto:sr-users@lists.kamailio.org> >>>>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users >>>> -- >>>> Daniel-Constantin Mierla -- www.asipto.com <http://www.asipto.com> >>>> www.twitter.com/miconda <http://www.twitter.com/miconda> -- www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda> >>>> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com <http://www.kamailioworld.com> >>>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com <http://www.asipto.com> >>>> >>> _______________________________________________ >>> Kamailio (SER) - Users Mailing List >>> sr-users@lists.kamailio.org <mailto:sr-users@lists.kamailio.org> >>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users >> -- >> Daniel-Constantin Mierla -- www.asipto.com <http://www.asipto.com> >> www.twitter.com/miconda <http://www.twitter.com/miconda> -- www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda> >> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com <http://www.kamailioworld.com> >> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com <http://www.asipto.com> >> -- Daniel-Constantin Mierla -- www.asipto.com <http://www.asipto.com> www.twitter.com/miconda <http://www.twitter.com/miconda> -- www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com <http://www.kamailioworld.com> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com <http://www.asipto.com> _______________________________________________ Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org <mailto:sr-users@lists.kamailio.org> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- Antisip - http://www.antisip.com
Hi Daniel,
Tks for the tips.
My traffic does include TLS as well.
For TCP settings:
tcp_connection_lifetime=3600 tcp_async=yes tcp_rd_buf_size=16384 tcp_accept_no_cl=yes tcp_max_connections=50000 tcp_connect_timeout=7
For TLS: enable_tls=yes tls_max_connections=50000
I'm using "set_forward_no_connect();" after lookup(location) since a long time.
I have added this week "set_reply_no_connect();" in case it will help to avoid the issue.
If the issue occurs, I will try to get something via "kamctrl trap".
In order to get a coredump (on restart timeout?) I have added this in my kamailio.service
WorkingDirectory=/var/run/kamailio LimitCORE=infinity
I have also DUMP_CORE=yes in /etc/default/kamailio and disable_core_dump=no in my kamailio.cfg
However, I'm not able to see any core dumps when restarting kamailio even when I see "sig_alarm_abort(): shutdown timeout triggered, dying"...
Am I supposed to get a core dump in such case?
Tks a lot! Aymeric
Le ven. 22 mars 2019 à 14:19, Daniel-Constantin Mierla miconda@gmail.com a écrit :
Do you have pure tcp traffic and facing this issue, or there are actually tls connections?
What are the values for core parameters related to tcp connect and tcp send timeouts?
As for restart taking long, see exit_timeout parameter:
As for tls with libssl1.1/libcrypto1.1, I think I discover what the issue is. With v1.1 they use their own internal locking functions, not exposing any api to set them from outside. Before, kamailio was initializing the libray telling to use Kamailio locks, giving one lock per connection. As i could get from some gdb traces I received, with libssl 1.1, the same internal lock is used for when attempting to connect to different addresses as well as when trying to write to different connections. If one operation is slow for what so ever reason, the others are waiting for the lock to be lifted by the slow operation. I am digging in the source code of libssl1.1 to figure out a solution, it can still take a bit because I am travelling for several days with no much spare time.
Among the tunnings would be lower timeouts to connect and send, do not attempt to connect unless you are sure the target expects new connections (e.g., sending to a gateway/sip server accepting traffic via tls, but don't do it even for the requests routed via lookup(location) as the registration is using a connection with an ephemeral source port and trying to connect back to it will fail). If still a major issue for what so ever reason, using a version compiled with libssl1.0 would be something to go for it.
Cheers, Daniel On 21.03.19 19:17, Aymeric Moizard wrote:
Hi List,
I want to share that I also met this issue last week with my kamailio 5.2.2.
As far as I was able to see, SIP application were able to "connect()" with TCP, but my logs wasn't reporting any of the SIP message received with TCP.
I have an pike right before an xlog showing every incoming request. However I suspect the issue was not related to pike module. The log didn't showed unusual number of blocked traffic.
I'm almost sure I haven't reached any ulimit restrictions. I have many TCP, UDP childreen... Server was not under high load Nothing unusual.
I'm running the default build for debian stretch from here: http://deb.kamailio.org/kamailio52 stretch
And unfortunatly, I had some tiny pressure to restart the service so I was not able to get deeper into the issue.
If I'm correct, I will certainly improve much things by using "set_reply_no_connect()". I have added it and restarted! (Tks Daniel for this tip!)
I have been looking at issue reported here: "Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting." https://github.com/kamailio/kamailio/issues/1172
I have to say that I do have libssl1.1. And I do have crash when I restart my kamailio. (even when I simply restart after a configuration modification)
Mar 21 18:28:50 sip kamailio[19222]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 21 18:28:50 sip kamailio[19175]: ERROR: ctl [ctl.c:390]: mod_destroy(): ERROR: ctl: could not delete unix socket /var/run/kamailio/kamailio_ctl: Permission denied (13)
[... one minute without nothing...]
Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs [jsonrpcs_fifo.c:599]: jsonrpc_fifo_destroy(): FIFO stat failed: Permission denied Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs [jsonrpcs_sock.c:516]: jsonrpc_dgram_destroy(): socket stat failed: Permission denied Mar 21 18:29:50 sip kamailio[19175]: CRITICAL: <core> [main.c:662]: sig_alarm_abort(): shutdown timeout triggered, dying...
As the 1172 issue is closed, should I expect kamailio to still have trouble with libssl1.1?
I just restarted again my service (to see if it restart better after 30 minutes only instead of a week)
Mar 21 19:07:30 sip kamailio[28737]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 21 19:07:31 sip kamailio[28671]: ERROR: ctl [ctl.c:390]: mod_destroy(): ERROR: ctl: could not delete unix socket /var/run/kamailio/kamailio_ctl: Permission denied (13)
[... one minute without nothing...]
Mar 21 19:08:30 sip kamailio[28671]: CRITICAL: <core> [main.c:662]: sig_alarm_abort(): shutdown timeout triggered, dying...
Still not able to restart in a clean way! Tks! Regards Aymeric
Le mer. 20 mars 2019 à 15:08, Daniel-Constantin Mierla miconda@gmail.com a écrit :
Hello,
based on the trap output I think I could figure out what happened there.
You have tcp_children to very low value (1 or so), the problem is not actually that one, but the fact that the connection to upstream (the device/app sending the request) was closed after receiving the request and routing of the reply gets stuck in the way of:
- a reply is received and has to be forwarded
- connection was lost, so Kamailio tries to establish a new one, but
takes time till fails because the upstream is behind nat or so based on the via header:
Via: SIP/2.0/TLS 10.1.0.4:10002 ;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
- the reply is retransmitted and gets to another worker, which tries
to forward it again, but discovers a connection structure for that destination exists (created by previous reply worker) and now waits for the connection to be released (or better said, for the mutex on writing buffer to be unlocked)
- as the second reply waits, there can be other retransmissions of the
reply ending up in other workers stuck on waiting for the mutex of the connection write buffer
The solution here is to use set_reply_no_connect() -- you can put it first in request_route block. I think this would be a good addition to the default configuration file as well, IMO, the sip server should not connect for sending replies and should do it also for requests that go behind nat.
Cheers, Daniel
On 19.03.19 10:53, Kristijan Vrban wrote:
So i had again the situation. But this time, incoming udp was affected. Kamailio was sending out OPTIONS (via dispatcher module) to a group of asterisk machines but the 200 OK reply to the OPTIONS where not processed, so the dispatcher module set all asterisk to inactive, even though they replied 200 OK
Attached the output of kamctl trap during the situation. Hope there is any useful in it. Because after "kamctl trap" it was working again without kamailio restart.
Best Kristijan
Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Hello,
setting tcp_children=1 is not a god option for scallability,
practically
you set kamailio to process a single tcp message at one time, on high traffic, that won't work well.
Maybe try to set tcp_children to 2 or 4, that should make an eventual race appear faster.
Regarding the pid, if it is an outgoing connection, then it can be created by any worker process, including a UDP worker, if that was the one receiving the sip message over udp and sends it out via tcp.
Cheers, Daniel
On 18.03.19 10:09, Kristijan Vrban wrote:
Hi Daniel,
for testing, i now had set: "tcp_children=1" and so far this issue
did not occur
ever since. So now value to provide for "kamctl trap" yet.
"kamctl ps" show this two process to handle tcp:
... }, { "IDX": 25, "PID": 71929, "DSC": "tcp receiver (generic) child=0" }, { "IDX": 26, "PID": 71933, "DSC": "tcp main process" } ...
Ok, but then is was wondering to see a TCP connection on a udp
receiver child:
netstat -ntp |grep 5061
... tcp 0 0 172.17.217.10:5061 195.70.114.125:18252 ESTABLISHED 71895/kamailio ...
An pid 71895 is:
}, { "IDX": 3, "PID": 71895, "DSC": "udp receiver child=2 sock=127.0.0.1:5060" }, {
And if i look into it via "lsof -p 71895" (the udp receiver child)
... kamailio 71895 kamailio 14u sock 0,9 0t0 8856085 protocol: TCP kamailio 71895 kamailio 15u sock 0,9 0t0 8886886 protocol: TCP kamailio 71895 kamailio 16u sock 0,9 0t0 8854886 protocol: TCP kamailio 71895 kamailio 17u sock 0,9 0t0 8828915 protocol: TCP kamailio 71895 kamailio 18u unix 0x000000005f73cb91 0t0 1680314 type=DGRAM kamailio 71895 kamailio 19u IPv4 1846523 0t0 TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED) kamailio 71895 kamailio 20u sock 0,9 0t0 8887192 protocol: TCP kamailio 71895 kamailio 21u sock 0,9 0t0 8813634 protocol: TCP kamailio 71895 kamailio 22u unix 0x00000000c19bd102 0t0 1681407 type=STREAM kamailio 71895 kamailio 23u sock 0,9 0t0 8850488 protocol: TCP ...
Not only the ESTABLISHED TCP session. But also this empty sockets "protocol: TCP" What are they doing there in the udp receiver? Is that how it's
supposed to be?
Kristijan
Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Can you get file written by `kamctl trap`? It should have the
backtrace
for all kamailio processes. You need latest kamailio 5.2.
Also, get the output for: kamctl ps
Cheers, Daniel
On 14.03.19 13:52, Kristijan Vrban wrote: > When i attach via gdb to one of the tcp worker, i see this: > > (gdb) bt > #0 0x00007fdaf4d14470 in futex_wait (private=<optimized out>, > expected=1, futex_word=0x7fdaeca92f8c) at > ../sysdeps/unix/sysv/linux/futex-internal.h:61 > #1 futex_wait_simple (private=<optimized out>, expected=1, > futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135 > #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at > pthread_rwlock_wrlock.c:67 > #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from > /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 > #4 0x00007fdaf08e1c08 in ?? () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
> #5 0x00007fdaf08a6f69 in ?? () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
> #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from > /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 > #7 0x00007fdaf0c31144 in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
> #8 0x00007fdaf0c2bddb in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
> #9 0x00007fdaf0c22858 in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
> #10 0x00007fdaf0c1af61 in SSL_do_handshake () from > /usr/lib/x86_64-linux-gnu/libssl.so.1.1 > #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, > error=0x7ffffe2a2df0) at tls_server.c:422 > #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, > flags=0x7ffffe2c318c) at tls_server.c:1116 > #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98, > read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 > #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, > bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at > core/tcp_read.c:1496 > #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1, > idx=-1) at core/tcp_read.c:1862 > #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0
<io_w>,
> t=2, repeat=0) at core/io_wait.h:1065 > #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at > core/tcp_read.c:1974 > #18 0x0000556ead4c8e24 in tcp_init_children () at
core/tcp_main.c:4853
> #19 0x0000556ead3c352a in main_loop () at main.c:1735 > #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at
main.c:2675
> > > > > > > > Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban > vrban.lkml@gmail.com: >> Hi, with full debug is see this in log for every incoming TCP SIP
request:
>> >> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: >> <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp >> receiver, connection passed to the least busy one (105) >> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: >> <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker 2 >> 27(17937) for activity on [tls:172.17.217.10:5061], 0x7fdaeda8f928 >> >> So the Kamailio TCP process is working, and received TCP traffic.
But
>> the tcp workers are somehow busy. >> >> When i attach via strace to the TCP worker, i do not see any
activity. Just:
>> >> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL >> >> and nothing, even when i see the main tcp process choose this
worker process.
>> >> Kristijan >> >> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban >> vrban.lkml@gmail.com: >>> first of all thanks for the feedback. i prepared our system now
to run
>>> with debug=3 >>> I hope to see more then then. >>> >>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban >>> vrban.lkml@gmail.com: >>>> Hi kamailios, >>>> >>>> i have a creepy situation with v5.2.1 stable Kamilio. After a
day or
>>>> so, Kamailio stop to process incoming SIP traffic via TCP. The >>>> incoming TCP network packages get TCP-ACK from the OS (Debian 9, >>>> 4.18.0-15-generic-Linux) but Kamailio does not show any
processing for
>>>> the SIP-Traffic incoming via TCP. No logs, nothing. While
traffic via
>>>> UDP is working just totally fine. >>>> >>>> When i look via command "netstat -ntp" is see, that the Recv-Q
get
>>>> bigger and bigger. e.g.: >>>> >>>> Proto Recv-Q Send-Q Local Address Foreign Address State
PID/Program
>>>> name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252
ESTABLISHED
>>>> 31347/kamailio >>>> >>>> After Kamailio restart, all is working fine again for a day. We
have
>>>> maybe 10-20 devices online via TCP and low call volume (1-2 call
per
>>>> minute). The only settings for tcp we have is
"tcp_delayed_ack=no"
>>>> >>>> How to could we debug this situation? Again, no error, no
warings in
>>>> the log. Just nothing. >>>> >>>> Kristijan > _______________________________________________ > Kamailio (SER) - Users Mailing List > sr-users@lists.kamailio.org
> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA
-- www.asipto.com
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA
-- www.asipto.com
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- Antisip - http://www.antisip.com
-- Daniel-Constantin Mierla -- www.asipto.comwww.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
Hi Aymeric,
Are you sure the issue is with TCP and not strictly related to TLS? I highly suggest you compile with ssl1.0 and give it a try...
If you want to read how I got to that conclusion: https://github.com/kamailio/kamailio/issues/1172
Hope it helps! Joel.
On Fri, Mar 22, 2019 at 11:58 AM Aymeric Moizard amoizard@gmail.com wrote:
Hi Daniel,
Tks for the tips.
My traffic does include TLS as well.
For TCP settings:
tcp_connection_lifetime=3600 tcp_async=yes tcp_rd_buf_size=16384 tcp_accept_no_cl=yes tcp_max_connections=50000 tcp_connect_timeout=7
For TLS: enable_tls=yes tls_max_connections=50000
I'm using "set_forward_no_connect();" after lookup(location) since a long time.
I have added this week "set_reply_no_connect();" in case it will help to avoid the issue.
If the issue occurs, I will try to get something via "kamctrl trap".
In order to get a coredump (on restart timeout?) I have added this in my kamailio.service
WorkingDirectory=/var/run/kamailio LimitCORE=infinity
I have also DUMP_CORE=yes in /etc/default/kamailio and disable_core_dump=no in my kamailio.cfg
However, I'm not able to see any core dumps when restarting kamailio even when I see "sig_alarm_abort(): shutdown timeout triggered, dying"...
Am I supposed to get a core dump in such case?
Tks a lot! Aymeric
Le ven. 22 mars 2019 à 14:19, Daniel-Constantin Mierla miconda@gmail.com a écrit :
Do you have pure tcp traffic and facing this issue, or there are actually tls connections?
What are the values for core parameters related to tcp connect and tcp send timeouts?
As for restart taking long, see exit_timeout parameter:
As for tls with libssl1.1/libcrypto1.1, I think I discover what the issue is. With v1.1 they use their own internal locking functions, not exposing any api to set them from outside. Before, kamailio was initializing the libray telling to use Kamailio locks, giving one lock per connection. As i could get from some gdb traces I received, with libssl 1.1, the same internal lock is used for when attempting to connect to different addresses as well as when trying to write to different connections. If one operation is slow for what so ever reason, the others are waiting for the lock to be lifted by the slow operation. I am digging in the source code of libssl1.1 to figure out a solution, it can still take a bit because I am travelling for several days with no much spare time.
Among the tunnings would be lower timeouts to connect and send, do not attempt to connect unless you are sure the target expects new connections (e.g., sending to a gateway/sip server accepting traffic via tls, but don't do it even for the requests routed via lookup(location) as the registration is using a connection with an ephemeral source port and trying to connect back to it will fail). If still a major issue for what so ever reason, using a version compiled with libssl1.0 would be something to go for it.
Cheers, Daniel On 21.03.19 19:17, Aymeric Moizard wrote:
Hi List,
I want to share that I also met this issue last week with my kamailio 5.2.2.
As far as I was able to see, SIP application were able to "connect()" with TCP, but my logs wasn't reporting any of the SIP message received with TCP.
I have an pike right before an xlog showing every incoming request. However I suspect the issue was not related to pike module. The log didn't showed unusual number of blocked traffic.
I'm almost sure I haven't reached any ulimit restrictions. I have many TCP, UDP childreen... Server was not under high load Nothing unusual.
I'm running the default build for debian stretch from here: http://deb.kamailio.org/kamailio52 stretch
And unfortunatly, I had some tiny pressure to restart the service so I was not able to get deeper into the issue.
If I'm correct, I will certainly improve much things by using "set_reply_no_connect()". I have added it and restarted! (Tks Daniel for this tip!)
I have been looking at issue reported here: "Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting." https://github.com/kamailio/kamailio/issues/1172
I have to say that I do have libssl1.1. And I do have crash when I restart my kamailio. (even when I simply restart after a configuration modification)
Mar 21 18:28:50 sip kamailio[19222]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 21 18:28:50 sip kamailio[19175]: ERROR: ctl [ctl.c:390]: mod_destroy(): ERROR: ctl: could not delete unix socket /var/run/kamailio/kamailio_ctl: Permission denied (13)
[... one minute without nothing...]
Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs [jsonrpcs_fifo.c:599]: jsonrpc_fifo_destroy(): FIFO stat failed: Permission denied Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs [jsonrpcs_sock.c:516]: jsonrpc_dgram_destroy(): socket stat failed: Permission denied Mar 21 18:29:50 sip kamailio[19175]: CRITICAL: <core> [main.c:662]: sig_alarm_abort(): shutdown timeout triggered, dying...
As the 1172 issue is closed, should I expect kamailio to still have trouble with libssl1.1?
I just restarted again my service (to see if it restart better after 30 minutes only instead of a week)
Mar 21 19:07:30 sip kamailio[28737]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 21 19:07:31 sip kamailio[28671]: ERROR: ctl [ctl.c:390]: mod_destroy(): ERROR: ctl: could not delete unix socket /var/run/kamailio/kamailio_ctl: Permission denied (13)
[... one minute without nothing...]
Mar 21 19:08:30 sip kamailio[28671]: CRITICAL: <core> [main.c:662]: sig_alarm_abort(): shutdown timeout triggered, dying...
Still not able to restart in a clean way! Tks! Regards Aymeric
Le mer. 20 mars 2019 à 15:08, Daniel-Constantin Mierla miconda@gmail.com a écrit :
Hello,
based on the trap output I think I could figure out what happened there.
You have tcp_children to very low value (1 or so), the problem is not actually that one, but the fact that the connection to upstream (the device/app sending the request) was closed after receiving the request and routing of the reply gets stuck in the way of:
- a reply is received and has to be forwarded
- connection was lost, so Kamailio tries to establish a new one, but
takes time till fails because the upstream is behind nat or so based on the via header:
Via: SIP/2.0/TLS 10.1.0.4:10002 ;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
- the reply is retransmitted and gets to another worker, which tries
to forward it again, but discovers a connection structure for that destination exists (created by previous reply worker) and now waits for the connection to be released (or better said, for the mutex on writing buffer to be unlocked)
- as the second reply waits, there can be other retransmissions of the
reply ending up in other workers stuck on waiting for the mutex of the connection write buffer
The solution here is to use set_reply_no_connect() -- you can put it first in request_route block. I think this would be a good addition to the default configuration file as well, IMO, the sip server should not connect for sending replies and should do it also for requests that go behind nat.
Cheers, Daniel
On 19.03.19 10:53, Kristijan Vrban wrote:
So i had again the situation. But this time, incoming udp was affected. Kamailio was sending out OPTIONS (via dispatcher module) to a group of asterisk machines but the 200 OK reply to the OPTIONS where not processed, so the dispatcher module set all asterisk to inactive, even though they replied 200 OK
Attached the output of kamctl trap during the situation. Hope there is any useful in it. Because after "kamctl trap" it was working again without kamailio restart.
Best Kristijan
Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Hello,
setting tcp_children=1 is not a god option for scallability,
practically
you set kamailio to process a single tcp message at one time, on high traffic, that won't work well.
Maybe try to set tcp_children to 2 or 4, that should make an eventual race appear faster.
Regarding the pid, if it is an outgoing connection, then it can be created by any worker process, including a UDP worker, if that was the one receiving the sip message over udp and sends it out via tcp.
Cheers, Daniel
On 18.03.19 10:09, Kristijan Vrban wrote:
Hi Daniel,
for testing, i now had set: "tcp_children=1" and so far this issue
did not occur
ever since. So now value to provide for "kamctl trap" yet.
"kamctl ps" show this two process to handle tcp:
... }, { "IDX": 25, "PID": 71929, "DSC": "tcp receiver (generic) child=0" }, { "IDX": 26, "PID": 71933, "DSC": "tcp main process" } ...
Ok, but then is was wondering to see a TCP connection on a udp
receiver child:
netstat -ntp |grep 5061
... tcp 0 0 172.17.217.10:5061 195.70.114.125:18252 ESTABLISHED 71895/kamailio ...
An pid 71895 is:
}, { "IDX": 3, "PID": 71895, "DSC": "udp receiver child=2 sock=127.0.0.1:5060" }, {
And if i look into it via "lsof -p 71895" (the udp receiver child)
... kamailio 71895 kamailio 14u sock 0,9 0t0 8856085 protocol: TCP kamailio 71895 kamailio 15u sock 0,9 0t0 8886886 protocol: TCP kamailio 71895 kamailio 16u sock 0,9 0t0 8854886 protocol: TCP kamailio 71895 kamailio 17u sock 0,9 0t0 8828915 protocol: TCP kamailio 71895 kamailio 18u unix 0x000000005f73cb91 0t0 1680314 type=DGRAM kamailio 71895 kamailio 19u IPv4 1846523 0t0 TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED) kamailio 71895 kamailio 20u sock 0,9 0t0 8887192 protocol: TCP kamailio 71895 kamailio 21u sock 0,9 0t0 8813634 protocol: TCP kamailio 71895 kamailio 22u unix 0x00000000c19bd102 0t0 1681407 type=STREAM kamailio 71895 kamailio 23u sock 0,9 0t0 8850488 protocol: TCP ...
Not only the ESTABLISHED TCP session. But also this empty sockets "protocol: TCP" What are they doing there in the udp receiver? Is that how it's
supposed to be?
Kristijan
Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com: > Can you get file written by `kamctl trap`? It should have the
backtrace
> for all kamailio processes. You need latest kamailio 5.2. > > Also, get the output for: kamctl ps > > Cheers, > Daniel > > On 14.03.19 13:52, Kristijan Vrban wrote: >> When i attach via gdb to one of the tcp worker, i see this: >> >> (gdb) bt >> #0 0x00007fdaf4d14470 in futex_wait (private=<optimized out>, >> expected=1, futex_word=0x7fdaeca92f8c) at >> ../sysdeps/unix/sysv/linux/futex-internal.h:61 >> #1 futex_wait_simple (private=<optimized out>, expected=1, >> futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135 >> #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at >> pthread_rwlock_wrlock.c:67 >> #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from >> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >> #4 0x00007fdaf08e1c08 in ?? () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>> #5 0x00007fdaf08a6f69 in ?? () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>> #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from >> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >> #7 0x00007fdaf0c31144 in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>> #8 0x00007fdaf0c2bddb in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>> #9 0x00007fdaf0c22858 in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>> #10 0x00007fdaf0c1af61 in SSL_do_handshake () from >> /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >> #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, >> error=0x7ffffe2a2df0) at tls_server.c:422 >> #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, >> flags=0x7ffffe2c318c) at tls_server.c:1116 >> #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98, >> read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 >> #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, >> bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at >> core/tcp_read.c:1496 >> #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1, >> idx=-1) at core/tcp_read.c:1862 >> #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0
<io_w>,
>> t=2, repeat=0) at core/io_wait.h:1065 >> #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at >> core/tcp_read.c:1974 >> #18 0x0000556ead4c8e24 in tcp_init_children () at
core/tcp_main.c:4853
>> #19 0x0000556ead3c352a in main_loop () at main.c:1735 >> #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at
main.c:2675
>> >> >> >> >> >> >> >> Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban >> vrban.lkml@gmail.com: >>> Hi, with full debug is see this in log for every incoming TCP SIP
request:
>>> >>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: >>> <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp >>> receiver, connection passed to the least busy one (105) >>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: >>> <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker 2 >>> 27(17937) for activity on [tls:172.17.217.10:5061],
0x7fdaeda8f928
>>> >>> So the Kamailio TCP process is working, and received TCP traffic.
But
>>> the tcp workers are somehow busy. >>> >>> When i attach via strace to the TCP worker, i do not see any
activity. Just:
>>> >>> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL >>> >>> and nothing, even when i see the main tcp process choose this
worker process.
>>> >>> Kristijan >>> >>> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban >>> vrban.lkml@gmail.com: >>>> first of all thanks for the feedback. i prepared our system now
to run
>>>> with debug=3 >>>> I hope to see more then then. >>>> >>>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban >>>> vrban.lkml@gmail.com: >>>>> Hi kamailios, >>>>> >>>>> i have a creepy situation with v5.2.1 stable Kamilio. After a
day or
>>>>> so, Kamailio stop to process incoming SIP traffic via TCP. The >>>>> incoming TCP network packages get TCP-ACK from the OS (Debian 9, >>>>> 4.18.0-15-generic-Linux) but Kamailio does not show any
processing for
>>>>> the SIP-Traffic incoming via TCP. No logs, nothing. While
traffic via
>>>>> UDP is working just totally fine. >>>>> >>>>> When i look via command "netstat -ntp" is see, that the Recv-Q
get
>>>>> bigger and bigger. e.g.: >>>>> >>>>> Proto Recv-Q Send-Q Local Address Foreign Address State
PID/Program
>>>>> name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252
ESTABLISHED
>>>>> 31347/kamailio >>>>> >>>>> After Kamailio restart, all is working fine again for a day. We
have
>>>>> maybe 10-20 devices online via TCP and low call volume (1-2
call per
>>>>> minute). The only settings for tcp we have is
"tcp_delayed_ack=no"
>>>>> >>>>> How to could we debug this situation? Again, no error, no
warings in
>>>>> the log. Just nothing. >>>>> >>>>> Kristijan >> _______________________________________________ >> Kamailio (SER) - Users Mailing List >> sr-users@lists.kamailio.org >> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users > -- > Daniel-Constantin Mierla -- www.asipto.com > www.twitter.com/miconda -- www.linkedin.com/in/miconda > Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com > Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC,
USA -- www.asipto.com
> _______________________________________________ Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA
-- www.asipto.com
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- Antisip - http://www.antisip.com
-- Daniel-Constantin Mierla -- www.asipto.comwww.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
-- Antisip - http://www.antisip.com _______________________________________________ Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
To make it easier for you and not have to go through the whole thread, if you want the TL;DR start here: https://github.com/kamailio/kamailio/issues/1172#issuecomment-312634272
On Fri, Mar 22, 2019 at 1:19 PM Joel Serrano joel@textplus.com wrote:
Hi Aymeric,
Are you sure the issue is with TCP and not strictly related to TLS? I highly suggest you compile with ssl1.0 and give it a try...
If you want to read how I got to that conclusion: https://github.com/kamailio/kamailio/issues/1172
Hope it helps! Joel.
On Fri, Mar 22, 2019 at 11:58 AM Aymeric Moizard amoizard@gmail.com wrote:
Hi Daniel,
Tks for the tips.
My traffic does include TLS as well.
For TCP settings:
tcp_connection_lifetime=3600 tcp_async=yes tcp_rd_buf_size=16384 tcp_accept_no_cl=yes tcp_max_connections=50000 tcp_connect_timeout=7
For TLS: enable_tls=yes tls_max_connections=50000
I'm using "set_forward_no_connect();" after lookup(location) since a long time.
I have added this week "set_reply_no_connect();" in case it will help to avoid the issue.
If the issue occurs, I will try to get something via "kamctrl trap".
In order to get a coredump (on restart timeout?) I have added this in my kamailio.service
WorkingDirectory=/var/run/kamailio LimitCORE=infinity
I have also DUMP_CORE=yes in /etc/default/kamailio and disable_core_dump=no in my kamailio.cfg
However, I'm not able to see any core dumps when restarting kamailio even when I see "sig_alarm_abort(): shutdown timeout triggered, dying"...
Am I supposed to get a core dump in such case?
Tks a lot! Aymeric
Le ven. 22 mars 2019 à 14:19, Daniel-Constantin Mierla miconda@gmail.com a écrit :
Do you have pure tcp traffic and facing this issue, or there are actually tls connections?
What are the values for core parameters related to tcp connect and tcp send timeouts?
As for restart taking long, see exit_timeout parameter:
As for tls with libssl1.1/libcrypto1.1, I think I discover what the issue is. With v1.1 they use their own internal locking functions, not exposing any api to set them from outside. Before, kamailio was initializing the libray telling to use Kamailio locks, giving one lock per connection. As i could get from some gdb traces I received, with libssl 1.1, the same internal lock is used for when attempting to connect to different addresses as well as when trying to write to different connections. If one operation is slow for what so ever reason, the others are waiting for the lock to be lifted by the slow operation. I am digging in the source code of libssl1.1 to figure out a solution, it can still take a bit because I am travelling for several days with no much spare time.
Among the tunnings would be lower timeouts to connect and send, do not attempt to connect unless you are sure the target expects new connections (e.g., sending to a gateway/sip server accepting traffic via tls, but don't do it even for the requests routed via lookup(location) as the registration is using a connection with an ephemeral source port and trying to connect back to it will fail). If still a major issue for what so ever reason, using a version compiled with libssl1.0 would be something to go for it.
Cheers, Daniel On 21.03.19 19:17, Aymeric Moizard wrote:
Hi List,
I want to share that I also met this issue last week with my kamailio 5.2.2.
As far as I was able to see, SIP application were able to "connect()" with TCP, but my logs wasn't reporting any of the SIP message received with TCP.
I have an pike right before an xlog showing every incoming request. However I suspect the issue was not related to pike module. The log didn't showed unusual number of blocked traffic.
I'm almost sure I haven't reached any ulimit restrictions. I have many TCP, UDP childreen... Server was not under high load Nothing unusual.
I'm running the default build for debian stretch from here: http://deb.kamailio.org/kamailio52 stretch
And unfortunatly, I had some tiny pressure to restart the service so I was not able to get deeper into the issue.
If I'm correct, I will certainly improve much things by using "set_reply_no_connect()". I have added it and restarted! (Tks Daniel for this tip!)
I have been looking at issue reported here: "Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting." https://github.com/kamailio/kamailio/issues/1172
I have to say that I do have libssl1.1. And I do have crash when I restart my kamailio. (even when I simply restart after a configuration modification)
Mar 21 18:28:50 sip kamailio[19222]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 21 18:28:50 sip kamailio[19175]: ERROR: ctl [ctl.c:390]: mod_destroy(): ERROR: ctl: could not delete unix socket /var/run/kamailio/kamailio_ctl: Permission denied (13)
[... one minute without nothing...]
Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs [jsonrpcs_fifo.c:599]: jsonrpc_fifo_destroy(): FIFO stat failed: Permission denied Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs [jsonrpcs_sock.c:516]: jsonrpc_dgram_destroy(): socket stat failed: Permission denied Mar 21 18:29:50 sip kamailio[19175]: CRITICAL: <core> [main.c:662]: sig_alarm_abort(): shutdown timeout triggered, dying...
As the 1172 issue is closed, should I expect kamailio to still have trouble with libssl1.1?
I just restarted again my service (to see if it restart better after 30 minutes only instead of a week)
Mar 21 19:07:30 sip kamailio[28737]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 21 19:07:31 sip kamailio[28671]: ERROR: ctl [ctl.c:390]: mod_destroy(): ERROR: ctl: could not delete unix socket /var/run/kamailio/kamailio_ctl: Permission denied (13)
[... one minute without nothing...]
Mar 21 19:08:30 sip kamailio[28671]: CRITICAL: <core> [main.c:662]: sig_alarm_abort(): shutdown timeout triggered, dying...
Still not able to restart in a clean way! Tks! Regards Aymeric
Le mer. 20 mars 2019 à 15:08, Daniel-Constantin Mierla < miconda@gmail.com> a écrit :
Hello,
based on the trap output I think I could figure out what happened there.
You have tcp_children to very low value (1 or so), the problem is not actually that one, but the fact that the connection to upstream (the device/app sending the request) was closed after receiving the request and routing of the reply gets stuck in the way of:
- a reply is received and has to be forwarded
- connection was lost, so Kamailio tries to establish a new one, but
takes time till fails because the upstream is behind nat or so based on the via header:
Via: SIP/2.0/TLS 10.1.0.4:10002 ;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
- the reply is retransmitted and gets to another worker, which tries
to forward it again, but discovers a connection structure for that destination exists (created by previous reply worker) and now waits for the connection to be released (or better said, for the mutex on writing buffer to be unlocked)
- as the second reply waits, there can be other retransmissions of the
reply ending up in other workers stuck on waiting for the mutex of the connection write buffer
The solution here is to use set_reply_no_connect() -- you can put it first in request_route block. I think this would be a good addition to the default configuration file as well, IMO, the sip server should not connect for sending replies and should do it also for requests that go behind nat.
Cheers, Daniel
On 19.03.19 10:53, Kristijan Vrban wrote:
So i had again the situation. But this time, incoming udp was affected. Kamailio was sending out OPTIONS (via dispatcher module) to a group of asterisk machines but the 200 OK reply to the OPTIONS where not processed, so the dispatcher module set all asterisk to inactive, even though they replied 200 OK
Attached the output of kamctl trap during the situation. Hope there is any useful in it. Because after "kamctl trap" it was working again without kamailio restart.
Best Kristijan
Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Hello,
setting tcp_children=1 is not a god option for scallability,
practically
you set kamailio to process a single tcp message at one time, on high traffic, that won't work well.
Maybe try to set tcp_children to 2 or 4, that should make an eventual race appear faster.
Regarding the pid, if it is an outgoing connection, then it can be created by any worker process, including a UDP worker, if that was
the
one receiving the sip message over udp and sends it out via tcp.
Cheers, Daniel
On 18.03.19 10:09, Kristijan Vrban wrote: > Hi Daniel, > > for testing, i now had set: "tcp_children=1" and so far this issue
did not occur
> ever since. So now value to provide for "kamctl trap" yet. > > "kamctl ps" show this two process to handle tcp: > > ... > }, { > "IDX": 25, > "PID": 71929, > "DSC": "tcp receiver (generic) child=0" > }, { > "IDX": 26, > "PID": 71933, > "DSC": "tcp main process" > } > ... > > > Ok, but then is was wondering to see a TCP connection on a udp
receiver child:
> > > netstat -ntp |grep 5061 > > ... > tcp 0 0 172.17.217.10:5061 195.70.114.125:18252 > ESTABLISHED 71895/kamailio > ... > > An pid 71895 is: > > }, { > "IDX": 3, > "PID": 71895, > "DSC": "udp receiver child=2 sock=127.0.0.1:5060" > }, { > > > > And if i look into it via "lsof -p 71895" (the udp receiver child) > > ... > kamailio 71895 kamailio 14u sock 0,9 0t0 > 8856085 protocol: TCP > kamailio 71895 kamailio 15u sock 0,9 0t0 > 8886886 protocol: TCP > kamailio 71895 kamailio 16u sock 0,9 0t0 > 8854886 protocol: TCP > kamailio 71895 kamailio 17u sock 0,9 0t0 > 8828915 protocol: TCP > kamailio 71895 kamailio 18u unix 0x000000005f73cb91 0t0 > 1680314 type=DGRAM > kamailio 71895 kamailio 19u IPv4 1846523 0t0 > TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED) > kamailio 71895 kamailio 20u sock 0,9 0t0 > 8887192 protocol: TCP > kamailio 71895 kamailio 21u sock 0,9 0t0 > 8813634 protocol: TCP > kamailio 71895 kamailio 22u unix 0x00000000c19bd102 0t0 > 1681407 type=STREAM > kamailio 71895 kamailio 23u sock 0,9 0t0 > 8850488 protocol: TCP > ... > > Not only the ESTABLISHED TCP session. But also this empty sockets > "protocol: TCP" > What are they doing there in the udp receiver? Is that how it's
supposed to be?
> > Kristijan > > Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla > miconda@gmail.com: >> Can you get file written by `kamctl trap`? It should have the
backtrace
>> for all kamailio processes. You need latest kamailio 5.2. >> >> Also, get the output for: kamctl ps >> >> Cheers, >> Daniel >> >> On 14.03.19 13:52, Kristijan Vrban wrote: >>> When i attach via gdb to one of the tcp worker, i see this: >>> >>> (gdb) bt >>> #0 0x00007fdaf4d14470 in futex_wait (private=<optimized out>, >>> expected=1, futex_word=0x7fdaeca92f8c) at >>> ../sysdeps/unix/sysv/linux/futex-internal.h:61 >>> #1 futex_wait_simple (private=<optimized out>, expected=1, >>> futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135 >>> #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at >>> pthread_rwlock_wrlock.c:67 >>> #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from >>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>> #4 0x00007fdaf08e1c08 in ?? () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>> #5 0x00007fdaf08a6f69 in ?? () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>> #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from >>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>> #7 0x00007fdaf0c31144 in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>> #8 0x00007fdaf0c2bddb in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>> #9 0x00007fdaf0c22858 in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>> #10 0x00007fdaf0c1af61 in SSL_do_handshake () from >>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>> #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, >>> error=0x7ffffe2a2df0) at tls_server.c:422 >>> #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, >>> flags=0x7ffffe2c318c) at tls_server.c:1116 >>> #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98, >>> read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 >>> #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, >>> bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at >>> core/tcp_read.c:1496 >>> #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1, >>> idx=-1) at core/tcp_read.c:1862 >>> #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0
<io_w>,
>>> t=2, repeat=0) at core/io_wait.h:1065 >>> #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at >>> core/tcp_read.c:1974 >>> #18 0x0000556ead4c8e24 in tcp_init_children () at
core/tcp_main.c:4853
>>> #19 0x0000556ead3c352a in main_loop () at main.c:1735 >>> #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at
main.c:2675
>>> >>> >>> >>> >>> >>> >>> >>> Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban >>> vrban.lkml@gmail.com: >>>> Hi, with full debug is see this in log for every incoming TCP
SIP request:
>>>> >>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]:
DEBUG:
>>>> <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp >>>> receiver, connection passed to the least busy one (105) >>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]:
DEBUG:
>>>> <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker
2
>>>> 27(17937) for activity on [tls:172.17.217.10:5061],
0x7fdaeda8f928
>>>> >>>> So the Kamailio TCP process is working, and received TCP
traffic. But
>>>> the tcp workers are somehow busy. >>>> >>>> When i attach via strace to the TCP worker, i do not see any
activity. Just:
>>>> >>>> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL >>>> >>>> and nothing, even when i see the main tcp process choose this
worker process.
>>>> >>>> Kristijan >>>> >>>> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban >>>> vrban.lkml@gmail.com: >>>>> first of all thanks for the feedback. i prepared our system now
to run
>>>>> with debug=3 >>>>> I hope to see more then then. >>>>> >>>>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban >>>>> vrban.lkml@gmail.com: >>>>>> Hi kamailios, >>>>>> >>>>>> i have a creepy situation with v5.2.1 stable Kamilio. After a
day or
>>>>>> so, Kamailio stop to process incoming SIP traffic via TCP. The >>>>>> incoming TCP network packages get TCP-ACK from the OS (Debian
9,
>>>>>> 4.18.0-15-generic-Linux) but Kamailio does not show any
processing for
>>>>>> the SIP-Traffic incoming via TCP. No logs, nothing. While
traffic via
>>>>>> UDP is working just totally fine. >>>>>> >>>>>> When i look via command "netstat -ntp" is see, that the Recv-Q
get
>>>>>> bigger and bigger. e.g.: >>>>>> >>>>>> Proto Recv-Q Send-Q Local Address Foreign Address State
PID/Program
>>>>>> name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252
ESTABLISHED
>>>>>> 31347/kamailio >>>>>> >>>>>> After Kamailio restart, all is working fine again for a day.
We have
>>>>>> maybe 10-20 devices online via TCP and low call volume (1-2
call per
>>>>>> minute). The only settings for tcp we have is
"tcp_delayed_ack=no"
>>>>>> >>>>>> How to could we debug this situation? Again, no error, no
warings in
>>>>>> the log. Just nothing. >>>>>> >>>>>> Kristijan >>> _______________________________________________ >>> Kamailio (SER) - Users Mailing List >>> sr-users@lists.kamailio.org >>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users >> -- >> Daniel-Constantin Mierla -- www.asipto.com >> www.twitter.com/miconda -- www.linkedin.com/in/miconda >> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com >> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC,
USA -- www.asipto.com
>> > _______________________________________________ > Kamailio (SER) - Users Mailing List > sr-users@lists.kamailio.org
> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA
-- www.asipto.com
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- Antisip - http://www.antisip.com
-- Daniel-Constantin Mierla -- www.asipto.comwww.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
-- Antisip - http://www.antisip.com _______________________________________________ Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Hi Joel,
My issue was that any TCP traffic wasn't working: including TLS. I guess it could be related to the SSL.1.1 issue.
Tks! Aymeric
Le ven. 22 mars 2019 à 21:21, Joel Serrano joel@textplus.com a écrit :
Hi Aymeric,
Are you sure the issue is with TCP and not strictly related to TLS? I highly suggest you compile with ssl1.0 and give it a try...
If you want to read how I got to that conclusion: https://github.com/kamailio/kamailio/issues/1172
Hope it helps! Joel.
On Fri, Mar 22, 2019 at 11:58 AM Aymeric Moizard amoizard@gmail.com wrote:
Hi Daniel,
Tks for the tips.
My traffic does include TLS as well.
For TCP settings:
tcp_connection_lifetime=3600 tcp_async=yes tcp_rd_buf_size=16384 tcp_accept_no_cl=yes tcp_max_connections=50000 tcp_connect_timeout=7
For TLS: enable_tls=yes tls_max_connections=50000
I'm using "set_forward_no_connect();" after lookup(location) since a long time.
I have added this week "set_reply_no_connect();" in case it will help to avoid the issue.
If the issue occurs, I will try to get something via "kamctrl trap".
In order to get a coredump (on restart timeout?) I have added this in my kamailio.service
WorkingDirectory=/var/run/kamailio LimitCORE=infinity
I have also DUMP_CORE=yes in /etc/default/kamailio and disable_core_dump=no in my kamailio.cfg
However, I'm not able to see any core dumps when restarting kamailio even when I see "sig_alarm_abort(): shutdown timeout triggered, dying"...
Am I supposed to get a core dump in such case?
Tks a lot! Aymeric
Le ven. 22 mars 2019 à 14:19, Daniel-Constantin Mierla miconda@gmail.com a écrit :
Do you have pure tcp traffic and facing this issue, or there are actually tls connections?
What are the values for core parameters related to tcp connect and tcp send timeouts?
As for restart taking long, see exit_timeout parameter:
As for tls with libssl1.1/libcrypto1.1, I think I discover what the issue is. With v1.1 they use their own internal locking functions, not exposing any api to set them from outside. Before, kamailio was initializing the libray telling to use Kamailio locks, giving one lock per connection. As i could get from some gdb traces I received, with libssl 1.1, the same internal lock is used for when attempting to connect to different addresses as well as when trying to write to different connections. If one operation is slow for what so ever reason, the others are waiting for the lock to be lifted by the slow operation. I am digging in the source code of libssl1.1 to figure out a solution, it can still take a bit because I am travelling for several days with no much spare time.
Among the tunnings would be lower timeouts to connect and send, do not attempt to connect unless you are sure the target expects new connections (e.g., sending to a gateway/sip server accepting traffic via tls, but don't do it even for the requests routed via lookup(location) as the registration is using a connection with an ephemeral source port and trying to connect back to it will fail). If still a major issue for what so ever reason, using a version compiled with libssl1.0 would be something to go for it.
Cheers, Daniel On 21.03.19 19:17, Aymeric Moizard wrote:
Hi List,
I want to share that I also met this issue last week with my kamailio 5.2.2.
As far as I was able to see, SIP application were able to "connect()" with TCP, but my logs wasn't reporting any of the SIP message received with TCP.
I have an pike right before an xlog showing every incoming request. However I suspect the issue was not related to pike module. The log didn't showed unusual number of blocked traffic.
I'm almost sure I haven't reached any ulimit restrictions. I have many TCP, UDP childreen... Server was not under high load Nothing unusual.
I'm running the default build for debian stretch from here: http://deb.kamailio.org/kamailio52 stretch
And unfortunatly, I had some tiny pressure to restart the service so I was not able to get deeper into the issue.
If I'm correct, I will certainly improve much things by using "set_reply_no_connect()". I have added it and restarted! (Tks Daniel for this tip!)
I have been looking at issue reported here: "Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting." https://github.com/kamailio/kamailio/issues/1172
I have to say that I do have libssl1.1. And I do have crash when I restart my kamailio. (even when I simply restart after a configuration modification)
Mar 21 18:28:50 sip kamailio[19222]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 21 18:28:50 sip kamailio[19175]: ERROR: ctl [ctl.c:390]: mod_destroy(): ERROR: ctl: could not delete unix socket /var/run/kamailio/kamailio_ctl: Permission denied (13)
[... one minute without nothing...]
Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs [jsonrpcs_fifo.c:599]: jsonrpc_fifo_destroy(): FIFO stat failed: Permission denied Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs [jsonrpcs_sock.c:516]: jsonrpc_dgram_destroy(): socket stat failed: Permission denied Mar 21 18:29:50 sip kamailio[19175]: CRITICAL: <core> [main.c:662]: sig_alarm_abort(): shutdown timeout triggered, dying...
As the 1172 issue is closed, should I expect kamailio to still have trouble with libssl1.1?
I just restarted again my service (to see if it restart better after 30 minutes only instead of a week)
Mar 21 19:07:30 sip kamailio[28737]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 21 19:07:31 sip kamailio[28671]: ERROR: ctl [ctl.c:390]: mod_destroy(): ERROR: ctl: could not delete unix socket /var/run/kamailio/kamailio_ctl: Permission denied (13)
[... one minute without nothing...]
Mar 21 19:08:30 sip kamailio[28671]: CRITICAL: <core> [main.c:662]: sig_alarm_abort(): shutdown timeout triggered, dying...
Still not able to restart in a clean way! Tks! Regards Aymeric
Le mer. 20 mars 2019 à 15:08, Daniel-Constantin Mierla < miconda@gmail.com> a écrit :
Hello,
based on the trap output I think I could figure out what happened there.
You have tcp_children to very low value (1 or so), the problem is not actually that one, but the fact that the connection to upstream (the device/app sending the request) was closed after receiving the request and routing of the reply gets stuck in the way of:
- a reply is received and has to be forwarded
- connection was lost, so Kamailio tries to establish a new one, but
takes time till fails because the upstream is behind nat or so based on the via header:
Via: SIP/2.0/TLS 10.1.0.4:10002 ;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
- the reply is retransmitted and gets to another worker, which tries
to forward it again, but discovers a connection structure for that destination exists (created by previous reply worker) and now waits for the connection to be released (or better said, for the mutex on writing buffer to be unlocked)
- as the second reply waits, there can be other retransmissions of the
reply ending up in other workers stuck on waiting for the mutex of the connection write buffer
The solution here is to use set_reply_no_connect() -- you can put it first in request_route block. I think this would be a good addition to the default configuration file as well, IMO, the sip server should not connect for sending replies and should do it also for requests that go behind nat.
Cheers, Daniel
On 19.03.19 10:53, Kristijan Vrban wrote:
So i had again the situation. But this time, incoming udp was affected. Kamailio was sending out OPTIONS (via dispatcher module) to a group of asterisk machines but the 200 OK reply to the OPTIONS where not processed, so the dispatcher module set all asterisk to inactive, even though they replied 200 OK
Attached the output of kamctl trap during the situation. Hope there is any useful in it. Because after "kamctl trap" it was working again without kamailio restart.
Best Kristijan
Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Hello,
setting tcp_children=1 is not a god option for scallability,
practically
you set kamailio to process a single tcp message at one time, on high traffic, that won't work well.
Maybe try to set tcp_children to 2 or 4, that should make an eventual race appear faster.
Regarding the pid, if it is an outgoing connection, then it can be created by any worker process, including a UDP worker, if that was
the
one receiving the sip message over udp and sends it out via tcp.
Cheers, Daniel
On 18.03.19 10:09, Kristijan Vrban wrote: > Hi Daniel, > > for testing, i now had set: "tcp_children=1" and so far this issue
did not occur
> ever since. So now value to provide for "kamctl trap" yet. > > "kamctl ps" show this two process to handle tcp: > > ... > }, { > "IDX": 25, > "PID": 71929, > "DSC": "tcp receiver (generic) child=0" > }, { > "IDX": 26, > "PID": 71933, > "DSC": "tcp main process" > } > ... > > > Ok, but then is was wondering to see a TCP connection on a udp
receiver child:
> > > netstat -ntp |grep 5061 > > ... > tcp 0 0 172.17.217.10:5061 195.70.114.125:18252 > ESTABLISHED 71895/kamailio > ... > > An pid 71895 is: > > }, { > "IDX": 3, > "PID": 71895, > "DSC": "udp receiver child=2 sock=127.0.0.1:5060" > }, { > > > > And if i look into it via "lsof -p 71895" (the udp receiver child) > > ... > kamailio 71895 kamailio 14u sock 0,9 0t0 > 8856085 protocol: TCP > kamailio 71895 kamailio 15u sock 0,9 0t0 > 8886886 protocol: TCP > kamailio 71895 kamailio 16u sock 0,9 0t0 > 8854886 protocol: TCP > kamailio 71895 kamailio 17u sock 0,9 0t0 > 8828915 protocol: TCP > kamailio 71895 kamailio 18u unix 0x000000005f73cb91 0t0 > 1680314 type=DGRAM > kamailio 71895 kamailio 19u IPv4 1846523 0t0 > TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED) > kamailio 71895 kamailio 20u sock 0,9 0t0 > 8887192 protocol: TCP > kamailio 71895 kamailio 21u sock 0,9 0t0 > 8813634 protocol: TCP > kamailio 71895 kamailio 22u unix 0x00000000c19bd102 0t0 > 1681407 type=STREAM > kamailio 71895 kamailio 23u sock 0,9 0t0 > 8850488 protocol: TCP > ... > > Not only the ESTABLISHED TCP session. But also this empty sockets > "protocol: TCP" > What are they doing there in the udp receiver? Is that how it's
supposed to be?
> > Kristijan > > Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla > miconda@gmail.com: >> Can you get file written by `kamctl trap`? It should have the
backtrace
>> for all kamailio processes. You need latest kamailio 5.2. >> >> Also, get the output for: kamctl ps >> >> Cheers, >> Daniel >> >> On 14.03.19 13:52, Kristijan Vrban wrote: >>> When i attach via gdb to one of the tcp worker, i see this: >>> >>> (gdb) bt >>> #0 0x00007fdaf4d14470 in futex_wait (private=<optimized out>, >>> expected=1, futex_word=0x7fdaeca92f8c) at >>> ../sysdeps/unix/sysv/linux/futex-internal.h:61 >>> #1 futex_wait_simple (private=<optimized out>, expected=1, >>> futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135 >>> #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at >>> pthread_rwlock_wrlock.c:67 >>> #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from >>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>> #4 0x00007fdaf08e1c08 in ?? () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>> #5 0x00007fdaf08a6f69 in ?? () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>> #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from >>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>> #7 0x00007fdaf0c31144 in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>> #8 0x00007fdaf0c2bddb in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>> #9 0x00007fdaf0c22858 in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>> #10 0x00007fdaf0c1af61 in SSL_do_handshake () from >>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>> #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, >>> error=0x7ffffe2a2df0) at tls_server.c:422 >>> #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, >>> flags=0x7ffffe2c318c) at tls_server.c:1116 >>> #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98, >>> read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 >>> #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, >>> bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at >>> core/tcp_read.c:1496 >>> #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1, >>> idx=-1) at core/tcp_read.c:1862 >>> #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0
<io_w>,
>>> t=2, repeat=0) at core/io_wait.h:1065 >>> #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at >>> core/tcp_read.c:1974 >>> #18 0x0000556ead4c8e24 in tcp_init_children () at
core/tcp_main.c:4853
>>> #19 0x0000556ead3c352a in main_loop () at main.c:1735 >>> #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at
main.c:2675
>>> >>> >>> >>> >>> >>> >>> >>> Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban >>> vrban.lkml@gmail.com: >>>> Hi, with full debug is see this in log for every incoming TCP
SIP request:
>>>> >>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]:
DEBUG:
>>>> <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp >>>> receiver, connection passed to the least busy one (105) >>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]:
DEBUG:
>>>> <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker
2
>>>> 27(17937) for activity on [tls:172.17.217.10:5061],
0x7fdaeda8f928
>>>> >>>> So the Kamailio TCP process is working, and received TCP
traffic. But
>>>> the tcp workers are somehow busy. >>>> >>>> When i attach via strace to the TCP worker, i do not see any
activity. Just:
>>>> >>>> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL >>>> >>>> and nothing, even when i see the main tcp process choose this
worker process.
>>>> >>>> Kristijan >>>> >>>> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban >>>> vrban.lkml@gmail.com: >>>>> first of all thanks for the feedback. i prepared our system now
to run
>>>>> with debug=3 >>>>> I hope to see more then then. >>>>> >>>>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban >>>>> vrban.lkml@gmail.com: >>>>>> Hi kamailios, >>>>>> >>>>>> i have a creepy situation with v5.2.1 stable Kamilio. After a
day or
>>>>>> so, Kamailio stop to process incoming SIP traffic via TCP. The >>>>>> incoming TCP network packages get TCP-ACK from the OS (Debian
9,
>>>>>> 4.18.0-15-generic-Linux) but Kamailio does not show any
processing for
>>>>>> the SIP-Traffic incoming via TCP. No logs, nothing. While
traffic via
>>>>>> UDP is working just totally fine. >>>>>> >>>>>> When i look via command "netstat -ntp" is see, that the Recv-Q
get
>>>>>> bigger and bigger. e.g.: >>>>>> >>>>>> Proto Recv-Q Send-Q Local Address Foreign Address State
PID/Program
>>>>>> name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252
ESTABLISHED
>>>>>> 31347/kamailio >>>>>> >>>>>> After Kamailio restart, all is working fine again for a day.
We have
>>>>>> maybe 10-20 devices online via TCP and low call volume (1-2
call per
>>>>>> minute). The only settings for tcp we have is
"tcp_delayed_ack=no"
>>>>>> >>>>>> How to could we debug this situation? Again, no error, no
warings in
>>>>>> the log. Just nothing. >>>>>> >>>>>> Kristijan >>> _______________________________________________ >>> Kamailio (SER) - Users Mailing List >>> sr-users@lists.kamailio.org >>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users >> -- >> Daniel-Constantin Mierla -- www.asipto.com >> www.twitter.com/miconda -- www.linkedin.com/in/miconda >> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com >> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC,
USA -- www.asipto.com
>> > _______________________________________________ > Kamailio (SER) - Users Mailing List > sr-users@lists.kamailio.org
> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA
-- www.asipto.com
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- Antisip - http://www.antisip.com
-- Daniel-Constantin Mierla -- www.asipto.comwww.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
-- Antisip - http://www.antisip.com _______________________________________________ Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Yes I agree there... I meant the trigger. I have a feeling your issue is with TLS, and when it happens it affects the rest...
Give it a try and let us know ;)
On Sat, Mar 23, 2019 at 12:05 Aymeric Moizard amoizard@gmail.com wrote:
Hi Joel,
My issue was that any TCP traffic wasn't working: including TLS. I guess it could be related to the SSL.1.1 issue.
Tks! Aymeric
Le ven. 22 mars 2019 à 21:21, Joel Serrano joel@textplus.com a écrit :
Hi Aymeric,
Are you sure the issue is with TCP and not strictly related to TLS? I highly suggest you compile with ssl1.0 and give it a try...
If you want to read how I got to that conclusion: https://github.com/kamailio/kamailio/issues/1172
Hope it helps! Joel.
On Fri, Mar 22, 2019 at 11:58 AM Aymeric Moizard amoizard@gmail.com wrote:
Hi Daniel,
Tks for the tips.
My traffic does include TLS as well.
For TCP settings:
tcp_connection_lifetime=3600 tcp_async=yes tcp_rd_buf_size=16384 tcp_accept_no_cl=yes tcp_max_connections=50000 tcp_connect_timeout=7
For TLS: enable_tls=yes tls_max_connections=50000
I'm using "set_forward_no_connect();" after lookup(location) since a long time.
I have added this week "set_reply_no_connect();" in case it will help to avoid the issue.
If the issue occurs, I will try to get something via "kamctrl trap".
In order to get a coredump (on restart timeout?) I have added this in my kamailio.service
WorkingDirectory=/var/run/kamailio LimitCORE=infinity
I have also DUMP_CORE=yes in /etc/default/kamailio and disable_core_dump=no in my kamailio.cfg
However, I'm not able to see any core dumps when restarting kamailio even when I see "sig_alarm_abort(): shutdown timeout triggered, dying"...
Am I supposed to get a core dump in such case?
Tks a lot! Aymeric
Le ven. 22 mars 2019 à 14:19, Daniel-Constantin Mierla < miconda@gmail.com> a écrit :
Do you have pure tcp traffic and facing this issue, or there are actually tls connections?
What are the values for core parameters related to tcp connect and tcp send timeouts?
As for restart taking long, see exit_timeout parameter:
As for tls with libssl1.1/libcrypto1.1, I think I discover what the issue is. With v1.1 they use their own internal locking functions, not exposing any api to set them from outside. Before, kamailio was initializing the libray telling to use Kamailio locks, giving one lock per connection. As i could get from some gdb traces I received, with libssl 1.1, the same internal lock is used for when attempting to connect to different addresses as well as when trying to write to different connections. If one operation is slow for what so ever reason, the others are waiting for the lock to be lifted by the slow operation. I am digging in the source code of libssl1.1 to figure out a solution, it can still take a bit because I am travelling for several days with no much spare time.
Among the tunnings would be lower timeouts to connect and send, do not attempt to connect unless you are sure the target expects new connections (e.g., sending to a gateway/sip server accepting traffic via tls, but don't do it even for the requests routed via lookup(location) as the registration is using a connection with an ephemeral source port and trying to connect back to it will fail). If still a major issue for what so ever reason, using a version compiled with libssl1.0 would be something to go for it.
Cheers, Daniel On 21.03.19 19:17, Aymeric Moizard wrote:
Hi List,
I want to share that I also met this issue last week with my kamailio 5.2.2.
As far as I was able to see, SIP application were able to "connect()" with TCP, but my logs wasn't reporting any of the SIP message received with TCP.
I have an pike right before an xlog showing every incoming request. However I suspect the issue was not related to pike module. The log didn't showed unusual number of blocked traffic.
I'm almost sure I haven't reached any ulimit restrictions. I have many TCP, UDP childreen... Server was not under high load Nothing unusual.
I'm running the default build for debian stretch from here: http://deb.kamailio.org/kamailio52 stretch
And unfortunatly, I had some tiny pressure to restart the service so I was not able to get deeper into the issue.
If I'm correct, I will certainly improve much things by using "set_reply_no_connect()". I have added it and restarted! (Tks Daniel for this tip!)
I have been looking at issue reported here: "Kamailio 5.0.2 suddenly stops processing traffic, then generates a core when restarting." https://github.com/kamailio/kamailio/issues/1172
I have to say that I do have libssl1.1. And I do have crash when I restart my kamailio. (even when I simply restart after a configuration modification)
Mar 21 18:28:50 sip kamailio[19222]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 21 18:28:50 sip kamailio[19175]: ERROR: ctl [ctl.c:390]: mod_destroy(): ERROR: ctl: could not delete unix socket /var/run/kamailio/kamailio_ctl: Permission denied (13)
[... one minute without nothing...]
Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs [jsonrpcs_fifo.c:599]: jsonrpc_fifo_destroy(): FIFO stat failed: Permission denied Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs [jsonrpcs_sock.c:516]: jsonrpc_dgram_destroy(): socket stat failed: Permission denied Mar 21 18:29:50 sip kamailio[19175]: CRITICAL: <core> [main.c:662]: sig_alarm_abort(): shutdown timeout triggered, dying...
As the 1172 issue is closed, should I expect kamailio to still have trouble with libssl1.1?
I just restarted again my service (to see if it restart better after 30 minutes only instead of a week)
Mar 21 19:07:30 sip kamailio[28737]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 21 19:07:31 sip kamailio[28671]: ERROR: ctl [ctl.c:390]: mod_destroy(): ERROR: ctl: could not delete unix socket /var/run/kamailio/kamailio_ctl: Permission denied (13)
[... one minute without nothing...]
Mar 21 19:08:30 sip kamailio[28671]: CRITICAL: <core> [main.c:662]: sig_alarm_abort(): shutdown timeout triggered, dying...
Still not able to restart in a clean way! Tks! Regards Aymeric
Le mer. 20 mars 2019 à 15:08, Daniel-Constantin Mierla < miconda@gmail.com> a écrit :
Hello,
based on the trap output I think I could figure out what happened there.
You have tcp_children to very low value (1 or so), the problem is not actually that one, but the fact that the connection to upstream (the device/app sending the request) was closed after receiving the request and routing of the reply gets stuck in the way of:
- a reply is received and has to be forwarded
- connection was lost, so Kamailio tries to establish a new one, but
takes time till fails because the upstream is behind nat or so based on the via header:
Via: SIP/2.0/TLS 10.1.0.4:10002 ;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
- the reply is retransmitted and gets to another worker, which tries
to forward it again, but discovers a connection structure for that destination exists (created by previous reply worker) and now waits for the connection to be released (or better said, for the mutex on writing buffer to be unlocked)
- as the second reply waits, there can be other retransmissions of
the reply ending up in other workers stuck on waiting for the mutex of the connection write buffer
The solution here is to use set_reply_no_connect() -- you can put it first in request_route block. I think this would be a good addition to the default configuration file as well, IMO, the sip server should not connect for sending replies and should do it also for requests that go behind nat.
Cheers, Daniel
On 19.03.19 10:53, Kristijan Vrban wrote:
So i had again the situation. But this time, incoming udp was affected. Kamailio was sending out OPTIONS (via dispatcher module) to a group of asterisk machines but the 200 OK reply to the OPTIONS where not processed, so the dispatcher module set all asterisk to inactive, even though they replied 200 OK
Attached the output of kamctl trap during the situation. Hope there
is
any useful in it. Because after "kamctl trap" it was working again without kamailio restart.
Best Kristijan
Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com: > Hello, > > setting tcp_children=1 is not a god option for scallability,
practically
> you set kamailio to process a single tcp message at one time, on
high
> traffic, that won't work well. > > Maybe try to set tcp_children to 2 or 4, that should make an
eventual
> race appear faster. > > Regarding the pid, if it is an outgoing connection, then it can be > created by any worker process, including a UDP worker, if that was
the
> one receiving the sip message over udp and sends it out via tcp. > > Cheers, > Daniel > > On 18.03.19 10:09, Kristijan Vrban wrote: >> Hi Daniel, >> >> for testing, i now had set: "tcp_children=1" and so far this issue
did not occur
>> ever since. So now value to provide for "kamctl trap" yet. >> >> "kamctl ps" show this two process to handle tcp: >> >> ... >> }, { >> "IDX": 25, >> "PID": 71929, >> "DSC": "tcp receiver (generic) child=0" >> }, { >> "IDX": 26, >> "PID": 71933, >> "DSC": "tcp main process" >> } >> ... >> >> >> Ok, but then is was wondering to see a TCP connection on a udp
receiver child:
>> >> >> netstat -ntp |grep 5061 >> >> ... >> tcp 0 0 172.17.217.10:5061 195.70.114.125:18252 >> ESTABLISHED 71895/kamailio >> ... >> >> An pid 71895 is: >> >> }, { >> "IDX": 3, >> "PID": 71895, >> "DSC": "udp receiver child=2 sock=127.0.0.1:5060" >> }, { >> >> >> >> And if i look into it via "lsof -p 71895" (the udp receiver child) >> >> ... >> kamailio 71895 kamailio 14u sock 0,9 0t0 >> 8856085 protocol: TCP >> kamailio 71895 kamailio 15u sock 0,9 0t0 >> 8886886 protocol: TCP >> kamailio 71895 kamailio 16u sock 0,9 0t0 >> 8854886 protocol: TCP >> kamailio 71895 kamailio 17u sock 0,9 0t0 >> 8828915 protocol: TCP >> kamailio 71895 kamailio 18u unix 0x000000005f73cb91 0t0 >> 1680314 type=DGRAM >> kamailio 71895 kamailio 19u IPv4 1846523 0t0 >> TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED) >> kamailio 71895 kamailio 20u sock 0,9 0t0 >> 8887192 protocol: TCP >> kamailio 71895 kamailio 21u sock 0,9 0t0 >> 8813634 protocol: TCP >> kamailio 71895 kamailio 22u unix 0x00000000c19bd102 0t0 >> 1681407 type=STREAM >> kamailio 71895 kamailio 23u sock 0,9 0t0 >> 8850488 protocol: TCP >> ... >> >> Not only the ESTABLISHED TCP session. But also this empty sockets >> "protocol: TCP" >> What are they doing there in the udp receiver? Is that how it's
supposed to be?
>> >> Kristijan >> >> Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla >> miconda@gmail.com: >>> Can you get file written by `kamctl trap`? It should have the
backtrace
>>> for all kamailio processes. You need latest kamailio 5.2. >>> >>> Also, get the output for: kamctl ps >>> >>> Cheers, >>> Daniel >>> >>> On 14.03.19 13:52, Kristijan Vrban wrote: >>>> When i attach via gdb to one of the tcp worker, i see this: >>>> >>>> (gdb) bt >>>> #0 0x00007fdaf4d14470 in futex_wait (private=<optimized out>, >>>> expected=1, futex_word=0x7fdaeca92f8c) at >>>> ../sysdeps/unix/sysv/linux/futex-internal.h:61 >>>> #1 futex_wait_simple (private=<optimized out>, expected=1, >>>> futex_word=0x7fdaeca92f8c) at
../sysdeps/nptl/futex-internal.h:135
>>>> #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at >>>> pthread_rwlock_wrlock.c:67 >>>> #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from >>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>>> #4 0x00007fdaf08e1c08 in ?? () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>>> #5 0x00007fdaf08a6f69 in ?? () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>>> #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from >>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>>> #7 0x00007fdaf0c31144 in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>>> #8 0x00007fdaf0c2bddb in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>>> #9 0x00007fdaf0c22858 in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>>> #10 0x00007fdaf0c1af61 in SSL_do_handshake () from >>>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>>> #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, >>>> error=0x7ffffe2a2df0) at tls_server.c:422 >>>> #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, >>>> flags=0x7ffffe2c318c) at tls_server.c:1116 >>>> #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98, >>>> read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 >>>> #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, >>>> bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at >>>> core/tcp_read.c:1496 >>>> #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1, >>>> idx=-1) at core/tcp_read.c:1862 >>>> #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0
<io_w>,
>>>> t=2, repeat=0) at core/io_wait.h:1065 >>>> #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at >>>> core/tcp_read.c:1974 >>>> #18 0x0000556ead4c8e24 in tcp_init_children () at
core/tcp_main.c:4853
>>>> #19 0x0000556ead3c352a in main_loop () at main.c:1735 >>>> #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at
main.c:2675
>>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban >>>> vrban.lkml@gmail.com: >>>>> Hi, with full debug is see this in log for every incoming TCP
SIP request:
>>>>> >>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]:
DEBUG:
>>>>> <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free
tcp
>>>>> receiver, connection passed to the least busy one (105) >>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]:
DEBUG:
>>>>> <core> [core/tcp_main.c:3875]: send2child(): selected tcp
worker 2
>>>>> 27(17937) for activity on [tls:172.17.217.10:5061],
0x7fdaeda8f928
>>>>> >>>>> So the Kamailio TCP process is working, and received TCP
traffic. But
>>>>> the tcp workers are somehow busy. >>>>> >>>>> When i attach via strace to the TCP worker, i do not see any
activity. Just:
>>>>> >>>>> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL >>>>> >>>>> and nothing, even when i see the main tcp process choose this
worker process.
>>>>> >>>>> Kristijan >>>>> >>>>> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban >>>>> vrban.lkml@gmail.com: >>>>>> first of all thanks for the feedback. i prepared our system
now to run
>>>>>> with debug=3 >>>>>> I hope to see more then then. >>>>>> >>>>>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban >>>>>> vrban.lkml@gmail.com: >>>>>>> Hi kamailios, >>>>>>> >>>>>>> i have a creepy situation with v5.2.1 stable Kamilio. After a
day or
>>>>>>> so, Kamailio stop to process incoming SIP traffic via TCP. The >>>>>>> incoming TCP network packages get TCP-ACK from the OS (Debian
9,
>>>>>>> 4.18.0-15-generic-Linux) but Kamailio does not show any
processing for
>>>>>>> the SIP-Traffic incoming via TCP. No logs, nothing. While
traffic via
>>>>>>> UDP is working just totally fine. >>>>>>> >>>>>>> When i look via command "netstat -ntp" is see, that the
Recv-Q get
>>>>>>> bigger and bigger. e.g.: >>>>>>> >>>>>>> Proto Recv-Q Send-Q Local Address Foreign Address State
PID/Program
>>>>>>> name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252
ESTABLISHED
>>>>>>> 31347/kamailio >>>>>>> >>>>>>> After Kamailio restart, all is working fine again for a day.
We have
>>>>>>> maybe 10-20 devices online via TCP and low call volume (1-2
call per
>>>>>>> minute). The only settings for tcp we have is
"tcp_delayed_ack=no"
>>>>>>> >>>>>>> How to could we debug this situation? Again, no error, no
warings in
>>>>>>> the log. Just nothing. >>>>>>> >>>>>>> Kristijan >>>> _______________________________________________ >>>> Kamailio (SER) - Users Mailing List >>>> sr-users@lists.kamailio.org >>>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users >>> -- >>> Daniel-Constantin Mierla -- www.asipto.com >>> www.twitter.com/miconda -- www.linkedin.com/in/miconda >>> Kamailio World Conference - May 6-8, 2019 --
www.kamailioworld.com
>>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC,
USA -- www.asipto.com
>>> >> _______________________________________________ >> Kamailio (SER) - Users Mailing List >> sr-users@lists.kamailio.org >> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users > -- > Daniel-Constantin Mierla -- www.asipto.com > www.twitter.com/miconda -- www.linkedin.com/in/miconda > Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com > Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC,
USA -- www.asipto.com
>
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- Antisip - http://www.antisip.com
-- Daniel-Constantin Mierla -- www.asipto.comwww.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
-- Antisip - http://www.antisip.com _______________________________________________ Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
The solution here is to use set_reply_no_connect()
implemented it. Now the issue has shifted to:
ERROR: <core> [core/tcp_main.c:3959]: handle_new_connect(): maximum number of connections exceeded: 2048/2048
But not a single TCP connection is active between Kamailio and any device. Seems this counter for maximum number of connections now has an issue?
Kristijan
Am Mi., 20. März 2019 um 15:07 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Hello,
based on the trap output I think I could figure out what happened there.
You have tcp_children to very low value (1 or so), the problem is not actually that one, but the fact that the connection to upstream (the device/app sending the request) was closed after receiving the request and routing of the reply gets stuck in the way of:
- a reply is received and has to be forwarded
- connection was lost, so Kamailio tries to establish a new one, but
takes time till fails because the upstream is behind nat or so based on the via header:
Via: SIP/2.0/TLS 10.1.0.4:10002;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
- the reply is retransmitted and gets to another worker, which tries
to forward it again, but discovers a connection structure for that destination exists (created by previous reply worker) and now waits for the connection to be released (or better said, for the mutex on writing buffer to be unlocked)
- as the second reply waits, there can be other retransmissions of the
reply ending up in other workers stuck on waiting for the mutex of the connection write buffer
The solution here is to use set_reply_no_connect() -- you can put it first in request_route block. I think this would be a good addition to the default configuration file as well, IMO, the sip server should not connect for sending replies and should do it also for requests that go behind nat.
Cheers, Daniel
On 19.03.19 10:53, Kristijan Vrban wrote:
So i had again the situation. But this time, incoming udp was affected. Kamailio was sending out OPTIONS (via dispatcher module) to a group of asterisk machines but the 200 OK reply to the OPTIONS where not processed, so the dispatcher module set all asterisk to inactive, even though they replied 200 OK
Attached the output of kamctl trap during the situation. Hope there is any useful in it. Because after "kamctl trap" it was working again without kamailio restart.
Best Kristijan
Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Hello,
setting tcp_children=1 is not a god option for scallability, practically you set kamailio to process a single tcp message at one time, on high traffic, that won't work well.
Maybe try to set tcp_children to 2 or 4, that should make an eventual race appear faster.
Regarding the pid, if it is an outgoing connection, then it can be created by any worker process, including a UDP worker, if that was the one receiving the sip message over udp and sends it out via tcp.
Cheers, Daniel
On 18.03.19 10:09, Kristijan Vrban wrote:
Hi Daniel,
for testing, i now had set: "tcp_children=1" and so far this issue did not occur ever since. So now value to provide for "kamctl trap" yet.
"kamctl ps" show this two process to handle tcp:
... }, { "IDX": 25, "PID": 71929, "DSC": "tcp receiver (generic) child=0" }, { "IDX": 26, "PID": 71933, "DSC": "tcp main process" } ...
Ok, but then is was wondering to see a TCP connection on a udp receiver child:
netstat -ntp |grep 5061
... tcp 0 0 172.17.217.10:5061 195.70.114.125:18252 ESTABLISHED 71895/kamailio ...
An pid 71895 is:
}, { "IDX": 3, "PID": 71895, "DSC": "udp receiver child=2 sock=127.0.0.1:5060" }, {
And if i look into it via "lsof -p 71895" (the udp receiver child)
... kamailio 71895 kamailio 14u sock 0,9 0t0 8856085 protocol: TCP kamailio 71895 kamailio 15u sock 0,9 0t0 8886886 protocol: TCP kamailio 71895 kamailio 16u sock 0,9 0t0 8854886 protocol: TCP kamailio 71895 kamailio 17u sock 0,9 0t0 8828915 protocol: TCP kamailio 71895 kamailio 18u unix 0x000000005f73cb91 0t0 1680314 type=DGRAM kamailio 71895 kamailio 19u IPv4 1846523 0t0 TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED) kamailio 71895 kamailio 20u sock 0,9 0t0 8887192 protocol: TCP kamailio 71895 kamailio 21u sock 0,9 0t0 8813634 protocol: TCP kamailio 71895 kamailio 22u unix 0x00000000c19bd102 0t0 1681407 type=STREAM kamailio 71895 kamailio 23u sock 0,9 0t0 8850488 protocol: TCP ...
Not only the ESTABLISHED TCP session. But also this empty sockets "protocol: TCP" What are they doing there in the udp receiver? Is that how it's supposed to be?
Kristijan
Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Can you get file written by `kamctl trap`? It should have the backtrace for all kamailio processes. You need latest kamailio 5.2.
Also, get the output for: kamctl ps
Cheers, Daniel
On 14.03.19 13:52, Kristijan Vrban wrote:
When i attach via gdb to one of the tcp worker, i see this:
(gdb) bt #0 0x00007fdaf4d14470 in futex_wait (private=<optimized out>, expected=1, futex_word=0x7fdaeca92f8c) at ../sysdeps/unix/sysv/linux/futex-internal.h:61 #1 futex_wait_simple (private=<optimized out>, expected=1, futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135 #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at pthread_rwlock_wrlock.c:67 #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #4 0x00007fdaf08e1c08 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #5 0x00007fdaf08a6f69 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 #7 0x00007fdaf0c31144 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #8 0x00007fdaf0c2bddb in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #9 0x00007fdaf0c22858 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #10 0x00007fdaf0c1af61 in SSL_do_handshake () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, error=0x7ffffe2a2df0) at tls_server.c:422 #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, flags=0x7ffffe2c318c) at tls_server.c:1116 #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98, read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at core/tcp_read.c:1496 #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1, idx=-1) at core/tcp_read.c:1862 #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0 <io_w>, t=2, repeat=0) at core/io_wait.h:1065 #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at core/tcp_read.c:1974 #18 0x0000556ead4c8e24 in tcp_init_children () at core/tcp_main.c:4853 #19 0x0000556ead3c352a in main_loop () at main.c:1735 #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at main.c:2675
Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com: > Hi, with full debug is see this in log for every incoming TCP SIP request: > > Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: > <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp > receiver, connection passed to the least busy one (105) > Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: > <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker 2 > 27(17937) for activity on [tls:172.17.217.10:5061], 0x7fdaeda8f928 > > So the Kamailio TCP process is working, and received TCP traffic. But > the tcp workers are somehow busy. > > When i attach via strace to the TCP worker, i do not see any activity. Just: > > futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL > > and nothing, even when i see the main tcp process choose this worker process. > > Kristijan > > Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban > vrban.lkml@gmail.com: >> first of all thanks for the feedback. i prepared our system now to run >> with debug=3 >> I hope to see more then then. >> >> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban >> vrban.lkml@gmail.com: >>> Hi kamailios, >>> >>> i have a creepy situation with v5.2.1 stable Kamilio. After a day or >>> so, Kamailio stop to process incoming SIP traffic via TCP. The >>> incoming TCP network packages get TCP-ACK from the OS (Debian 9, >>> 4.18.0-15-generic-Linux) but Kamailio does not show any processing for >>> the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via >>> UDP is working just totally fine. >>> >>> When i look via command "netstat -ntp" is see, that the Recv-Q get >>> bigger and bigger. e.g.: >>> >>> Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program >>> name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 ESTABLISHED >>> 31347/kamailio >>> >>> After Kamailio restart, all is working fine again for a day. We have >>> maybe 10-20 devices online via TCP and low call volume (1-2 call per >>> minute). The only settings for tcp we have is "tcp_delayed_ack=no" >>> >>> How to could we debug this situation? Again, no error, no warings in >>> the log. Just nothing. >>> >>> Kristijan _______________________________________________ Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
Have you done a test with tools such as sipp, or was this happening after a while, with usual phones registering?
Can you list the tcp connections and see if they are listed?
kamctl tcp core.tcp_list
Cheers, Daniel
On 25.03.19 08:03, Kristijan Vrban wrote:
The solution here is to use set_reply_no_connect()
implemented it. Now the issue has shifted to:
ERROR: <core> [core/tcp_main.c:3959]: handle_new_connect(): maximum number of connections exceeded: 2048/2048
But not a single TCP connection is active between Kamailio and any device. Seems this counter for maximum number of connections now has an issue?
Kristijan
Am Mi., 20. März 2019 um 15:07 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Hello,
based on the trap output I think I could figure out what happened there.
You have tcp_children to very low value (1 or so), the problem is not actually that one, but the fact that the connection to upstream (the device/app sending the request) was closed after receiving the request and routing of the reply gets stuck in the way of:
- a reply is received and has to be forwarded
- connection was lost, so Kamailio tries to establish a new one, but
takes time till fails because the upstream is behind nat or so based on the via header:
Via: SIP/2.0/TLS 10.1.0.4:10002;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
- the reply is retransmitted and gets to another worker, which tries
to forward it again, but discovers a connection structure for that destination exists (created by previous reply worker) and now waits for the connection to be released (or better said, for the mutex on writing buffer to be unlocked)
- as the second reply waits, there can be other retransmissions of the
reply ending up in other workers stuck on waiting for the mutex of the connection write buffer
The solution here is to use set_reply_no_connect() -- you can put it first in request_route block. I think this would be a good addition to the default configuration file as well, IMO, the sip server should not connect for sending replies and should do it also for requests that go behind nat.
Cheers, Daniel
On 19.03.19 10:53, Kristijan Vrban wrote:
So i had again the situation. But this time, incoming udp was affected. Kamailio was sending out OPTIONS (via dispatcher module) to a group of asterisk machines but the 200 OK reply to the OPTIONS where not processed, so the dispatcher module set all asterisk to inactive, even though they replied 200 OK
Attached the output of kamctl trap during the situation. Hope there is any useful in it. Because after "kamctl trap" it was working again without kamailio restart.
Best Kristijan
Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Hello,
setting tcp_children=1 is not a god option for scallability, practically you set kamailio to process a single tcp message at one time, on high traffic, that won't work well.
Maybe try to set tcp_children to 2 or 4, that should make an eventual race appear faster.
Regarding the pid, if it is an outgoing connection, then it can be created by any worker process, including a UDP worker, if that was the one receiving the sip message over udp and sends it out via tcp.
Cheers, Daniel
On 18.03.19 10:09, Kristijan Vrban wrote:
Hi Daniel,
for testing, i now had set: "tcp_children=1" and so far this issue did not occur ever since. So now value to provide for "kamctl trap" yet.
"kamctl ps" show this two process to handle tcp:
... }, { "IDX": 25, "PID": 71929, "DSC": "tcp receiver (generic) child=0" }, { "IDX": 26, "PID": 71933, "DSC": "tcp main process" } ...
Ok, but then is was wondering to see a TCP connection on a udp receiver child:
netstat -ntp |grep 5061
... tcp 0 0 172.17.217.10:5061 195.70.114.125:18252 ESTABLISHED 71895/kamailio ...
An pid 71895 is:
}, { "IDX": 3, "PID": 71895, "DSC": "udp receiver child=2 sock=127.0.0.1:5060" }, {
And if i look into it via "lsof -p 71895" (the udp receiver child)
... kamailio 71895 kamailio 14u sock 0,9 0t0 8856085 protocol: TCP kamailio 71895 kamailio 15u sock 0,9 0t0 8886886 protocol: TCP kamailio 71895 kamailio 16u sock 0,9 0t0 8854886 protocol: TCP kamailio 71895 kamailio 17u sock 0,9 0t0 8828915 protocol: TCP kamailio 71895 kamailio 18u unix 0x000000005f73cb91 0t0 1680314 type=DGRAM kamailio 71895 kamailio 19u IPv4 1846523 0t0 TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED) kamailio 71895 kamailio 20u sock 0,9 0t0 8887192 protocol: TCP kamailio 71895 kamailio 21u sock 0,9 0t0 8813634 protocol: TCP kamailio 71895 kamailio 22u unix 0x00000000c19bd102 0t0 1681407 type=STREAM kamailio 71895 kamailio 23u sock 0,9 0t0 8850488 protocol: TCP ...
Not only the ESTABLISHED TCP session. But also this empty sockets "protocol: TCP" What are they doing there in the udp receiver? Is that how it's supposed to be?
Kristijan
Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Can you get file written by `kamctl trap`? It should have the backtrace for all kamailio processes. You need latest kamailio 5.2.
Also, get the output for: kamctl ps
Cheers, Daniel
On 14.03.19 13:52, Kristijan Vrban wrote: > When i attach via gdb to one of the tcp worker, i see this: > > (gdb) bt > #0 0x00007fdaf4d14470 in futex_wait (private=<optimized out>, > expected=1, futex_word=0x7fdaeca92f8c) at > ../sysdeps/unix/sysv/linux/futex-internal.h:61 > #1 futex_wait_simple (private=<optimized out>, expected=1, > futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135 > #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at > pthread_rwlock_wrlock.c:67 > #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from > /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 > #4 0x00007fdaf08e1c08 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 > #5 0x00007fdaf08a6f69 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 > #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from > /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 > #7 0x00007fdaf0c31144 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 > #8 0x00007fdaf0c2bddb in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 > #9 0x00007fdaf0c22858 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 > #10 0x00007fdaf0c1af61 in SSL_do_handshake () from > /usr/lib/x86_64-linux-gnu/libssl.so.1.1 > #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, > error=0x7ffffe2a2df0) at tls_server.c:422 > #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, > flags=0x7ffffe2c318c) at tls_server.c:1116 > #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98, > read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 > #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, > bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at > core/tcp_read.c:1496 > #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1, > idx=-1) at core/tcp_read.c:1862 > #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0 <io_w>, > t=2, repeat=0) at core/io_wait.h:1065 > #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at > core/tcp_read.c:1974 > #18 0x0000556ead4c8e24 in tcp_init_children () at core/tcp_main.c:4853 > #19 0x0000556ead3c352a in main_loop () at main.c:1735 > #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at main.c:2675 > > > > > > > > Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban > vrban.lkml@gmail.com: >> Hi, with full debug is see this in log for every incoming TCP SIP request: >> >> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: >> <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp >> receiver, connection passed to the least busy one (105) >> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: >> <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker 2 >> 27(17937) for activity on [tls:172.17.217.10:5061], 0x7fdaeda8f928 >> >> So the Kamailio TCP process is working, and received TCP traffic. But >> the tcp workers are somehow busy. >> >> When i attach via strace to the TCP worker, i do not see any activity. Just: >> >> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL >> >> and nothing, even when i see the main tcp process choose this worker process. >> >> Kristijan >> >> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban >> vrban.lkml@gmail.com: >>> first of all thanks for the feedback. i prepared our system now to run >>> with debug=3 >>> I hope to see more then then. >>> >>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban >>> vrban.lkml@gmail.com: >>>> Hi kamailios, >>>> >>>> i have a creepy situation with v5.2.1 stable Kamilio. After a day or >>>> so, Kamailio stop to process incoming SIP traffic via TCP. The >>>> incoming TCP network packages get TCP-ACK from the OS (Debian 9, >>>> 4.18.0-15-generic-Linux) but Kamailio does not show any processing for >>>> the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via >>>> UDP is working just totally fine. >>>> >>>> When i look via command "netstat -ntp" is see, that the Recv-Q get >>>> bigger and bigger. e.g.: >>>> >>>> Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program >>>> name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 ESTABLISHED >>>> 31347/kamailio >>>> >>>> After Kamailio restart, all is working fine again for a day. We have >>>> maybe 10-20 devices online via TCP and low call volume (1-2 call per >>>> minute). The only settings for tcp we have is "tcp_delayed_ack=no" >>>> >>>> How to could we debug this situation? Again, no error, no warings in >>>> the log. Just nothing. >>>> >>>> Kristijan > _______________________________________________ > Kamailio (SER) - Users Mailing List > sr-users@lists.kamailio.org
> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
I looked similar examples when 1) used perl module + perl app in kamailio config; 2) used http_client module and upstream http server return error message with size about 64Kb.
you can check your config for external server calls. Think this may be related. Sergey
пн, 25 мар. 2019 г. в 16:28, Daniel-Constantin Mierla miconda@gmail.com:
Have you done a test with tools such as sipp, or was this happening after a while, with usual phones registering?
Can you list the tcp connections and see if they are listed?
kamctl tcp core.tcp_list
Cheers, Daniel
On 25.03.19 08:03, Kristijan Vrban wrote:
The solution here is to use set_reply_no_connect()
implemented it. Now the issue has shifted to:
ERROR: <core> [core/tcp_main.c:3959]: handle_new_connect(): maximum number of connections exceeded: 2048/2048
But not a single TCP connection is active between Kamailio and any device. Seems this counter for maximum number of connections now has an issue?
Kristijan
Am Mi., 20. März 2019 um 15:07 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Hello,
based on the trap output I think I could figure out what happened there.
You have tcp_children to very low value (1 or so), the problem is not actually that one, but the fact that the connection to upstream (the device/app sending the request) was closed after receiving the request and routing of the reply gets stuck in the way of:
- a reply is received and has to be forwarded
- connection was lost, so Kamailio tries to establish a new one, but
takes time till fails because the upstream is behind nat or so based on the via header:
Via: SIP/2.0/TLS 10.1.0.4:10002
;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
- the reply is retransmitted and gets to another worker, which tries
to forward it again, but discovers a connection structure for that destination exists (created by previous reply worker) and now waits for the connection to be released (or better said, for the mutex on writing buffer to be unlocked)
- as the second reply waits, there can be other retransmissions of the
reply ending up in other workers stuck on waiting for the mutex of the connection write buffer
The solution here is to use set_reply_no_connect() -- you can put it first in request_route block. I think this would be a good addition to the default configuration file as well, IMO, the sip server should not connect for sending replies and should do it also for requests that go behind nat.
Cheers, Daniel
On 19.03.19 10:53, Kristijan Vrban wrote:
So i had again the situation. But this time, incoming udp was affected. Kamailio was sending out OPTIONS (via dispatcher module) to a group of asterisk machines but the 200 OK reply to the OPTIONS where not processed, so the dispatcher module set all asterisk to inactive, even though they replied 200 OK
Attached the output of kamctl trap during the situation. Hope there is any useful in it. Because after "kamctl trap" it was working again without kamailio restart.
Best Kristijan
Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Hello,
setting tcp_children=1 is not a god option for scallability,
practically
you set kamailio to process a single tcp message at one time, on high traffic, that won't work well.
Maybe try to set tcp_children to 2 or 4, that should make an eventual race appear faster.
Regarding the pid, if it is an outgoing connection, then it can be created by any worker process, including a UDP worker, if that was the one receiving the sip message over udp and sends it out via tcp.
Cheers, Daniel
On 18.03.19 10:09, Kristijan Vrban wrote:
Hi Daniel,
for testing, i now had set: "tcp_children=1" and so far this issue
did not occur
ever since. So now value to provide for "kamctl trap" yet.
"kamctl ps" show this two process to handle tcp:
... }, { "IDX": 25, "PID": 71929, "DSC": "tcp receiver (generic) child=0" }, { "IDX": 26, "PID": 71933, "DSC": "tcp main process" } ...
Ok, but then is was wondering to see a TCP connection on a udp
receiver child:
netstat -ntp |grep 5061
... tcp 0 0 172.17.217.10:5061 195.70.114.125:18252 ESTABLISHED 71895/kamailio ...
An pid 71895 is:
}, { "IDX": 3, "PID": 71895, "DSC": "udp receiver child=2 sock=127.0.0.1:5060" }, {
And if i look into it via "lsof -p 71895" (the udp receiver child)
... kamailio 71895 kamailio 14u sock 0,9 0t0 8856085 protocol: TCP kamailio 71895 kamailio 15u sock 0,9 0t0 8886886 protocol: TCP kamailio 71895 kamailio 16u sock 0,9 0t0 8854886 protocol: TCP kamailio 71895 kamailio 17u sock 0,9 0t0 8828915 protocol: TCP kamailio 71895 kamailio 18u unix 0x000000005f73cb91 0t0 1680314 type=DGRAM kamailio 71895 kamailio 19u IPv4 1846523 0t0 TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED) kamailio 71895 kamailio 20u sock 0,9 0t0 8887192 protocol: TCP kamailio 71895 kamailio 21u sock 0,9 0t0 8813634 protocol: TCP kamailio 71895 kamailio 22u unix 0x00000000c19bd102 0t0 1681407 type=STREAM kamailio 71895 kamailio 23u sock 0,9 0t0 8850488 protocol: TCP ...
Not only the ESTABLISHED TCP session. But also this empty sockets "protocol: TCP" What are they doing there in the udp receiver? Is that how it's
supposed to be?
Kristijan
Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com: > Can you get file written by `kamctl trap`? It should have the
backtrace
> for all kamailio processes. You need latest kamailio 5.2. > > Also, get the output for: kamctl ps > > Cheers, > Daniel > > On 14.03.19 13:52, Kristijan Vrban wrote: >> When i attach via gdb to one of the tcp worker, i see this: >> >> (gdb) bt >> #0 0x00007fdaf4d14470 in futex_wait (private=<optimized out>, >> expected=1, futex_word=0x7fdaeca92f8c) at >> ../sysdeps/unix/sysv/linux/futex-internal.h:61 >> #1 futex_wait_simple (private=<optimized out>, expected=1, >> futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135 >> #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at >> pthread_rwlock_wrlock.c:67 >> #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from >> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >> #4 0x00007fdaf08e1c08 in ?? () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>> #5 0x00007fdaf08a6f69 in ?? () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>> #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from >> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >> #7 0x00007fdaf0c31144 in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>> #8 0x00007fdaf0c2bddb in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>> #9 0x00007fdaf0c22858 in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>> #10 0x00007fdaf0c1af61 in SSL_do_handshake () from >> /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >> #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, >> error=0x7ffffe2a2df0) at tls_server.c:422 >> #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, >> flags=0x7ffffe2c318c) at tls_server.c:1116 >> #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98, >> read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 >> #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, >> bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at >> core/tcp_read.c:1496 >> #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1, >> idx=-1) at core/tcp_read.c:1862 >> #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0
<io_w>,
>> t=2, repeat=0) at core/io_wait.h:1065 >> #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at >> core/tcp_read.c:1974 >> #18 0x0000556ead4c8e24 in tcp_init_children () at
core/tcp_main.c:4853
>> #19 0x0000556ead3c352a in main_loop () at main.c:1735 >> #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at
main.c:2675
>> >> >> >> >> >> >> >> Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban >> vrban.lkml@gmail.com: >>> Hi, with full debug is see this in log for every incoming TCP SIP
request:
>>> >>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: >>> <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp >>> receiver, connection passed to the least busy one (105) >>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: >>> <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker 2 >>> 27(17937) for activity on [tls:172.17.217.10:5061],
0x7fdaeda8f928
>>> >>> So the Kamailio TCP process is working, and received TCP traffic.
But
>>> the tcp workers are somehow busy. >>> >>> When i attach via strace to the TCP worker, i do not see any
activity. Just:
>>> >>> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL >>> >>> and nothing, even when i see the main tcp process choose this
worker process.
>>> >>> Kristijan >>> >>> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban >>> vrban.lkml@gmail.com: >>>> first of all thanks for the feedback. i prepared our system now
to run
>>>> with debug=3 >>>> I hope to see more then then. >>>> >>>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban >>>> vrban.lkml@gmail.com: >>>>> Hi kamailios, >>>>> >>>>> i have a creepy situation with v5.2.1 stable Kamilio. After a
day or
>>>>> so, Kamailio stop to process incoming SIP traffic via TCP. The >>>>> incoming TCP network packages get TCP-ACK from the OS (Debian 9, >>>>> 4.18.0-15-generic-Linux) but Kamailio does not show any
processing for
>>>>> the SIP-Traffic incoming via TCP. No logs, nothing. While
traffic via
>>>>> UDP is working just totally fine. >>>>> >>>>> When i look via command "netstat -ntp" is see, that the Recv-Q
get
>>>>> bigger and bigger. e.g.: >>>>> >>>>> Proto Recv-Q Send-Q Local Address Foreign Address State
PID/Program
>>>>> name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252
ESTABLISHED
>>>>> 31347/kamailio >>>>> >>>>> After Kamailio restart, all is working fine again for a day. We
have
>>>>> maybe 10-20 devices online via TCP and low call volume (1-2
call per
>>>>> minute). The only settings for tcp we have is
"tcp_delayed_ack=no"
>>>>> >>>>> How to could we debug this situation? Again, no error, no
warings in
>>>>> the log. Just nothing. >>>>> >>>>> Kristijan >> _______________________________________________ >> Kamailio (SER) - Users Mailing List >> sr-users@lists.kamailio.org >> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users > -- > Daniel-Constantin Mierla -- www.asipto.com > www.twitter.com/miconda -- www.linkedin.com/in/miconda > Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com > Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC,
USA -- www.asipto.com
> _______________________________________________ Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA
-- www.asipto.com
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA --
www.asipto.com
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Have you done a test with tools such as sipp, or was this happening after a while, with usual phones registering?
Usual variety of devices registering via TLS. But i can not exclude that some devices displaying behavioural problems.
Can you list the tcp connections and see if they are listed? kamctl tcp core.tcp_list
Need Kex module for that? So i can deliver next time. But when i do "lsof -u kamailio |grep TCP" i get a long list of more then 2000 lines with:
... kamailio 37561 kamailio 2105u sock 0,9 0t0 27856287 protocol: TCP kamailio 37561 kamailio 2106u sock 0,9 0t0 27856305 protocol: TCP kamailio 37561 kamailio 2107u sock 0,9 0t0 27856306 protocol: TCP kamailio 37561 kamailio 2108u sock 0,9 0t0 27856914 protocol: TCP ...
So about the time Kamailio created a lot of socket in the TCP domain, but which are not bound to any port (eg via connect(2) or listen(2) or bind(2)) Until we get to the maximum number of 2048 connections.
Best Kristijan
Am Mo., 25. März 2019 um 14:27 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Have you done a test with tools such as sipp, or was this happening after a while, with usual phones registering?
Can you list the tcp connections and see if they are listed?
kamctl tcp core.tcp_list
Cheers, Daniel
On 25.03.19 08:03, Kristijan Vrban wrote:
The solution here is to use set_reply_no_connect()
implemented it. Now the issue has shifted to:
ERROR: <core> [core/tcp_main.c:3959]: handle_new_connect(): maximum number of connections exceeded: 2048/2048
But not a single TCP connection is active between Kamailio and any device. Seems this counter for maximum number of connections now has an issue?
Kristijan
Am Mi., 20. März 2019 um 15:07 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Hello,
based on the trap output I think I could figure out what happened there.
You have tcp_children to very low value (1 or so), the problem is not actually that one, but the fact that the connection to upstream (the device/app sending the request) was closed after receiving the request and routing of the reply gets stuck in the way of:
- a reply is received and has to be forwarded
- connection was lost, so Kamailio tries to establish a new one, but
takes time till fails because the upstream is behind nat or so based on the via header:
Via: SIP/2.0/TLS 10.1.0.4:10002;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
- the reply is retransmitted and gets to another worker, which tries
to forward it again, but discovers a connection structure for that destination exists (created by previous reply worker) and now waits for the connection to be released (or better said, for the mutex on writing buffer to be unlocked)
- as the second reply waits, there can be other retransmissions of the
reply ending up in other workers stuck on waiting for the mutex of the connection write buffer
The solution here is to use set_reply_no_connect() -- you can put it first in request_route block. I think this would be a good addition to the default configuration file as well, IMO, the sip server should not connect for sending replies and should do it also for requests that go behind nat.
Cheers, Daniel
On 19.03.19 10:53, Kristijan Vrban wrote:
So i had again the situation. But this time, incoming udp was affected. Kamailio was sending out OPTIONS (via dispatcher module) to a group of asterisk machines but the 200 OK reply to the OPTIONS where not processed, so the dispatcher module set all asterisk to inactive, even though they replied 200 OK
Attached the output of kamctl trap during the situation. Hope there is any useful in it. Because after "kamctl trap" it was working again without kamailio restart.
Best Kristijan
Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Hello,
setting tcp_children=1 is not a god option for scallability, practically you set kamailio to process a single tcp message at one time, on high traffic, that won't work well.
Maybe try to set tcp_children to 2 or 4, that should make an eventual race appear faster.
Regarding the pid, if it is an outgoing connection, then it can be created by any worker process, including a UDP worker, if that was the one receiving the sip message over udp and sends it out via tcp.
Cheers, Daniel
On 18.03.19 10:09, Kristijan Vrban wrote:
Hi Daniel,
for testing, i now had set: "tcp_children=1" and so far this issue did not occur ever since. So now value to provide for "kamctl trap" yet.
"kamctl ps" show this two process to handle tcp:
... }, { "IDX": 25, "PID": 71929, "DSC": "tcp receiver (generic) child=0" }, { "IDX": 26, "PID": 71933, "DSC": "tcp main process" } ...
Ok, but then is was wondering to see a TCP connection on a udp receiver child:
netstat -ntp |grep 5061
... tcp 0 0 172.17.217.10:5061 195.70.114.125:18252 ESTABLISHED 71895/kamailio ...
An pid 71895 is:
}, { "IDX": 3, "PID": 71895, "DSC": "udp receiver child=2 sock=127.0.0.1:5060" }, {
And if i look into it via "lsof -p 71895" (the udp receiver child)
... kamailio 71895 kamailio 14u sock 0,9 0t0 8856085 protocol: TCP kamailio 71895 kamailio 15u sock 0,9 0t0 8886886 protocol: TCP kamailio 71895 kamailio 16u sock 0,9 0t0 8854886 protocol: TCP kamailio 71895 kamailio 17u sock 0,9 0t0 8828915 protocol: TCP kamailio 71895 kamailio 18u unix 0x000000005f73cb91 0t0 1680314 type=DGRAM kamailio 71895 kamailio 19u IPv4 1846523 0t0 TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED) kamailio 71895 kamailio 20u sock 0,9 0t0 8887192 protocol: TCP kamailio 71895 kamailio 21u sock 0,9 0t0 8813634 protocol: TCP kamailio 71895 kamailio 22u unix 0x00000000c19bd102 0t0 1681407 type=STREAM kamailio 71895 kamailio 23u sock 0,9 0t0 8850488 protocol: TCP ...
Not only the ESTABLISHED TCP session. But also this empty sockets "protocol: TCP" What are they doing there in the udp receiver? Is that how it's supposed to be?
Kristijan
Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com: > Can you get file written by `kamctl trap`? It should have the backtrace > for all kamailio processes. You need latest kamailio 5.2. > > Also, get the output for: kamctl ps > > Cheers, > Daniel > > On 14.03.19 13:52, Kristijan Vrban wrote: >> When i attach via gdb to one of the tcp worker, i see this: >> >> (gdb) bt >> #0 0x00007fdaf4d14470 in futex_wait (private=<optimized out>, >> expected=1, futex_word=0x7fdaeca92f8c) at >> ../sysdeps/unix/sysv/linux/futex-internal.h:61 >> #1 futex_wait_simple (private=<optimized out>, expected=1, >> futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135 >> #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at >> pthread_rwlock_wrlock.c:67 >> #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from >> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >> #4 0x00007fdaf08e1c08 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >> #5 0x00007fdaf08a6f69 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >> #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from >> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >> #7 0x00007fdaf0c31144 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >> #8 0x00007fdaf0c2bddb in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >> #9 0x00007fdaf0c22858 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >> #10 0x00007fdaf0c1af61 in SSL_do_handshake () from >> /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >> #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, >> error=0x7ffffe2a2df0) at tls_server.c:422 >> #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, >> flags=0x7ffffe2c318c) at tls_server.c:1116 >> #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98, >> read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 >> #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, >> bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at >> core/tcp_read.c:1496 >> #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1, >> idx=-1) at core/tcp_read.c:1862 >> #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0 <io_w>, >> t=2, repeat=0) at core/io_wait.h:1065 >> #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at >> core/tcp_read.c:1974 >> #18 0x0000556ead4c8e24 in tcp_init_children () at core/tcp_main.c:4853 >> #19 0x0000556ead3c352a in main_loop () at main.c:1735 >> #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at main.c:2675 >> >> >> >> >> >> >> >> Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban >> vrban.lkml@gmail.com: >>> Hi, with full debug is see this in log for every incoming TCP SIP request: >>> >>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: >>> <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp >>> receiver, connection passed to the least busy one (105) >>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: >>> <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker 2 >>> 27(17937) for activity on [tls:172.17.217.10:5061], 0x7fdaeda8f928 >>> >>> So the Kamailio TCP process is working, and received TCP traffic. But >>> the tcp workers are somehow busy. >>> >>> When i attach via strace to the TCP worker, i do not see any activity. Just: >>> >>> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL >>> >>> and nothing, even when i see the main tcp process choose this worker process. >>> >>> Kristijan >>> >>> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban >>> vrban.lkml@gmail.com: >>>> first of all thanks for the feedback. i prepared our system now to run >>>> with debug=3 >>>> I hope to see more then then. >>>> >>>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban >>>> vrban.lkml@gmail.com: >>>>> Hi kamailios, >>>>> >>>>> i have a creepy situation with v5.2.1 stable Kamilio. After a day or >>>>> so, Kamailio stop to process incoming SIP traffic via TCP. The >>>>> incoming TCP network packages get TCP-ACK from the OS (Debian 9, >>>>> 4.18.0-15-generic-Linux) but Kamailio does not show any processing for >>>>> the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via >>>>> UDP is working just totally fine. >>>>> >>>>> When i look via command "netstat -ntp" is see, that the Recv-Q get >>>>> bigger and bigger. e.g.: >>>>> >>>>> Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program >>>>> name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 ESTABLISHED >>>>> 31347/kamailio >>>>> >>>>> After Kamailio restart, all is working fine again for a day. We have >>>>> maybe 10-20 devices online via TCP and low call volume (1-2 call per >>>>> minute). The only settings for tcp we have is "tcp_delayed_ack=no" >>>>> >>>>> How to could we debug this situation? Again, no error, no warings in >>>>> the log. Just nothing. >>>>> >>>>> Kristijan >> _______________________________________________ >> Kamailio (SER) - Users Mailing List >> sr-users@lists.kamailio.org >> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users > -- > Daniel-Constantin Mierla -- www.asipto.com > www.twitter.com/miconda -- www.linkedin.com/in/miconda > Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com > Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com > _______________________________________________ Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
Attached also the output of kamctl trap
Am Di., 26. März 2019 um 08:42 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
Have you done a test with tools such as sipp, or was this happening after a while, with usual phones registering?
Usual variety of devices registering via TLS. But i can not exclude that some devices displaying behavioural problems.
Can you list the tcp connections and see if they are listed? kamctl tcp core.tcp_list
Need Kex module for that? So i can deliver next time. But when i do "lsof -u kamailio |grep TCP" i get a long list of more then 2000 lines with:
... kamailio 37561 kamailio 2105u sock 0,9 0t0 27856287 protocol: TCP kamailio 37561 kamailio 2106u sock 0,9 0t0 27856305 protocol: TCP kamailio 37561 kamailio 2107u sock 0,9 0t0 27856306 protocol: TCP kamailio 37561 kamailio 2108u sock 0,9 0t0 27856914 protocol: TCP ...
So about the time Kamailio created a lot of socket in the TCP domain, but which are not bound to any port (eg via connect(2) or listen(2) or bind(2)) Until we get to the maximum number of 2048 connections.
Best Kristijan
Am Mo., 25. März 2019 um 14:27 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Have you done a test with tools such as sipp, or was this happening after a while, with usual phones registering?
Can you list the tcp connections and see if they are listed?
kamctl tcp core.tcp_list
Cheers, Daniel
On 25.03.19 08:03, Kristijan Vrban wrote:
The solution here is to use set_reply_no_connect()
implemented it. Now the issue has shifted to:
ERROR: <core> [core/tcp_main.c:3959]: handle_new_connect(): maximum number of connections exceeded: 2048/2048
But not a single TCP connection is active between Kamailio and any device. Seems this counter for maximum number of connections now has an issue?
Kristijan
Am Mi., 20. März 2019 um 15:07 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Hello,
based on the trap output I think I could figure out what happened there.
You have tcp_children to very low value (1 or so), the problem is not actually that one, but the fact that the connection to upstream (the device/app sending the request) was closed after receiving the request and routing of the reply gets stuck in the way of:
- a reply is received and has to be forwarded
- connection was lost, so Kamailio tries to establish a new one, but
takes time till fails because the upstream is behind nat or so based on the via header:
Via: SIP/2.0/TLS 10.1.0.4:10002;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
- the reply is retransmitted and gets to another worker, which tries
to forward it again, but discovers a connection structure for that destination exists (created by previous reply worker) and now waits for the connection to be released (or better said, for the mutex on writing buffer to be unlocked)
- as the second reply waits, there can be other retransmissions of the
reply ending up in other workers stuck on waiting for the mutex of the connection write buffer
The solution here is to use set_reply_no_connect() -- you can put it first in request_route block. I think this would be a good addition to the default configuration file as well, IMO, the sip server should not connect for sending replies and should do it also for requests that go behind nat.
Cheers, Daniel
On 19.03.19 10:53, Kristijan Vrban wrote:
So i had again the situation. But this time, incoming udp was affected. Kamailio was sending out OPTIONS (via dispatcher module) to a group of asterisk machines but the 200 OK reply to the OPTIONS where not processed, so the dispatcher module set all asterisk to inactive, even though they replied 200 OK
Attached the output of kamctl trap during the situation. Hope there is any useful in it. Because after "kamctl trap" it was working again without kamailio restart.
Best Kristijan
Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Hello,
setting tcp_children=1 is not a god option for scallability, practically you set kamailio to process a single tcp message at one time, on high traffic, that won't work well.
Maybe try to set tcp_children to 2 or 4, that should make an eventual race appear faster.
Regarding the pid, if it is an outgoing connection, then it can be created by any worker process, including a UDP worker, if that was the one receiving the sip message over udp and sends it out via tcp.
Cheers, Daniel
On 18.03.19 10:09, Kristijan Vrban wrote: > Hi Daniel, > > for testing, i now had set: "tcp_children=1" and so far this issue did not occur > ever since. So now value to provide for "kamctl trap" yet. > > "kamctl ps" show this two process to handle tcp: > > ... > }, { > "IDX": 25, > "PID": 71929, > "DSC": "tcp receiver (generic) child=0" > }, { > "IDX": 26, > "PID": 71933, > "DSC": "tcp main process" > } > ... > > > Ok, but then is was wondering to see a TCP connection on a udp receiver child: > > > netstat -ntp |grep 5061 > > ... > tcp 0 0 172.17.217.10:5061 195.70.114.125:18252 > ESTABLISHED 71895/kamailio > ... > > An pid 71895 is: > > }, { > "IDX": 3, > "PID": 71895, > "DSC": "udp receiver child=2 sock=127.0.0.1:5060" > }, { > > > > And if i look into it via "lsof -p 71895" (the udp receiver child) > > ... > kamailio 71895 kamailio 14u sock 0,9 0t0 > 8856085 protocol: TCP > kamailio 71895 kamailio 15u sock 0,9 0t0 > 8886886 protocol: TCP > kamailio 71895 kamailio 16u sock 0,9 0t0 > 8854886 protocol: TCP > kamailio 71895 kamailio 17u sock 0,9 0t0 > 8828915 protocol: TCP > kamailio 71895 kamailio 18u unix 0x000000005f73cb91 0t0 > 1680314 type=DGRAM > kamailio 71895 kamailio 19u IPv4 1846523 0t0 > TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED) > kamailio 71895 kamailio 20u sock 0,9 0t0 > 8887192 protocol: TCP > kamailio 71895 kamailio 21u sock 0,9 0t0 > 8813634 protocol: TCP > kamailio 71895 kamailio 22u unix 0x00000000c19bd102 0t0 > 1681407 type=STREAM > kamailio 71895 kamailio 23u sock 0,9 0t0 > 8850488 protocol: TCP > ... > > Not only the ESTABLISHED TCP session. But also this empty sockets > "protocol: TCP" > What are they doing there in the udp receiver? Is that how it's supposed to be? > > Kristijan > > Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla > miconda@gmail.com: >> Can you get file written by `kamctl trap`? It should have the backtrace >> for all kamailio processes. You need latest kamailio 5.2. >> >> Also, get the output for: kamctl ps >> >> Cheers, >> Daniel >> >> On 14.03.19 13:52, Kristijan Vrban wrote: >>> When i attach via gdb to one of the tcp worker, i see this: >>> >>> (gdb) bt >>> #0 0x00007fdaf4d14470 in futex_wait (private=<optimized out>, >>> expected=1, futex_word=0x7fdaeca92f8c) at >>> ../sysdeps/unix/sysv/linux/futex-internal.h:61 >>> #1 futex_wait_simple (private=<optimized out>, expected=1, >>> futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135 >>> #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at >>> pthread_rwlock_wrlock.c:67 >>> #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from >>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>> #4 0x00007fdaf08e1c08 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>> #5 0x00007fdaf08a6f69 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>> #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from >>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>> #7 0x00007fdaf0c31144 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>> #8 0x00007fdaf0c2bddb in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>> #9 0x00007fdaf0c22858 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>> #10 0x00007fdaf0c1af61 in SSL_do_handshake () from >>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>> #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, >>> error=0x7ffffe2a2df0) at tls_server.c:422 >>> #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, >>> flags=0x7ffffe2c318c) at tls_server.c:1116 >>> #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98, >>> read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 >>> #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, >>> bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at >>> core/tcp_read.c:1496 >>> #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1, >>> idx=-1) at core/tcp_read.c:1862 >>> #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0 <io_w>, >>> t=2, repeat=0) at core/io_wait.h:1065 >>> #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at >>> core/tcp_read.c:1974 >>> #18 0x0000556ead4c8e24 in tcp_init_children () at core/tcp_main.c:4853 >>> #19 0x0000556ead3c352a in main_loop () at main.c:1735 >>> #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at main.c:2675 >>> >>> >>> >>> >>> >>> >>> >>> Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban >>> vrban.lkml@gmail.com: >>>> Hi, with full debug is see this in log for every incoming TCP SIP request: >>>> >>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: >>>> <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp >>>> receiver, connection passed to the least busy one (105) >>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: >>>> <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker 2 >>>> 27(17937) for activity on [tls:172.17.217.10:5061], 0x7fdaeda8f928 >>>> >>>> So the Kamailio TCP process is working, and received TCP traffic. But >>>> the tcp workers are somehow busy. >>>> >>>> When i attach via strace to the TCP worker, i do not see any activity. Just: >>>> >>>> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL >>>> >>>> and nothing, even when i see the main tcp process choose this worker process. >>>> >>>> Kristijan >>>> >>>> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban >>>> vrban.lkml@gmail.com: >>>>> first of all thanks for the feedback. i prepared our system now to run >>>>> with debug=3 >>>>> I hope to see more then then. >>>>> >>>>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban >>>>> vrban.lkml@gmail.com: >>>>>> Hi kamailios, >>>>>> >>>>>> i have a creepy situation with v5.2.1 stable Kamilio. After a day or >>>>>> so, Kamailio stop to process incoming SIP traffic via TCP. The >>>>>> incoming TCP network packages get TCP-ACK from the OS (Debian 9, >>>>>> 4.18.0-15-generic-Linux) but Kamailio does not show any processing for >>>>>> the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via >>>>>> UDP is working just totally fine. >>>>>> >>>>>> When i look via command "netstat -ntp" is see, that the Recv-Q get >>>>>> bigger and bigger. e.g.: >>>>>> >>>>>> Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program >>>>>> name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 ESTABLISHED >>>>>> 31347/kamailio >>>>>> >>>>>> After Kamailio restart, all is working fine again for a day. We have >>>>>> maybe 10-20 devices online via TCP and low call volume (1-2 call per >>>>>> minute). The only settings for tcp we have is "tcp_delayed_ack=no" >>>>>> >>>>>> How to could we debug this situation? Again, no error, no warings in >>>>>> the log. Just nothing. >>>>>> >>>>>> Kristijan >>> _______________________________________________ >>> Kamailio (SER) - Users Mailing List >>> sr-users@lists.kamailio.org >>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users >> -- >> Daniel-Constantin Mierla -- www.asipto.com >> www.twitter.com/miconda -- www.linkedin.com/in/miconda >> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com >> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com >> > _______________________________________________ > Kamailio (SER) - Users Mailing List > sr-users@lists.kamailio.org
> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
And again one more kamctl trap file where
set_reply_no_connect was set.
Am Di., 26. März 2019 um 08:53 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
Attached also the output of kamctl trap
Am Di., 26. März 2019 um 08:42 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
Have you done a test with tools such as sipp, or was this happening after a while, with usual phones registering?
Usual variety of devices registering via TLS. But i can not exclude that some devices displaying behavioural problems.
Can you list the tcp connections and see if they are listed? kamctl tcp core.tcp_list
Need Kex module for that? So i can deliver next time. But when i do "lsof -u kamailio |grep TCP" i get a long list of more then 2000 lines with:
... kamailio 37561 kamailio 2105u sock 0,9 0t0 27856287 protocol: TCP kamailio 37561 kamailio 2106u sock 0,9 0t0 27856305 protocol: TCP kamailio 37561 kamailio 2107u sock 0,9 0t0 27856306 protocol: TCP kamailio 37561 kamailio 2108u sock 0,9 0t0 27856914 protocol: TCP ...
So about the time Kamailio created a lot of socket in the TCP domain, but which are not bound to any port (eg via connect(2) or listen(2) or bind(2)) Until we get to the maximum number of 2048 connections.
Best Kristijan
Am Mo., 25. März 2019 um 14:27 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Have you done a test with tools such as sipp, or was this happening after a while, with usual phones registering?
Can you list the tcp connections and see if they are listed?
kamctl tcp core.tcp_list
Cheers, Daniel
On 25.03.19 08:03, Kristijan Vrban wrote:
The solution here is to use set_reply_no_connect()
implemented it. Now the issue has shifted to:
ERROR: <core> [core/tcp_main.c:3959]: handle_new_connect(): maximum number of connections exceeded: 2048/2048
But not a single TCP connection is active between Kamailio and any device. Seems this counter for maximum number of connections now has an issue?
Kristijan
Am Mi., 20. März 2019 um 15:07 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Hello,
based on the trap output I think I could figure out what happened there.
You have tcp_children to very low value (1 or so), the problem is not actually that one, but the fact that the connection to upstream (the device/app sending the request) was closed after receiving the request and routing of the reply gets stuck in the way of:
- a reply is received and has to be forwarded
- connection was lost, so Kamailio tries to establish a new one, but
takes time till fails because the upstream is behind nat or so based on the via header:
Via: SIP/2.0/TLS 10.1.0.4:10002;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
- the reply is retransmitted and gets to another worker, which tries
to forward it again, but discovers a connection structure for that destination exists (created by previous reply worker) and now waits for the connection to be released (or better said, for the mutex on writing buffer to be unlocked)
- as the second reply waits, there can be other retransmissions of the
reply ending up in other workers stuck on waiting for the mutex of the connection write buffer
The solution here is to use set_reply_no_connect() -- you can put it first in request_route block. I think this would be a good addition to the default configuration file as well, IMO, the sip server should not connect for sending replies and should do it also for requests that go behind nat.
Cheers, Daniel
On 19.03.19 10:53, Kristijan Vrban wrote:
So i had again the situation. But this time, incoming udp was affected. Kamailio was sending out OPTIONS (via dispatcher module) to a group of asterisk machines but the 200 OK reply to the OPTIONS where not processed, so the dispatcher module set all asterisk to inactive, even though they replied 200 OK
Attached the output of kamctl trap during the situation. Hope there is any useful in it. Because after "kamctl trap" it was working again without kamailio restart.
Best Kristijan
Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com: > Hello, > > setting tcp_children=1 is not a god option for scallability, practically > you set kamailio to process a single tcp message at one time, on high > traffic, that won't work well. > > Maybe try to set tcp_children to 2 or 4, that should make an eventual > race appear faster. > > Regarding the pid, if it is an outgoing connection, then it can be > created by any worker process, including a UDP worker, if that was the > one receiving the sip message over udp and sends it out via tcp. > > Cheers, > Daniel > > On 18.03.19 10:09, Kristijan Vrban wrote: >> Hi Daniel, >> >> for testing, i now had set: "tcp_children=1" and so far this issue did not occur >> ever since. So now value to provide for "kamctl trap" yet. >> >> "kamctl ps" show this two process to handle tcp: >> >> ... >> }, { >> "IDX": 25, >> "PID": 71929, >> "DSC": "tcp receiver (generic) child=0" >> }, { >> "IDX": 26, >> "PID": 71933, >> "DSC": "tcp main process" >> } >> ... >> >> >> Ok, but then is was wondering to see a TCP connection on a udp receiver child: >> >> >> netstat -ntp |grep 5061 >> >> ... >> tcp 0 0 172.17.217.10:5061 195.70.114.125:18252 >> ESTABLISHED 71895/kamailio >> ... >> >> An pid 71895 is: >> >> }, { >> "IDX": 3, >> "PID": 71895, >> "DSC": "udp receiver child=2 sock=127.0.0.1:5060" >> }, { >> >> >> >> And if i look into it via "lsof -p 71895" (the udp receiver child) >> >> ... >> kamailio 71895 kamailio 14u sock 0,9 0t0 >> 8856085 protocol: TCP >> kamailio 71895 kamailio 15u sock 0,9 0t0 >> 8886886 protocol: TCP >> kamailio 71895 kamailio 16u sock 0,9 0t0 >> 8854886 protocol: TCP >> kamailio 71895 kamailio 17u sock 0,9 0t0 >> 8828915 protocol: TCP >> kamailio 71895 kamailio 18u unix 0x000000005f73cb91 0t0 >> 1680314 type=DGRAM >> kamailio 71895 kamailio 19u IPv4 1846523 0t0 >> TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED) >> kamailio 71895 kamailio 20u sock 0,9 0t0 >> 8887192 protocol: TCP >> kamailio 71895 kamailio 21u sock 0,9 0t0 >> 8813634 protocol: TCP >> kamailio 71895 kamailio 22u unix 0x00000000c19bd102 0t0 >> 1681407 type=STREAM >> kamailio 71895 kamailio 23u sock 0,9 0t0 >> 8850488 protocol: TCP >> ... >> >> Not only the ESTABLISHED TCP session. But also this empty sockets >> "protocol: TCP" >> What are they doing there in the udp receiver? Is that how it's supposed to be? >> >> Kristijan >> >> Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla >> miconda@gmail.com: >>> Can you get file written by `kamctl trap`? It should have the backtrace >>> for all kamailio processes. You need latest kamailio 5.2. >>> >>> Also, get the output for: kamctl ps >>> >>> Cheers, >>> Daniel >>> >>> On 14.03.19 13:52, Kristijan Vrban wrote: >>>> When i attach via gdb to one of the tcp worker, i see this: >>>> >>>> (gdb) bt >>>> #0 0x00007fdaf4d14470 in futex_wait (private=<optimized out>, >>>> expected=1, futex_word=0x7fdaeca92f8c) at >>>> ../sysdeps/unix/sysv/linux/futex-internal.h:61 >>>> #1 futex_wait_simple (private=<optimized out>, expected=1, >>>> futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135 >>>> #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at >>>> pthread_rwlock_wrlock.c:67 >>>> #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from >>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>>> #4 0x00007fdaf08e1c08 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>>> #5 0x00007fdaf08a6f69 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>>> #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from >>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>>> #7 0x00007fdaf0c31144 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>>> #8 0x00007fdaf0c2bddb in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>>> #9 0x00007fdaf0c22858 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>>> #10 0x00007fdaf0c1af61 in SSL_do_handshake () from >>>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>>> #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, >>>> error=0x7ffffe2a2df0) at tls_server.c:422 >>>> #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, >>>> flags=0x7ffffe2c318c) at tls_server.c:1116 >>>> #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98, >>>> read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 >>>> #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, >>>> bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at >>>> core/tcp_read.c:1496 >>>> #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1, >>>> idx=-1) at core/tcp_read.c:1862 >>>> #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0 <io_w>, >>>> t=2, repeat=0) at core/io_wait.h:1065 >>>> #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at >>>> core/tcp_read.c:1974 >>>> #18 0x0000556ead4c8e24 in tcp_init_children () at core/tcp_main.c:4853 >>>> #19 0x0000556ead3c352a in main_loop () at main.c:1735 >>>> #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at main.c:2675 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban >>>> vrban.lkml@gmail.com: >>>>> Hi, with full debug is see this in log for every incoming TCP SIP request: >>>>> >>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: >>>>> <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp >>>>> receiver, connection passed to the least busy one (105) >>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: >>>>> <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker 2 >>>>> 27(17937) for activity on [tls:172.17.217.10:5061], 0x7fdaeda8f928 >>>>> >>>>> So the Kamailio TCP process is working, and received TCP traffic. But >>>>> the tcp workers are somehow busy. >>>>> >>>>> When i attach via strace to the TCP worker, i do not see any activity. Just: >>>>> >>>>> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL >>>>> >>>>> and nothing, even when i see the main tcp process choose this worker process. >>>>> >>>>> Kristijan >>>>> >>>>> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban >>>>> vrban.lkml@gmail.com: >>>>>> first of all thanks for the feedback. i prepared our system now to run >>>>>> with debug=3 >>>>>> I hope to see more then then. >>>>>> >>>>>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban >>>>>> vrban.lkml@gmail.com: >>>>>>> Hi kamailios, >>>>>>> >>>>>>> i have a creepy situation with v5.2.1 stable Kamilio. After a day or >>>>>>> so, Kamailio stop to process incoming SIP traffic via TCP. The >>>>>>> incoming TCP network packages get TCP-ACK from the OS (Debian 9, >>>>>>> 4.18.0-15-generic-Linux) but Kamailio does not show any processing for >>>>>>> the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via >>>>>>> UDP is working just totally fine. >>>>>>> >>>>>>> When i look via command "netstat -ntp" is see, that the Recv-Q get >>>>>>> bigger and bigger. e.g.: >>>>>>> >>>>>>> Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program >>>>>>> name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 ESTABLISHED >>>>>>> 31347/kamailio >>>>>>> >>>>>>> After Kamailio restart, all is working fine again for a day. We have >>>>>>> maybe 10-20 devices online via TCP and low call volume (1-2 call per >>>>>>> minute). The only settings for tcp we have is "tcp_delayed_ack=no" >>>>>>> >>>>>>> How to could we debug this situation? Again, no error, no warings in >>>>>>> the log. Just nothing. >>>>>>> >>>>>>> Kristijan >>>> _______________________________________________ >>>> Kamailio (SER) - Users Mailing List >>>> sr-users@lists.kamailio.org >>>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users >>> -- >>> Daniel-Constantin Mierla -- www.asipto.com >>> www.twitter.com/miconda -- www.linkedin.com/in/miconda >>> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com >>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com >>> >> _______________________________________________ >> Kamailio (SER) - Users Mailing List >> sr-users@lists.kamailio.org >> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users > -- > Daniel-Constantin Mierla -- www.asipto.com > www.twitter.com/miconda -- www.linkedin.com/in/miconda > Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com > Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com >
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
Just curious, did you get to compile with OpenSSL 1.0 and test?
On Tue, Mar 26, 2019 at 06:12 Kristijan Vrban vrban.lkml@gmail.com wrote:
And again one more kamctl trap file where
set_reply_no_connect was set.
Am Di., 26. März 2019 um 08:53 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
Attached also the output of kamctl trap
Am Di., 26. März 2019 um 08:42 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
Have you done a test with tools such as sipp, or was this happening after a while, with usual phones registering?
Usual variety of devices registering via TLS. But i can not exclude that some devices displaying behavioural problems.
Can you list the tcp connections and see if they are listed? kamctl tcp core.tcp_list
Need Kex module for that? So i can deliver next time. But when i do "lsof -u kamailio |grep TCP" i get a long list of more then 2000 lines with:
... kamailio 37561 kamailio 2105u sock 0,9 0t0 27856287 protocol: TCP kamailio 37561 kamailio 2106u sock 0,9 0t0 27856305 protocol: TCP kamailio 37561 kamailio 2107u sock 0,9 0t0 27856306 protocol: TCP kamailio 37561 kamailio 2108u sock 0,9 0t0 27856914 protocol: TCP ...
So about the time Kamailio created a lot of socket in the TCP domain, but which are not bound to any port (eg via connect(2) or listen(2) or bind(2)) Until we get to the maximum number of 2048 connections.
Best Kristijan
Am Mo., 25. März 2019 um 14:27 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Have you done a test with tools such as sipp, or was this happening after a while, with usual phones registering?
Can you list the tcp connections and see if they are listed?
kamctl tcp core.tcp_list
Cheers, Daniel
On 25.03.19 08:03, Kristijan Vrban wrote:
The solution here is to use set_reply_no_connect()
implemented it. Now the issue has shifted to:
ERROR: <core> [core/tcp_main.c:3959]: handle_new_connect(): maximum number of connections exceeded: 2048/2048
But not a single TCP connection is active between Kamailio and any device. Seems this counter for maximum number of connections now has an issue?
Kristijan
Am Mi., 20. März 2019 um 15:07 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Hello,
based on the trap output I think I could figure out what happened
there.
You have tcp_children to very low value (1 or so), the problem is
not
actually that one, but the fact that the connection to upstream
(the
device/app sending the request) was closed after receiving the
request
and routing of the reply gets stuck in the way of:
- a reply is received and has to be forwarded
- connection was lost, so Kamailio tries to establish a new
one, but
takes time till fails because the upstream is behind nat or so
based on
the via header:
Via: SIP/2.0/TLS 10.1.0.4:10002
;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
- the reply is retransmitted and gets to another worker, which
tries
to forward it again, but discovers a connection structure for that destination exists (created by previous reply worker) and now
waits for
the connection to be released (or better said, for the mutex on
writing
buffer to be unlocked)
- as the second reply waits, there can be other retransmissions
of the
reply ending up in other workers stuck on waiting for the mutex
of the
connection write buffer
The solution here is to use set_reply_no_connect() -- you can put
it
first in request_route block. I think this would be a good
addition to
the default configuration file as well, IMO, the sip server
should not
connect for sending replies and should do it also for requests
that go
behind nat.
Cheers, Daniel
On 19.03.19 10:53, Kristijan Vrban wrote: > So i had again the situation. But this time, incoming udp was > affected. Kamailio was sending out OPTIONS (via dispatcher
module) to
> a group of asterisk machines > but the 200 OK reply to the OPTIONS where not processed, so the > dispatcher module set all asterisk to inactive, even though they > replied 200 OK > > Attached the output of kamctl trap during the situation. Hope
there is
> any useful in it. Because after "kamctl trap" it was working
again
> without kamailio restart. > > Best > Kristijan > > Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin
Mierla
> miconda@gmail.com: >> Hello, >> >> setting tcp_children=1 is not a god option for scallability,
practically
>> you set kamailio to process a single tcp message at one time,
on high
>> traffic, that won't work well. >> >> Maybe try to set tcp_children to 2 or 4, that should make an
eventual
>> race appear faster. >> >> Regarding the pid, if it is an outgoing connection, then it can
be
>> created by any worker process, including a UDP worker, if that
was the
>> one receiving the sip message over udp and sends it out via tcp. >> >> Cheers, >> Daniel >> >> On 18.03.19 10:09, Kristijan Vrban wrote: >>> Hi Daniel, >>> >>> for testing, i now had set: "tcp_children=1" and so far this
issue did not occur
>>> ever since. So now value to provide for "kamctl trap" yet. >>> >>> "kamctl ps" show this two process to handle tcp: >>> >>> ... >>> }, { >>> "IDX": 25, >>> "PID": 71929, >>> "DSC": "tcp receiver (generic) child=0" >>> }, { >>> "IDX": 26, >>> "PID": 71933, >>> "DSC": "tcp main process" >>> } >>> ... >>> >>> >>> Ok, but then is was wondering to see a TCP connection on a udp
receiver child:
>>> >>> >>> netstat -ntp |grep 5061 >>> >>> ... >>> tcp 0 0 172.17.217.10:5061
195.70.114.125:18252
>>> ESTABLISHED 71895/kamailio >>> ... >>> >>> An pid 71895 is: >>> >>> }, { >>> "IDX": 3, >>> "PID": 71895, >>> "DSC": "udp receiver child=2 sock=127.0.0.1:5060" >>> }, { >>> >>> >>> >>> And if i look into it via "lsof -p 71895" (the udp receiver
child)
>>> >>> ... >>> kamailio 71895 kamailio 14u sock 0,9 0t0 >>> 8856085 protocol: TCP >>> kamailio 71895 kamailio 15u sock 0,9 0t0 >>> 8886886 protocol: TCP >>> kamailio 71895 kamailio 16u sock 0,9 0t0 >>> 8854886 protocol: TCP >>> kamailio 71895 kamailio 17u sock 0,9 0t0 >>> 8828915 protocol: TCP >>> kamailio 71895 kamailio 18u unix 0x000000005f73cb91 0t0 >>> 1680314 type=DGRAM >>> kamailio 71895 kamailio 19u IPv4 1846523 0t0 >>> TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED) >>> kamailio 71895 kamailio 20u sock 0,9 0t0 >>> 8887192 protocol: TCP >>> kamailio 71895 kamailio 21u sock 0,9 0t0 >>> 8813634 protocol: TCP >>> kamailio 71895 kamailio 22u unix 0x00000000c19bd102 0t0 >>> 1681407 type=STREAM >>> kamailio 71895 kamailio 23u sock 0,9 0t0 >>> 8850488 protocol: TCP >>> ... >>> >>> Not only the ESTABLISHED TCP session. But also this empty
sockets
>>> "protocol: TCP" >>> What are they doing there in the udp receiver? Is that how
it's supposed to be?
>>> >>> Kristijan >>> >>> Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin
Mierla
>>> miconda@gmail.com: >>>> Can you get file written by `kamctl trap`? It should have the
backtrace
>>>> for all kamailio processes. You need latest kamailio 5.2. >>>> >>>> Also, get the output for: kamctl ps >>>> >>>> Cheers, >>>> Daniel >>>> >>>> On 14.03.19 13:52, Kristijan Vrban wrote: >>>>> When i attach via gdb to one of the tcp worker, i see this: >>>>> >>>>> (gdb) bt >>>>> #0 0x00007fdaf4d14470 in futex_wait (private=<optimized
out>,
>>>>> expected=1, futex_word=0x7fdaeca92f8c) at >>>>> ../sysdeps/unix/sysv/linux/futex-internal.h:61 >>>>> #1 futex_wait_simple (private=<optimized out>, expected=1, >>>>> futex_word=0x7fdaeca92f8c) at
../sysdeps/nptl/futex-internal.h:135
>>>>> #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at >>>>> pthread_rwlock_wrlock.c:67 >>>>> #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from >>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>>>> #4 0x00007fdaf08e1c08 in ?? () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>>>> #5 0x00007fdaf08a6f69 in ?? () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>>>> #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from >>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>>>> #7 0x00007fdaf0c31144 in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>>>> #8 0x00007fdaf0c2bddb in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>>>> #9 0x00007fdaf0c22858 in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>>>> #10 0x00007fdaf0c1af61 in SSL_do_handshake () from >>>>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>>>> #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, >>>>> error=0x7ffffe2a2df0) at tls_server.c:422 >>>>> #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, >>>>> flags=0x7ffffe2c318c) at tls_server.c:1116 >>>>> #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98, >>>>> read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 >>>>> #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, >>>>> bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at >>>>> core/tcp_read.c:1496 >>>>> #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98,
events=1,
>>>>> idx=-1) at core/tcp_read.c:1862 >>>>> #16 0x0000556ead5e2053 in io_wait_loop_epoll
(h=0x556eadaaeec0 <io_w>,
>>>>> t=2, repeat=0) at core/io_wait.h:1065 >>>>> #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at >>>>> core/tcp_read.c:1974 >>>>> #18 0x0000556ead4c8e24 in tcp_init_children () at
core/tcp_main.c:4853
>>>>> #19 0x0000556ead3c352a in main_loop () at main.c:1735 >>>>> #20 0x0000556ead3ca5f8 in main (argc=13,
argv=0x7ffffe2c3828) at main.c:2675
>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban >>>>> vrban.lkml@gmail.com: >>>>>> Hi, with full debug is see this in log for every incoming
TCP SIP request:
>>>>>> >>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]:
DEBUG:
>>>>>> <core> [core/tcp_main.c:3871]: send2child(): WARNING: no
free tcp
>>>>>> receiver, connection passed to the least busy one (105) >>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]:
DEBUG:
>>>>>> <core> [core/tcp_main.c:3875]: send2child(): selected tcp
worker 2
>>>>>> 27(17937) for activity on [tls:172.17.217.10:5061],
0x7fdaeda8f928
>>>>>> >>>>>> So the Kamailio TCP process is working, and received TCP
traffic. But
>>>>>> the tcp workers are somehow busy. >>>>>> >>>>>> When i attach via strace to the TCP worker, i do not see
any activity. Just:
>>>>>> >>>>>> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL >>>>>> >>>>>> and nothing, even when i see the main tcp process choose
this worker process.
>>>>>> >>>>>> Kristijan >>>>>> >>>>>> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban >>>>>> vrban.lkml@gmail.com: >>>>>>> first of all thanks for the feedback. i prepared our
system now to run
>>>>>>> with debug=3 >>>>>>> I hope to see more then then. >>>>>>> >>>>>>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban >>>>>>> vrban.lkml@gmail.com: >>>>>>>> Hi kamailios, >>>>>>>> >>>>>>>> i have a creepy situation with v5.2.1 stable Kamilio.
After a day or
>>>>>>>> so, Kamailio stop to process incoming SIP traffic via
TCP. The
>>>>>>>> incoming TCP network packages get TCP-ACK from the OS
(Debian 9,
>>>>>>>> 4.18.0-15-generic-Linux) but Kamailio does not show any
processing for
>>>>>>>> the SIP-Traffic incoming via TCP. No logs, nothing. While
traffic via
>>>>>>>> UDP is working just totally fine. >>>>>>>> >>>>>>>> When i look via command "netstat -ntp" is see, that the
Recv-Q get
>>>>>>>> bigger and bigger. e.g.: >>>>>>>> >>>>>>>> Proto Recv-Q Send-Q Local Address Foreign Address State
PID/Program
>>>>>>>> name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252
ESTABLISHED
>>>>>>>> 31347/kamailio >>>>>>>> >>>>>>>> After Kamailio restart, all is working fine again for a
day. We have
>>>>>>>> maybe 10-20 devices online via TCP and low call volume
(1-2 call per
>>>>>>>> minute). The only settings for tcp we have is
"tcp_delayed_ack=no"
>>>>>>>> >>>>>>>> How to could we debug this situation? Again, no error, no
warings in
>>>>>>>> the log. Just nothing. >>>>>>>> >>>>>>>> Kristijan >>>>> _______________________________________________ >>>>> Kamailio (SER) - Users Mailing List >>>>> sr-users@lists.kamailio.org >>>>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users >>>> -- >>>> Daniel-Constantin Mierla -- www.asipto.com >>>> www.twitter.com/miconda -- www.linkedin.com/in/miconda >>>> Kamailio World Conference - May 6-8, 2019 --
www.kamailioworld.com
>>>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington,
DC, USA -- www.asipto.com
>>>> >>> _______________________________________________ >>> Kamailio (SER) - Users Mailing List >>> sr-users@lists.kamailio.org >>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users >> -- >> Daniel-Constantin Mierla -- www.asipto.com >> www.twitter.com/miconda -- www.linkedin.com/in/miconda >> Kamailio World Conference - May 6-8, 2019 --
www.kamailioworld.com
>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington,
DC, USA -- www.asipto.com
>>
Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 --
www.kamailioworld.com
Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC,
USA -- www.asipto.com
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA
-- www.asipto.com
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Just curious, did you get to compile with OpenSSL 1.0 and test?
Just compiled with OpenSSL 1.0 . Gone test now.
Am Di., 26. März 2019 um 15:40 Uhr schrieb Joel Serrano joel@textplus.com:
Just curious, did you get to compile with OpenSSL 1.0 and test?
On Tue, Mar 26, 2019 at 06:12 Kristijan Vrban vrban.lkml@gmail.com wrote:
And again one more kamctl trap file where
set_reply_no_connect was set.
Am Di., 26. März 2019 um 08:53 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
Attached also the output of kamctl trap
Am Di., 26. März 2019 um 08:42 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
Have you done a test with tools such as sipp, or was this happening after a while, with usual phones registering?
Usual variety of devices registering via TLS. But i can not exclude that some devices displaying behavioural problems.
Can you list the tcp connections and see if they are listed? kamctl tcp core.tcp_list
Need Kex module for that? So i can deliver next time. But when i do "lsof -u kamailio |grep TCP" i get a long list of more then 2000 lines with:
... kamailio 37561 kamailio 2105u sock 0,9 0t0 27856287 protocol: TCP kamailio 37561 kamailio 2106u sock 0,9 0t0 27856305 protocol: TCP kamailio 37561 kamailio 2107u sock 0,9 0t0 27856306 protocol: TCP kamailio 37561 kamailio 2108u sock 0,9 0t0 27856914 protocol: TCP ...
So about the time Kamailio created a lot of socket in the TCP domain, but which are not bound to any port (eg via connect(2) or listen(2) or bind(2)) Until we get to the maximum number of 2048 connections.
Best Kristijan
Am Mo., 25. März 2019 um 14:27 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Have you done a test with tools such as sipp, or was this happening after a while, with usual phones registering?
Can you list the tcp connections and see if they are listed?
kamctl tcp core.tcp_list
Cheers, Daniel
On 25.03.19 08:03, Kristijan Vrban wrote:
> The solution here is to use set_reply_no_connect() implemented it. Now the issue has shifted to:
ERROR: <core> [core/tcp_main.c:3959]: handle_new_connect(): maximum number of connections exceeded: 2048/2048
But not a single TCP connection is active between Kamailio and any device. Seems this counter for maximum number of connections now has an issue?
Kristijan
Am Mi., 20. März 2019 um 15:07 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com: > Hello, > > based on the trap output I think I could figure out what happened there. > > You have tcp_children to very low value (1 or so), the problem is not > actually that one, but the fact that the connection to upstream (the > device/app sending the request) was closed after receiving the request > and routing of the reply gets stuck in the way of: > > - a reply is received and has to be forwarded > - connection was lost, so Kamailio tries to establish a new one, but > takes time till fails because the upstream is behind nat or so based on > the via header: > > Via: SIP/2.0/TLS > 10.1.0.4:10002;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9 > > - the reply is retransmitted and gets to another worker, which tries > to forward it again, but discovers a connection structure for that > destination exists (created by previous reply worker) and now waits for > the connection to be released (or better said, for the mutex on writing > buffer to be unlocked) > > - as the second reply waits, there can be other retransmissions of the > reply ending up in other workers stuck on waiting for the mutex of the > connection write buffer > > The solution here is to use set_reply_no_connect() -- you can put it > first in request_route block. I think this would be a good addition to > the default configuration file as well, IMO, the sip server should not > connect for sending replies and should do it also for requests that go > behind nat. > > Cheers, > Daniel > > On 19.03.19 10:53, Kristijan Vrban wrote: >> So i had again the situation. But this time, incoming udp was >> affected. Kamailio was sending out OPTIONS (via dispatcher module) to >> a group of asterisk machines >> but the 200 OK reply to the OPTIONS where not processed, so the >> dispatcher module set all asterisk to inactive, even though they >> replied 200 OK >> >> Attached the output of kamctl trap during the situation. Hope there is >> any useful in it. Because after "kamctl trap" it was working again >> without kamailio restart. >> >> Best >> Kristijan >> >> Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla >> miconda@gmail.com: >>> Hello, >>> >>> setting tcp_children=1 is not a god option for scallability, practically >>> you set kamailio to process a single tcp message at one time, on high >>> traffic, that won't work well. >>> >>> Maybe try to set tcp_children to 2 or 4, that should make an eventual >>> race appear faster. >>> >>> Regarding the pid, if it is an outgoing connection, then it can be >>> created by any worker process, including a UDP worker, if that was the >>> one receiving the sip message over udp and sends it out via tcp. >>> >>> Cheers, >>> Daniel >>> >>> On 18.03.19 10:09, Kristijan Vrban wrote: >>>> Hi Daniel, >>>> >>>> for testing, i now had set: "tcp_children=1" and so far this issue did not occur >>>> ever since. So now value to provide for "kamctl trap" yet. >>>> >>>> "kamctl ps" show this two process to handle tcp: >>>> >>>> ... >>>> }, { >>>> "IDX": 25, >>>> "PID": 71929, >>>> "DSC": "tcp receiver (generic) child=0" >>>> }, { >>>> "IDX": 26, >>>> "PID": 71933, >>>> "DSC": "tcp main process" >>>> } >>>> ... >>>> >>>> >>>> Ok, but then is was wondering to see a TCP connection on a udp receiver child: >>>> >>>> >>>> netstat -ntp |grep 5061 >>>> >>>> ... >>>> tcp 0 0 172.17.217.10:5061 195.70.114.125:18252 >>>> ESTABLISHED 71895/kamailio >>>> ... >>>> >>>> An pid 71895 is: >>>> >>>> }, { >>>> "IDX": 3, >>>> "PID": 71895, >>>> "DSC": "udp receiver child=2 sock=127.0.0.1:5060" >>>> }, { >>>> >>>> >>>> >>>> And if i look into it via "lsof -p 71895" (the udp receiver child) >>>> >>>> ... >>>> kamailio 71895 kamailio 14u sock 0,9 0t0 >>>> 8856085 protocol: TCP >>>> kamailio 71895 kamailio 15u sock 0,9 0t0 >>>> 8886886 protocol: TCP >>>> kamailio 71895 kamailio 16u sock 0,9 0t0 >>>> 8854886 protocol: TCP >>>> kamailio 71895 kamailio 17u sock 0,9 0t0 >>>> 8828915 protocol: TCP >>>> kamailio 71895 kamailio 18u unix 0x000000005f73cb91 0t0 >>>> 1680314 type=DGRAM >>>> kamailio 71895 kamailio 19u IPv4 1846523 0t0 >>>> TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED) >>>> kamailio 71895 kamailio 20u sock 0,9 0t0 >>>> 8887192 protocol: TCP >>>> kamailio 71895 kamailio 21u sock 0,9 0t0 >>>> 8813634 protocol: TCP >>>> kamailio 71895 kamailio 22u unix 0x00000000c19bd102 0t0 >>>> 1681407 type=STREAM >>>> kamailio 71895 kamailio 23u sock 0,9 0t0 >>>> 8850488 protocol: TCP >>>> ... >>>> >>>> Not only the ESTABLISHED TCP session. But also this empty sockets >>>> "protocol: TCP" >>>> What are they doing there in the udp receiver? Is that how it's supposed to be? >>>> >>>> Kristijan >>>> >>>> Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla >>>> miconda@gmail.com: >>>>> Can you get file written by `kamctl trap`? It should have the backtrace >>>>> for all kamailio processes. You need latest kamailio 5.2. >>>>> >>>>> Also, get the output for: kamctl ps >>>>> >>>>> Cheers, >>>>> Daniel >>>>> >>>>> On 14.03.19 13:52, Kristijan Vrban wrote: >>>>>> When i attach via gdb to one of the tcp worker, i see this: >>>>>> >>>>>> (gdb) bt >>>>>> #0 0x00007fdaf4d14470 in futex_wait (private=<optimized out>, >>>>>> expected=1, futex_word=0x7fdaeca92f8c) at >>>>>> ../sysdeps/unix/sysv/linux/futex-internal.h:61 >>>>>> #1 futex_wait_simple (private=<optimized out>, expected=1, >>>>>> futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135 >>>>>> #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at >>>>>> pthread_rwlock_wrlock.c:67 >>>>>> #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from >>>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>>>>> #4 0x00007fdaf08e1c08 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>>>>> #5 0x00007fdaf08a6f69 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>>>>> #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from >>>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>>>>> #7 0x00007fdaf0c31144 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>>>>> #8 0x00007fdaf0c2bddb in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>>>>> #9 0x00007fdaf0c22858 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>>>>> #10 0x00007fdaf0c1af61 in SSL_do_handshake () from >>>>>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>>>>> #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, >>>>>> error=0x7ffffe2a2df0) at tls_server.c:422 >>>>>> #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, >>>>>> flags=0x7ffffe2c318c) at tls_server.c:1116 >>>>>> #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98, >>>>>> read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 >>>>>> #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, >>>>>> bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at >>>>>> core/tcp_read.c:1496 >>>>>> #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1, >>>>>> idx=-1) at core/tcp_read.c:1862 >>>>>> #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0 <io_w>, >>>>>> t=2, repeat=0) at core/io_wait.h:1065 >>>>>> #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at >>>>>> core/tcp_read.c:1974 >>>>>> #18 0x0000556ead4c8e24 in tcp_init_children () at core/tcp_main.c:4853 >>>>>> #19 0x0000556ead3c352a in main_loop () at main.c:1735 >>>>>> #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at main.c:2675 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban >>>>>> vrban.lkml@gmail.com: >>>>>>> Hi, with full debug is see this in log for every incoming TCP SIP request: >>>>>>> >>>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: >>>>>>> <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp >>>>>>> receiver, connection passed to the least busy one (105) >>>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: >>>>>>> <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker 2 >>>>>>> 27(17937) for activity on [tls:172.17.217.10:5061], 0x7fdaeda8f928 >>>>>>> >>>>>>> So the Kamailio TCP process is working, and received TCP traffic. But >>>>>>> the tcp workers are somehow busy. >>>>>>> >>>>>>> When i attach via strace to the TCP worker, i do not see any activity. Just: >>>>>>> >>>>>>> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL >>>>>>> >>>>>>> and nothing, even when i see the main tcp process choose this worker process. >>>>>>> >>>>>>> Kristijan >>>>>>> >>>>>>> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban >>>>>>> vrban.lkml@gmail.com: >>>>>>>> first of all thanks for the feedback. i prepared our system now to run >>>>>>>> with debug=3 >>>>>>>> I hope to see more then then. >>>>>>>> >>>>>>>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban >>>>>>>> vrban.lkml@gmail.com: >>>>>>>>> Hi kamailios, >>>>>>>>> >>>>>>>>> i have a creepy situation with v5.2.1 stable Kamilio. After a day or >>>>>>>>> so, Kamailio stop to process incoming SIP traffic via TCP. The >>>>>>>>> incoming TCP network packages get TCP-ACK from the OS (Debian 9, >>>>>>>>> 4.18.0-15-generic-Linux) but Kamailio does not show any processing for >>>>>>>>> the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via >>>>>>>>> UDP is working just totally fine. >>>>>>>>> >>>>>>>>> When i look via command "netstat -ntp" is see, that the Recv-Q get >>>>>>>>> bigger and bigger. e.g.: >>>>>>>>> >>>>>>>>> Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program >>>>>>>>> name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 ESTABLISHED >>>>>>>>> 31347/kamailio >>>>>>>>> >>>>>>>>> After Kamailio restart, all is working fine again for a day. We have >>>>>>>>> maybe 10-20 devices online via TCP and low call volume (1-2 call per >>>>>>>>> minute). The only settings for tcp we have is "tcp_delayed_ack=no" >>>>>>>>> >>>>>>>>> How to could we debug this situation? Again, no error, no warings in >>>>>>>>> the log. Just nothing. >>>>>>>>> >>>>>>>>> Kristijan >>>>>> _______________________________________________ >>>>>> Kamailio (SER) - Users Mailing List >>>>>> sr-users@lists.kamailio.org >>>>>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users >>>>> -- >>>>> Daniel-Constantin Mierla -- www.asipto.com >>>>> www.twitter.com/miconda -- www.linkedin.com/in/miconda >>>>> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com >>>>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com >>>>> >>>> _______________________________________________ >>>> Kamailio (SER) - Users Mailing List >>>> sr-users@lists.kamailio.org >>>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users >>> -- >>> Daniel-Constantin Mierla -- www.asipto.com >>> www.twitter.com/miconda -- www.linkedin.com/in/miconda >>> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com >>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com >>> > -- > Daniel-Constantin Mierla -- www.asipto.com > www.twitter.com/miconda -- www.linkedin.com/in/miconda > Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com > Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com >
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
On 3/26/19 3:52 PM, Kristijan Vrban wrote:
Just curious, did you get to compile with OpenSSL 1.0 and test?
Just compiled with OpenSSL 1.0 . Gone test now.
Kristijan, any new occurrences since you have recompiled kamailio with openssl 1.0?
Regards, Andrew
Hi Andrew,
yes, with openssl 1.0.2 Kamailio is now up and running since five days. Looks good so far.
Kristijan
Am Do., 28. März 2019 um 11:09 Uhr schrieb Andrew Pogrebennyk apogrebennyk@sipwise.com:
On 3/26/19 3:52 PM, Kristijan Vrban wrote:
Just curious, did you get to compile with OpenSSL 1.0 and test?
Just compiled with OpenSSL 1.0 . Gone test now.
Kristijan, any new occurrences since you have recompiled kamailio with openssl 1.0?
Regards, Andrew
Hello,
an update on this issue -- I spent a bit of time looking at libssl/libcrypto library and the problem can be the type of mutexes they use now internally starting with v1.1, respectively the pthread mutex. They are not process shared and kamailio is a multi-process application, working with the same tls connection from multiple processes.
Today I wrote to openssl mailing list, waiting now to see if I get any hints from there.
Cheers, Daniel
On 01.04.19 10:33, Kristijan Vrban wrote:
Hi Andrew,
yes, with openssl 1.0.2 Kamailio is now up and running since five days. Looks good so far.
Kristijan
Am Do., 28. März 2019 um 11:09 Uhr schrieb Andrew Pogrebennyk apogrebennyk@sipwise.com:
On 3/26/19 3:52 PM, Kristijan Vrban wrote:
Just curious, did you get to compile with OpenSSL 1.0 and test?
Just compiled with OpenSSL 1.0 . Gone test now.
Kristijan, any new occurrences since you have recompiled kamailio with openssl 1.0?
Regards, Andrew
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Hi,
(X-posted to sr-dev as this is getting into the nitty gritty)
As a short-term workaround for this, I've been playing with the preloaded library approach to hijack the pthread mutex calls and force them to provide process-shared mutexes. AFAICT this seems to be working and only has the minuscule performance impact of using slower process-shared mutexes in all instances, even when they aren't required.
The code for the preloaded library itself is very short and simple: https://gist.github.com/rfuchs/1bb7348b6acbe37e557d94c2f69a1498
As a more complete patch that integrates it into the build system (probably badly): https://gist.github.com/rfuchs/b240ffe87938a45e6f2a4cf53fe29f17
Finally it requires adding it to the startup script, for example in a systemd service file as:
Environment='LD_PRELOAD=/usr/lib/x86_64-linux-gnu/kamailio/openssl_mutex_shared/openssl_mutex_shared.so'
(that's with a hard coded path which isn't optimal of course).
I don't consider this a proper fix, but only a hacky workaround, but it might be a solution for the very near future. Throwing it out there in case other people have been working on similar approaches, and/or maybe have some comments about this.
Cheers
On 01/04/2019 04.52, Daniel-Constantin Mierla wrote:
Hello,
an update on this issue -- I spent a bit of time looking at libssl/libcrypto library and the problem can be the type of mutexes they use now internally starting with v1.1, respectively the pthread mutex. They are not process shared and kamailio is a multi-process application, working with the same tls connection from multiple processes.
Today I wrote to openssl mailing list, waiting now to see if I get any hints from there.
Cheers, Daniel
On 01.04.19 10:33, Kristijan Vrban wrote:
Hi Andrew,
yes, with openssl 1.0.2 Kamailio is now up and running since five days. Looks good so far.
Kristijan
Am Do., 28. März 2019 um 11:09 Uhr schrieb Andrew Pogrebennyk apogrebennyk@sipwise.com:
On 3/26/19 3:52 PM, Kristijan Vrban wrote:
Just curious, did you get to compile with OpenSSL 1.0 and test?
Just compiled with OpenSSL 1.0 . Gone test now.
Kristijan, any new occurrences since you have recompiled kamailio with openssl 1.0?
Regards, Andrew
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Hello,
thanks for working on this and providing the workaround solution with preloaded symbols.
I would suggest that you push it to master, to make it easier to deploy and test. There is a change I would do: install the openssl_mutex_shared via Makefile from tls module, not from the main Makefile. The ctl module does the same for kamcmd, practically is about setting a var there -- ctl has:
MOD_INSTALL_UTILS=../../../utils/kamcmd
Another one (I can look into it as well if proves not to be straightforward): place this inside the tls module, like: src/modules/tls/utils/openssl_mutex_shared -- the Makefile of the module needs the path to the folder.
If we then get couple of people testing and validating the use of process shared mutex, then we can pursue with openssl devs to add an option for it.
Cheers, Daniel
On 11.04.19 17:14, Richard Fuchs wrote:
Hi,
(X-posted to sr-dev as this is getting into the nitty gritty)
As a short-term workaround for this, I've been playing with the preloaded library approach to hijack the pthread mutex calls and force them to provide process-shared mutexes. AFAICT this seems to be working and only has the minuscule performance impact of using slower process-shared mutexes in all instances, even when they aren't required.
The code for the preloaded library itself is very short and simple: https://gist.github.com/rfuchs/1bb7348b6acbe37e557d94c2f69a1498
As a more complete patch that integrates it into the build system (probably badly): https://gist.github.com/rfuchs/b240ffe87938a45e6f2a4cf53fe29f17
Finally it requires adding it to the startup script, for example in a systemd service file as:
Environment='LD_PRELOAD=/usr/lib/x86_64-linux-gnu/kamailio/openssl_mutex_shared/openssl_mutex_shared.so'
(that's with a hard coded path which isn't optimal of course).
I don't consider this a proper fix, but only a hacky workaround, but it might be a solution for the very near future. Throwing it out there in case other people have been working on similar approaches, and/or maybe have some comments about this.
Cheers
On 01/04/2019 04.52, Daniel-Constantin Mierla wrote:
Hello,
an update on this issue -- I spent a bit of time looking at libssl/libcrypto library and the problem can be the type of mutexes they use now internally starting with v1.1, respectively the pthread mutex. They are not process shared and kamailio is a multi-process application, working with the same tls connection from multiple processes.
Today I wrote to openssl mailing list, waiting now to see if I get any hints from there.
Cheers, Daniel
On 01.04.19 10:33, Kristijan Vrban wrote:
Hi Andrew,
yes, with openssl 1.0.2 Kamailio is now up and running since five days. Looks good so far.
Kristijan
Am Do., 28. März 2019 um 11:09 Uhr schrieb Andrew Pogrebennyk apogrebennyk@sipwise.com:
On 3/26/19 3:52 PM, Kristijan Vrban wrote:
Just curious, did you get to compile with OpenSSL 1.0 and test?
Just compiled with OpenSSL 1.0 . Gone test now.
Kristijan, any new occurrences since you have recompiled kamailio with openssl 1.0?
Regards, Andrew
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Hi Daniel,
I hope you are well. Do you have any updates on this issue? Did you get any response on openssl mailing list? Thank you!
With kind regards,
Jurijs
On Mon, Apr 1, 2019 at 11:55 AM Daniel-Constantin Mierla miconda@gmail.com wrote:
Hello,
an update on this issue -- I spent a bit of time looking at libssl/libcrypto library and the problem can be the type of mutexes they use now internally starting with v1.1, respectively the pthread mutex. They are not process shared and kamailio is a multi-process application, working with the same tls connection from multiple processes.
Today I wrote to openssl mailing list, waiting now to see if I get any hints from there.
Cheers, Daniel
On 01.04.19 10:33, Kristijan Vrban wrote:
Hi Andrew,
yes, with openssl 1.0.2 Kamailio is now up and running since five days. Looks good so far.
Kristijan
Am Do., 28. März 2019 um 11:09 Uhr schrieb Andrew Pogrebennyk apogrebennyk@sipwise.com:
On 3/26/19 3:52 PM, Kristijan Vrban wrote:
Just curious, did you get to compile with OpenSSL 1.0 and test?
Just compiled with OpenSSL 1.0 . Gone test now.
Kristijan, any new occurrences since you have recompiled kamailio with openssl 1.0?
Regards, Andrew
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Hello,
for deadlock issue with libssl 1.1 an workaround with a preloaded library was made available quite some time ago:
https://github.com/kamailio/kamailio/tree/master/src/modules/tls/utils/opens...
Recently that code was integrated in the core, so the preloaded library is not needed if you run 5.1.9 or latest branch 5.2 (to be released as 5.2.5, probably soon) as well as branch 5.3 or master.
However, few days ago was reported a crash inside the pseudo-random number generator (prng) of libssl 1.1, which seems to be caused by the changes in libssl 1.1 to have only-thread-safety approach. A patch was pushed two days ago, which seemed to fix it, see: https://github.com/kamailio/kamailio/issues/2077
More work is expected there in the next few days to play with variants of prng.
Cheers, Daniel
On 03.10.19 10:29, Jurijs Ivolga wrote:
Hi Daniel,
I hope you are well. Do you have any updates on this issue? Did you get any response on openssl mailing list? Thank you!
With kind regards,
Jurijs
On Mon, Apr 1, 2019 at 11:55 AM Daniel-Constantin Mierla <miconda@gmail.com mailto:miconda@gmail.com> wrote:
Hello, an update on this issue -- I spent a bit of time looking at libssl/libcrypto library and the problem can be the type of mutexes they use now internally starting with v1.1, respectively the pthread mutex. They are not process shared and kamailio is a multi-process application, working with the same tls connection from multiple processes. Today I wrote to openssl mailing list, waiting now to see if I get any hints from there. Cheers, Daniel On 01.04.19 10:33, Kristijan Vrban wrote: > Hi Andrew, > > yes, with openssl 1.0.2 Kamailio is now up and running since five > days. Looks good so far. > > Kristijan > > Am Do., 28. März 2019 um 11:09 Uhr schrieb Andrew Pogrebennyk > <apogrebennyk@sipwise.com <mailto:apogrebennyk@sipwise.com>>: >> On 3/26/19 3:52 PM, Kristijan Vrban wrote: >>>> Just curious, did you get to compile with OpenSSL 1.0 and test? >>> Just compiled with OpenSSL 1.0 . Gone test now. >> Kristijan, >> any new occurrences since you have recompiled kamailio with openssl 1.0? >> >> Regards, >> Andrew > _______________________________________________ > Kamailio (SER) - Users Mailing List > sr-users@lists.kamailio.org <mailto:sr-users@lists.kamailio.org> > https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users -- Daniel-Constantin Mierla -- www.asipto.com <http://www.asipto.com> www.twitter.com/miconda <http://www.twitter.com/miconda> -- www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com <http://www.kamailioworld.com> _______________________________________________ Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org <mailto:sr-users@lists.kamailio.org> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Daniel,
Got the same issue on 5.3.1 with openssl1.1, debian9. After 3 working days of tests (about ~30-50 wss clients), suddenly we've got a lot of connections stucked in CLOSE_WAIT state. Kamailio called sig_alarm_abort() when we try to reboot.
Thanks, Andrey
-- Sent from: http://sip-router.1086192.n5.nabble.com/Users-f3.html
Hi All,
I was debugging a TCP issue (most probably, I may start a thread for this question).
I was trying to get some info for TCP and TLS.
I typed: $> sudo kamctl rpc tls.list
And waited for a while.... until... I realized that my User-Agent, connected with TCP was not able to register any more. I think the rpc command has introduced something wrong.
The device can successfully "connect", send the REGISTER over the established TCP connection. The REGISTER do not appear in the logs any more, I don't see any traffic for TCP any more. So the behavior is the same as I had before: TCP and TLS are both not working and UDP is still working fine.
kamctl do not work any more... so kamctl trap do not work...
I have been able to type.. manually... for (all?) kamailio threads:
gdb /usr/sbin/kamailio 16500 -batch --eval-command="bt full" >> kamailio-trap-tcp-down.txt
I'm temporarly puting the backtrace I have here: https://sip.antisip.com/kamailio-trap-tcp-down.txt
You can see a thread stuck on the json command line: "tls_list" And many other waiting on CRYPTO_THREAD_write_lock
? might be related to: https://github.com/openssl/openssl/issues/5376
SIDE NOTE: Right before I was typing the last gdb command for the last thread, kamailio has crashed: This was around 5 minutes after the dead lock started.
Mar 26 14:47:11 sip kamailio[16493]: ERROR: <core> [core/tcp_main.c:2561]: tcpconn_do_send(): failed to send on 0x7ff8dfc2fdc8 (91.121.30.149:5061-> 62.210.97.21:49351): Broken pipe (32) Mar 26 14:47:11 sip kamailio[16493]: ERROR: <core> [core/tcp_read.c:1505]: tcp_read_req(): ERROR: tcp_read_req: error reading - c: 0x7ff8dfc2fdc8 r: 0x7ff8dfc2fe48 (-1) Mar 26 14:47:11 sip kamailio[16493]: WARNING: <core> [core/tcp_read.c:1848]: handle_io(): F_TCPCONN connection marked as bad: 0x7ff8dfa6a408 id 846 refcnt 3 Mar 26 14:47:11 sip kamailio[16371]: ALERT: <core> [main.c:755]: handle_sigs(): child process 16374 exited by a signal 11 Mar 26 14:47:11 sip kamailio[16371]: ALERT: <core> [main.c:758]: handle_sigs(): core was not generated Mar 26 14:47:11 sip kamailio[16371]: INFO: <core> [main.c:781]: handle_sigs(): terminating due to SIGCHLD Mar 26 14:47:11 sip kamailio[16493]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 26 14:47:11 sip kamailio[16500]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 26 14:47:11 sip kamailio[16479]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received
Unfortunalty, even if I did my best to setup my service to generate a core on crash, I still have "core was not generated".... (debian stretch)
Tks for reading! Regards Aymeric
Le mar. 26 mars 2019 à 14:11, Kristijan Vrban vrban.lkml@gmail.com a écrit :
And again one more kamctl trap file where
set_reply_no_connect was set.
Am Di., 26. März 2019 um 08:53 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
Attached also the output of kamctl trap
Am Di., 26. März 2019 um 08:42 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
Have you done a test with tools such as sipp, or was this happening after a while, with usual phones registering?
Usual variety of devices registering via TLS. But i can not exclude that some devices displaying behavioural problems.
Can you list the tcp connections and see if they are listed? kamctl tcp core.tcp_list
Need Kex module for that? So i can deliver next time. But when i do "lsof -u kamailio |grep TCP" i get a long list of more then 2000 lines with:
... kamailio 37561 kamailio 2105u sock 0,9 0t0 27856287 protocol: TCP kamailio 37561 kamailio 2106u sock 0,9 0t0 27856305 protocol: TCP kamailio 37561 kamailio 2107u sock 0,9 0t0 27856306 protocol: TCP kamailio 37561 kamailio 2108u sock 0,9 0t0 27856914 protocol: TCP ...
So about the time Kamailio created a lot of socket in the TCP domain, but which are not bound to any port (eg via connect(2) or listen(2) or bind(2)) Until we get to the maximum number of 2048 connections.
Best Kristijan
Am Mo., 25. März 2019 um 14:27 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Have you done a test with tools such as sipp, or was this happening after a while, with usual phones registering?
Can you list the tcp connections and see if they are listed?
kamctl tcp core.tcp_list
Cheers, Daniel
On 25.03.19 08:03, Kristijan Vrban wrote:
The solution here is to use set_reply_no_connect()
implemented it. Now the issue has shifted to:
ERROR: <core> [core/tcp_main.c:3959]: handle_new_connect(): maximum number of connections exceeded: 2048/2048
But not a single TCP connection is active between Kamailio and any device. Seems this counter for maximum number of connections now has an issue?
Kristijan
Am Mi., 20. März 2019 um 15:07 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Hello,
based on the trap output I think I could figure out what happened
there.
You have tcp_children to very low value (1 or so), the problem is
not
actually that one, but the fact that the connection to upstream
(the
device/app sending the request) was closed after receiving the
request
and routing of the reply gets stuck in the way of:
- a reply is received and has to be forwarded
- connection was lost, so Kamailio tries to establish a new
one, but
takes time till fails because the upstream is behind nat or so
based on
the via header:
Via: SIP/2.0/TLS 10.1.0.4:10002
;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
- the reply is retransmitted and gets to another worker, which
tries
to forward it again, but discovers a connection structure for that destination exists (created by previous reply worker) and now
waits for
the connection to be released (or better said, for the mutex on
writing
buffer to be unlocked)
- as the second reply waits, there can be other retransmissions
of the
reply ending up in other workers stuck on waiting for the mutex
of the
connection write buffer
The solution here is to use set_reply_no_connect() -- you can put
it
first in request_route block. I think this would be a good
addition to
the default configuration file as well, IMO, the sip server
should not
connect for sending replies and should do it also for requests
that go
behind nat.
Cheers, Daniel
On 19.03.19 10:53, Kristijan Vrban wrote: > So i had again the situation. But this time, incoming udp was > affected. Kamailio was sending out OPTIONS (via dispatcher
module) to
> a group of asterisk machines > but the 200 OK reply to the OPTIONS where not processed, so the > dispatcher module set all asterisk to inactive, even though they > replied 200 OK > > Attached the output of kamctl trap during the situation. Hope
there is
> any useful in it. Because after "kamctl trap" it was working
again
> without kamailio restart. > > Best > Kristijan > > Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin
Mierla
> miconda@gmail.com: >> Hello, >> >> setting tcp_children=1 is not a god option for scallability,
practically
>> you set kamailio to process a single tcp message at one time,
on high
>> traffic, that won't work well. >> >> Maybe try to set tcp_children to 2 or 4, that should make an
eventual
>> race appear faster. >> >> Regarding the pid, if it is an outgoing connection, then it can
be
>> created by any worker process, including a UDP worker, if that
was the
>> one receiving the sip message over udp and sends it out via tcp. >> >> Cheers, >> Daniel >> >> On 18.03.19 10:09, Kristijan Vrban wrote: >>> Hi Daniel, >>> >>> for testing, i now had set: "tcp_children=1" and so far this
issue did not occur
>>> ever since. So now value to provide for "kamctl trap" yet. >>> >>> "kamctl ps" show this two process to handle tcp: >>> >>> ... >>> }, { >>> "IDX": 25, >>> "PID": 71929, >>> "DSC": "tcp receiver (generic) child=0" >>> }, { >>> "IDX": 26, >>> "PID": 71933, >>> "DSC": "tcp main process" >>> } >>> ... >>> >>> >>> Ok, but then is was wondering to see a TCP connection on a udp
receiver child:
>>> >>> >>> netstat -ntp |grep 5061 >>> >>> ... >>> tcp 0 0 172.17.217.10:5061
195.70.114.125:18252
>>> ESTABLISHED 71895/kamailio >>> ... >>> >>> An pid 71895 is: >>> >>> }, { >>> "IDX": 3, >>> "PID": 71895, >>> "DSC": "udp receiver child=2 sock=127.0.0.1:5060" >>> }, { >>> >>> >>> >>> And if i look into it via "lsof -p 71895" (the udp receiver
child)
>>> >>> ... >>> kamailio 71895 kamailio 14u sock 0,9 0t0 >>> 8856085 protocol: TCP >>> kamailio 71895 kamailio 15u sock 0,9 0t0 >>> 8886886 protocol: TCP >>> kamailio 71895 kamailio 16u sock 0,9 0t0 >>> 8854886 protocol: TCP >>> kamailio 71895 kamailio 17u sock 0,9 0t0 >>> 8828915 protocol: TCP >>> kamailio 71895 kamailio 18u unix 0x000000005f73cb91 0t0 >>> 1680314 type=DGRAM >>> kamailio 71895 kamailio 19u IPv4 1846523 0t0 >>> TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED) >>> kamailio 71895 kamailio 20u sock 0,9 0t0 >>> 8887192 protocol: TCP >>> kamailio 71895 kamailio 21u sock 0,9 0t0 >>> 8813634 protocol: TCP >>> kamailio 71895 kamailio 22u unix 0x00000000c19bd102 0t0 >>> 1681407 type=STREAM >>> kamailio 71895 kamailio 23u sock 0,9 0t0 >>> 8850488 protocol: TCP >>> ... >>> >>> Not only the ESTABLISHED TCP session. But also this empty
sockets
>>> "protocol: TCP" >>> What are they doing there in the udp receiver? Is that how
it's supposed to be?
>>> >>> Kristijan >>> >>> Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin
Mierla
>>> miconda@gmail.com: >>>> Can you get file written by `kamctl trap`? It should have the
backtrace
>>>> for all kamailio processes. You need latest kamailio 5.2. >>>> >>>> Also, get the output for: kamctl ps >>>> >>>> Cheers, >>>> Daniel >>>> >>>> On 14.03.19 13:52, Kristijan Vrban wrote: >>>>> When i attach via gdb to one of the tcp worker, i see this: >>>>> >>>>> (gdb) bt >>>>> #0 0x00007fdaf4d14470 in futex_wait (private=<optimized
out>,
>>>>> expected=1, futex_word=0x7fdaeca92f8c) at >>>>> ../sysdeps/unix/sysv/linux/futex-internal.h:61 >>>>> #1 futex_wait_simple (private=<optimized out>, expected=1, >>>>> futex_word=0x7fdaeca92f8c) at
../sysdeps/nptl/futex-internal.h:135
>>>>> #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at >>>>> pthread_rwlock_wrlock.c:67 >>>>> #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from >>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>>>> #4 0x00007fdaf08e1c08 in ?? () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>>>> #5 0x00007fdaf08a6f69 in ?? () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>>>> #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from >>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>>>> #7 0x00007fdaf0c31144 in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>>>> #8 0x00007fdaf0c2bddb in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>>>> #9 0x00007fdaf0c22858 in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>>>> #10 0x00007fdaf0c1af61 in SSL_do_handshake () from >>>>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>>>> #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, >>>>> error=0x7ffffe2a2df0) at tls_server.c:422 >>>>> #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, >>>>> flags=0x7ffffe2c318c) at tls_server.c:1116 >>>>> #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98, >>>>> read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 >>>>> #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, >>>>> bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at >>>>> core/tcp_read.c:1496 >>>>> #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98,
events=1,
>>>>> idx=-1) at core/tcp_read.c:1862 >>>>> #16 0x0000556ead5e2053 in io_wait_loop_epoll
(h=0x556eadaaeec0 <io_w>,
>>>>> t=2, repeat=0) at core/io_wait.h:1065 >>>>> #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at >>>>> core/tcp_read.c:1974 >>>>> #18 0x0000556ead4c8e24 in tcp_init_children () at
core/tcp_main.c:4853
>>>>> #19 0x0000556ead3c352a in main_loop () at main.c:1735 >>>>> #20 0x0000556ead3ca5f8 in main (argc=13,
argv=0x7ffffe2c3828) at main.c:2675
>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban >>>>> vrban.lkml@gmail.com: >>>>>> Hi, with full debug is see this in log for every incoming
TCP SIP request:
>>>>>> >>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]:
DEBUG:
>>>>>> <core> [core/tcp_main.c:3871]: send2child(): WARNING: no
free tcp
>>>>>> receiver, connection passed to the least busy one (105) >>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]:
DEBUG:
>>>>>> <core> [core/tcp_main.c:3875]: send2child(): selected tcp
worker 2
>>>>>> 27(17937) for activity on [tls:172.17.217.10:5061],
0x7fdaeda8f928
>>>>>> >>>>>> So the Kamailio TCP process is working, and received TCP
traffic. But
>>>>>> the tcp workers are somehow busy. >>>>>> >>>>>> When i attach via strace to the TCP worker, i do not see
any activity. Just:
>>>>>> >>>>>> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL >>>>>> >>>>>> and nothing, even when i see the main tcp process choose
this worker process.
>>>>>> >>>>>> Kristijan >>>>>> >>>>>> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban >>>>>> vrban.lkml@gmail.com: >>>>>>> first of all thanks for the feedback. i prepared our
system now to run
>>>>>>> with debug=3 >>>>>>> I hope to see more then then. >>>>>>> >>>>>>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban >>>>>>> vrban.lkml@gmail.com: >>>>>>>> Hi kamailios, >>>>>>>> >>>>>>>> i have a creepy situation with v5.2.1 stable Kamilio.
After a day or
>>>>>>>> so, Kamailio stop to process incoming SIP traffic via
TCP. The
>>>>>>>> incoming TCP network packages get TCP-ACK from the OS
(Debian 9,
>>>>>>>> 4.18.0-15-generic-Linux) but Kamailio does not show any
processing for
>>>>>>>> the SIP-Traffic incoming via TCP. No logs, nothing. While
traffic via
>>>>>>>> UDP is working just totally fine. >>>>>>>> >>>>>>>> When i look via command "netstat -ntp" is see, that the
Recv-Q get
>>>>>>>> bigger and bigger. e.g.: >>>>>>>> >>>>>>>> Proto Recv-Q Send-Q Local Address Foreign Address State
PID/Program
>>>>>>>> name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252
ESTABLISHED
>>>>>>>> 31347/kamailio >>>>>>>> >>>>>>>> After Kamailio restart, all is working fine again for a
day. We have
>>>>>>>> maybe 10-20 devices online via TCP and low call volume
(1-2 call per
>>>>>>>> minute). The only settings for tcp we have is
"tcp_delayed_ack=no"
>>>>>>>> >>>>>>>> How to could we debug this situation? Again, no error, no
warings in
>>>>>>>> the log. Just nothing. >>>>>>>> >>>>>>>> Kristijan >>>>> _______________________________________________ >>>>> Kamailio (SER) - Users Mailing List >>>>> sr-users@lists.kamailio.org >>>>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users >>>> -- >>>> Daniel-Constantin Mierla -- www.asipto.com >>>> www.twitter.com/miconda -- www.linkedin.com/in/miconda >>>> Kamailio World Conference - May 6-8, 2019 --
www.kamailioworld.com
>>>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington,
DC, USA -- www.asipto.com
>>>> >>> _______________________________________________ >>> Kamailio (SER) - Users Mailing List >>> sr-users@lists.kamailio.org >>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users >> -- >> Daniel-Constantin Mierla -- www.asipto.com >> www.twitter.com/miconda -- www.linkedin.com/in/miconda >> Kamailio World Conference - May 6-8, 2019 --
www.kamailioworld.com
>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington,
DC, USA -- www.asipto.com
>>
Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 --
www.kamailioworld.com
Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC,
USA -- www.asipto.com
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA
-- www.asipto.com
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Hello,
yep, locking there is expected, as listing the tls connections wait for no other processes to change the content of internal tls connection structures. So it is a side effect of libssl/libcrypto getting stuck and the other processing waiting for it to move one. I have the Kamailio training in USA these days, so the trip and schedule of the day didn't allow me to look more at the libsll/libcrypto code in order to find a solution here. It is a high priority in my list, as I get time during the next days.
Cheers, Daniel
On 26.03.19 15:55, Aymeric Moizard wrote:
Hi All,
I was debugging a TCP issue (most probably, I may start a thread for this question).
I was trying to get some info for TCP and TLS.
I typed: $> sudo kamctl rpc tls.list
And waited for a while.... until... I realized that my User-Agent, connected with TCP was not able to register any more. I think the rpc command has introduced something wrong.
The device can successfully "connect", send the REGISTER over the established TCP connection. The REGISTER do not appear in the logs any more, I don't see any traffic for TCP any more. So the behavior is the same as I had before: TCP and TLS are both not working and UDP is still working fine.
kamctl do not work any more... so kamctl trap do not work...
I have been able to type.. manually... for (all?) kamailio threads:
gdb /usr/sbin/kamailio 16500 -batch --eval-command="bt full" >> kamailio-trap-tcp-down.txt
I'm temporarly puting the backtrace I have here: https://sip.antisip.com/kamailio-trap-tcp-down.txt
You can see a thread stuck on the json command line: "tls_list" And many other waiting on CRYPTO_THREAD_write_lock ? might be related to: https://github.com/openssl/openssl/issues/5376 SIDE NOTE: Right before I was typing the last gdb command for the last thread, kamailio has crashed: This was around 5 minutes after the dead lock started.
Mar 26 14:47:11 sip kamailio[16493]: ERROR: <core> [core/tcp_main.c:2561]: tcpconn_do_send(): failed to send on 0x7ff8dfc2fdc8 (91.121.30.149:5061->62.210.97.21:49351 http://62.210.97.21:49351): Broken pipe (32) Mar 26 14:47:11 sip kamailio[16493]: ERROR: <core> [core/tcp_read.c:1505]: tcp_read_req(): ERROR: tcp_read_req: error reading - c: 0x7ff8dfc2fdc8 r: 0x7ff8dfc2fe48 (-1) Mar 26 14:47:11 sip kamailio[16493]: WARNING: <core> [core/tcp_read.c:1848]: handle_io(): F_TCPCONN connection marked as bad: 0x7ff8dfa6a408 id 846 refcnt 3 Mar 26 14:47:11 sip kamailio[16371]: ALERT: <core> [main.c:755]: handle_sigs(): child process 16374 exited by a signal 11 Mar 26 14:47:11 sip kamailio[16371]: ALERT: <core> [main.c:758]: handle_sigs(): core was not generated Mar 26 14:47:11 sip kamailio[16371]: INFO: <core> [main.c:781]: handle_sigs(): terminating due to SIGCHLD Mar 26 14:47:11 sip kamailio[16493]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 26 14:47:11 sip kamailio[16500]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 26 14:47:11 sip kamailio[16479]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received
Unfortunalty, even if I did my best to setup my service to generate a core on crash, I still have "core was not generated".... (debian stretch)
Tks for reading! Regards Aymeric
Le mar. 26 mars 2019 à 14:11, Kristijan Vrban <vrban.lkml@gmail.com mailto:vrban.lkml@gmail.com> a écrit :
And again one more kamctl trap file where set_reply_no_connect was set. Am Di., 26. März 2019 um 08:53 Uhr schrieb Kristijan Vrban <vrban.lkml@gmail.com <mailto:vrban.lkml@gmail.com>>: > > Attached also the output of kamctl trap > > Am Di., 26. März 2019 um 08:42 Uhr schrieb Kristijan Vrban > <vrban.lkml@gmail.com <mailto:vrban.lkml@gmail.com>>: > > > > > Have you done a test with tools such as sipp, or was this happening > > > after a while, with usual phones registering? > > > > Usual variety of devices registering via TLS. But i can not exclude > > that some devices displaying behavioural problems. > > > > > Can you list the tcp connections and see if they are listed? > > > kamctl tcp core.tcp_list > > > > Need Kex module for that? So i can deliver next time. But when i do > > "lsof -u kamailio |grep TCP" > > i get a long list of more then 2000 lines with: > > > > ... > > kamailio 37561 kamailio 2105u sock 0,9 0t0 > > 27856287 protocol: TCP > > kamailio 37561 kamailio 2106u sock 0,9 0t0 > > 27856305 protocol: TCP > > kamailio 37561 kamailio 2107u sock 0,9 0t0 > > 27856306 protocol: TCP > > kamailio 37561 kamailio 2108u sock 0,9 0t0 > > 27856914 protocol: TCP > > ... > > > > So about the time Kamailio created a lot of socket in the TCP domain, > > but which are not bound to any port (eg via connect(2) or listen(2) or > > bind(2)) > > Until we get to the maximum number of 2048 connections. > > > > Best > > Kristijan > > > > Am Mo., 25. März 2019 um 14:27 Uhr schrieb Daniel-Constantin Mierla > > <miconda@gmail.com <mailto:miconda@gmail.com>>: > > > > > > Have you done a test with tools such as sipp, or was this happening > > > after a while, with usual phones registering? > > > > > > Can you list the tcp connections and see if they are listed? > > > > > > kamctl tcp core.tcp_list > > > > > > Cheers, > > > Daniel > > > > > > On 25.03.19 08:03, Kristijan Vrban wrote: > > > >> The solution here is to use set_reply_no_connect() > > > > implemented it. Now the issue has shifted to: > > > > > > > > ERROR: <core> [core/tcp_main.c:3959]: handle_new_connect(): maximum > > > > number of connections exceeded: 2048/2048 > > > > > > > > But not a single TCP connection is active between Kamailio and any > > > > device. Seems this counter for maximum number of connections > > > > now has an issue? > > > > > > > > Kristijan > > > > > > > > Am Mi., 20. März 2019 um 15:07 Uhr schrieb Daniel-Constantin Mierla > > > > <miconda@gmail.com <mailto:miconda@gmail.com>>: > > > >> Hello, > > > >> > > > >> based on the trap output I think I could figure out what happened there. > > > >> > > > >> You have tcp_children to very low value (1 or so), the problem is not > > > >> actually that one, but the fact that the connection to upstream (the > > > >> device/app sending the request) was closed after receiving the request > > > >> and routing of the reply gets stuck in the way of: > > > >> > > > >> - a reply is received and has to be forwarded > > > >> - connection was lost, so Kamailio tries to establish a new one, but > > > >> takes time till fails because the upstream is behind nat or so based on > > > >> the via header: > > > >> > > > >> Via: SIP/2.0/TLS > > > >> 10.1.0.4:10002;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9 > > > >> > > > >> - the reply is retransmitted and gets to another worker, which tries > > > >> to forward it again, but discovers a connection structure for that > > > >> destination exists (created by previous reply worker) and now waits for > > > >> the connection to be released (or better said, for the mutex on writing > > > >> buffer to be unlocked) > > > >> > > > >> - as the second reply waits, there can be other retransmissions of the > > > >> reply ending up in other workers stuck on waiting for the mutex of the > > > >> connection write buffer > > > >> > > > >> The solution here is to use set_reply_no_connect() -- you can put it > > > >> first in request_route block. I think this would be a good addition to > > > >> the default configuration file as well, IMO, the sip server should not > > > >> connect for sending replies and should do it also for requests that go > > > >> behind nat. > > > >> > > > >> Cheers, > > > >> Daniel > > > >> > > > >> On 19.03.19 10:53, Kristijan Vrban wrote: > > > >>> So i had again the situation. But this time, incoming udp was > > > >>> affected. Kamailio was sending out OPTIONS (via dispatcher module) to > > > >>> a group of asterisk machines > > > >>> but the 200 OK reply to the OPTIONS where not processed, so the > > > >>> dispatcher module set all asterisk to inactive, even though they > > > >>> replied 200 OK > > > >>> > > > >>> Attached the output of kamctl trap during the situation. Hope there is > > > >>> any useful in it. Because after "kamctl trap" it was working again > > > >>> without kamailio restart. > > > >>> > > > >>> Best > > > >>> Kristijan > > > >>> > > > >>> Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla > > > >>> <miconda@gmail.com <mailto:miconda@gmail.com>>: > > > >>>> Hello, > > > >>>> > > > >>>> setting tcp_children=1 is not a god option for scallability, practically > > > >>>> you set kamailio to process a single tcp message at one time, on high > > > >>>> traffic, that won't work well. > > > >>>> > > > >>>> Maybe try to set tcp_children to 2 or 4, that should make an eventual > > > >>>> race appear faster. > > > >>>> > > > >>>> Regarding the pid, if it is an outgoing connection, then it can be > > > >>>> created by any worker process, including a UDP worker, if that was the > > > >>>> one receiving the sip message over udp and sends it out via tcp. > > > >>>> > > > >>>> Cheers, > > > >>>> Daniel > > > >>>> > > > >>>> On 18.03.19 10:09, Kristijan Vrban wrote: > > > >>>>> Hi Daniel, > > > >>>>> > > > >>>>> for testing, i now had set: "tcp_children=1" and so far this issue did not occur > > > >>>>> ever since. So now value to provide for "kamctl trap" yet. > > > >>>>> > > > >>>>> "kamctl ps" show this two process to handle tcp: > > > >>>>> > > > >>>>> ... > > > >>>>> }, { > > > >>>>> "IDX": 25, > > > >>>>> "PID": 71929, > > > >>>>> "DSC": "tcp receiver (generic) child=0" > > > >>>>> }, { > > > >>>>> "IDX": 26, > > > >>>>> "PID": 71933, > > > >>>>> "DSC": "tcp main process" > > > >>>>> } > > > >>>>> ... > > > >>>>> > > > >>>>> > > > >>>>> Ok, but then is was wondering to see a TCP connection on a udp receiver child: > > > >>>>> > > > >>>>> > > > >>>>> netstat -ntp |grep 5061 > > > >>>>> > > > >>>>> ... > > > >>>>> tcp 0 0 172.17.217.10:5061 <http://172.17.217.10:5061> 195.70.114.125:18252 <http://195.70.114.125:18252> > > > >>>>> ESTABLISHED 71895/kamailio > > > >>>>> ... > > > >>>>> > > > >>>>> An pid 71895 is: > > > >>>>> > > > >>>>> }, { > > > >>>>> "IDX": 3, > > > >>>>> "PID": 71895, > > > >>>>> "DSC": "udp receiver child=2 sock=127.0.0.1:5060 <http://127.0.0.1:5060>" > > > >>>>> }, { > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> And if i look into it via "lsof -p 71895" (the udp receiver child) > > > >>>>> > > > >>>>> ... > > > >>>>> kamailio 71895 kamailio 14u sock 0,9 0t0 > > > >>>>> 8856085 protocol: TCP > > > >>>>> kamailio 71895 kamailio 15u sock 0,9 0t0 > > > >>>>> 8886886 protocol: TCP > > > >>>>> kamailio 71895 kamailio 16u sock 0,9 0t0 > > > >>>>> 8854886 protocol: TCP > > > >>>>> kamailio 71895 kamailio 17u sock 0,9 0t0 > > > >>>>> 8828915 protocol: TCP > > > >>>>> kamailio 71895 kamailio 18u unix 0x000000005f73cb91 0t0 > > > >>>>> 1680314 type=DGRAM > > > >>>>> kamailio 71895 kamailio 19u IPv4 1846523 0t0 > > > >>>>> TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED) > > > >>>>> kamailio 71895 kamailio 20u sock 0,9 0t0 > > > >>>>> 8887192 protocol: TCP > > > >>>>> kamailio 71895 kamailio 21u sock 0,9 0t0 > > > >>>>> 8813634 protocol: TCP > > > >>>>> kamailio 71895 kamailio 22u unix 0x00000000c19bd102 0t0 > > > >>>>> 1681407 type=STREAM > > > >>>>> kamailio 71895 kamailio 23u sock 0,9 0t0 > > > >>>>> 8850488 protocol: TCP > > > >>>>> ... > > > >>>>> > > > >>>>> Not only the ESTABLISHED TCP session. But also this empty sockets > > > >>>>> "protocol: TCP" > > > >>>>> What are they doing there in the udp receiver? Is that how it's supposed to be? > > > >>>>> > > > >>>>> Kristijan > > > >>>>> > > > >>>>> Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla > > > >>>>> <miconda@gmail.com <mailto:miconda@gmail.com>>: > > > >>>>>> Can you get file written by `kamctl trap`? It should have the backtrace > > > >>>>>> for all kamailio processes. You need latest kamailio 5.2. > > > >>>>>> > > > >>>>>> Also, get the output for: kamctl ps > > > >>>>>> > > > >>>>>> Cheers, > > > >>>>>> Daniel > > > >>>>>> > > > >>>>>> On 14.03.19 13:52, Kristijan Vrban wrote: > > > >>>>>>> When i attach via gdb to one of the tcp worker, i see this: > > > >>>>>>> > > > >>>>>>> (gdb) bt > > > >>>>>>> #0 0x00007fdaf4d14470 in futex_wait (private=<optimized out>, > > > >>>>>>> expected=1, futex_word=0x7fdaeca92f8c) at > > > >>>>>>> ../sysdeps/unix/sysv/linux/futex-internal.h:61 > > > >>>>>>> #1 futex_wait_simple (private=<optimized out>, expected=1, > > > >>>>>>> futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135 > > > >>>>>>> #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at > > > >>>>>>> pthread_rwlock_wrlock.c:67 > > > >>>>>>> #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from > > > >>>>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 > > > >>>>>>> #4 0x00007fdaf08e1c08 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 > > > >>>>>>> #5 0x00007fdaf08a6f69 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 > > > >>>>>>> #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from > > > >>>>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 > > > >>>>>>> #7 0x00007fdaf0c31144 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 > > > >>>>>>> #8 0x00007fdaf0c2bddb in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 > > > >>>>>>> #9 0x00007fdaf0c22858 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 > > > >>>>>>> #10 0x00007fdaf0c1af61 in SSL_do_handshake () from > > > >>>>>>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1 > > > >>>>>>> #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, > > > >>>>>>> error=0x7ffffe2a2df0) at tls_server.c:422 > > > >>>>>>> #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, > > > >>>>>>> flags=0x7ffffe2c318c) at tls_server.c:1116 > > > >>>>>>> #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98, > > > >>>>>>> read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 > > > >>>>>>> #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, > > > >>>>>>> bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at > > > >>>>>>> core/tcp_read.c:1496 > > > >>>>>>> #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1, > > > >>>>>>> idx=-1) at core/tcp_read.c:1862 > > > >>>>>>> #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0 <io_w>, > > > >>>>>>> t=2, repeat=0) at core/io_wait.h:1065 > > > >>>>>>> #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at > > > >>>>>>> core/tcp_read.c:1974 > > > >>>>>>> #18 0x0000556ead4c8e24 in tcp_init_children () at core/tcp_main.c:4853 > > > >>>>>>> #19 0x0000556ead3c352a in main_loop () at main.c:1735 > > > >>>>>>> #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at main.c:2675 > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban > > > >>>>>>> <vrban.lkml@gmail.com <mailto:vrban.lkml@gmail.com>>: > > > >>>>>>>> Hi, with full debug is see this in log for every incoming TCP SIP request: > > > >>>>>>>> > > > >>>>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: > > > >>>>>>>> <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp > > > >>>>>>>> receiver, connection passed to the least busy one (105) > > > >>>>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: > > > >>>>>>>> <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker 2 > > > >>>>>>>> 27(17937) for activity on [tls:172.17.217.10:5061 <http://172.17.217.10:5061>], 0x7fdaeda8f928 > > > >>>>>>>> > > > >>>>>>>> So the Kamailio TCP process is working, and received TCP traffic. But > > > >>>>>>>> the tcp workers are somehow busy. > > > >>>>>>>> > > > >>>>>>>> When i attach via strace to the TCP worker, i do not see any activity. Just: > > > >>>>>>>> > > > >>>>>>>> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL > > > >>>>>>>> > > > >>>>>>>> and nothing, even when i see the main tcp process choose this worker process. > > > >>>>>>>> > > > >>>>>>>> Kristijan > > > >>>>>>>> > > > >>>>>>>> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban > > > >>>>>>>> <vrban.lkml@gmail.com <mailto:vrban.lkml@gmail.com>>: > > > >>>>>>>>> first of all thanks for the feedback. i prepared our system now to run > > > >>>>>>>>> with debug=3 > > > >>>>>>>>> I hope to see more then then. > > > >>>>>>>>> > > > >>>>>>>>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban > > > >>>>>>>>> <vrban.lkml@gmail.com <mailto:vrban.lkml@gmail.com>>: > > > >>>>>>>>>> Hi kamailios, > > > >>>>>>>>>> > > > >>>>>>>>>> i have a creepy situation with v5.2.1 stable Kamilio. After a day or > > > >>>>>>>>>> so, Kamailio stop to process incoming SIP traffic via TCP. The > > > >>>>>>>>>> incoming TCP network packages get TCP-ACK from the OS (Debian 9, > > > >>>>>>>>>> 4.18.0-15-generic-Linux) but Kamailio does not show any processing for > > > >>>>>>>>>> the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via > > > >>>>>>>>>> UDP is working just totally fine. > > > >>>>>>>>>> > > > >>>>>>>>>> When i look via command "netstat -ntp" is see, that the Recv-Q get > > > >>>>>>>>>> bigger and bigger. e.g.: > > > >>>>>>>>>> > > > >>>>>>>>>> Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program > > > >>>>>>>>>> name tcp 4566 0 172.17.217.12:5060 <http://172.17.217.12:5060> xxx.xxx.xxx.xxx:57252 ESTABLISHED > > > >>>>>>>>>> 31347/kamailio > > > >>>>>>>>>> > > > >>>>>>>>>> After Kamailio restart, all is working fine again for a day. We have > > > >>>>>>>>>> maybe 10-20 devices online via TCP and low call volume (1-2 call per > > > >>>>>>>>>> minute). The only settings for tcp we have is "tcp_delayed_ack=no" > > > >>>>>>>>>> > > > >>>>>>>>>> How to could we debug this situation? Again, no error, no warings in > > > >>>>>>>>>> the log. Just nothing. > > > >>>>>>>>>> > > > >>>>>>>>>> Kristijan > > > >>>>>>> _______________________________________________ > > > >>>>>>> Kamailio (SER) - Users Mailing List > > > >>>>>>> sr-users@lists.kamailio.org <mailto:sr-users@lists.kamailio.org> > > > >>>>>>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users > > > >>>>>> -- > > > >>>>>> Daniel-Constantin Mierla -- www.asipto.com <http://www.asipto.com> > > > >>>>>> www.twitter.com/miconda <http://www.twitter.com/miconda> -- www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda> > > > >>>>>> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com <http://www.kamailioworld.com> > > > >>>>>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com <http://www.asipto.com> > > > >>>>>> > > > >>>>> _______________________________________________ > > > >>>>> Kamailio (SER) - Users Mailing List > > > >>>>> sr-users@lists.kamailio.org <mailto:sr-users@lists.kamailio.org> > > > >>>>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users > > > >>>> -- > > > >>>> Daniel-Constantin Mierla -- www.asipto.com <http://www.asipto.com> > > > >>>> www.twitter.com/miconda <http://www.twitter.com/miconda> -- www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda> > > > >>>> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com <http://www.kamailioworld.com> > > > >>>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com <http://www.asipto.com> > > > >>>> > > > >> -- > > > >> Daniel-Constantin Mierla -- www.asipto.com <http://www.asipto.com> > > > >> www.twitter.com/miconda <http://www.twitter.com/miconda> -- www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda> > > > >> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com <http://www.kamailioworld.com> > > > >> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com <http://www.asipto.com> > > > >> > > > -- > > > Daniel-Constantin Mierla -- www.asipto.com <http://www.asipto.com> > > > www.twitter.com/miconda <http://www.twitter.com/miconda> -- www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda> > > > Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com <http://www.kamailioworld.com> > > > Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com <http://www.asipto.com> > > > _______________________________________________ Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org <mailto:sr-users@lists.kamailio.org> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- Antisip - http://www.antisip.com
Hello Aymeric,
would you be able to test with tls module compiled against libssl 1.1 and using the pre-loaded shared object workaround?
* https://github.com/kamailio/kamailio/tree/master/src/modules/tls/utils/opens...
You should be able to use it with any version, no need to test with kamailio master branch.
Just clone the master branch, then:
cd src/modules/tls/utils/openssl_mutex_shared
make
Either from there or copy openssl_mutex_shared.so to a location you want, then pre-load it before starting your version of Kamailio.
The README.md in the folder has some more details.
I would like to have some validation that it works fine before approaching this topic with libssl project to allow to init the locks with shared process option.
Thanks, Daniel
On 26.03.19 16:18, Daniel-Constantin Mierla wrote:
Hello,
yep, locking there is expected, as listing the tls connections wait for no other processes to change the content of internal tls connection structures. So it is a side effect of libssl/libcrypto getting stuck and the other processing waiting for it to move one. I have the Kamailio training in USA these days, so the trip and schedule of the day didn't allow me to look more at the libsll/libcrypto code in order to find a solution here. It is a high priority in my list, as I get time during the next days.
Cheers, Daniel
On 26.03.19 15:55, Aymeric Moizard wrote:
Hi All,
I was debugging a TCP issue (most probably, I may start a thread for this question).
I was trying to get some info for TCP and TLS.
I typed: $> sudo kamctl rpc tls.list
And waited for a while.... until... I realized that my User-Agent, connected with TCP was not able to register any more. I think the rpc command has introduced something wrong.
The device can successfully "connect", send the REGISTER over the established TCP connection. The REGISTER do not appear in the logs any more, I don't see any traffic for TCP any more. So the behavior is the same as I had before: TCP and TLS are both not working and UDP is still working fine.
kamctl do not work any more... so kamctl trap do not work...
I have been able to type.. manually... for (all?) kamailio threads:
gdb /usr/sbin/kamailio 16500 -batch --eval-command="bt full" >> kamailio-trap-tcp-down.txt
I'm temporarly puting the backtrace I have here: https://sip.antisip.com/kamailio-trap-tcp-down.txt
You can see a thread stuck on the json command line: "tls_list" And many other waiting on CRYPTO_THREAD_write_lock ? might be related to: https://github.com/openssl/openssl/issues/5376 SIDE NOTE: Right before I was typing the last gdb command for the last thread, kamailio has crashed: This was around 5 minutes after the dead lock started.
Mar 26 14:47:11 sip kamailio[16493]: ERROR: <core> [core/tcp_main.c:2561]: tcpconn_do_send(): failed to send on 0x7ff8dfc2fdc8 (91.121.30.149:5061->62.210.97.21:49351 http://62.210.97.21:49351): Broken pipe (32) Mar 26 14:47:11 sip kamailio[16493]: ERROR: <core> [core/tcp_read.c:1505]: tcp_read_req(): ERROR: tcp_read_req: error reading - c: 0x7ff8dfc2fdc8 r: 0x7ff8dfc2fe48 (-1) Mar 26 14:47:11 sip kamailio[16493]: WARNING: <core> [core/tcp_read.c:1848]: handle_io(): F_TCPCONN connection marked as bad: 0x7ff8dfa6a408 id 846 refcnt 3 Mar 26 14:47:11 sip kamailio[16371]: ALERT: <core> [main.c:755]: handle_sigs(): child process 16374 exited by a signal 11 Mar 26 14:47:11 sip kamailio[16371]: ALERT: <core> [main.c:758]: handle_sigs(): core was not generated Mar 26 14:47:11 sip kamailio[16371]: INFO: <core> [main.c:781]: handle_sigs(): terminating due to SIGCHLD Mar 26 14:47:11 sip kamailio[16493]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 26 14:47:11 sip kamailio[16500]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 26 14:47:11 sip kamailio[16479]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received
Unfortunalty, even if I did my best to setup my service to generate a core on crash, I still have "core was not generated".... (debian stretch)
Tks for reading! Regards Aymeric
Le mar. 26 mars 2019 à 14:11, Kristijan Vrban <vrban.lkml@gmail.com mailto:vrban.lkml@gmail.com> a écrit :
And again one more kamctl trap file where set_reply_no_connect was set. Am Di., 26. März 2019 um 08:53 Uhr schrieb Kristijan Vrban <vrban.lkml@gmail.com <mailto:vrban.lkml@gmail.com>>: > > Attached also the output of kamctl trap > > Am Di., 26. März 2019 um 08:42 Uhr schrieb Kristijan Vrban > <vrban.lkml@gmail.com <mailto:vrban.lkml@gmail.com>>: > > > > > Have you done a test with tools such as sipp, or was this happening > > > after a while, with usual phones registering? > > > > Usual variety of devices registering via TLS. But i can not exclude > > that some devices displaying behavioural problems. > > > > > Can you list the tcp connections and see if they are listed? > > > kamctl tcp core.tcp_list > > > > Need Kex module for that? So i can deliver next time. But when i do > > "lsof -u kamailio |grep TCP" > > i get a long list of more then 2000 lines with: > > > > ... > > kamailio 37561 kamailio 2105u sock 0,9 0t0 > > 27856287 protocol: TCP > > kamailio 37561 kamailio 2106u sock 0,9 0t0 > > 27856305 protocol: TCP > > kamailio 37561 kamailio 2107u sock 0,9 0t0 > > 27856306 protocol: TCP > > kamailio 37561 kamailio 2108u sock 0,9 0t0 > > 27856914 protocol: TCP > > ... > > > > So about the time Kamailio created a lot of socket in the TCP domain, > > but which are not bound to any port (eg via connect(2) or listen(2) or > > bind(2)) > > Until we get to the maximum number of 2048 connections. > > > > Best > > Kristijan > > > > Am Mo., 25. März 2019 um 14:27 Uhr schrieb Daniel-Constantin Mierla > > <miconda@gmail.com <mailto:miconda@gmail.com>>: > > > > > > Have you done a test with tools such as sipp, or was this happening > > > after a while, with usual phones registering? > > > > > > Can you list the tcp connections and see if they are listed? > > > > > > kamctl tcp core.tcp_list > > > > > > Cheers, > > > Daniel > > > > > > On 25.03.19 08:03, Kristijan Vrban wrote: > > > >> The solution here is to use set_reply_no_connect() > > > > implemented it. Now the issue has shifted to: > > > > > > > > ERROR: <core> [core/tcp_main.c:3959]: handle_new_connect(): maximum > > > > number of connections exceeded: 2048/2048 > > > > > > > > But not a single TCP connection is active between Kamailio and any > > > > device. Seems this counter for maximum number of connections > > > > now has an issue? > > > > > > > > Kristijan > > > > > > > > Am Mi., 20. März 2019 um 15:07 Uhr schrieb Daniel-Constantin Mierla > > > > <miconda@gmail.com <mailto:miconda@gmail.com>>: > > > >> Hello, > > > >> > > > >> based on the trap output I think I could figure out what happened there. > > > >> > > > >> You have tcp_children to very low value (1 or so), the problem is not > > > >> actually that one, but the fact that the connection to upstream (the > > > >> device/app sending the request) was closed after receiving the request > > > >> and routing of the reply gets stuck in the way of: > > > >> > > > >> - a reply is received and has to be forwarded > > > >> - connection was lost, so Kamailio tries to establish a new one, but > > > >> takes time till fails because the upstream is behind nat or so based on > > > >> the via header: > > > >> > > > >> Via: SIP/2.0/TLS > > > >> 10.1.0.4:10002;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9 > > > >> > > > >> - the reply is retransmitted and gets to another worker, which tries > > > >> to forward it again, but discovers a connection structure for that > > > >> destination exists (created by previous reply worker) and now waits for > > > >> the connection to be released (or better said, for the mutex on writing > > > >> buffer to be unlocked) > > > >> > > > >> - as the second reply waits, there can be other retransmissions of the > > > >> reply ending up in other workers stuck on waiting for the mutex of the > > > >> connection write buffer > > > >> > > > >> The solution here is to use set_reply_no_connect() -- you can put it > > > >> first in request_route block. I think this would be a good addition to > > > >> the default configuration file as well, IMO, the sip server should not > > > >> connect for sending replies and should do it also for requests that go > > > >> behind nat. > > > >> > > > >> Cheers, > > > >> Daniel > > > >> > > > >> On 19.03.19 10:53, Kristijan Vrban wrote: > > > >>> So i had again the situation. But this time, incoming udp was > > > >>> affected. Kamailio was sending out OPTIONS (via dispatcher module) to > > > >>> a group of asterisk machines > > > >>> but the 200 OK reply to the OPTIONS where not processed, so the > > > >>> dispatcher module set all asterisk to inactive, even though they > > > >>> replied 200 OK > > > >>> > > > >>> Attached the output of kamctl trap during the situation. Hope there is > > > >>> any useful in it. Because after "kamctl trap" it was working again > > > >>> without kamailio restart. > > > >>> > > > >>> Best > > > >>> Kristijan > > > >>> > > > >>> Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla > > > >>> <miconda@gmail.com <mailto:miconda@gmail.com>>: > > > >>>> Hello, > > > >>>> > > > >>>> setting tcp_children=1 is not a god option for scallability, practically > > > >>>> you set kamailio to process a single tcp message at one time, on high > > > >>>> traffic, that won't work well. > > > >>>> > > > >>>> Maybe try to set tcp_children to 2 or 4, that should make an eventual > > > >>>> race appear faster. > > > >>>> > > > >>>> Regarding the pid, if it is an outgoing connection, then it can be > > > >>>> created by any worker process, including a UDP worker, if that was the > > > >>>> one receiving the sip message over udp and sends it out via tcp. > > > >>>> > > > >>>> Cheers, > > > >>>> Daniel > > > >>>> > > > >>>> On 18.03.19 10:09, Kristijan Vrban wrote: > > > >>>>> Hi Daniel, > > > >>>>> > > > >>>>> for testing, i now had set: "tcp_children=1" and so far this issue did not occur > > > >>>>> ever since. So now value to provide for "kamctl trap" yet. > > > >>>>> > > > >>>>> "kamctl ps" show this two process to handle tcp: > > > >>>>> > > > >>>>> ... > > > >>>>> }, { > > > >>>>> "IDX": 25, > > > >>>>> "PID": 71929, > > > >>>>> "DSC": "tcp receiver (generic) child=0" > > > >>>>> }, { > > > >>>>> "IDX": 26, > > > >>>>> "PID": 71933, > > > >>>>> "DSC": "tcp main process" > > > >>>>> } > > > >>>>> ... > > > >>>>> > > > >>>>> > > > >>>>> Ok, but then is was wondering to see a TCP connection on a udp receiver child: > > > >>>>> > > > >>>>> > > > >>>>> netstat -ntp |grep 5061 > > > >>>>> > > > >>>>> ... > > > >>>>> tcp 0 0 172.17.217.10:5061 <http://172.17.217.10:5061> 195.70.114.125:18252 <http://195.70.114.125:18252> > > > >>>>> ESTABLISHED 71895/kamailio > > > >>>>> ... > > > >>>>> > > > >>>>> An pid 71895 is: > > > >>>>> > > > >>>>> }, { > > > >>>>> "IDX": 3, > > > >>>>> "PID": 71895, > > > >>>>> "DSC": "udp receiver child=2 sock=127.0.0.1:5060 <http://127.0.0.1:5060>" > > > >>>>> }, { > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> And if i look into it via "lsof -p 71895" (the udp receiver child) > > > >>>>> > > > >>>>> ... > > > >>>>> kamailio 71895 kamailio 14u sock 0,9 0t0 > > > >>>>> 8856085 protocol: TCP > > > >>>>> kamailio 71895 kamailio 15u sock 0,9 0t0 > > > >>>>> 8886886 protocol: TCP > > > >>>>> kamailio 71895 kamailio 16u sock 0,9 0t0 > > > >>>>> 8854886 protocol: TCP > > > >>>>> kamailio 71895 kamailio 17u sock 0,9 0t0 > > > >>>>> 8828915 protocol: TCP > > > >>>>> kamailio 71895 kamailio 18u unix 0x000000005f73cb91 0t0 > > > >>>>> 1680314 type=DGRAM > > > >>>>> kamailio 71895 kamailio 19u IPv4 1846523 0t0 > > > >>>>> TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED) > > > >>>>> kamailio 71895 kamailio 20u sock 0,9 0t0 > > > >>>>> 8887192 protocol: TCP > > > >>>>> kamailio 71895 kamailio 21u sock 0,9 0t0 > > > >>>>> 8813634 protocol: TCP > > > >>>>> kamailio 71895 kamailio 22u unix 0x00000000c19bd102 0t0 > > > >>>>> 1681407 type=STREAM > > > >>>>> kamailio 71895 kamailio 23u sock 0,9 0t0 > > > >>>>> 8850488 protocol: TCP > > > >>>>> ... > > > >>>>> > > > >>>>> Not only the ESTABLISHED TCP session. But also this empty sockets > > > >>>>> "protocol: TCP" > > > >>>>> What are they doing there in the udp receiver? Is that how it's supposed to be? > > > >>>>> > > > >>>>> Kristijan > > > >>>>> > > > >>>>> Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla > > > >>>>> <miconda@gmail.com <mailto:miconda@gmail.com>>: > > > >>>>>> Can you get file written by `kamctl trap`? It should have the backtrace > > > >>>>>> for all kamailio processes. You need latest kamailio 5.2. > > > >>>>>> > > > >>>>>> Also, get the output for: kamctl ps > > > >>>>>> > > > >>>>>> Cheers, > > > >>>>>> Daniel > > > >>>>>> > > > >>>>>> On 14.03.19 13:52, Kristijan Vrban wrote: > > > >>>>>>> When i attach via gdb to one of the tcp worker, i see this: > > > >>>>>>> > > > >>>>>>> (gdb) bt > > > >>>>>>> #0 0x00007fdaf4d14470 in futex_wait (private=<optimized out>, > > > >>>>>>> expected=1, futex_word=0x7fdaeca92f8c) at > > > >>>>>>> ../sysdeps/unix/sysv/linux/futex-internal.h:61 > > > >>>>>>> #1 futex_wait_simple (private=<optimized out>, expected=1, > > > >>>>>>> futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135 > > > >>>>>>> #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at > > > >>>>>>> pthread_rwlock_wrlock.c:67 > > > >>>>>>> #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from > > > >>>>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 > > > >>>>>>> #4 0x00007fdaf08e1c08 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 > > > >>>>>>> #5 0x00007fdaf08a6f69 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 > > > >>>>>>> #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from > > > >>>>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 > > > >>>>>>> #7 0x00007fdaf0c31144 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 > > > >>>>>>> #8 0x00007fdaf0c2bddb in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 > > > >>>>>>> #9 0x00007fdaf0c22858 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 > > > >>>>>>> #10 0x00007fdaf0c1af61 in SSL_do_handshake () from > > > >>>>>>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1 > > > >>>>>>> #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, > > > >>>>>>> error=0x7ffffe2a2df0) at tls_server.c:422 > > > >>>>>>> #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, > > > >>>>>>> flags=0x7ffffe2c318c) at tls_server.c:1116 > > > >>>>>>> #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98, > > > >>>>>>> read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 > > > >>>>>>> #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, > > > >>>>>>> bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at > > > >>>>>>> core/tcp_read.c:1496 > > > >>>>>>> #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1, > > > >>>>>>> idx=-1) at core/tcp_read.c:1862 > > > >>>>>>> #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0 <io_w>, > > > >>>>>>> t=2, repeat=0) at core/io_wait.h:1065 > > > >>>>>>> #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at > > > >>>>>>> core/tcp_read.c:1974 > > > >>>>>>> #18 0x0000556ead4c8e24 in tcp_init_children () at core/tcp_main.c:4853 > > > >>>>>>> #19 0x0000556ead3c352a in main_loop () at main.c:1735 > > > >>>>>>> #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at main.c:2675 > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban > > > >>>>>>> <vrban.lkml@gmail.com <mailto:vrban.lkml@gmail.com>>: > > > >>>>>>>> Hi, with full debug is see this in log for every incoming TCP SIP request: > > > >>>>>>>> > > > >>>>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: > > > >>>>>>>> <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp > > > >>>>>>>> receiver, connection passed to the least busy one (105) > > > >>>>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: > > > >>>>>>>> <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker 2 > > > >>>>>>>> 27(17937) for activity on [tls:172.17.217.10:5061 <http://172.17.217.10:5061>], 0x7fdaeda8f928 > > > >>>>>>>> > > > >>>>>>>> So the Kamailio TCP process is working, and received TCP traffic. But > > > >>>>>>>> the tcp workers are somehow busy. > > > >>>>>>>> > > > >>>>>>>> When i attach via strace to the TCP worker, i do not see any activity. Just: > > > >>>>>>>> > > > >>>>>>>> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL > > > >>>>>>>> > > > >>>>>>>> and nothing, even when i see the main tcp process choose this worker process. > > > >>>>>>>> > > > >>>>>>>> Kristijan > > > >>>>>>>> > > > >>>>>>>> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban > > > >>>>>>>> <vrban.lkml@gmail.com <mailto:vrban.lkml@gmail.com>>: > > > >>>>>>>>> first of all thanks for the feedback. i prepared our system now to run > > > >>>>>>>>> with debug=3 > > > >>>>>>>>> I hope to see more then then. > > > >>>>>>>>> > > > >>>>>>>>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban > > > >>>>>>>>> <vrban.lkml@gmail.com <mailto:vrban.lkml@gmail.com>>: > > > >>>>>>>>>> Hi kamailios, > > > >>>>>>>>>> > > > >>>>>>>>>> i have a creepy situation with v5.2.1 stable Kamilio. After a day or > > > >>>>>>>>>> so, Kamailio stop to process incoming SIP traffic via TCP. The > > > >>>>>>>>>> incoming TCP network packages get TCP-ACK from the OS (Debian 9, > > > >>>>>>>>>> 4.18.0-15-generic-Linux) but Kamailio does not show any processing for > > > >>>>>>>>>> the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via > > > >>>>>>>>>> UDP is working just totally fine. > > > >>>>>>>>>> > > > >>>>>>>>>> When i look via command "netstat -ntp" is see, that the Recv-Q get > > > >>>>>>>>>> bigger and bigger. e.g.: > > > >>>>>>>>>> > > > >>>>>>>>>> Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program > > > >>>>>>>>>> name tcp 4566 0 172.17.217.12:5060 <http://172.17.217.12:5060> xxx.xxx.xxx.xxx:57252 ESTABLISHED > > > >>>>>>>>>> 31347/kamailio > > > >>>>>>>>>> > > > >>>>>>>>>> After Kamailio restart, all is working fine again for a day. We have > > > >>>>>>>>>> maybe 10-20 devices online via TCP and low call volume (1-2 call per > > > >>>>>>>>>> minute). The only settings for tcp we have is "tcp_delayed_ack=no" > > > >>>>>>>>>> > > > >>>>>>>>>> How to could we debug this situation? Again, no error, no warings in > > > >>>>>>>>>> the log. Just nothing. > > > >>>>>>>>>> > > > >>>>>>>>>> Kristijan > > > >>>>>>> _______________________________________________ > > > >>>>>>> Kamailio (SER) - Users Mailing List > > > >>>>>>> sr-users@lists.kamailio.org <mailto:sr-users@lists.kamailio.org> > > > >>>>>>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users > > > >>>>>> -- > > > >>>>>> Daniel-Constantin Mierla -- www.asipto.com <http://www.asipto.com> > > > >>>>>> www.twitter.com/miconda <http://www.twitter.com/miconda> -- www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda> > > > >>>>>> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com <http://www.kamailioworld.com> > > > >>>>>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com <http://www.asipto.com> > > > >>>>>> > > > >>>>> _______________________________________________ > > > >>>>> Kamailio (SER) - Users Mailing List > > > >>>>> sr-users@lists.kamailio.org <mailto:sr-users@lists.kamailio.org> > > > >>>>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users > > > >>>> -- > > > >>>> Daniel-Constantin Mierla -- www.asipto.com <http://www.asipto.com> > > > >>>> www.twitter.com/miconda <http://www.twitter.com/miconda> -- www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda> > > > >>>> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com <http://www.kamailioworld.com> > > > >>>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com <http://www.asipto.com> > > > >>>> > > > >> -- > > > >> Daniel-Constantin Mierla -- www.asipto.com <http://www.asipto.com> > > > >> www.twitter.com/miconda <http://www.twitter.com/miconda> -- www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda> > > > >> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com <http://www.kamailioworld.com> > > > >> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com <http://www.asipto.com> > > > >> > > > -- > > > Daniel-Constantin Mierla -- www.asipto.com <http://www.asipto.com> > > > www.twitter.com/miconda <http://www.twitter.com/miconda> -- www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda> > > > Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com <http://www.kamailioworld.com> > > > Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com <http://www.asipto.com> > > > _______________________________________________ Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org <mailto:sr-users@lists.kamailio.org> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- Antisip - http://www.antisip.com
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
HI Daniel,
I have received your request and have added it to my TODO list...
Unfortunatly, no much time currently. I will certainly do it later, but cannot give any delay for it.
Also, I would really like to understand how to "generate" the issue. (I think I had the issue only once or twice this year...)
Otherwise, I will have no way to make sure the workaround would work...
Tks Aymeric
Le lun. 15 avr. 2019 à 09:06, Daniel-Constantin Mierla miconda@gmail.com a écrit :
Hello Aymeric,
would you be able to test with tls module compiled against libssl 1.1 and using the pre-loaded shared object workaround?
https://github.com/kamailio/kamailio/tree/master/src/modules/tls/utils/opens...
You should be able to use it with any version, no need to test with kamailio master branch.
Just clone the master branch, then:
cd src/modules/tls/utils/openssl_mutex_shared
make
Either from there or copy openssl_mutex_shared.so to a location you want, then pre-load it before starting your version of Kamailio.
The README.md in the folder has some more details.
I would like to have some validation that it works fine before approaching this topic with libssl project to allow to init the locks with shared process option.
Thanks, Daniel On 26.03.19 16:18, Daniel-Constantin Mierla wrote:
Hello,
yep, locking there is expected, as listing the tls connections wait for no other processes to change the content of internal tls connection structures. So it is a side effect of libssl/libcrypto getting stuck and the other processing waiting for it to move one. I have the Kamailio training in USA these days, so the trip and schedule of the day didn't allow me to look more at the libsll/libcrypto code in order to find a solution here. It is a high priority in my list, as I get time during the next days.
Cheers, Daniel On 26.03.19 15:55, Aymeric Moizard wrote:
Hi All,
I was debugging a TCP issue (most probably, I may start a thread for this question).
I was trying to get some info for TCP and TLS.
I typed: $> sudo kamctl rpc tls.list
And waited for a while.... until... I realized that my User-Agent, connected with TCP was not able to register any more. I think the rpc command has introduced something wrong.
The device can successfully "connect", send the REGISTER over the established TCP connection. The REGISTER do not appear in the logs any more, I don't see any traffic for TCP any more. So the behavior is the same as I had before: TCP and TLS are both not working and UDP is still working fine.
kamctl do not work any more... so kamctl trap do not work...
I have been able to type.. manually... for (all?) kamailio threads:
gdb /usr/sbin/kamailio 16500 -batch --eval-command="bt full" >> kamailio-trap-tcp-down.txt
I'm temporarly puting the backtrace I have here: https://sip.antisip.com/kamailio-trap-tcp-down.txt
You can see a thread stuck on the json command line: "tls_list" And many other waiting on CRYPTO_THREAD_write_lock ? might be related to: https://github.com/openssl/openssl/issues/5376 SIDE NOTE: Right before I was typing the last gdb command for the last thread, kamailio has crashed: This was around 5 minutes after the dead lock started.
Mar 26 14:47:11 sip kamailio[16493]: ERROR: <core> [core/tcp_main.c:2561]: tcpconn_do_send(): failed to send on 0x7ff8dfc2fdc8 (91.121.30.149:5061-> 62.210.97.21:49351): Broken pipe (32) Mar 26 14:47:11 sip kamailio[16493]: ERROR: <core> [core/tcp_read.c:1505]: tcp_read_req(): ERROR: tcp_read_req: error reading - c: 0x7ff8dfc2fdc8 r: 0x7ff8dfc2fe48 (-1) Mar 26 14:47:11 sip kamailio[16493]: WARNING: <core> [core/tcp_read.c:1848]: handle_io(): F_TCPCONN connection marked as bad: 0x7ff8dfa6a408 id 846 refcnt 3 Mar 26 14:47:11 sip kamailio[16371]: ALERT: <core> [main.c:755]: handle_sigs(): child process 16374 exited by a signal 11 Mar 26 14:47:11 sip kamailio[16371]: ALERT: <core> [main.c:758]: handle_sigs(): core was not generated Mar 26 14:47:11 sip kamailio[16371]: INFO: <core> [main.c:781]: handle_sigs(): terminating due to SIGCHLD Mar 26 14:47:11 sip kamailio[16493]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 26 14:47:11 sip kamailio[16500]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 26 14:47:11 sip kamailio[16479]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received
Unfortunalty, even if I did my best to setup my service to generate a core on crash, I still have "core was not generated".... (debian stretch)
Tks for reading! Regards Aymeric
Le mar. 26 mars 2019 à 14:11, Kristijan Vrban vrban.lkml@gmail.com a écrit :
And again one more kamctl trap file where
set_reply_no_connect was set.
Am Di., 26. März 2019 um 08:53 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
Attached also the output of kamctl trap
Am Di., 26. März 2019 um 08:42 Uhr schrieb Kristijan Vrban vrban.lkml@gmail.com:
Have you done a test with tools such as sipp, or was this happening after a while, with usual phones registering?
Usual variety of devices registering via TLS. But i can not exclude that some devices displaying behavioural problems.
Can you list the tcp connections and see if they are listed? kamctl tcp core.tcp_list
Need Kex module for that? So i can deliver next time. But when i do "lsof -u kamailio |grep TCP" i get a long list of more then 2000 lines with:
... kamailio 37561 kamailio 2105u sock 0,9 0t0 27856287 protocol: TCP kamailio 37561 kamailio 2106u sock 0,9 0t0 27856305 protocol: TCP kamailio 37561 kamailio 2107u sock 0,9 0t0 27856306 protocol: TCP kamailio 37561 kamailio 2108u sock 0,9 0t0 27856914 protocol: TCP ...
So about the time Kamailio created a lot of socket in the TCP domain, but which are not bound to any port (eg via connect(2) or listen(2) or bind(2)) Until we get to the maximum number of 2048 connections.
Best Kristijan
Am Mo., 25. März 2019 um 14:27 Uhr schrieb Daniel-Constantin Mierla miconda@gmail.com:
Have you done a test with tools such as sipp, or was this happening after a while, with usual phones registering?
Can you list the tcp connections and see if they are listed?
kamctl tcp core.tcp_list
Cheers, Daniel
On 25.03.19 08:03, Kristijan Vrban wrote:
> The solution here is to use set_reply_no_connect() implemented it. Now the issue has shifted to:
ERROR: <core> [core/tcp_main.c:3959]: handle_new_connect():
maximum
number of connections exceeded: 2048/2048
But not a single TCP connection is active between Kamailio and any device. Seems this counter for maximum number of connections now has an issue?
Kristijan
Am Mi., 20. März 2019 um 15:07 Uhr schrieb Daniel-Constantin
Mierla
miconda@gmail.com: > Hello, > > based on the trap output I think I could figure out what
happened there.
> > You have tcp_children to very low value (1 or so), the problem
is not
> actually that one, but the fact that the connection to upstream
(the
> device/app sending the request) was closed after receiving the
request
> and routing of the reply gets stuck in the way of: > > - a reply is received and has to be forwarded > - connection was lost, so Kamailio tries to establish a new
one, but
> takes time till fails because the upstream is behind nat or so
based on
> the via header: > > Via: SIP/2.0/TLS > 10.1.0.4:10002
;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
> > - the reply is retransmitted and gets to another worker, which
tries
> to forward it again, but discovers a connection structure for
that
> destination exists (created by previous reply worker) and now
waits for
> the connection to be released (or better said, for the mutex on
writing
> buffer to be unlocked) > > - as the second reply waits, there can be other
retransmissions of the
> reply ending up in other workers stuck on waiting for the mutex
of the
> connection write buffer > > The solution here is to use set_reply_no_connect() -- you can
put it
> first in request_route block. I think this would be a good
addition to
> the default configuration file as well, IMO, the sip server
should not
> connect for sending replies and should do it also for requests
that go
> behind nat. > > Cheers, > Daniel > > On 19.03.19 10:53, Kristijan Vrban wrote: >> So i had again the situation. But this time, incoming udp was >> affected. Kamailio was sending out OPTIONS (via dispatcher
module) to
>> a group of asterisk machines >> but the 200 OK reply to the OPTIONS where not processed, so the >> dispatcher module set all asterisk to inactive, even though they >> replied 200 OK >> >> Attached the output of kamctl trap during the situation. Hope
there is
>> any useful in it. Because after "kamctl trap" it was working
again
>> without kamailio restart. >> >> Best >> Kristijan >> >> Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin
Mierla
>> miconda@gmail.com: >>> Hello, >>> >>> setting tcp_children=1 is not a god option for scallability,
practically
>>> you set kamailio to process a single tcp message at one time,
on high
>>> traffic, that won't work well. >>> >>> Maybe try to set tcp_children to 2 or 4, that should make an
eventual
>>> race appear faster. >>> >>> Regarding the pid, if it is an outgoing connection, then it
can be
>>> created by any worker process, including a UDP worker, if that
was the
>>> one receiving the sip message over udp and sends it out via
tcp.
>>> >>> Cheers, >>> Daniel >>> >>> On 18.03.19 10:09, Kristijan Vrban wrote: >>>> Hi Daniel, >>>> >>>> for testing, i now had set: "tcp_children=1" and so far this
issue did not occur
>>>> ever since. So now value to provide for "kamctl trap" yet. >>>> >>>> "kamctl ps" show this two process to handle tcp: >>>> >>>> ... >>>> }, { >>>> "IDX": 25, >>>> "PID": 71929, >>>> "DSC": "tcp receiver (generic) child=0" >>>> }, { >>>> "IDX": 26, >>>> "PID": 71933, >>>> "DSC": "tcp main process" >>>> } >>>> ... >>>> >>>> >>>> Ok, but then is was wondering to see a TCP connection on a
udp receiver child:
>>>> >>>> >>>> netstat -ntp |grep 5061 >>>> >>>> ... >>>> tcp 0 0 172.17.217.10:5061
195.70.114.125:18252
>>>> ESTABLISHED 71895/kamailio >>>> ... >>>> >>>> An pid 71895 is: >>>> >>>> }, { >>>> "IDX": 3, >>>> "PID": 71895, >>>> "DSC": "udp receiver child=2 sock=127.0.0.1:5060" >>>> }, { >>>> >>>> >>>> >>>> And if i look into it via "lsof -p 71895" (the udp receiver
child)
>>>> >>>> ... >>>> kamailio 71895 kamailio 14u sock 0,9
0t0
>>>> 8856085 protocol: TCP >>>> kamailio 71895 kamailio 15u sock 0,9
0t0
>>>> 8886886 protocol: TCP >>>> kamailio 71895 kamailio 16u sock 0,9
0t0
>>>> 8854886 protocol: TCP >>>> kamailio 71895 kamailio 17u sock 0,9
0t0
>>>> 8828915 protocol: TCP >>>> kamailio 71895 kamailio 18u unix 0x000000005f73cb91
0t0
>>>> 1680314 type=DGRAM >>>> kamailio 71895 kamailio 19u IPv4 1846523
0t0
>>>> TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED) >>>> kamailio 71895 kamailio 20u sock 0,9
0t0
>>>> 8887192 protocol: TCP >>>> kamailio 71895 kamailio 21u sock 0,9
0t0
>>>> 8813634 protocol: TCP >>>> kamailio 71895 kamailio 22u unix 0x00000000c19bd102
0t0
>>>> 1681407 type=STREAM >>>> kamailio 71895 kamailio 23u sock 0,9
0t0
>>>> 8850488 protocol: TCP >>>> ... >>>> >>>> Not only the ESTABLISHED TCP session. But also this empty
sockets
>>>> "protocol: TCP" >>>> What are they doing there in the udp receiver? Is that how
it's supposed to be?
>>>> >>>> Kristijan >>>> >>>> Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin
Mierla
>>>> miconda@gmail.com: >>>>> Can you get file written by `kamctl trap`? It should have
the backtrace
>>>>> for all kamailio processes. You need latest kamailio 5.2. >>>>> >>>>> Also, get the output for: kamctl ps >>>>> >>>>> Cheers, >>>>> Daniel >>>>> >>>>> On 14.03.19 13:52, Kristijan Vrban wrote: >>>>>> When i attach via gdb to one of the tcp worker, i see this: >>>>>> >>>>>> (gdb) bt >>>>>> #0 0x00007fdaf4d14470 in futex_wait (private=<optimized
out>,
>>>>>> expected=1, futex_word=0x7fdaeca92f8c) at >>>>>> ../sysdeps/unix/sysv/linux/futex-internal.h:61 >>>>>> #1 futex_wait_simple (private=<optimized out>, expected=1, >>>>>> futex_word=0x7fdaeca92f8c) at
../sysdeps/nptl/futex-internal.h:135
>>>>>> #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at >>>>>> pthread_rwlock_wrlock.c:67 >>>>>> #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from >>>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>>>>> #4 0x00007fdaf08e1c08 in ?? () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>>>>> #5 0x00007fdaf08a6f69 in ?? () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>>>>> #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from >>>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>>>>> #7 0x00007fdaf0c31144 in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>>>>> #8 0x00007fdaf0c2bddb in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>>>>> #9 0x00007fdaf0c22858 in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>>>>> #10 0x00007fdaf0c1af61 in SSL_do_handshake () from >>>>>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>>>>> #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, >>>>>> error=0x7ffffe2a2df0) at tls_server.c:422 >>>>>> #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, >>>>>> flags=0x7ffffe2c318c) at tls_server.c:1116 >>>>>> #13 0x0000556ead5e7c46 in tcp_read_headers
(c=0x7fdaed26fa98,
>>>>>> read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 >>>>>> #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, >>>>>> bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at >>>>>> core/tcp_read.c:1496 >>>>>> #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98,
events=1,
>>>>>> idx=-1) at core/tcp_read.c:1862 >>>>>> #16 0x0000556ead5e2053 in io_wait_loop_epoll
(h=0x556eadaaeec0 <io_w>,
>>>>>> t=2, repeat=0) at core/io_wait.h:1065 >>>>>> #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at >>>>>> core/tcp_read.c:1974 >>>>>> #18 0x0000556ead4c8e24 in tcp_init_children () at
core/tcp_main.c:4853
>>>>>> #19 0x0000556ead3c352a in main_loop () at main.c:1735 >>>>>> #20 0x0000556ead3ca5f8 in main (argc=13,
argv=0x7ffffe2c3828) at main.c:2675
>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban >>>>>> vrban.lkml@gmail.com: >>>>>>> Hi, with full debug is see this in log for every incoming
TCP SIP request:
>>>>>>> >>>>>>> Mar 14 12:10:15 kamailio-preview
/usr/sbin/kamailio[17940]: DEBUG:
>>>>>>> <core> [core/tcp_main.c:3871]: send2child(): WARNING: no
free tcp
>>>>>>> receiver, connection passed to the least busy one (105) >>>>>>> Mar 14 12:10:15 kamailio-preview
/usr/sbin/kamailio[17940]: DEBUG:
>>>>>>> <core> [core/tcp_main.c:3875]: send2child(): selected tcp
worker 2
>>>>>>> 27(17937) for activity on [tls:172.17.217.10:5061],
0x7fdaeda8f928
>>>>>>> >>>>>>> So the Kamailio TCP process is working, and received TCP
traffic. But
>>>>>>> the tcp workers are somehow busy. >>>>>>> >>>>>>> When i attach via strace to the TCP worker, i do not see
any activity. Just:
>>>>>>> >>>>>>> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL >>>>>>> >>>>>>> and nothing, even when i see the main tcp process choose
this worker process.
>>>>>>> >>>>>>> Kristijan >>>>>>> >>>>>>> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban >>>>>>> vrban.lkml@gmail.com: >>>>>>>> first of all thanks for the feedback. i prepared our
system now to run
>>>>>>>> with debug=3 >>>>>>>> I hope to see more then then. >>>>>>>> >>>>>>>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban >>>>>>>> vrban.lkml@gmail.com: >>>>>>>>> Hi kamailios, >>>>>>>>> >>>>>>>>> i have a creepy situation with v5.2.1 stable Kamilio.
After a day or
>>>>>>>>> so, Kamailio stop to process incoming SIP traffic via
TCP. The
>>>>>>>>> incoming TCP network packages get TCP-ACK from the OS
(Debian 9,
>>>>>>>>> 4.18.0-15-generic-Linux) but Kamailio does not show any
processing for
>>>>>>>>> the SIP-Traffic incoming via TCP. No logs, nothing.
While traffic via
>>>>>>>>> UDP is working just totally fine. >>>>>>>>> >>>>>>>>> When i look via command "netstat -ntp" is see, that the
Recv-Q get
>>>>>>>>> bigger and bigger. e.g.: >>>>>>>>> >>>>>>>>> Proto Recv-Q Send-Q Local Address Foreign Address State
PID/Program
>>>>>>>>> name tcp 4566 0 172.17.217.12:5060
xxx.xxx.xxx.xxx:57252 ESTABLISHED
>>>>>>>>> 31347/kamailio >>>>>>>>> >>>>>>>>> After Kamailio restart, all is working fine again for a
day. We have
>>>>>>>>> maybe 10-20 devices online via TCP and low call volume
(1-2 call per
>>>>>>>>> minute). The only settings for tcp we have is
"tcp_delayed_ack=no"
>>>>>>>>> >>>>>>>>> How to could we debug this situation? Again, no error,
no warings in
>>>>>>>>> the log. Just nothing. >>>>>>>>> >>>>>>>>> Kristijan >>>>>> _______________________________________________ >>>>>> Kamailio (SER) - Users Mailing List >>>>>> sr-users@lists.kamailio.org >>>>>>
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>>>>> -- >>>>> Daniel-Constantin Mierla -- www.asipto.com >>>>> www.twitter.com/miconda -- www.linkedin.com/in/miconda >>>>> Kamailio World Conference - May 6-8, 2019 --
www.kamailioworld.com
>>>>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington,
DC, USA -- www.asipto.com
>>>>> >>>> _______________________________________________ >>>> Kamailio (SER) - Users Mailing List >>>> sr-users@lists.kamailio.org >>>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users >>> -- >>> Daniel-Constantin Mierla -- www.asipto.com >>> www.twitter.com/miconda -- www.linkedin.com/in/miconda >>> Kamailio World Conference - May 6-8, 2019 --
www.kamailioworld.com
>>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington,
DC, USA -- www.asipto.com
>>> > -- > Daniel-Constantin Mierla -- www.asipto.com > www.twitter.com/miconda -- www.linkedin.com/in/miconda > Kamailio World Conference - May 6-8, 2019 --
www.kamailioworld.com
> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC,
USA -- www.asipto.com
>
-- Daniel-Constantin Mierla -- www.asipto.com www.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC,
USA -- www.asipto.com
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- Antisip - http://www.antisip.com
-- Daniel-Constantin Mierla -- www.asipto.comwww.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
-- Daniel-Constantin Mierla -- www.asipto.comwww.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com
Hello,
I think one possibility to reproduce the issue would be to create a scenario when same connection is wanted at the same time, when the first process that gets the lock on it needs a bit more time to execute. Not sure how to create the case, maybe something like:
- enable onsend_route for replies and when getting a 200ok reply to and INVITE there, do a sleep() - because the ACK is not coming fast enough, 200ok should be retransmitted by the callee, another K process will get it and will try to send it over the same connection - run with not many tcp workers, maybe like tcp_children=4 - do several calls and see how it goes
Again, not sure it covers the case properly, but it is something to test, because the backtraces I got showed attempts to use same connection.
Otherwise, just running it with traffic for long time, eventually with two kamailio connected via tls, so a single connection is used for all traffic between them and makes it likely to have many processes trying to use it.
Cheers, Daniel
On 01.05.19 22:26, Aymeric Moizard wrote:
HI Daniel,
I have received your request and have added it to my TODO list...
Unfortunatly, no much time currently. I will certainly do it later, but cannot give any delay for it.
Also, I would really like to understand how to "generate" the issue. (I think I had the issue only once or twice this year...)
Otherwise, I will have no way to make sure the workaround would work...
Tks Aymeric
Le lun. 15 avr. 2019 à 09:06, Daniel-Constantin Mierla <miconda@gmail.com mailto:miconda@gmail.com> a écrit :
Hello Aymeric, would you be able to test with tls module compiled against libssl 1.1 and using the pre-loaded shared object workaround? * https://github.com/kamailio/kamailio/tree/master/src/modules/tls/utils/openssl_mutex_shared You should be able to use it with any version, no need to test with kamailio master branch. Just clone the master branch, then: cd src/modules/tls/utils/openssl_mutex_shared make Either from there or copy openssl_mutex_shared.so to a location you want, then pre-load it before starting your version of Kamailio. The README.md in the folder has some more details. I would like to have some validation that it works fine before approaching this topic with libssl project to allow to init the locks with shared process option. Thanks, Daniel On 26.03.19 16:18, Daniel-Constantin Mierla wrote:
Hello, yep, locking there is expected, as listing the tls connections wait for no other processes to change the content of internal tls connection structures. So it is a side effect of libssl/libcrypto getting stuck and the other processing waiting for it to move one. I have the Kamailio training in USA these days, so the trip and schedule of the day didn't allow me to look more at the libsll/libcrypto code in order to find a solution here. It is a high priority in my list, as I get time during the next days. Cheers, Daniel On 26.03.19 15:55, Aymeric Moizard wrote:
Hi All, I was debugging a TCP issue (most probably, I may start a thread for this question). I was trying to get some info for TCP and TLS. I typed: $> sudo kamctl rpc tls.list And waited for a while.... until... I realized that my User-Agent, connected with TCP was not able to register any more. I think the rpc command has introduced something wrong. The device can successfully "connect", send the REGISTER over the established TCP connection. The REGISTER do not appear in the logs any more, I don't see any traffic for TCP any more. So the behavior is the same as I had before: TCP and TLS are both not working and UDP is still working fine. kamctl do not work any more... so kamctl trap do not work... I have been able to type.. manually... for (all?) kamailio threads: gdb /usr/sbin/kamailio 16500 -batch --eval-command="bt full" >> kamailio-trap-tcp-down.txt I'm temporarly puting the backtrace I have here: https://sip.antisip.com/kamailio-trap-tcp-down.txt You can see a thread stuck on the json command line: "tls_list" And many other waiting on CRYPTO_THREAD_write_lock ? might be related to: https://github.com/openssl/openssl/issues/5376 SIDE NOTE: Right before I was typing the last gdb command for the last thread, kamailio has crashed: This was around 5 minutes after the dead lock started. Mar 26 14:47:11 sip kamailio[16493]: ERROR: <core> [core/tcp_main.c:2561]: tcpconn_do_send(): failed to send on 0x7ff8dfc2fdc8 (91.121.30.149:5061->62.210.97.21:49351 <http://62.210.97.21:49351>): Broken pipe (32) Mar 26 14:47:11 sip kamailio[16493]: ERROR: <core> [core/tcp_read.c:1505]: tcp_read_req(): ERROR: tcp_read_req: error reading - c: 0x7ff8dfc2fdc8 r: 0x7ff8dfc2fe48 (-1) Mar 26 14:47:11 sip kamailio[16493]: WARNING: <core> [core/tcp_read.c:1848]: handle_io(): F_TCPCONN connection marked as bad: 0x7ff8dfa6a408 id 846 refcnt 3 Mar 26 14:47:11 sip kamailio[16371]: ALERT: <core> [main.c:755]: handle_sigs(): child process 16374 exited by a signal 11 Mar 26 14:47:11 sip kamailio[16371]: ALERT: <core> [main.c:758]: handle_sigs(): core was not generated Mar 26 14:47:11 sip kamailio[16371]: INFO: <core> [main.c:781]: handle_sigs(): terminating due to SIGCHLD Mar 26 14:47:11 sip kamailio[16493]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 26 14:47:11 sip kamailio[16500]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 26 14:47:11 sip kamailio[16479]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Unfortunalty, even if I did my best to setup my service to generate a core on crash, I still have "core was not generated".... (debian stretch) Tks for reading! Regards Aymeric Le mar. 26 mars 2019 à 14:11, Kristijan Vrban <vrban.lkml@gmail.com <mailto:vrban.lkml@gmail.com>> a écrit : And again one more kamctl trap file where set_reply_no_connect was set. Am Di., 26. März 2019 um 08:53 Uhr schrieb Kristijan Vrban <vrban.lkml@gmail.com <mailto:vrban.lkml@gmail.com>>: > > Attached also the output of kamctl trap > > Am Di., 26. März 2019 um 08:42 Uhr schrieb Kristijan Vrban > <vrban.lkml@gmail.com <mailto:vrban.lkml@gmail.com>>: > > > > > Have you done a test with tools such as sipp, or was this happening > > > after a while, with usual phones registering? > > > > Usual variety of devices registering via TLS. But i can not exclude > > that some devices displaying behavioural problems. > > > > > Can you list the tcp connections and see if they are listed? > > > kamctl tcp core.tcp_list > > > > Need Kex module for that? So i can deliver next time. But when i do > > "lsof -u kamailio |grep TCP" > > i get a long list of more then 2000 lines with: > > > > ... > > kamailio 37561 kamailio 2105u sock 0,9 0t0 > > 27856287 protocol: TCP > > kamailio 37561 kamailio 2106u sock 0,9 0t0 > > 27856305 protocol: TCP > > kamailio 37561 kamailio 2107u sock 0,9 0t0 > > 27856306 protocol: TCP > > kamailio 37561 kamailio 2108u sock 0,9 0t0 > > 27856914 protocol: TCP > > ... > > > > So about the time Kamailio created a lot of socket in the TCP domain, > > but which are not bound to any port (eg via connect(2) or listen(2) or > > bind(2)) > > Until we get to the maximum number of 2048 connections. > > > > Best > > Kristijan > > > > Am Mo., 25. März 2019 um 14:27 Uhr schrieb Daniel-Constantin Mierla > > <miconda@gmail.com <mailto:miconda@gmail.com>>: > > > > > > Have you done a test with tools such as sipp, or was this happening > > > after a while, with usual phones registering? > > > > > > Can you list the tcp connections and see if they are listed? > > > > > > kamctl tcp core.tcp_list > > > > > > Cheers, > > > Daniel > > > > > > On 25.03.19 08:03, Kristijan Vrban wrote: > > > >> The solution here is to use set_reply_no_connect() > > > > implemented it. Now the issue has shifted to: > > > > > > > > ERROR: <core> [core/tcp_main.c:3959]: handle_new_connect(): maximum > > > > number of connections exceeded: 2048/2048 > > > > > > > > But not a single TCP connection is active between Kamailio and any > > > > device. Seems this counter for maximum number of connections > > > > now has an issue? > > > > > > > > Kristijan > > > > > > > > Am Mi., 20. März 2019 um 15:07 Uhr schrieb Daniel-Constantin Mierla > > > > <miconda@gmail.com <mailto:miconda@gmail.com>>: > > > >> Hello, > > > >> > > > >> based on the trap output I think I could figure out what happened there. > > > >> > > > >> You have tcp_children to very low value (1 or so), the problem is not > > > >> actually that one, but the fact that the connection to upstream (the > > > >> device/app sending the request) was closed after receiving the request > > > >> and routing of the reply gets stuck in the way of: > > > >> > > > >> - a reply is received and has to be forwarded > > > >> - connection was lost, so Kamailio tries to establish a new one, but > > > >> takes time till fails because the upstream is behind nat or so based on > > > >> the via header: > > > >> > > > >> Via: SIP/2.0/TLS > > > >> 10.1.0.4:10002;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9 > > > >> > > > >> - the reply is retransmitted and gets to another worker, which tries > > > >> to forward it again, but discovers a connection structure for that > > > >> destination exists (created by previous reply worker) and now waits for > > > >> the connection to be released (or better said, for the mutex on writing > > > >> buffer to be unlocked) > > > >> > > > >> - as the second reply waits, there can be other retransmissions of the > > > >> reply ending up in other workers stuck on waiting for the mutex of the > > > >> connection write buffer > > > >> > > > >> The solution here is to use set_reply_no_connect() -- you can put it > > > >> first in request_route block. I think this would be a good addition to > > > >> the default configuration file as well, IMO, the sip server should not > > > >> connect for sending replies and should do it also for requests that go > > > >> behind nat. > > > >> > > > >> Cheers, > > > >> Daniel > > > >> > > > >> On 19.03.19 10:53, Kristijan Vrban wrote: > > > >>> So i had again the situation. But this time, incoming udp was > > > >>> affected. Kamailio was sending out OPTIONS (via dispatcher module) to > > > >>> a group of asterisk machines > > > >>> but the 200 OK reply to the OPTIONS where not processed, so the > > > >>> dispatcher module set all asterisk to inactive, even though they > > > >>> replied 200 OK > > > >>> > > > >>> Attached the output of kamctl trap during the situation. Hope there is > > > >>> any useful in it. Because after "kamctl trap" it was working again > > > >>> without kamailio restart. > > > >>> > > > >>> Best > > > >>> Kristijan > > > >>> > > > >>> Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla > > > >>> <miconda@gmail.com <mailto:miconda@gmail.com>>: > > > >>>> Hello, > > > >>>> > > > >>>> setting tcp_children=1 is not a god option for scallability, practically > > > >>>> you set kamailio to process a single tcp message at one time, on high > > > >>>> traffic, that won't work well. > > > >>>> > > > >>>> Maybe try to set tcp_children to 2 or 4, that should make an eventual > > > >>>> race appear faster. > > > >>>> > > > >>>> Regarding the pid, if it is an outgoing connection, then it can be > > > >>>> created by any worker process, including a UDP worker, if that was the > > > >>>> one receiving the sip message over udp and sends it out via tcp. > > > >>>> > > > >>>> Cheers, > > > >>>> Daniel > > > >>>> > > > >>>> On 18.03.19 10:09, Kristijan Vrban wrote: > > > >>>>> Hi Daniel, > > > >>>>> > > > >>>>> for testing, i now had set: "tcp_children=1" and so far this issue did not occur > > > >>>>> ever since. So now value to provide for "kamctl trap" yet. > > > >>>>> > > > >>>>> "kamctl ps" show this two process to handle tcp: > > > >>>>> > > > >>>>> ... > > > >>>>> }, { > > > >>>>> "IDX": 25, > > > >>>>> "PID": 71929, > > > >>>>> "DSC": "tcp receiver (generic) child=0" > > > >>>>> }, { > > > >>>>> "IDX": 26, > > > >>>>> "PID": 71933, > > > >>>>> "DSC": "tcp main process" > > > >>>>> } > > > >>>>> ... > > > >>>>> > > > >>>>> > > > >>>>> Ok, but then is was wondering to see a TCP connection on a udp receiver child: > > > >>>>> > > > >>>>> > > > >>>>> netstat -ntp |grep 5061 > > > >>>>> > > > >>>>> ... > > > >>>>> tcp 0 0 172.17.217.10:5061 <http://172.17.217.10:5061> 195.70.114.125:18252 <http://195.70.114.125:18252> > > > >>>>> ESTABLISHED 71895/kamailio > > > >>>>> ... > > > >>>>> > > > >>>>> An pid 71895 is: > > > >>>>> > > > >>>>> }, { > > > >>>>> "IDX": 3, > > > >>>>> "PID": 71895, > > > >>>>> "DSC": "udp receiver child=2 sock=127.0.0.1:5060 <http://127.0.0.1:5060>" > > > >>>>> }, { > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> And if i look into it via "lsof -p 71895" (the udp receiver child) > > > >>>>> > > > >>>>> ... > > > >>>>> kamailio 71895 kamailio 14u sock 0,9 0t0 > > > >>>>> 8856085 protocol: TCP > > > >>>>> kamailio 71895 kamailio 15u sock 0,9 0t0 > > > >>>>> 8886886 protocol: TCP > > > >>>>> kamailio 71895 kamailio 16u sock 0,9 0t0 > > > >>>>> 8854886 protocol: TCP > > > >>>>> kamailio 71895 kamailio 17u sock 0,9 0t0 > > > >>>>> 8828915 protocol: TCP > > > >>>>> kamailio 71895 kamailio 18u unix 0x000000005f73cb91 0t0 > > > >>>>> 1680314 type=DGRAM > > > >>>>> kamailio 71895 kamailio 19u IPv4 1846523 0t0 > > > >>>>> TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED) > > > >>>>> kamailio 71895 kamailio 20u sock 0,9 0t0 > > > >>>>> 8887192 protocol: TCP > > > >>>>> kamailio 71895 kamailio 21u sock 0,9 0t0 > > > >>>>> 8813634 protocol: TCP > > > >>>>> kamailio 71895 kamailio 22u unix 0x00000000c19bd102 0t0 > > > >>>>> 1681407 type=STREAM > > > >>>>> kamailio 71895 kamailio 23u sock 0,9 0t0 > > > >>>>> 8850488 protocol: TCP > > > >>>>> ... > > > >>>>> > > > >>>>> Not only the ESTABLISHED TCP session. But also this empty sockets > > > >>>>> "protocol: TCP" > > > >>>>> What are they doing there in the udp receiver? Is that how it's supposed to be? > > > >>>>> > > > >>>>> Kristijan > > > >>>>> > > > >>>>> Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla > > > >>>>> <miconda@gmail.com <mailto:miconda@gmail.com>>: > > > >>>>>> Can you get file written by `kamctl trap`? It should have the backtrace > > > >>>>>> for all kamailio processes. You need latest kamailio 5.2. > > > >>>>>> > > > >>>>>> Also, get the output for: kamctl ps > > > >>>>>> > > > >>>>>> Cheers, > > > >>>>>> Daniel > > > >>>>>> > > > >>>>>> On 14.03.19 13:52, Kristijan Vrban wrote: > > > >>>>>>> When i attach via gdb to one of the tcp worker, i see this: > > > >>>>>>> > > > >>>>>>> (gdb) bt > > > >>>>>>> #0 0x00007fdaf4d14470 in futex_wait (private=<optimized out>, > > > >>>>>>> expected=1, futex_word=0x7fdaeca92f8c) at > > > >>>>>>> ../sysdeps/unix/sysv/linux/futex-internal.h:61 > > > >>>>>>> #1 futex_wait_simple (private=<optimized out>, expected=1, > > > >>>>>>> futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135 > > > >>>>>>> #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at > > > >>>>>>> pthread_rwlock_wrlock.c:67 > > > >>>>>>> #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from > > > >>>>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 > > > >>>>>>> #4 0x00007fdaf08e1c08 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 > > > >>>>>>> #5 0x00007fdaf08a6f69 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 > > > >>>>>>> #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from > > > >>>>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 > > > >>>>>>> #7 0x00007fdaf0c31144 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 > > > >>>>>>> #8 0x00007fdaf0c2bddb in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 > > > >>>>>>> #9 0x00007fdaf0c22858 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 > > > >>>>>>> #10 0x00007fdaf0c1af61 in SSL_do_handshake () from > > > >>>>>>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1 > > > >>>>>>> #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, > > > >>>>>>> error=0x7ffffe2a2df0) at tls_server.c:422 > > > >>>>>>> #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, > > > >>>>>>> flags=0x7ffffe2c318c) at tls_server.c:1116 > > > >>>>>>> #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98, > > > >>>>>>> read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 > > > >>>>>>> #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, > > > >>>>>>> bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at > > > >>>>>>> core/tcp_read.c:1496 > > > >>>>>>> #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1, > > > >>>>>>> idx=-1) at core/tcp_read.c:1862 > > > >>>>>>> #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0 <io_w>, > > > >>>>>>> t=2, repeat=0) at core/io_wait.h:1065 > > > >>>>>>> #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at > > > >>>>>>> core/tcp_read.c:1974 > > > >>>>>>> #18 0x0000556ead4c8e24 in tcp_init_children () at core/tcp_main.c:4853 > > > >>>>>>> #19 0x0000556ead3c352a in main_loop () at main.c:1735 > > > >>>>>>> #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at main.c:2675 > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban > > > >>>>>>> <vrban.lkml@gmail.com <mailto:vrban.lkml@gmail.com>>: > > > >>>>>>>> Hi, with full debug is see this in log for every incoming TCP SIP request: > > > >>>>>>>> > > > >>>>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: > > > >>>>>>>> <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp > > > >>>>>>>> receiver, connection passed to the least busy one (105) > > > >>>>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: > > > >>>>>>>> <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker 2 > > > >>>>>>>> 27(17937) for activity on [tls:172.17.217.10:5061 <http://172.17.217.10:5061>], 0x7fdaeda8f928 > > > >>>>>>>> > > > >>>>>>>> So the Kamailio TCP process is working, and received TCP traffic. But > > > >>>>>>>> the tcp workers are somehow busy. > > > >>>>>>>> > > > >>>>>>>> When i attach via strace to the TCP worker, i do not see any activity. Just: > > > >>>>>>>> > > > >>>>>>>> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL > > > >>>>>>>> > > > >>>>>>>> and nothing, even when i see the main tcp process choose this worker process. > > > >>>>>>>> > > > >>>>>>>> Kristijan > > > >>>>>>>> > > > >>>>>>>> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban > > > >>>>>>>> <vrban.lkml@gmail.com <mailto:vrban.lkml@gmail.com>>: > > > >>>>>>>>> first of all thanks for the feedback. i prepared our system now to run > > > >>>>>>>>> with debug=3 > > > >>>>>>>>> I hope to see more then then. > > > >>>>>>>>> > > > >>>>>>>>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban > > > >>>>>>>>> <vrban.lkml@gmail.com <mailto:vrban.lkml@gmail.com>>: > > > >>>>>>>>>> Hi kamailios, > > > >>>>>>>>>> > > > >>>>>>>>>> i have a creepy situation with v5.2.1 stable Kamilio. After a day or > > > >>>>>>>>>> so, Kamailio stop to process incoming SIP traffic via TCP. The > > > >>>>>>>>>> incoming TCP network packages get TCP-ACK from the OS (Debian 9, > > > >>>>>>>>>> 4.18.0-15-generic-Linux) but Kamailio does not show any processing for > > > >>>>>>>>>> the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via > > > >>>>>>>>>> UDP is working just totally fine. > > > >>>>>>>>>> > > > >>>>>>>>>> When i look via command "netstat -ntp" is see, that the Recv-Q get > > > >>>>>>>>>> bigger and bigger. e.g.: > > > >>>>>>>>>> > > > >>>>>>>>>> Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program > > > >>>>>>>>>> name tcp 4566 0 172.17.217.12:5060 <http://172.17.217.12:5060> xxx.xxx.xxx.xxx:57252 ESTABLISHED > > > >>>>>>>>>> 31347/kamailio > > > >>>>>>>>>> > > > >>>>>>>>>> After Kamailio restart, all is working fine again for a day. We have > > > >>>>>>>>>> maybe 10-20 devices online via TCP and low call volume (1-2 call per > > > >>>>>>>>>> minute). The only settings for tcp we have is "tcp_delayed_ack=no" > > > >>>>>>>>>> > > > >>>>>>>>>> How to could we debug this situation? Again, no error, no warings in > > > >>>>>>>>>> the log. Just nothing. > > > >>>>>>>>>> > > > >>>>>>>>>> Kristijan > > > >>>>>>> _______________________________________________ > > > >>>>>>> Kamailio (SER) - Users Mailing List > > > >>>>>>> sr-users@lists.kamailio.org <mailto:sr-users@lists.kamailio.org> > > > >>>>>>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users > > > >>>>>> -- > > > >>>>>> Daniel-Constantin Mierla -- www.asipto.com <http://www.asipto.com> > > > >>>>>> www.twitter.com/miconda <http://www.twitter.com/miconda> -- www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda> > > > >>>>>> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com <http://www.kamailioworld.com> > > > >>>>>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com <http://www.asipto.com> > > > >>>>>> > > > >>>>> _______________________________________________ > > > >>>>> Kamailio (SER) - Users Mailing List > > > >>>>> sr-users@lists.kamailio.org <mailto:sr-users@lists.kamailio.org> > > > >>>>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users > > > >>>> -- > > > >>>> Daniel-Constantin Mierla -- www.asipto.com <http://www.asipto.com> > > > >>>> www.twitter.com/miconda <http://www.twitter.com/miconda> -- www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda> > > > >>>> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com <http://www.kamailioworld.com> > > > >>>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com <http://www.asipto.com> > > > >>>> > > > >> -- > > > >> Daniel-Constantin Mierla -- www.asipto.com <http://www.asipto.com> > > > >> www.twitter.com/miconda <http://www.twitter.com/miconda> -- www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda> > > > >> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com <http://www.kamailioworld.com> > > > >> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com <http://www.asipto.com> > > > >> > > > -- > > > Daniel-Constantin Mierla -- www.asipto.com <http://www.asipto.com> > > > www.twitter.com/miconda <http://www.twitter.com/miconda> -- www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda> > > > Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com <http://www.kamailioworld.com> > > > Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com <http://www.asipto.com> > > > _______________________________________________ Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org <mailto:sr-users@lists.kamailio.org> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users -- Antisip - http://www.antisip.com
-- Daniel-Constantin Mierla -- www.asipto.com <http://www.asipto.com> www.twitter.com/miconda <http://www.twitter.com/miconda> -- www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com <http://www.kamailioworld.com> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com <http://www.asipto.com>
-- Daniel-Constantin Mierla -- www.asipto.com <http://www.asipto.com> www.twitter.com/miconda <http://www.twitter.com/miconda> -- www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com <http://www.kamailioworld.com>
-- Antisip - http://www.antisip.com
Hi all!
We have used the work-around with the pre-loaded library and so far this seems to have fixed our problem (that my colleague Kristijan Vrban reported). At least we did not have a single failure within the last week, whereas before the issue happened about once every 2 days. Would be nice if this would be part of the next Kamailio version.
With best regards
Florian Floimair Innovation - Software-Development
COMMEND INTERNATIONAL GMBH A-5020 Salzburg, Saalachstraße 51 http://www.commend.comhttp://www.commend.com/
Security and Communication by Commend
FN 178618z | LG Salzburg
Von: sr-users sr-users-bounces@lists.kamailio.org im Auftrag von Daniel-Constantin Mierla miconda@gmail.com Antworten an: "miconda@gmail.com" miconda@gmail.com, "Kamailio (SER) - Users Mailing List" sr-users@lists.kamailio.org Datum: Montag, 15. April 2019 um 09:07 An: Aymeric Moizard amoizard@gmail.com, "Kamailio (SER) - Users Mailing List" sr-users@lists.kamailio.org Betreff: Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.
Hello Aymeric,
would you be able to test with tls module compiled against libssl 1.1 and using the pre-loaded shared object workaround?
* https://github.com/kamailio/kamailio/tree/master/src/modules/tls/utils/opens...https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkamailio%2Fkamailio%2Ftree%2Fmaster%2Fsrc%2Fmodules%2Ftls%2Futils%2Fopenssl_mutex_shared&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526480174&sdata=d9E%2Fy4cvdLkGCPUexoCJ7tws3QL4rFqz5ebnMGnsESQ%3D&reserved=0
You should be able to use it with any version, no need to test with kamailio master branch.
Just clone the master branch, then:
cd src/modules/tls/utils/openssl_mutex_shared
make
Either from there or copy openssl_mutex_shared.so to a location you want, then pre-load it before starting your version of Kamailio.
The README.md in the folder has some more details.
I would like to have some validation that it works fine before approaching this topic with libssl project to allow to init the locks with shared process option.
Thanks, Daniel On 26.03.19 16:18, Daniel-Constantin Mierla wrote:
Hello,
yep, locking there is expected, as listing the tls connections wait for no other processes to change the content of internal tls connection structures. So it is a side effect of libssl/libcrypto getting stuck and the other processing waiting for it to move one. I have the Kamailio training in USA these days, so the trip and schedule of the day didn't allow me to look more at the libsll/libcrypto code in order to find a solution here. It is a high priority in my list, as I get time during the next days.
Cheers, Daniel On 26.03.19 15:55, Aymeric Moizard wrote: Hi All,
I was debugging a TCP issue (most probably, I may start a thread for this question).
I was trying to get some info for TCP and TLS.
I typed: $> sudo kamctl rpc tls.list
And waited for a while.... until... I realized that my User-Agent, connected with TCP was not able to register any more. I think the rpc command has introduced something wrong.
The device can successfully "connect", send the REGISTER over the established TCP connection. The REGISTER do not appear in the logs any more, I don't see any traffic for TCP any more. So the behavior is the same as I had before: TCP and TLS are both not working and UDP is still working fine.
kamctl do not work any more... so kamctl trap do not work...
I have been able to type.. manually... for (all?) kamailio threads:
gdb /usr/sbin/kamailio 16500 -batch --eval-command="bt full" >> kamailio-trap-tcp-down.txt
I'm temporarly puting the backtrace I have here: https://sip.antisip.com/kamailio-trap-tcp-down.txthttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsip.antisip.com%2Fkamailio-trap-tcp-down.txt&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526490178&sdata=1lfFxvR0m4PVcfnYsrrIO%2FM2nbGK6zfpl2C01O2c7M0%3D&reserved=0
You can see a thread stuck on the json command line: "tls_list" And many other waiting on CRYPTO_THREAD_write_lock ? might be related to: https://github.com/openssl/openssl/issues/5376https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenssl%2Fopenssl%2Fissues%2F5376&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526490178&sdata=D5Fb4U3trdbRUY7ifMLSc5KE4mAxjK2%2BzOy8nSD1Rks%3D&reserved=0 SIDE NOTE: Right before I was typing the last gdb command for the last thread, kamailio has crashed: This was around 5 minutes after the dead lock started.
Mar 26 14:47:11 sip kamailio[16493]: ERROR: <core> [core/tcp_main.c:2561]: tcpconn_do_send(): failed to send on 0x7ff8dfc2fdc8 (91.121.30.149:5061->62.210.97.21:49351https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2F62.210.97.21%3A49351&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526500195&sdata=9XqEUKoMwNEvCPFtKfvB0c43yk1GcSzYOiPdY9Pj1uo%3D&reserved=0): Broken pipe (32) Mar 26 14:47:11 sip kamailio[16493]: ERROR: <core> [core/tcp_read.c:1505]: tcp_read_req(): ERROR: tcp_read_req: error reading - c: 0x7ff8dfc2fdc8 r: 0x7ff8dfc2fe48 (-1) Mar 26 14:47:11 sip kamailio[16493]: WARNING: <core> [core/tcp_read.c:1848]: handle_io(): F_TCPCONN connection marked as bad: 0x7ff8dfa6a408 id 846 refcnt 3 Mar 26 14:47:11 sip kamailio[16371]: ALERT: <core> [main.c:755]: handle_sigs(): child process 16374 exited by a signal 11 Mar 26 14:47:11 sip kamailio[16371]: ALERT: <core> [main.c:758]: handle_sigs(): core was not generated Mar 26 14:47:11 sip kamailio[16371]: INFO: <core> [main.c:781]: handle_sigs(): terminating due to SIGCHLD Mar 26 14:47:11 sip kamailio[16493]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 26 14:47:11 sip kamailio[16500]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 26 14:47:11 sip kamailio[16479]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received
Unfortunalty, even if I did my best to setup my service to generate a core on crash, I still have "core was not generated".... (debian stretch)
Tks for reading! Regards Aymeric
Le mar. 26 mars 2019 à 14:11, Kristijan Vrban <vrban.lkml@gmail.commailto:vrban.lkml@gmail.com> a écrit : And again one more kamctl trap file where
set_reply_no_connect was set.
Am Di., 26. März 2019 um 08:53 Uhr schrieb Kristijan Vrban <vrban.lkml@gmail.commailto:vrban.lkml@gmail.com>:
Attached also the output of kamctl trap
Am Di., 26. März 2019 um 08:42 Uhr schrieb Kristijan Vrban <vrban.lkml@gmail.commailto:vrban.lkml@gmail.com>:
Have you done a test with tools such as sipp, or was this happening after a while, with usual phones registering?
Usual variety of devices registering via TLS. But i can not exclude that some devices displaying behavioural problems.
Can you list the tcp connections and see if they are listed? kamctl tcp core.tcp_list
Need Kex module for that? So i can deliver next time. But when i do "lsof -u kamailio |grep TCP" i get a long list of more then 2000 lines with:
... kamailio 37561 kamailio 2105u sock 0,9 0t0 27856287 protocol: TCP kamailio 37561 kamailio 2106u sock 0,9 0t0 27856305 protocol: TCP kamailio 37561 kamailio 2107u sock 0,9 0t0 27856306 protocol: TCP kamailio 37561 kamailio 2108u sock 0,9 0t0 27856914 protocol: TCP ...
So about the time Kamailio created a lot of socket in the TCP domain, but which are not bound to any port (eg via connect(2) or listen(2) or bind(2)) Until we get to the maximum number of 2048 connections.
Best Kristijan
Am Mo., 25. März 2019 um 14:27 Uhr schrieb Daniel-Constantin Mierla <miconda@gmail.commailto:miconda@gmail.com>:
Have you done a test with tools such as sipp, or was this happening after a while, with usual phones registering?
Can you list the tcp connections and see if they are listed?
kamctl tcp core.tcp_list
Cheers, Daniel
On 25.03.19 08:03, Kristijan Vrban wrote:
The solution here is to use set_reply_no_connect()
implemented it. Now the issue has shifted to:
ERROR: <core> [core/tcp_main.c:3959]: handle_new_connect(): maximum number of connections exceeded: 2048/2048
But not a single TCP connection is active between Kamailio and any device. Seems this counter for maximum number of connections now has an issue?
Kristijan
Am Mi., 20. März 2019 um 15:07 Uhr schrieb Daniel-Constantin Mierla <miconda@gmail.commailto:miconda@gmail.com>:
Hello,
based on the trap output I think I could figure out what happened there.
You have tcp_children to very low value (1 or so), the problem is not actually that one, but the fact that the connection to upstream (the device/app sending the request) was closed after receiving the request and routing of the reply gets stuck in the way of:
- a reply is received and has to be forwarded
- connection was lost, so Kamailio tries to establish a new one, but
takes time till fails because the upstream is behind nat or so based on the via header:
Via: SIP/2.0/TLS 10.1.0.4:10002;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
- the reply is retransmitted and gets to another worker, which tries
to forward it again, but discovers a connection structure for that destination exists (created by previous reply worker) and now waits for the connection to be released (or better said, for the mutex on writing buffer to be unlocked)
- as the second reply waits, there can be other retransmissions of the
reply ending up in other workers stuck on waiting for the mutex of the connection write buffer
The solution here is to use set_reply_no_connect() -- you can put it first in request_route block. I think this would be a good addition to the default configuration file as well, IMO, the sip server should not connect for sending replies and should do it also for requests that go behind nat.
Cheers, Daniel
On 19.03.19 10:53, Kristijan Vrban wrote:
So i had again the situation. But this time, incoming udp was affected. Kamailio was sending out OPTIONS (via dispatcher module) to a group of asterisk machines but the 200 OK reply to the OPTIONS where not processed, so the dispatcher module set all asterisk to inactive, even though they replied 200 OK
Attached the output of kamctl trap during the situation. Hope there is any useful in it. Because after "kamctl trap" it was working again without kamailio restart.
Best Kristijan
Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla <miconda@gmail.commailto:miconda@gmail.com>: > Hello, > > setting tcp_children=1 is not a god option for scallability, practically > you set kamailio to process a single tcp message at one time, on high > traffic, that won't work well. > > Maybe try to set tcp_children to 2 or 4, that should make an eventual > race appear faster. > > Regarding the pid, if it is an outgoing connection, then it can be > created by any worker process, including a UDP worker, if that was the > one receiving the sip message over udp and sends it out via tcp. > > Cheers, > Daniel > > On 18.03.19 10:09, Kristijan Vrban wrote: >> Hi Daniel, >> >> for testing, i now had set: "tcp_children=1" and so far this issue did not occur >> ever since. So now value to provide for "kamctl trap" yet. >> >> "kamctl ps" show this two process to handle tcp: >> >> ... >> }, { >> "IDX": 25, >> "PID": 71929, >> "DSC": "tcp receiver (generic) child=0" >> }, { >> "IDX": 26, >> "PID": 71933, >> "DSC": "tcp main process" >> } >> ... >> >> >> Ok, but then is was wondering to see a TCP connection on a udp receiver child: >> >> >> netstat -ntp |grep 5061 >> >> ... >> tcp 0 0 172.17.217.10:5061https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2F172.17.217.10%3A5061&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526500195&sdata=wO8Jbw676nbMafjlenNGCrZgNEhZvye4Go7UWk1umF8%3D&reserved=0 195.70.114.125:18252https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2F195.70.114.125%3A18252&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526510200&sdata=z%2FJm5HNDlL6xhlui1%2FBcMkvSQDAbp%2FxkM3KY9sAUX0s%3D&reserved=0 >> ESTABLISHED 71895/kamailio >> ... >> >> An pid 71895 is: >> >> }, { >> "IDX": 3, >> "PID": 71895, >> "DSC": "udp receiver child=2 sock=127.0.0.1:5060https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2F127.0.0.1%3A5060&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526520221&sdata=8r55f9MZ2gaw%2B2MA1LY1IfbnWkZDLHdr%2FSwRu7hwnvQ%3D&reserved=0" >> }, { >> >> >> >> And if i look into it via "lsof -p 71895" (the udp receiver child) >> >> ... >> kamailio 71895 kamailio 14u sock 0,9 0t0 >> 8856085 protocol: TCP >> kamailio 71895 kamailio 15u sock 0,9 0t0 >> 8886886 protocol: TCP >> kamailio 71895 kamailio 16u sock 0,9 0t0 >> 8854886 protocol: TCP >> kamailio 71895 kamailio 17u sock 0,9 0t0 >> 8828915 protocol: TCP >> kamailio 71895 kamailio 18u unix 0x000000005f73cb91 0t0 >> 1680314 type=DGRAM >> kamailio 71895 kamailio 19u IPv4 1846523 0t0 >> TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED) >> kamailio 71895 kamailio 20u sock 0,9 0t0 >> 8887192 protocol: TCP >> kamailio 71895 kamailio 21u sock 0,9 0t0 >> 8813634 protocol: TCP >> kamailio 71895 kamailio 22u unix 0x00000000c19bd102 0t0 >> 1681407 type=STREAM >> kamailio 71895 kamailio 23u sock 0,9 0t0 >> 8850488 protocol: TCP >> ... >> >> Not only the ESTABLISHED TCP session. But also this empty sockets >> "protocol: TCP" >> What are they doing there in the udp receiver? Is that how it's supposed to be? >> >> Kristijan >> >> Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla >> <miconda@gmail.commailto:miconda@gmail.com>: >>> Can you get file written by `kamctl trap`? It should have the backtrace >>> for all kamailio processes. You need latest kamailio 5.2. >>> >>> Also, get the output for: kamctl ps >>> >>> Cheers, >>> Daniel >>> >>> On 14.03.19 13:52, Kristijan Vrban wrote: >>>> When i attach via gdb to one of the tcp worker, i see this: >>>> >>>> (gdb) bt >>>> #0 0x00007fdaf4d14470 in futex_wait (private=<optimized out>, >>>> expected=1, futex_word=0x7fdaeca92f8c) at >>>> ../sysdeps/unix/sysv/linux/futex-internal.h:61 >>>> #1 futex_wait_simple (private=<optimized out>, expected=1, >>>> futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135 >>>> #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at >>>> pthread_rwlock_wrlock.c:67 >>>> #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from >>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>>> #4 0x00007fdaf08e1c08 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>>> #5 0x00007fdaf08a6f69 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>>> #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from >>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 >>>> #7 0x00007fdaf0c31144 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>>> #8 0x00007fdaf0c2bddb in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>>> #9 0x00007fdaf0c22858 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>>> #10 0x00007fdaf0c1af61 in SSL_do_handshake () from >>>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1 >>>> #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, >>>> error=0x7ffffe2a2df0) at tls_server.c:422 >>>> #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, >>>> flags=0x7ffffe2c318c) at tls_server.c:1116 >>>> #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98, >>>> read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 >>>> #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, >>>> bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at >>>> core/tcp_read.c:1496 >>>> #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1, >>>> idx=-1) at core/tcp_read.c:1862 >>>> #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0 <io_w>, >>>> t=2, repeat=0) at core/io_wait.h:1065 >>>> #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at >>>> core/tcp_read.c:1974 >>>> #18 0x0000556ead4c8e24 in tcp_init_children () at core/tcp_main.c:4853 >>>> #19 0x0000556ead3c352a in main_loop () at main.c:1735 >>>> #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at main.c:2675 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban >>>> <vrban.lkml@gmail.commailto:vrban.lkml@gmail.com>: >>>>> Hi, with full debug is see this in log for every incoming TCP SIP request: >>>>> >>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: >>>>> <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp >>>>> receiver, connection passed to the least busy one (105) >>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: >>>>> <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker 2 >>>>> 27(17937) for activity on [tls:172.17.217.10:5061https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2F172.17.217.10%3A5061&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526520221&sdata=ad7Txe5RLrnj0CMbQZORXvAdU0NpbrCjP5RzNrbxJdU%3D&reserved=0], 0x7fdaeda8f928 >>>>> >>>>> So the Kamailio TCP process is working, and received TCP traffic. But >>>>> the tcp workers are somehow busy. >>>>> >>>>> When i attach via strace to the TCP worker, i do not see any activity. Just: >>>>> >>>>> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL >>>>> >>>>> and nothing, even when i see the main tcp process choose this worker process. >>>>> >>>>> Kristijan >>>>> >>>>> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban >>>>> <vrban.lkml@gmail.commailto:vrban.lkml@gmail.com>: >>>>>> first of all thanks for the feedback. i prepared our system now to run >>>>>> with debug=3 >>>>>> I hope to see more then then. >>>>>> >>>>>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban >>>>>> <vrban.lkml@gmail.commailto:vrban.lkml@gmail.com>: >>>>>>> Hi kamailios, >>>>>>> >>>>>>> i have a creepy situation with v5.2.1 stable Kamilio. After a day or >>>>>>> so, Kamailio stop to process incoming SIP traffic via TCP. The >>>>>>> incoming TCP network packages get TCP-ACK from the OS (Debian 9, >>>>>>> 4.18.0-15-generic-Linux) but Kamailio does not show any processing for >>>>>>> the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via >>>>>>> UDP is working just totally fine. >>>>>>> >>>>>>> When i look via command "netstat -ntp" is see, that the Recv-Q get >>>>>>> bigger and bigger. e.g.: >>>>>>> >>>>>>> Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program >>>>>>> name tcp 4566 0 172.17.217.12:5060https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2F172.17.217.12%3A5060&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526530221&sdata=TlooUxH53u7tlp54rf3FIyKisAusK00CtjbPKUpXQy8%3D&reserved=0 xxx.xxx.xxx.xxx:57252 ESTABLISHED >>>>>>> 31347/kamailio >>>>>>> >>>>>>> After Kamailio restart, all is working fine again for a day. We have >>>>>>> maybe 10-20 devices online via TCP and low call volume (1-2 call per >>>>>>> minute). The only settings for tcp we have is "tcp_delayed_ack=no" >>>>>>> >>>>>>> How to could we debug this situation? Again, no error, no warings in >>>>>>> the log. Just nothing. >>>>>>> >>>>>>> Kristijan >>>> _______________________________________________ >>>> Kamailio (SER) - Users Mailing List >>>> sr-users@lists.kamailio.orgmailto:sr-users@lists.kamailio.org >>>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-usershttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.kamailio.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsr-users&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526530221&sdata=CfDOAJ2wQlJDTfTfwj4Ba0BIT74gZJCiS4XNLW%2F1Dog%3D&reserved=0 >>> -- >>> Daniel-Constantin Mierla -- www.asipto.comhttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.asipto.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526540229&sdata=JZ6S53yHAYHGO%2BiQ1IRPYwlaR6H8QeIRiBxjiqLhAqc%3D&reserved=0 >>> www.twitter.com/micondahttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.twitter.com%2Fmiconda&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526550233&sdata=abtGRZNNELUIh4VJhWIPMjMgRDg4fLT%2F%2B28i2l1IXdE%3D&reserved=0 -- www.linkedin.com/in/micondahttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.linkedin.com%2Fin%2Fmiconda&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526550233&sdata=yiFMASKu6gBkuFqifQsZjR%2F%2Fjbxr2z7reCKIrrpNK1s%3D&reserved=0 >>> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.comhttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.kamailioworld.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526560246&sdata=IqTpYULtc2fymVugRCXgOn3FKigN2eKGjIb2cTYrD0k%3D&reserved=0 >>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.comhttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.asipto.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526570254&sdata=mSGruspS5Vddgo9VHb2%2FjNvXg28Indn%2FPBXwofox31g%3D&reserved=0 >>> >> _______________________________________________ >> Kamailio (SER) - Users Mailing List >> sr-users@lists.kamailio.orgmailto:sr-users@lists.kamailio.org >> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-usershttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.kamailio.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsr-users&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526570254&sdata=39EnwRbLjbVt5fhJC1qOAxbeOEUlsXirOysqS25zW70%3D&reserved=0 > -- > Daniel-Constantin Mierla -- www.asipto.comhttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.asipto.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526580262&sdata=IKSvu7HKThYWiDReXfGd3YOUN%2FkSrCIIA0ZphemHNrk%3D&reserved=0 > www.twitter.com/micondahttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.twitter.com%2Fmiconda&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526590271&sdata=mNjRxHhnBtOTuWYH0SAjchWFLrJ9ZMEohj8WRf9Q%2B4E%3D&reserved=0 -- www.linkedin.com/in/micondahttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.linkedin.com%2Fin%2Fmiconda&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526590271&sdata=FDEKg%2BZZ0rtVhp3N8KE%2FO6Os19S4gcuNFnkowh6Cg4Y%3D&reserved=0 > Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.comhttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.kamailioworld.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526610292&sdata=pRIyX6ukA3fD9uSltzI9oNQS9guiQoLvako%2FYgOpomk%3D&reserved=0 > Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.comhttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.asipto.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526610292&sdata=VgI9SYl%2FirYVa2j%2FgqxTJ%2F8SlbEafUeyDg9Ej4jXw8s%3D&reserved=0 >
-- Daniel-Constantin Mierla -- www.asipto.comhttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.asipto.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526620296&sdata=wdG9sK2JFMtALBtBpiOdOOrCT2N5rcFU5vTIVgyTkmA%3D&reserved=0 www.twitter.com/micondahttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.twitter.com%2Fmiconda&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526630309&sdata=4XzLDUOq%2FoYdSWdJ2ZZh7sHPFY45w19dvan1m3%2FOgpY%3D&reserved=0 -- www.linkedin.com/in/micondahttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.linkedin.com%2Fin%2Fmiconda&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526630309&sdata=QsMOMqcG422FyAKD87NNbDcTAnVELlxwlTsk9qhQY3E%3D&reserved=0 Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.comhttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.kamailioworld.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526640317&sdata=f5hJyISlS%2FyhFIbYtOAwGevmaTSuyInBl5QcIIQWxJk%3D&reserved=0 Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.comhttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.asipto.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526650321&sdata=lMfoFBzZ7RV5ZLl9c1phXnxYHrz6%2F78MLZq3ftuiW84%3D&reserved=0
-- Daniel-Constantin Mierla -- www.asipto.comhttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.asipto.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526650321&sdata=lMfoFBzZ7RV5ZLl9c1phXnxYHrz6%2F78MLZq3ftuiW84%3D&reserved=0 www.twitter.com/micondahttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.twitter.com%2Fmiconda&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526660329&sdata=BgWlmOacvqNaaoay9DB%2B6ZLgvwJUycf9CGup97yzS5g%3D&reserved=0 -- www.linkedin.com/in/micondahttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.linkedin.com%2Fin%2Fmiconda&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526660329&sdata=ozEAUiWg7FTMwGYijeingJv11ygyhYo4W3GBm3ZJMuM%3D&reserved=0 Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.comhttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.kamailioworld.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526670338&sdata=F9zjXHrdLEB6%2FB4tmQvPqwyQnFM9ZjI8MKdSnA9nyTg%3D&reserved=0 Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.comhttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.asipto.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526680346&sdata=biJ1rDk5WXly4BwxPwIDOHZcIxHlX3cL%2BciU4zSna1A%3D&reserved=0
_______________________________________________ Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.orgmailto:sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-usershttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.kamailio.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsr-users&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526680346&sdata=RqR%2BxmUOrHlvfRlJv43ZH8LOkd4ZCMu6Xn59E7aK4Ew%3D&reserved=0
-- [http://sip.antisip.com/am48.png%5DAntisip - http://www.antisip.comhttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.antisip.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526690359&sdata=%2F2zOiWbiz2yAKOz4oQKPPiaT%2BE3HoYVETeU1Ktr4PL8%3D&reserved=0
--
Daniel-Constantin Mierla -- www.asipto.comhttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.asipto.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526690359&sdata=wYyFvDHHTyQ4715bAnRuSSsCM1f6IvjappwbujOFcKo%3D&reserved=0
www.twitter.com/micondahttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.twitter.com%2Fmiconda&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526700363&sdata=VqBde6P5Pa4b5%2Bcz0mrUfaQYXwnjK0lPcWe8ZkA5WX8%3D&reserved=0 -- www.linkedin.com/in/micondahttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.linkedin.com%2Fin%2Fmiconda&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526700363&sdata=Lhst%2FwKY6WwaZzYCZSWiKfr%2Bk02JSPP1V8PXw5pLJZs%3D&reserved=0
Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.comhttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.kamailioworld.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526710376&sdata=ENTJ2mcgqVDyCPSmNnewFjiqbJs%2BZpIihYnXr20rWmw%3D&reserved=0
Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.comhttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.asipto.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526720380&sdata=YmCXbGb3X7gNpKgm7TQJJH8sFSC7dlPxgVS0woqphTs%3D&reserved=0
--
Daniel-Constantin Mierla -- www.asipto.comhttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.asipto.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526720380&sdata=YmCXbGb3X7gNpKgm7TQJJH8sFSC7dlPxgVS0woqphTs%3D&reserved=0
www.twitter.com/micondahttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.twitter.com%2Fmiconda&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526730388&sdata=2QaiULvJkZ58paz8XPyt%2BimNtHv0RJWgRumA5rpyaUg%3D&reserved=0 -- www.linkedin.com/in/micondahttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.linkedin.com%2Fin%2Fmiconda&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526730388&sdata=oZaUpr8gw4Wao0MY7UJQHGd08zPA1XiT8YSma0ItkRg%3D&reserved=0
Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.comhttps://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.kamailioworld.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526740401&sdata=7OsH4zVH1My373X60M1VYWknIEjUrIe7yj9kclQig3k%3D&reserved=0
Hello,
thanks for the feedback! It is good to know that it works well so far for you. I don't see any reason not to make the library to preload as part of the next release.
Just to let everyone know, for now, the built packages are pinned to link against libssl 1.0.x.
Soon, I will approach the openssl project in order to find a proper solution for long term.
Cheers, Daniel
On 13.05.19 10:48, Floimair Florian wrote:
Hi all!
We have used the work-around with the pre-loaded library and so far this seems to have fixed our problem (that my colleague Kristijan Vrban reported).
At least we did not have a single failure within the last week, whereas before the issue happened about once every 2 days.
Would be nice if this would be part of the next Kamailio version.
With best regards
*Florian Floimair *Innovation - Software-Development
*COMMEND INTERNATIONAL GMBH *A-5020 Salzburg, Saalachstraße 51 http://www.commend.com http://www.commend.com/
*Security and Communication by Commend
*FN 178618z | LG Salzburg
*Von: *sr-users sr-users-bounces@lists.kamailio.org im Auftrag von Daniel-Constantin Mierla miconda@gmail.com *Antworten an: *"miconda@gmail.com" miconda@gmail.com, "Kamailio (SER) - Users Mailing List" sr-users@lists.kamailio.org *Datum: *Montag, 15. April 2019 um 09:07 *An: *Aymeric Moizard amoizard@gmail.com, "Kamailio (SER) - Users Mailing List" sr-users@lists.kamailio.org *Betreff: *Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.
Hello Aymeric,
would you be able to test with tls module compiled against libssl 1.1 and using the pre-loaded shared object workaround?
* https://github.com/kamailio/kamailio/tree/master/src/modules/tls/utils/opens... https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkamailio%2Fkamailio%2Ftree%2Fmaster%2Fsrc%2Fmodules%2Ftls%2Futils%2Fopenssl_mutex_shared&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526480174&sdata=d9E%2Fy4cvdLkGCPUexoCJ7tws3QL4rFqz5ebnMGnsESQ%3D&reserved=0
You should be able to use it with any version, no need to test with kamailio master branch.
Just clone the master branch, then:
cd src/modules/tls/utils/openssl_mutex_shared
make
Either from there or copy openssl_mutex_shared.so to a location you want, then pre-load it before starting your version of Kamailio.
The README.md in the folder has some more details.
I would like to have some validation that it works fine before approaching this topic with libssl project to allow to init the locks with shared process option.
Thanks, Daniel
On 26.03.19 16:18, Daniel-Constantin Mierla wrote:
Hello, yep, locking there is expected, as listing the tls connections wait for no other processes to change the content of internal tls connection structures. So it is a side effect of libssl/libcrypto getting stuck and the other processing waiting for it to move one. I have the Kamailio training in USA these days, so the trip and schedule of the day didn't allow me to look more at the libsll/libcrypto code in order to find a solution here. It is a high priority in my list, as I get time during the next days. Cheers, Daniel On 26.03.19 15:55, Aymeric Moizard wrote: Hi All, I was debugging a TCP issue (most probably, I may start a thread for this question). I was trying to get some info for TCP and TLS. I typed: $> sudo kamctl rpc tls.list And waited for a while.... until... I realized that my User-Agent, connected with TCP was not able to register any more. I think the rpc command has introduced something wrong. The device can successfully "connect", send the REGISTER over the established TCP connection. The REGISTER do not appear in the logs any more, I don't see any traffic for TCP any more. So the behavior is the same as I had before: TCP and TLS are both not working and UDP is still working fine. kamctl do not work any more... so kamctl trap do not work... I have been able to type.. manually... for (all?) kamailio threads: gdb /usr/sbin/kamailio 16500 -batch --eval-command="bt full" >> kamailio-trap-tcp-down.txt I'm temporarly puting the backtrace I have here: https://sip.antisip.com/kamailio-trap-tcp-down.txt <https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsip.antisip.com%2Fkamailio-trap-tcp-down.txt&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526490178&sdata=1lfFxvR0m4PVcfnYsrrIO%2FM2nbGK6zfpl2C01O2c7M0%3D&reserved=0> You can see a thread stuck on the json command line: "tls_list" And many other waiting on CRYPTO_THREAD_write_lock ? might be related to: https://github.com/openssl/openssl/issues/5376 <https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenssl%2Fopenssl%2Fissues%2F5376&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526490178&sdata=D5Fb4U3trdbRUY7ifMLSc5KE4mAxjK2%2BzOy8nSD1Rks%3D&reserved=0> SIDE NOTE: Right before I was typing the last gdb command for the last thread, kamailio has crashed: This was around 5 minutes after the dead lock started. Mar 26 14:47:11 sip kamailio[16493]: ERROR: <core> [core/tcp_main.c:2561]: tcpconn_do_send(): failed to send on 0x7ff8dfc2fdc8 (91.121.30.149:5061->62.210.97.21:49351 <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2F62.210.97.21%3A49351&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526500195&sdata=9XqEUKoMwNEvCPFtKfvB0c43yk1GcSzYOiPdY9Pj1uo%3D&reserved=0>): Broken pipe (32) Mar 26 14:47:11 sip kamailio[16493]: ERROR: <core> [core/tcp_read.c:1505]: tcp_read_req(): ERROR: tcp_read_req: error reading - c: 0x7ff8dfc2fdc8 r: 0x7ff8dfc2fe48 (-1) Mar 26 14:47:11 sip kamailio[16493]: WARNING: <core> [core/tcp_read.c:1848]: handle_io(): F_TCPCONN connection marked as bad: 0x7ff8dfa6a408 id 846 refcnt 3 Mar 26 14:47:11 sip kamailio[16371]: ALERT: <core> [main.c:755]: handle_sigs(): child process 16374 exited by a signal 11 Mar 26 14:47:11 sip kamailio[16371]: ALERT: <core> [main.c:758]: handle_sigs(): core was not generated Mar 26 14:47:11 sip kamailio[16371]: INFO: <core> [main.c:781]: handle_sigs(): terminating due to SIGCHLD Mar 26 14:47:11 sip kamailio[16493]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 26 14:47:11 sip kamailio[16500]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 26 14:47:11 sip kamailio[16479]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Unfortunalty, even if I did my best to setup my service to generate a core on crash, I still have "core was not generated".... (debian stretch) Tks for reading! Regards Aymeric Le mar. 26 mars 2019 à 14:11, Kristijan Vrban <vrban.lkml@gmail.com <mailto:vrban.lkml@gmail.com>> a écrit : And again one more kamctl trap file where set_reply_no_connect was set. Am Di., 26. März 2019 um 08:53 Uhr schrieb Kristijan Vrban <vrban.lkml@gmail.com <mailto:vrban.lkml@gmail.com>>: > > Attached also the output of kamctl trap > > Am Di., 26. März 2019 um 08:42 Uhr schrieb Kristijan Vrban > <vrban.lkml@gmail.com <mailto:vrban.lkml@gmail.com>>: > > > > > Have you done a test with tools such as sipp, or was this happening > > > after a while, with usual phones registering? > > > > Usual variety of devices registering via TLS. But i can not exclude > > that some devices displaying behavioural problems. > > > > > Can you list the tcp connections and see if they are listed? > > > kamctl tcp core.tcp_list > > > > Need Kex module for that? So i can deliver next time. But when i do > > "lsof -u kamailio |grep TCP" > > i get a long list of more then 2000 lines with: > > > > ... > > kamailio 37561 kamailio 2105u sock 0,9 0t0 > > 27856287 protocol: TCP > > kamailio 37561 kamailio 2106u sock 0,9 0t0 > > 27856305 protocol: TCP > > kamailio 37561 kamailio 2107u sock 0,9 0t0 > > 27856306 protocol: TCP > > kamailio 37561 kamailio 2108u sock 0,9 0t0 > > 27856914 protocol: TCP > > ... > > > > So about the time Kamailio created a lot of socket in the TCP domain, > > but which are not bound to any port (eg via connect(2) or listen(2) or > > bind(2)) > > Until we get to the maximum number of 2048 connections. > > > > Best > > Kristijan > > > > Am Mo., 25. März 2019 um 14:27 Uhr schrieb Daniel-Constantin Mierla > > <miconda@gmail.com <mailto:miconda@gmail.com>>: > > > > > > Have you done a test with tools such as sipp, or was this happening > > > after a while, with usual phones registering? > > > > > > Can you list the tcp connections and see if they are listed? > > > > > > kamctl tcp core.tcp_list > > > > > > Cheers, > > > Daniel > > > > > > On 25.03.19 08:03, Kristijan Vrban wrote: > > > >> The solution here is to use set_reply_no_connect() > > > > implemented it. Now the issue has shifted to: > > > > > > > > ERROR: <core> [core/tcp_main.c:3959]: handle_new_connect(): maximum > > > > number of connections exceeded: 2048/2048 > > > > > > > > But not a single TCP connection is active between Kamailio and any > > > > device. Seems this counter for maximum number of connections > > > > now has an issue? > > > > > > > > Kristijan > > > > > > > > Am Mi., 20. März 2019 um 15:07 Uhr schrieb Daniel-Constantin Mierla > > > > <miconda@gmail.com <mailto:miconda@gmail.com>>: > > > >> Hello, > > > >> > > > >> based on the trap output I think I could figure out what happened there. > > > >> > > > >> You have tcp_children to very low value (1 or so), the problem is not > > > >> actually that one, but the fact that the connection to upstream (the > > > >> device/app sending the request) was closed after receiving the request > > > >> and routing of the reply gets stuck in the way of: > > > >> > > > >> - a reply is received and has to be forwarded > > > >> - connection was lost, so Kamailio tries to establish a new one, but > > > >> takes time till fails because the upstream is behind nat or so based on > > > >> the via header: > > > >> > > > >> Via: SIP/2.0/TLS > > > >> 10.1.0.4:10002;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9 > > > >> > > > >> - the reply is retransmitted and gets to another worker, which tries > > > >> to forward it again, but discovers a connection structure for that > > > >> destination exists (created by previous reply worker) and now waits for > > > >> the connection to be released (or better said, for the mutex on writing > > > >> buffer to be unlocked) > > > >> > > > >> - as the second reply waits, there can be other retransmissions of the > > > >> reply ending up in other workers stuck on waiting for the mutex of the > > > >> connection write buffer > > > >> > > > >> The solution here is to use set_reply_no_connect() -- you can put it > > > >> first in request_route block. I think this would be a good addition to > > > >> the default configuration file as well, IMO, the sip server should not > > > >> connect for sending replies and should do it also for requests that go > > > >> behind nat. > > > >> > > > >> Cheers, > > > >> Daniel > > > >> > > > >> On 19.03.19 10:53, Kristijan Vrban wrote: > > > >>> So i had again the situation. But this time, incoming udp was > > > >>> affected. Kamailio was sending out OPTIONS (via dispatcher module) to > > > >>> a group of asterisk machines > > > >>> but the 200 OK reply to the OPTIONS where not processed, so the > > > >>> dispatcher module set all asterisk to inactive, even though they > > > >>> replied 200 OK > > > >>> > > > >>> Attached the output of kamctl trap during the situation. Hope there is > > > >>> any useful in it. Because after "kamctl trap" it was working again > > > >>> without kamailio restart. > > > >>> > > > >>> Best > > > >>> Kristijan > > > >>> > > > >>> Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla > > > >>> <miconda@gmail.com <mailto:miconda@gmail.com>>: > > > >>>> Hello, > > > >>>> > > > >>>> setting tcp_children=1 is not a god option for scallability, practically > > > >>>> you set kamailio to process a single tcp message at one time, on high > > > >>>> traffic, that won't work well. > > > >>>> > > > >>>> Maybe try to set tcp_children to 2 or 4, that should make an eventual > > > >>>> race appear faster. > > > >>>> > > > >>>> Regarding the pid, if it is an outgoing connection, then it can be > > > >>>> created by any worker process, including a UDP worker, if that was the > > > >>>> one receiving the sip message over udp and sends it out via tcp. > > > >>>> > > > >>>> Cheers, > > > >>>> Daniel > > > >>>> > > > >>>> On 18.03.19 10:09, Kristijan Vrban wrote: > > > >>>>> Hi Daniel, > > > >>>>> > > > >>>>> for testing, i now had set: "tcp_children=1" and so far this issue did not occur > > > >>>>> ever since. So now value to provide for "kamctl trap" yet. > > > >>>>> > > > >>>>> "kamctl ps" show this two process to handle tcp: > > > >>>>> > > > >>>>> ... > > > >>>>> }, { > > > >>>>> "IDX": 25, > > > >>>>> "PID": 71929, > > > >>>>> "DSC": "tcp receiver (generic) child=0" > > > >>>>> }, { > > > >>>>> "IDX": 26, > > > >>>>> "PID": 71933, > > > >>>>> "DSC": "tcp main process" > > > >>>>> } > > > >>>>> ... > > > >>>>> > > > >>>>> > > > >>>>> Ok, but then is was wondering to see a TCP connection on a udp receiver child: > > > >>>>> > > > >>>>> > > > >>>>> netstat -ntp |grep 5061 > > > >>>>> > > > >>>>> ... > > > >>>>> tcp 0 0 172.17.217.10:5061 <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2F172.17.217.10%3A5061&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526500195&sdata=wO8Jbw676nbMafjlenNGCrZgNEhZvye4Go7UWk1umF8%3D&reserved=0> 195.70.114.125:18252 <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2F195.70.114.125%3A18252&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526510200&sdata=z%2FJm5HNDlL6xhlui1%2FBcMkvSQDAbp%2FxkM3KY9sAUX0s%3D&reserved=0> > > > >>>>> ESTABLISHED 71895/kamailio > > > >>>>> ... > > > >>>>> > > > >>>>> An pid 71895 is: > > > >>>>> > > > >>>>> }, { > > > >>>>> "IDX": 3, > > > >>>>> "PID": 71895, > > > >>>>> "DSC": "udp receiver child=2 sock=127.0.0.1:5060 <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2F127.0.0.1%3A5060&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526520221&sdata=8r55f9MZ2gaw%2B2MA1LY1IfbnWkZDLHdr%2FSwRu7hwnvQ%3D&reserved=0>" > > > >>>>> }, { > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> And if i look into it via "lsof -p 71895" (the udp receiver child) > > > >>>>> > > > >>>>> ... > > > >>>>> kamailio 71895 kamailio 14u sock 0,9 0t0 > > > >>>>> 8856085 protocol: TCP > > > >>>>> kamailio 71895 kamailio 15u sock 0,9 0t0 > > > >>>>> 8886886 protocol: TCP > > > >>>>> kamailio 71895 kamailio 16u sock 0,9 0t0 > > > >>>>> 8854886 protocol: TCP > > > >>>>> kamailio 71895 kamailio 17u sock 0,9 0t0 > > > >>>>> 8828915 protocol: TCP > > > >>>>> kamailio 71895 kamailio 18u unix 0x000000005f73cb91 0t0 > > > >>>>> 1680314 type=DGRAM > > > >>>>> kamailio 71895 kamailio 19u IPv4 1846523 0t0 > > > >>>>> TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED) > > > >>>>> kamailio 71895 kamailio 20u sock 0,9 0t0 > > > >>>>> 8887192 protocol: TCP > > > >>>>> kamailio 71895 kamailio 21u sock 0,9 0t0 > > > >>>>> 8813634 protocol: TCP > > > >>>>> kamailio 71895 kamailio 22u unix 0x00000000c19bd102 0t0 > > > >>>>> 1681407 type=STREAM > > > >>>>> kamailio 71895 kamailio 23u sock 0,9 0t0 > > > >>>>> 8850488 protocol: TCP > > > >>>>> ... > > > >>>>> > > > >>>>> Not only the ESTABLISHED TCP session. But also this empty sockets > > > >>>>> "protocol: TCP" > > > >>>>> What are they doing there in the udp receiver? Is that how it's supposed to be? > > > >>>>> > > > >>>>> Kristijan > > > >>>>> > > > >>>>> Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla > > > >>>>> <miconda@gmail.com <mailto:miconda@gmail.com>>: > > > >>>>>> Can you get file written by `kamctl trap`? It should have the backtrace > > > >>>>>> for all kamailio processes. You need latest kamailio 5.2. > > > >>>>>> > > > >>>>>> Also, get the output for: kamctl ps > > > >>>>>> > > > >>>>>> Cheers, > > > >>>>>> Daniel > > > >>>>>> > > > >>>>>> On 14.03.19 13:52, Kristijan Vrban wrote: > > > >>>>>>> When i attach via gdb to one of the tcp worker, i see this: > > > >>>>>>> > > > >>>>>>> (gdb) bt > > > >>>>>>> #0 0x00007fdaf4d14470 in futex_wait (private=<optimized out>, > > > >>>>>>> expected=1, futex_word=0x7fdaeca92f8c) at > > > >>>>>>> ../sysdeps/unix/sysv/linux/futex-internal.h:61 > > > >>>>>>> #1 futex_wait_simple (private=<optimized out>, expected=1, > > > >>>>>>> futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135 > > > >>>>>>> #2 __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at > > > >>>>>>> pthread_rwlock_wrlock.c:67 > > > >>>>>>> #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from > > > >>>>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 > > > >>>>>>> #4 0x00007fdaf08e1c08 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 > > > >>>>>>> #5 0x00007fdaf08a6f69 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 > > > >>>>>>> #6 0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from > > > >>>>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 > > > >>>>>>> #7 0x00007fdaf0c31144 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 > > > >>>>>>> #8 0x00007fdaf0c2bddb in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 > > > >>>>>>> #9 0x00007fdaf0c22858 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 > > > >>>>>>> #10 0x00007fdaf0c1af61 in SSL_do_handshake () from > > > >>>>>>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1 > > > >>>>>>> #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98, > > > >>>>>>> error=0x7ffffe2a2df0) at tls_server.c:422 > > > >>>>>>> #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98, > > > >>>>>>> flags=0x7ffffe2c318c) at tls_server.c:1116 > > > >>>>>>> #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98, > > > >>>>>>> read_flags=0x7ffffe2c318c) at core/tcp_read.c:469 > > > >>>>>>> #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98, > > > >>>>>>> bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at > > > >>>>>>> core/tcp_read.c:1496 > > > >>>>>>> #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1, > > > >>>>>>> idx=-1) at core/tcp_read.c:1862 > > > >>>>>>> #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0 <io_w>, > > > >>>>>>> t=2, repeat=0) at core/io_wait.h:1065 > > > >>>>>>> #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at > > > >>>>>>> core/tcp_read.c:1974 > > > >>>>>>> #18 0x0000556ead4c8e24 in tcp_init_children () at core/tcp_main.c:4853 > > > >>>>>>> #19 0x0000556ead3c352a in main_loop () at main.c:1735 > > > >>>>>>> #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at main.c:2675 > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban > > > >>>>>>> <vrban.lkml@gmail.com <mailto:vrban.lkml@gmail.com>>: > > > >>>>>>>> Hi, with full debug is see this in log for every incoming TCP SIP request: > > > >>>>>>>> > > > >>>>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: > > > >>>>>>>> <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp > > > >>>>>>>> receiver, connection passed to the least busy one (105) > > > >>>>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG: > > > >>>>>>>> <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker 2 > > > >>>>>>>> 27(17937) for activity on [tls:172.17.217.10:5061 <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2F172.17.217.10%3A5061&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526520221&sdata=ad7Txe5RLrnj0CMbQZORXvAdU0NpbrCjP5RzNrbxJdU%3D&reserved=0>], 0x7fdaeda8f928 > > > >>>>>>>> > > > >>>>>>>> So the Kamailio TCP process is working, and received TCP traffic. But > > > >>>>>>>> the tcp workers are somehow busy. > > > >>>>>>>> > > > >>>>>>>> When i attach via strace to the TCP worker, i do not see any activity. Just: > > > >>>>>>>> > > > >>>>>>>> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL > > > >>>>>>>> > > > >>>>>>>> and nothing, even when i see the main tcp process choose this worker process. > > > >>>>>>>> > > > >>>>>>>> Kristijan > > > >>>>>>>> > > > >>>>>>>> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban > > > >>>>>>>> <vrban.lkml@gmail.com <mailto:vrban.lkml@gmail.com>>: > > > >>>>>>>>> first of all thanks for the feedback. i prepared our system now to run > > > >>>>>>>>> with debug=3 > > > >>>>>>>>> I hope to see more then then. > > > >>>>>>>>> > > > >>>>>>>>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban > > > >>>>>>>>> <vrban.lkml@gmail.com <mailto:vrban.lkml@gmail.com>>: > > > >>>>>>>>>> Hi kamailios, > > > >>>>>>>>>> > > > >>>>>>>>>> i have a creepy situation with v5.2.1 stable Kamilio. After a day or > > > >>>>>>>>>> so, Kamailio stop to process incoming SIP traffic via TCP. The > > > >>>>>>>>>> incoming TCP network packages get TCP-ACK from the OS (Debian 9, > > > >>>>>>>>>> 4.18.0-15-generic-Linux) but Kamailio does not show any processing for > > > >>>>>>>>>> the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via > > > >>>>>>>>>> UDP is working just totally fine. > > > >>>>>>>>>> > > > >>>>>>>>>> When i look via command "netstat -ntp" is see, that the Recv-Q get > > > >>>>>>>>>> bigger and bigger. e.g.: > > > >>>>>>>>>> > > > >>>>>>>>>> Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program > > > >>>>>>>>>> name tcp 4566 0 172.17.217.12:5060 <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2F172.17.217.12%3A5060&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526530221&sdata=TlooUxH53u7tlp54rf3FIyKisAusK00CtjbPKUpXQy8%3D&reserved=0> xxx.xxx.xxx.xxx:57252 ESTABLISHED > > > >>>>>>>>>> 31347/kamailio > > > >>>>>>>>>> > > > >>>>>>>>>> After Kamailio restart, all is working fine again for a day. We have > > > >>>>>>>>>> maybe 10-20 devices online via TCP and low call volume (1-2 call per > > > >>>>>>>>>> minute). The only settings for tcp we have is "tcp_delayed_ack=no" > > > >>>>>>>>>> > > > >>>>>>>>>> How to could we debug this situation? Again, no error, no warings in > > > >>>>>>>>>> the log. Just nothing. > > > >>>>>>>>>> > > > >>>>>>>>>> Kristijan > > > >>>>>>> _______________________________________________ > > > >>>>>>> Kamailio (SER) - Users Mailing List > > > >>>>>>> sr-users@lists.kamailio.org <mailto:sr-users@lists.kamailio.org> > > > >>>>>>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users <https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.kamailio.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsr-users&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526530221&sdata=CfDOAJ2wQlJDTfTfwj4Ba0BIT74gZJCiS4XNLW%2F1Dog%3D&reserved=0> > > > >>>>>> -- > > > >>>>>> Daniel-Constantin Mierla -- www.asipto.com <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.asipto.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526540229&sdata=JZ6S53yHAYHGO%2BiQ1IRPYwlaR6H8QeIRiBxjiqLhAqc%3D&reserved=0> > > > >>>>>> www.twitter.com/miconda <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.twitter.com%2Fmiconda&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526550233&sdata=abtGRZNNELUIh4VJhWIPMjMgRDg4fLT%2F%2B28i2l1IXdE%3D&reserved=0> -- www.linkedin.com/in/miconda <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.linkedin.com%2Fin%2Fmiconda&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526550233&sdata=yiFMASKu6gBkuFqifQsZjR%2F%2Fjbxr2z7reCKIrrpNK1s%3D&reserved=0> > > > >>>>>> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.kamailioworld.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526560246&sdata=IqTpYULtc2fymVugRCXgOn3FKigN2eKGjIb2cTYrD0k%3D&reserved=0> > > > >>>>>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.asipto.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526570254&sdata=mSGruspS5Vddgo9VHb2%2FjNvXg28Indn%2FPBXwofox31g%3D&reserved=0> > > > >>>>>> > > > >>>>> _______________________________________________ > > > >>>>> Kamailio (SER) - Users Mailing List > > > >>>>> sr-users@lists.kamailio.org <mailto:sr-users@lists.kamailio.org> > > > >>>>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users <https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.kamailio.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsr-users&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526570254&sdata=39EnwRbLjbVt5fhJC1qOAxbeOEUlsXirOysqS25zW70%3D&reserved=0> > > > >>>> -- > > > >>>> Daniel-Constantin Mierla -- www.asipto.com <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.asipto.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526580262&sdata=IKSvu7HKThYWiDReXfGd3YOUN%2FkSrCIIA0ZphemHNrk%3D&reserved=0> > > > >>>> www.twitter.com/miconda <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.twitter.com%2Fmiconda&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526590271&sdata=mNjRxHhnBtOTuWYH0SAjchWFLrJ9ZMEohj8WRf9Q%2B4E%3D&reserved=0> -- www.linkedin.com/in/miconda <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.linkedin.com%2Fin%2Fmiconda&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526590271&sdata=FDEKg%2BZZ0rtVhp3N8KE%2FO6Os19S4gcuNFnkowh6Cg4Y%3D&reserved=0> > > > >>>> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.kamailioworld.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526610292&sdata=pRIyX6ukA3fD9uSltzI9oNQS9guiQoLvako%2FYgOpomk%3D&reserved=0> > > > >>>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.asipto.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526610292&sdata=VgI9SYl%2FirYVa2j%2FgqxTJ%2F8SlbEafUeyDg9Ej4jXw8s%3D&reserved=0> > > > >>>> > > > >> -- > > > >> Daniel-Constantin Mierla -- www.asipto.com <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.asipto.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526620296&sdata=wdG9sK2JFMtALBtBpiOdOOrCT2N5rcFU5vTIVgyTkmA%3D&reserved=0> > > > >> www.twitter.com/miconda <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.twitter.com%2Fmiconda&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526630309&sdata=4XzLDUOq%2FoYdSWdJ2ZZh7sHPFY45w19dvan1m3%2FOgpY%3D&reserved=0> -- www.linkedin.com/in/miconda <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.linkedin.com%2Fin%2Fmiconda&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526630309&sdata=QsMOMqcG422FyAKD87NNbDcTAnVELlxwlTsk9qhQY3E%3D&reserved=0> > > > >> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.kamailioworld.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526640317&sdata=f5hJyISlS%2FyhFIbYtOAwGevmaTSuyInBl5QcIIQWxJk%3D&reserved=0> > > > >> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.asipto.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526650321&sdata=lMfoFBzZ7RV5ZLl9c1phXnxYHrz6%2F78MLZq3ftuiW84%3D&reserved=0> > > > >> > > > -- > > > Daniel-Constantin Mierla -- www.asipto.com <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.asipto.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526650321&sdata=lMfoFBzZ7RV5ZLl9c1phXnxYHrz6%2F78MLZq3ftuiW84%3D&reserved=0> > > > www.twitter.com/miconda <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.twitter.com%2Fmiconda&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526660329&sdata=BgWlmOacvqNaaoay9DB%2B6ZLgvwJUycf9CGup97yzS5g%3D&reserved=0> -- www.linkedin.com/in/miconda <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.linkedin.com%2Fin%2Fmiconda&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526660329&sdata=ozEAUiWg7FTMwGYijeingJv11ygyhYo4W3GBm3ZJMuM%3D&reserved=0> > > > Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.kamailioworld.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526670338&sdata=F9zjXHrdLEB6%2FB4tmQvPqwyQnFM9ZjI8MKdSnA9nyTg%3D&reserved=0> > > > Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.asipto.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526680346&sdata=biJ1rDk5WXly4BwxPwIDOHZcIxHlX3cL%2BciU4zSna1A%3D&reserved=0> > > > _______________________________________________ Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org <mailto:sr-users@lists.kamailio.org> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users <https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.kamailio.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsr-users&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526680346&sdata=RqR%2BxmUOrHlvfRlJv43ZH8LOkd4ZCMu6Xn59E7aK4Ew%3D&reserved=0> -- http://sip.antisip.com/am48.pngAntisip - http://www.antisip.com <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.antisip.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526690359&sdata=%2F2zOiWbiz2yAKOz4oQKPPiaT%2BE3HoYVETeU1Ktr4PL8%3D&reserved=0> -- Daniel-Constantin Mierla -- www.asipto.com <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.asipto.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526690359&sdata=wYyFvDHHTyQ4715bAnRuSSsCM1f6IvjappwbujOFcKo%3D&reserved=0> www.twitter.com/miconda <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.twitter.com%2Fmiconda&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526700363&sdata=VqBde6P5Pa4b5%2Bcz0mrUfaQYXwnjK0lPcWe8ZkA5WX8%3D&reserved=0> -- www.linkedin.com/in/miconda <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.linkedin.com%2Fin%2Fmiconda&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526700363&sdata=Lhst%2FwKY6WwaZzYCZSWiKfr%2Bk02JSPP1V8PXw5pLJZs%3D&reserved=0> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.kamailioworld.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526710376&sdata=ENTJ2mcgqVDyCPSmNnewFjiqbJs%2BZpIihYnXr20rWmw%3D&reserved=0> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.asipto.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526720380&sdata=YmCXbGb3X7gNpKgm7TQJJH8sFSC7dlPxgVS0woqphTs%3D&reserved=0>
-- Daniel-Constantin Mierla -- www.asipto.com https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.asipto.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526720380&sdata=YmCXbGb3X7gNpKgm7TQJJH8sFSC7dlPxgVS0woqphTs%3D&reserved=0 www.twitter.com/miconda https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.twitter.com%2Fmiconda&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526730388&sdata=2QaiULvJkZ58paz8XPyt%2BimNtHv0RJWgRumA5rpyaUg%3D&reserved=0 -- www.linkedin.com/in/miconda https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.linkedin.com%2Fin%2Fmiconda&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526730388&sdata=oZaUpr8gw4Wao0MY7UJQHGd08zPA1XiT8YSma0ItkRg%3D&reserved=0 Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.kamailioworld.com&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526740401&sdata=7OsH4zVH1My373X60M1VYWknIEjUrIe7yj9kclQig3k%3D&reserved=0
Hi!
I haven't used the workaround yet: I'm focusing on trying to make sure I have the same issue or trying to figure out how to force it to happen.
I have started to check again the server today and I started by this command: $> sudo kamcmd tls.list
In my previous description, the above was a dead lock. Today, It finally completed, but after 5 minutes. (I suspect 5 minutes is abnormal)
During the long running command: -> UDP was working -> TCP was not: -> The TCP connection is being ESTABLISHED, but the SIP message was not replied. (this was the behavior I had before)
At the same time, I took a trap "sudo kamctl trap". (during the dead lock) -> one thread is on "tls_list" (tls_rpc.c:154) -> one thread is on tcpconn_get (core/tcp_main.c:1449) called from tcp_send (core/tcp_main.c:1716) and seems to be sending a 484 Address Incomplete on a TLS connection -> 2 threads are on CRYPTO_THREAD_write_lock on a backtrace showing "SSL_do_handshake/tls_accept"
Suddenly, "sudo kamcmd tls.list" completed, and then, my TCP Agent received 4 answers from kamailio for the last 4 REGISTER sent.
I have a network capture for my TCP agent. I have a trap showing 2 thread waiting on "CRYPTO_THREAD_write_lock"
Conclusion: The use-case showed that the lock was VERY long. The use-case showed that the lock was TEMPORARY...
Side-note: From my understanding of the multi-fork/openssl issue, I would expect to see dead lock happening very fast after a kamailio restart?
Do you expect the preload workaround to work in such behavior? Or do you consider that my issue is different?
Because there is no "real" dead-lock, I don't understand why "my" issue would be related to libssl1.1...
My gdb trap, network capture are available in private exchange if you need! (please ask me by direct email)
Tks Aymeric
Le lun. 13 mai 2019 à 12:48, Daniel-Constantin Mierla miconda@gmail.com a écrit :
Hello,
thanks for the feedback! It is good to know that it works well so far for you. I don't see any reason not to make the library to preload as part of the next release.
Just to let everyone know, for now, the built packages are pinned to link against libssl 1.0.x.
Soon, I will approach the openssl project in order to find a proper solution for long term.
Cheers, Daniel On 13.05.19 10:48, Floimair Florian wrote:
Hi all!
We have used the work-around with the pre-loaded library and so far this seems to have fixed our problem (that my colleague Kristijan Vrban reported).
At least we did not have a single failure within the last week, whereas before the issue happened about once every 2 days.
Would be nice if this would be part of the next Kamailio version.
With best regards
*Florian Floimair *Innovation - Software-Development
*COMMEND INTERNATIONAL GMBH *A-5020 Salzburg, Saalachstraße 51 http://www.commend.com
*Security and Communication by Commend *FN 178618z | LG Salzburg
*Von: *sr-users sr-users-bounces@lists.kamailio.org sr-users-bounces@lists.kamailio.org im Auftrag von Daniel-Constantin Mierla miconda@gmail.com miconda@gmail.com *Antworten an: *"miconda@gmail.com" miconda@gmail.com miconda@gmail.com miconda@gmail.com, "Kamailio (SER) - Users Mailing List" sr-users@lists.kamailio.org sr-users@lists.kamailio.org *Datum: *Montag, 15. April 2019 um 09:07 *An: *Aymeric Moizard amoizard@gmail.com amoizard@gmail.com, "Kamailio (SER) - Users Mailing List" sr-users@lists.kamailio.org sr-users@lists.kamailio.org *Betreff: *Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP.
Hello Aymeric,
would you be able to test with tls module compiled against libssl 1.1 and using the pre-loaded shared object workaround?
https://github.com/kamailio/kamailio/tree/master/src/modules/tls/utils/opens... https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkamailio%2Fkamailio%2Ftree%2Fmaster%2Fsrc%2Fmodules%2Ftls%2Futils%2Fopenssl_mutex_shared&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526480174&sdata=d9E%2Fy4cvdLkGCPUexoCJ7tws3QL4rFqz5ebnMGnsESQ%3D&reserved=0
You should be able to use it with any version, no need to test with kamailio master branch.
Just clone the master branch, then:
cd src/modules/tls/utils/openssl_mutex_shared
make
Either from there or copy openssl_mutex_shared.so to a location you want, then pre-load it before starting your version of Kamailio.
The README.md in the folder has some more details.
I would like to have some validation that it works fine before approaching this topic with libssl project to allow to init the locks with shared process option.
Thanks, Daniel
On 26.03.19 16:18, Daniel-Constantin Mierla wrote:
Hello,
yep, locking there is expected, as listing the tls connections wait for no other processes to change the content of internal tls connection structures. So it is a side effect of libssl/libcrypto getting stuck and the other processing waiting for it to move one. I have the Kamailio training in USA these days, so the trip and schedule of the day didn't allow me to look more at the libsll/libcrypto code in order to find a solution here. It is a high priority in my list, as I get time during the next days.
Cheers, Daniel
On 26.03.19 15:55, Aymeric Moizard wrote:
Hi All,
I was debugging a TCP issue (most probably, I may start a thread for this question).
I was trying to get some info for TCP and TLS.
I typed:
$> sudo kamctl rpc tls.list
And waited for a while.... until... I realized that my User-Agent, connected with TCP was not able to register any more. I think the rpc command has introduced something wrong.
The device can successfully "connect", send the REGISTER over the established TCP connection. The REGISTER do not appear in the logs any more, I don't see any traffic for TCP any more. So the behavior is the same as I had before: TCP and TLS are both not working and UDP is still working fine.
kamctl do not work any more... so kamctl trap do not work...
I have been able to type.. manually... for (all?) kamailio threads:
gdb /usr/sbin/kamailio 16500 -batch --eval-command="bt full" >> kamailio-trap-tcp-down.txt
I'm temporarly puting the backtrace I have here:
https://sip.antisip.com/kamailio-trap-tcp-down.txt https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsip.antisip.com%2Fkamailio-trap-tcp-down.txt&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526490178&sdata=1lfFxvR0m4PVcfnYsrrIO%2FM2nbGK6zfpl2C01O2c7M0%3D&reserved=0
You can see a thread stuck on the json command line: "tls_list"
And many other waiting on CRYPTO_THREAD_write_lock
? might be related to: https://github.com/openssl/openssl/issues/5376 https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenssl%2Fopenssl%2Fissues%2F5376&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526490178&sdata=D5Fb4U3trdbRUY7ifMLSc5KE4mAxjK2%2BzOy8nSD1Rks%3D&reserved=0
SIDE NOTE:
Right before I was typing the last gdb command for the last thread, kamailio
has crashed: This was around 5 minutes after the dead lock started.
Mar 26 14:47:11 sip kamailio[16493]: ERROR: <core> [core/tcp_main.c:2561]: tcpconn_do_send(): failed to send on 0x7ff8dfc2fdc8 (91.121.30.149:5061-> 62.210.97.21:49351 https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2F62.210.97.21%3A49351&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526500195&sdata=9XqEUKoMwNEvCPFtKfvB0c43yk1GcSzYOiPdY9Pj1uo%3D&reserved=0): Broken pipe (32)
Mar 26 14:47:11 sip kamailio[16493]: ERROR: <core> [core/tcp_read.c:1505]: tcp_read_req(): ERROR: tcp_read_req: error reading - c: 0x7ff8dfc2fdc8 r: 0x7ff8dfc2fe48 (-1)
Mar 26 14:47:11 sip kamailio[16493]: WARNING: <core> [core/tcp_read.c:1848]: handle_io(): F_TCPCONN connection marked as bad: 0x7ff8dfa6a408 id 846 refcnt 3
Mar 26 14:47:11 sip kamailio[16371]: ALERT: <core> [main.c:755]: handle_sigs(): child process 16374 exited by a signal 11
Mar 26 14:47:11 sip kamailio[16371]: ALERT: <core> [main.c:758]: handle_sigs(): core was not generated
Mar 26 14:47:11 sip kamailio[16371]: INFO: <core> [main.c:781]: handle_sigs(): terminating due to SIGCHLD
Mar 26 14:47:11 sip kamailio[16493]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received
Mar 26 14:47:11 sip kamailio[16500]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received
Mar 26 14:47:11 sip kamailio[16479]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received
Unfortunalty, even if I did my best to setup my service to generate a core on crash, I still have "core was not generated".... (debian stretch)
Tks for reading!
Regards
Aymeric
Hello,
this kind of behaviour, with long time blocking and then moving on, is a symptom of the same issue. One of the observed behaviours was that attaching with gdb and detaching make code running further, that's what kamctl trap does. I haven't looked deeper, but my guess is that some signals are sent during the gdb operations.
It would be good if you can test with the workaround and see the results. There was already a report that the issue was not seen after a rather long running time.
Cheers, Daniel
On 17.05.19 16:03, Aymeric Moizard wrote:
Hi!
I haven't used the workaround yet: I'm focusing on trying to make sure I have the same issue or trying to figure out how to force it to happen.
I have started to check again the server today and I started by this command: $> sudo kamcmd tls.list
In my previous description, the above was a dead lock. Today, It finally completed, but after 5 minutes. (I suspect 5 minutes is abnormal)
During the long running command: -> UDP was working -> TCP was not: -> The TCP connection is being ESTABLISHED, but the SIP message was not replied. (this was the behavior I had before)
At the same time, I took a trap "sudo kamctl trap". (during the dead lock) -> one thread is on "tls_list" (tls_rpc.c:154) -> one thread is on tcpconn_get (core/tcp_main.c:1449) called from tcp_send (core/tcp_main.c:1716) and seems to be sending a 484 Address Incomplete on a TLS connection -> 2 threads are on CRYPTO_THREAD_write_lock on a backtrace showing "SSL_do_handshake/tls_accept"
Suddenly, "sudo kamcmd tls.list" completed, and then, my TCP Agent received 4 answers from kamailio for the last 4 REGISTER sent.
I have a network capture for my TCP agent. I have a trap showing 2 thread waiting on "CRYPTO_THREAD_write_lock"
Conclusion: The use-case showed that the lock was VERY long. The use-case showed that the lock was TEMPORARY...
Side-note: From my understanding of the multi-fork/openssl issue, I would expect to see dead lock happening very fast after a kamailio restart?
Do you expect the preload workaround to work in such behavior? Or do you consider that my issue is different?
Because there is no "real" dead-lock, I don't understand why "my" issue would be related to libssl1.1...
My gdb trap, network capture are available in private exchange if you need! (please ask me by direct email)
Tks Aymeric
Le lun. 13 mai 2019 à 12:48, Daniel-Constantin Mierla <miconda@gmail.com mailto:miconda@gmail.com> a écrit :
Hello, thanks for the feedback! It is good to know that it works well so far for you. I don't see any reason not to make the library to preload as part of the next release. Just to let everyone know, for now, the built packages are pinned to link against libssl 1.0.x. Soon, I will approach the openssl project in order to find a proper solution for long term. Cheers, Daniel On 13.05.19 10:48, Floimair Florian wrote:
Hi all! We have used the work-around with the pre-loaded library and so far this seems to have fixed our problem (that my colleague Kristijan Vrban reported). At least we did not have a single failure within the last week, whereas before the issue happened about once every 2 days. Would be nice if this would be part of the next Kamailio version. With best regards *Florian Floimair *Innovation - Software-Development *COMMEND INTERNATIONAL GMBH *A-5020 Salzburg, Saalachstraße 51 http://www.commend.com <http://www.commend.com/> *Security and Communication by Commend *FN 178618z | LG Salzburg *Von: *sr-users <sr-users-bounces@lists.kamailio.org> <mailto:sr-users-bounces@lists.kamailio.org> im Auftrag von Daniel-Constantin Mierla <miconda@gmail.com> <mailto:miconda@gmail.com> *Antworten an: *"miconda@gmail.com" <mailto:miconda@gmail.com> <miconda@gmail.com> <mailto:miconda@gmail.com>, "Kamailio (SER) - Users Mailing List" <sr-users@lists.kamailio.org> <mailto:sr-users@lists.kamailio.org> *Datum: *Montag, 15. April 2019 um 09:07 *An: *Aymeric Moizard <amoizard@gmail.com> <mailto:amoizard@gmail.com>, "Kamailio (SER) - Users Mailing List" <sr-users@lists.kamailio.org> <mailto:sr-users@lists.kamailio.org> *Betreff: *Re: [SR-Users] Kamailio stop to process incoming SIP traffic via TCP. Hello Aymeric, would you be able to test with tls module compiled against libssl 1.1 and using the pre-loaded shared object workaround? * https://github.com/kamailio/kamailio/tree/master/src/modules/tls/utils/openssl_mutex_shared <https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkamailio%2Fkamailio%2Ftree%2Fmaster%2Fsrc%2Fmodules%2Ftls%2Futils%2Fopenssl_mutex_shared&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526480174&sdata=d9E%2Fy4cvdLkGCPUexoCJ7tws3QL4rFqz5ebnMGnsESQ%3D&reserved=0> You should be able to use it with any version, no need to test with kamailio master branch. Just clone the master branch, then: cd src/modules/tls/utils/openssl_mutex_shared make Either from there or copy openssl_mutex_shared.so to a location you want, then pre-load it before starting your version of Kamailio. The README.md in the folder has some more details. I would like to have some validation that it works fine before approaching this topic with libssl project to allow to init the locks with shared process option. Thanks, Daniel On 26.03.19 16:18, Daniel-Constantin Mierla wrote: Hello, yep, locking there is expected, as listing the tls connections wait for no other processes to change the content of internal tls connection structures. So it is a side effect of libssl/libcrypto getting stuck and the other processing waiting for it to move one. I have the Kamailio training in USA these days, so the trip and schedule of the day didn't allow me to look more at the libsll/libcrypto code in order to find a solution here. It is a high priority in my list, as I get time during the next days. Cheers, Daniel On 26.03.19 15:55, Aymeric Moizard wrote: Hi All, I was debugging a TCP issue (most probably, I may start a thread for this question). I was trying to get some info for TCP and TLS. I typed: $> sudo kamctl rpc tls.list And waited for a while.... until... I realized that my User-Agent, connected with TCP was not able to register any more. I think the rpc command has introduced something wrong. The device can successfully "connect", send the REGISTER over the established TCP connection. The REGISTER do not appear in the logs any more, I don't see any traffic for TCP any more. So the behavior is the same as I had before: TCP and TLS are both not working and UDP is still working fine. kamctl do not work any more... so kamctl trap do not work... I have been able to type.. manually... for (all?) kamailio threads: gdb /usr/sbin/kamailio 16500 -batch --eval-command="bt full" >> kamailio-trap-tcp-down.txt I'm temporarly puting the backtrace I have here: https://sip.antisip.com/kamailio-trap-tcp-down.txt <https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsip.antisip.com%2Fkamailio-trap-tcp-down.txt&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526490178&sdata=1lfFxvR0m4PVcfnYsrrIO%2FM2nbGK6zfpl2C01O2c7M0%3D&reserved=0> You can see a thread stuck on the json command line: "tls_list" And many other waiting on CRYPTO_THREAD_write_lock ? might be related to: https://github.com/openssl/openssl/issues/5376 <https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenssl%2Fopenssl%2Fissues%2F5376&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526490178&sdata=D5Fb4U3trdbRUY7ifMLSc5KE4mAxjK2%2BzOy8nSD1Rks%3D&reserved=0> SIDE NOTE: Right before I was typing the last gdb command for the last thread, kamailio has crashed: This was around 5 minutes after the dead lock started. Mar 26 14:47:11 sip kamailio[16493]: ERROR: <core> [core/tcp_main.c:2561]: tcpconn_do_send(): failed to send on 0x7ff8dfc2fdc8 (91.121.30.149:5061->62.210.97.21:49351 <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2F62.210.97.21%3A49351&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526500195&sdata=9XqEUKoMwNEvCPFtKfvB0c43yk1GcSzYOiPdY9Pj1uo%3D&reserved=0>): Broken pipe (32) Mar 26 14:47:11 sip kamailio[16493]: ERROR: <core> [core/tcp_read.c:1505]: tcp_read_req(): ERROR: tcp_read_req: error reading - c: 0x7ff8dfc2fdc8 r: 0x7ff8dfc2fe48 (-1) Mar 26 14:47:11 sip kamailio[16493]: WARNING: <core> [core/tcp_read.c:1848]: handle_io(): F_TCPCONN connection marked as bad: 0x7ff8dfa6a408 id 846 refcnt 3 Mar 26 14:47:11 sip kamailio[16371]: ALERT: <core> [main.c:755]: handle_sigs(): child process 16374 exited by a signal 11 Mar 26 14:47:11 sip kamailio[16371]: ALERT: <core> [main.c:758]: handle_sigs(): core was not generated Mar 26 14:47:11 sip kamailio[16371]: INFO: <core> [main.c:781]: handle_sigs(): terminating due to SIGCHLD Mar 26 14:47:11 sip kamailio[16493]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 26 14:47:11 sip kamailio[16500]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Mar 26 14:47:11 sip kamailio[16479]: INFO: <core> [main.c:836]: sig_usr(): signal 15 received Unfortunalty, even if I did my best to setup my service to generate a core on crash, I still have "core was not generated".... (debian stretch) Tks for reading! Regards Aymeric
-- Antisip - http://www.antisip.com
Tks Daniel,
I have installed the workaround.
lsof seems to indicate that I have installed and pre-loaded openssl_mutex_shared.so correctly.
I will let you know if I see the issue again.
Tks! Aymeric
Le lun. 20 mai 2019 à 09:49, Daniel-Constantin Mierla miconda@gmail.com a écrit :
Hello,
this kind of behaviour, with long time blocking and then moving on, is a symptom of the same issue. One of the observed behaviours was that attaching with gdb and detaching make code running further, that's what kamctl trap does. I haven't looked deeper, but my guess is that some signals are sent during the gdb operations.
It would be good if you can test with the workaround and see the results. There was already a report that the issue was not seen after a rather long running time.
Cheers, Daniel On 17.05.19 16:03, Aymeric Moizard wrote:
Hi!
I haven't used the workaround yet: I'm focusing on trying to make sure I have the same issue or trying to figure out how to force it to happen.
I have started to check again the server today and I started by this command: $> sudo kamcmd tls.list
In my previous description, the above was a dead lock. Today, It finally completed, but after 5 minutes. (I suspect 5 minutes is abnormal)
During the long running command: -> UDP was working -> TCP was not: -> The TCP connection is being ESTABLISHED, but the SIP message was not replied. (this was the behavior I had before)
At the same time, I took a trap "sudo kamctl trap". (during the dead lock) -> one thread is on "tls_list" (tls_rpc.c:154) -> one thread is on tcpconn_get (core/tcp_main.c:1449) called from tcp_send (core/tcp_main.c:1716) and seems to be sending a 484 Address Incomplete on a TLS connection -> 2 threads are on CRYPTO_THREAD_write_lock on a backtrace showing "SSL_do_handshake/tls_accept"
Suddenly, "sudo kamcmd tls.list" completed, and then, my TCP Agent received 4 answers from kamailio for the last 4 REGISTER sent.
I have a network capture for my TCP agent. I have a trap showing 2 thread waiting on "CRYPTO_THREAD_write_lock"
Conclusion: The use-case showed that the lock was VERY long. The use-case showed that the lock was TEMPORARY...
Side-note: From my understanding of the multi-fork/openssl issue, I would expect to see dead lock happening very fast after a kamailio restart?
Do you expect the preload workaround to work in such behavior? Or do you consider that my issue is different?
Because there is no "real" dead-lock, I don't understand why "my" issue would be related to libssl1.1...
My gdb trap, network capture are available in private exchange if you need! (please ask me by direct email)
Tks Aymeric