[SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

Aymeric Moizard amoizard at gmail.com
Thu Mar 21 19:17:20 CET 2019


Hi List,

I want to share that I also met this issue last week with my kamailio 5.2.2.

As far as I was able to see, SIP application were able to "connect()"
with TCP, but my logs wasn't reporting any of the SIP message received
with TCP.

I have an pike right before an xlog showing every incoming request. However
I suspect the issue was not related to pike module. The log didn't showed
unusual
number of blocked traffic.

I'm almost sure I haven't reached any ulimit restrictions.
I have many TCP, UDP childreen...
Server was not under high load
Nothing unusual.

I'm running the default build for debian stretch from here:
   http://deb.kamailio.org/kamailio52 stretch

And unfortunatly, I had some tiny pressure to restart the service so I was
not able to get deeper into the issue.

If I'm correct, I will certainly improve much things by using
"set_reply_no_connect()".
I have added it and restarted!
(Tks Daniel for this tip!)

I have been looking at issue reported here:
"Kamailio 5.0.2 suddenly stops processing traffic, then generates a core
when restarting."
https://github.com/kamailio/kamailio/issues/1172

I have to say that I do have libssl1.1.
And I do have crash when I restart my kamailio. (even when I simply restart
after a configuration modification)

Mar 21 18:28:50 sip kamailio[19222]: INFO: <core> [main.c:836]: sig_usr():
signal 15 received
Mar 21 18:28:50 sip kamailio[19175]: ERROR: ctl [ctl.c:390]: mod_destroy():
ERROR: ctl: could not delete unix socket /var/run/kamailio/kamailio_ctl:
Permission denied (13)

[... one minute without nothing...]

Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs [jsonrpcs_fifo.c:599]:
jsonrpc_fifo_destroy(): FIFO stat failed: Permission denied
Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs [jsonrpcs_sock.c:516]:
jsonrpc_dgram_destroy(): socket stat failed: Permission denied
Mar 21 18:29:50 sip kamailio[19175]: CRITICAL: <core> [main.c:662]:
sig_alarm_abort(): shutdown timeout triggered, dying...

As the 1172 issue is closed, should I expect kamailio to still have trouble
with libssl1.1?

I just restarted again my service (to see if it restart better after 30
minutes only instead of a week)

Mar 21 19:07:30 sip kamailio[28737]: INFO: <core> [main.c:836]: sig_usr():
signal 15 received
Mar 21 19:07:31 sip kamailio[28671]: ERROR: ctl [ctl.c:390]: mod_destroy():
ERROR: ctl: could not delete unix socket /var/run/kamailio/kamailio_ctl:
Permission denied (13)

[... one minute without nothing...]

Mar 21 19:08:30 sip kamailio[28671]: CRITICAL: <core> [main.c:662]:
sig_alarm_abort(): shutdown timeout triggered, dying...

Still not able to restart in a clean way!
Tks!
Regards
Aymeric


Le mer. 20 mars 2019 à 15:08, Daniel-Constantin Mierla <miconda at gmail.com>
a écrit :

> Hello,
>
> based on the trap output I think I could figure out what happened there.
>
> You have tcp_children to very low value (1 or so), the problem is not
> actually that one, but the fact that the connection to upstream (the
> device/app sending the request) was closed after receiving the request
> and routing of the reply gets stuck in the way of:
>
>   - a reply is received and has to be forwarded
>   - connection was lost, so Kamailio tries to establish a new one, but
> takes time till fails because the upstream is behind nat or so based on
> the via header:
>
> Via: SIP/2.0/TLS
> 10.1.0.4:10002
> ;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
>
>   - the reply is retransmitted and gets to another worker, which tries
> to forward it again, but discovers a connection structure for that
> destination exists (created by previous reply worker) and now waits for
> the connection to be released (or better said, for the mutex on writing
> buffer to be unlocked)
>
>   - as the second reply waits, there can be other retransmissions of the
> reply ending up in other workers stuck on waiting for the mutex of the
> connection write buffer
>
> The solution here is to use set_reply_no_connect() -- you can put it
> first in request_route block. I think this would be a good addition to
> the default configuration file as well, IMO, the sip server should not
> connect for sending replies and should do it also for requests that go
> behind nat.
>
> Cheers,
> Daniel
>
> On 19.03.19 10:53, Kristijan Vrban wrote:
> > So i had again the situation. But this time, incoming udp was
> > affected. Kamailio was sending out OPTIONS (via dispatcher module) to
> > a group of asterisk machines
> > but the 200 OK reply to the OPTIONS where not processed, so the
> > dispatcher module set all asterisk to inactive, even though they
> > replied 200 OK
> >
> > Attached the output of kamctl trap during the situation. Hope there is
> > any useful in it. Because after "kamctl trap" it was working again
> > without kamailio restart.
> >
> > Best
> > Kristijan
> >
> > Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla
> > <miconda at gmail.com>:
> >> Hello,
> >>
> >> setting tcp_children=1 is not a god option for scallability, practically
> >> you set kamailio to process a single tcp message at one time, on high
> >> traffic, that won't work well.
> >>
> >> Maybe try to set tcp_children to 2 or 4, that should make an eventual
> >> race appear faster.
> >>
> >> Regarding the pid, if it is an outgoing connection, then it can be
> >> created by any worker process, including a UDP worker, if that was the
> >> one receiving the sip message over udp and sends it out via tcp.
> >>
> >> Cheers,
> >> Daniel
> >>
> >> On 18.03.19 10:09, Kristijan Vrban wrote:
> >>> Hi Daniel,
> >>>
> >>> for testing, i now had set: "tcp_children=1" and so far this issue did
> not occur
> >>> ever since. So now value to provide for "kamctl trap" yet.
> >>>
> >>> "kamctl ps" show this two process to handle tcp:
> >>>
> >>> ...
> >>>     }, {
> >>>       "IDX":  25,
> >>>       "PID":  71929,
> >>>       "DSC":  "tcp receiver (generic) child=0"
> >>>     }, {
> >>>       "IDX":  26,
> >>>       "PID":  71933,
> >>>       "DSC":  "tcp main process"
> >>>     }
> >>> ...
> >>>
> >>>
> >>> Ok, but then is was wondering to see a TCP connection on a udp
> receiver child:
> >>>
> >>>
> >>> netstat -ntp |grep 5061
> >>>
> >>> ...
> >>> tcp        0      0 172.17.217.10:5061      195.70.114.125:18252
> >>> ESTABLISHED 71895/kamailio
> >>> ...
> >>>
> >>> An pid 71895 is:
> >>>
> >>> }, {
> >>>       "IDX":  3,
> >>>       "PID":  71895,
> >>>       "DSC":  "udp receiver child=2 sock=127.0.0.1:5060"
> >>>     }, {
> >>>
> >>>
> >>>
> >>> And if i look into it via "lsof -p 71895" (the udp receiver child)
> >>>
> >>> ...
> >>> kamailio 71895 kamailio   14u  sock                0,9      0t0
> >>> 8856085 protocol: TCP
> >>> kamailio 71895 kamailio   15u  sock                0,9      0t0
> >>> 8886886 protocol: TCP
> >>> kamailio 71895 kamailio   16u  sock                0,9      0t0
> >>> 8854886 protocol: TCP
> >>> kamailio 71895 kamailio   17u  sock                0,9      0t0
> >>> 8828915 protocol: TCP
> >>> kamailio 71895 kamailio   18u  unix 0x000000005f73cb91      0t0
> >>> 1680314 type=DGRAM
> >>> kamailio 71895 kamailio   19u  IPv4            1846523      0t0
> >>> TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED)
> >>> kamailio 71895 kamailio   20u  sock                0,9      0t0
> >>> 8887192 protocol: TCP
> >>> kamailio 71895 kamailio   21u  sock                0,9      0t0
> >>> 8813634 protocol: TCP
> >>> kamailio 71895 kamailio   22u  unix 0x00000000c19bd102      0t0
> >>> 1681407 type=STREAM
> >>> kamailio 71895 kamailio   23u  sock                0,9      0t0
> >>> 8850488 protocol: TCP
> >>> ...
> >>>
> >>> Not only the ESTABLISHED TCP session. But also this empty sockets
> >>> "protocol: TCP"
> >>> What are they doing there in the udp receiver? Is that how it's
> supposed to be?
> >>>
> >>> Kristijan
> >>>
> >>> Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla
> >>> <miconda at gmail.com>:
> >>>> Can you get file written by `kamctl trap`? It should have the
> backtrace
> >>>> for all kamailio processes. You need latest kamailio 5.2.
> >>>>
> >>>> Also, get the output for: kamctl ps
> >>>>
> >>>> Cheers,
> >>>> Daniel
> >>>>
> >>>> On 14.03.19 13:52, Kristijan Vrban wrote:
> >>>>> When i attach via gdb to one of the tcp worker, i see this:
> >>>>>
> >>>>> (gdb) bt
> >>>>> #0  0x00007fdaf4d14470 in futex_wait (private=<optimized out>,
> >>>>> expected=1, futex_word=0x7fdaeca92f8c) at
> >>>>> ../sysdeps/unix/sysv/linux/futex-internal.h:61
> >>>>> #1  futex_wait_simple (private=<optimized out>, expected=1,
> >>>>> futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135
> >>>>> #2  __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at
> >>>>> pthread_rwlock_wrlock.c:67
> >>>>> #3  0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from
> >>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
> >>>>> #4  0x00007fdaf08e1c08 in ?? () from
> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
> >>>>> #5  0x00007fdaf08a6f69 in ?? () from
> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
> >>>>> #6  0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from
> >>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
> >>>>> #7  0x00007fdaf0c31144 in ?? () from
> /usr/lib/x86_64-linux-gnu/libssl.so.1.1
> >>>>> #8  0x00007fdaf0c2bddb in ?? () from
> /usr/lib/x86_64-linux-gnu/libssl.so.1.1
> >>>>> #9  0x00007fdaf0c22858 in ?? () from
> /usr/lib/x86_64-linux-gnu/libssl.so.1.1
> >>>>> #10 0x00007fdaf0c1af61 in SSL_do_handshake () from
> >>>>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1
> >>>>> #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98,
> >>>>> error=0x7ffffe2a2df0) at tls_server.c:422
> >>>>> #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98,
> >>>>> flags=0x7ffffe2c318c) at tls_server.c:1116
> >>>>> #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98,
> >>>>> read_flags=0x7ffffe2c318c) at core/tcp_read.c:469
> >>>>> #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98,
> >>>>> bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at
> >>>>> core/tcp_read.c:1496
> >>>>> #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1,
> >>>>> idx=-1) at core/tcp_read.c:1862
> >>>>> #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0
> <io_w>,
> >>>>> t=2, repeat=0) at core/io_wait.h:1065
> >>>>> #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at
> >>>>> core/tcp_read.c:1974
> >>>>> #18 0x0000556ead4c8e24 in tcp_init_children () at
> core/tcp_main.c:4853
> >>>>> #19 0x0000556ead3c352a in main_loop () at main.c:1735
> >>>>> #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at
> main.c:2675
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban
> >>>>> <vrban.lkml at gmail.com>:
> >>>>>> Hi, with full debug is see this in log for every incoming TCP SIP
> request:
> >>>>>>
> >>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG:
> >>>>>> <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp
> >>>>>> receiver, connection passed to the least busy one (105)
> >>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG:
> >>>>>> <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker 2
> >>>>>> 27(17937) for activity on [tls:172.17.217.10:5061], 0x7fdaeda8f928
> >>>>>>
> >>>>>> So the Kamailio TCP process is working, and received TCP traffic.
> But
> >>>>>> the tcp workers are somehow busy.
> >>>>>>
> >>>>>> When i attach via strace to the TCP worker, i do not see any
> activity. Just:
> >>>>>>
> >>>>>> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL
> >>>>>>
> >>>>>> and nothing, even when i see the main tcp process choose this
> worker process.
> >>>>>>
> >>>>>> Kristijan
> >>>>>>
> >>>>>> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban
> >>>>>> <vrban.lkml at gmail.com>:
> >>>>>>> first of all thanks for the feedback. i prepared our system now to
> run
> >>>>>>> with debug=3
> >>>>>>> I hope to see more then then.
> >>>>>>>
> >>>>>>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban
> >>>>>>> <vrban.lkml at gmail.com>:
> >>>>>>>> Hi kamailios,
> >>>>>>>>
> >>>>>>>> i have a creepy situation with v5.2.1 stable Kamilio. After a day
> or
> >>>>>>>> so, Kamailio stop to process incoming SIP traffic via TCP. The
> >>>>>>>> incoming TCP network packages get TCP-ACK from the OS (Debian 9,
> >>>>>>>> 4.18.0-15-generic-Linux) but Kamailio does not show any
> processing for
> >>>>>>>> the SIP-Traffic incoming via TCP. No logs, nothing. While traffic
> via
> >>>>>>>> UDP is working just totally fine.
> >>>>>>>>
> >>>>>>>> When i look via command "netstat -ntp" is see, that the Recv-Q get
> >>>>>>>> bigger and bigger. e.g.:
> >>>>>>>>
> >>>>>>>> Proto Recv-Q Send-Q Local Address Foreign Address State
> PID/Program
> >>>>>>>> name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252
> ESTABLISHED
> >>>>>>>> 31347/kamailio
> >>>>>>>>
> >>>>>>>> After Kamailio restart, all is working fine again for a day. We
> have
> >>>>>>>> maybe 10-20 devices online via TCP and low call volume (1-2 call
> per
> >>>>>>>> minute). The only settings for tcp we have is "tcp_delayed_ack=no"
> >>>>>>>>
> >>>>>>>> How to could we debug this situation? Again, no error, no warings
> in
> >>>>>>>> the log. Just nothing.
> >>>>>>>>
> >>>>>>>> Kristijan
> >>>>> _______________________________________________
> >>>>> Kamailio (SER) - Users Mailing List
> >>>>> sr-users at lists.kamailio.org
> >>>>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
> >>>> --
> >>>> Daniel-Constantin Mierla -- www.asipto.com
> >>>> www.twitter.com/miconda -- www.linkedin.com/in/miconda
> >>>> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com
> >>>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA
> -- www.asipto.com
> >>>>
> >>> _______________________________________________
> >>> Kamailio (SER) - Users Mailing List
> >>> sr-users at lists.kamailio.org
> >>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
> >> --
> >> Daniel-Constantin Mierla -- www.asipto.com
> >> www.twitter.com/miconda -- www.linkedin.com/in/miconda
> >> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com
> >> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA --
> www.asipto.com
> >>
> --
> Daniel-Constantin Mierla -- www.asipto.com
> www.twitter.com/miconda -- www.linkedin.com/in/miconda
> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com
> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA --
> www.asipto.com
>
>
> _______________________________________________
> Kamailio (SER) - Users Mailing List
> sr-users at lists.kamailio.org
> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>


-- 
Antisip - http://www.antisip.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kamailio.org/pipermail/sr-users/attachments/20190321/878520d3/attachment.html>


More information about the sr-users mailing list