[SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

Joel Serrano joel at textplus.com
Fri Mar 22 21:19:40 CET 2019


Hi Aymeric,

Are you sure the issue is with TCP and not strictly related to TLS? I
highly suggest you compile with ssl1.0 and give it a try...

If you want to read how I got to that conclusion:
https://github.com/kamailio/kamailio/issues/1172

Hope it helps!
Joel.







On Fri, Mar 22, 2019 at 11:58 AM Aymeric Moizard <amoizard at gmail.com> wrote:

> Hi Daniel,
>
> Tks for the tips.
>
> My traffic does include TLS as well.
>
> For TCP settings:
>
> tcp_connection_lifetime=3600
> tcp_async=yes
> tcp_rd_buf_size=16384
> tcp_accept_no_cl=yes
> tcp_max_connections=50000
> tcp_connect_timeout=7
>
> For TLS:
> enable_tls=yes
> tls_max_connections=50000
>
> I'm using "set_forward_no_connect();" after lookup(location) since a long
> time.
>
> I have added this week "set_reply_no_connect();" in case it will help to
> avoid the issue.
>
> If the issue occurs, I will try to get something via "kamctrl trap".
>
> In order to get a coredump (on restart timeout?) I have added this in
> my kamailio.service
>
> WorkingDirectory=/var/run/kamailio
> LimitCORE=infinity
>
> I have also DUMP_CORE=yes in /etc/default/kamailio
> and disable_core_dump=no in my kamailio.cfg
>
> However, I'm not able to see any core dumps when restarting kamailio
> even when I see "sig_alarm_abort(): shutdown timeout triggered, dying"...
>
> Am I supposed to get a core dump in such case?
>
> Tks a lot!
> Aymeric
>
> Le ven. 22 mars 2019 à 14:19, Daniel-Constantin Mierla <miconda at gmail.com>
> a écrit :
>
>> Do you have pure tcp traffic and facing this issue, or there are actually
>> tls connections?
>>
>> What are the values for core parameters related to tcp connect and tcp
>> send timeouts?
>>
>> As for restart taking long, see exit_timeout parameter:
>>
>>   * https://www.kamailio.org/wiki/cookbooks/5.2.x/core#exit_timeout
>>
>> As for tls with libssl1.1/libcrypto1.1, I think I discover what the issue
>> is. With v1.1 they use their own internal locking functions, not exposing
>> any api to set them from outside. Before, kamailio was initializing the
>> libray telling to use Kamailio locks, giving one lock per connection. As i
>> could get from some gdb traces I received, with libssl 1.1, the same
>> internal lock is used for when attempting to connect to different addresses
>> as well as when trying to write to different connections. If one operation
>> is slow for what so ever reason, the others are waiting for the lock to be
>> lifted by the slow operation. I am digging in the source code of libssl1.1
>> to figure out a solution, it can still take a bit because I am travelling
>> for several days with no much spare time.
>>
>> Among the tunnings would be lower timeouts to connect and send, do not
>> attempt to connect unless you are sure the target expects new connections
>> (e.g., sending to a gateway/sip server accepting traffic via tls, but don't
>> do it even for the requests routed via lookup(location) as the registration
>> is using a connection with an ephemeral source port and trying to connect
>> back to it will fail). If still a major issue for what so ever reason,
>> using a version compiled with libssl1.0 would be something to go for it.
>>
>> Cheers,
>> Daniel
>> On 21.03.19 19:17, Aymeric Moizard wrote:
>>
>> Hi List,
>>
>> I want to share that I also met this issue last week with my kamailio
>> 5.2.2.
>>
>> As far as I was able to see, SIP application were able to "connect()"
>> with TCP, but my logs wasn't reporting any of the SIP message received
>> with TCP.
>>
>> I have an pike right before an xlog showing every incoming request.
>> However
>> I suspect the issue was not related to pike module. The log didn't showed
>> unusual
>> number of blocked traffic.
>>
>> I'm almost sure I haven't reached any ulimit restrictions.
>> I have many TCP, UDP childreen...
>> Server was not under high load
>> Nothing unusual.
>>
>> I'm running the default build for debian stretch from here:
>>    http://deb.kamailio.org/kamailio52 stretch
>>
>> And unfortunatly, I had some tiny pressure to restart the service so I was
>> not able to get deeper into the issue.
>>
>> If I'm correct, I will certainly improve much things by using
>> "set_reply_no_connect()".
>> I have added it and restarted!
>> (Tks Daniel for this tip!)
>>
>> I have been looking at issue reported here:
>> "Kamailio 5.0.2 suddenly stops processing traffic, then generates a core
>> when restarting."
>> https://github.com/kamailio/kamailio/issues/1172
>>
>> I have to say that I do have libssl1.1.
>> And I do have crash when I restart my kamailio. (even when I simply
>> restart after a configuration modification)
>>
>> Mar 21 18:28:50 sip kamailio[19222]: INFO: <core> [main.c:836]:
>> sig_usr(): signal 15 received
>> Mar 21 18:28:50 sip kamailio[19175]: ERROR: ctl [ctl.c:390]:
>> mod_destroy(): ERROR: ctl: could not delete unix socket
>> /var/run/kamailio/kamailio_ctl: Permission denied (13)
>>
>> [... one minute without nothing...]
>>
>> Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs
>> [jsonrpcs_fifo.c:599]: jsonrpc_fifo_destroy(): FIFO stat failed: Permission
>> denied
>> Mar 21 18:29:42 sip kamailio[19175]: ERROR: jsonrpcs
>> [jsonrpcs_sock.c:516]: jsonrpc_dgram_destroy(): socket stat failed:
>> Permission denied
>> Mar 21 18:29:50 sip kamailio[19175]: CRITICAL: <core> [main.c:662]:
>> sig_alarm_abort(): shutdown timeout triggered, dying...
>>
>> As the 1172 issue is closed, should I expect kamailio to still have
>> trouble with libssl1.1?
>>
>> I just restarted again my service (to see if it restart better after 30
>> minutes only instead of a week)
>>
>> Mar 21 19:07:30 sip kamailio[28737]: INFO: <core> [main.c:836]:
>> sig_usr(): signal 15 received
>> Mar 21 19:07:31 sip kamailio[28671]: ERROR: ctl [ctl.c:390]:
>> mod_destroy(): ERROR: ctl: could not delete unix socket
>> /var/run/kamailio/kamailio_ctl: Permission denied (13)
>>
>> [... one minute without nothing...]
>>
>> Mar 21 19:08:30 sip kamailio[28671]: CRITICAL: <core> [main.c:662]:
>> sig_alarm_abort(): shutdown timeout triggered, dying...
>>
>> Still not able to restart in a clean way!
>> Tks!
>> Regards
>> Aymeric
>>
>>
>> Le mer. 20 mars 2019 à 15:08, Daniel-Constantin Mierla <miconda at gmail.com>
>> a écrit :
>>
>>> Hello,
>>>
>>> based on the trap output I think I could figure out what happened there.
>>>
>>> You have tcp_children to very low value (1 or so), the problem is not
>>> actually that one, but the fact that the connection to upstream (the
>>> device/app sending the request) was closed after receiving the request
>>> and routing of the reply gets stuck in the way of:
>>>
>>>   - a reply is received and has to be forwarded
>>>   - connection was lost, so Kamailio tries to establish a new one, but
>>> takes time till fails because the upstream is behind nat or so based on
>>> the via header:
>>>
>>> Via: SIP/2.0/TLS
>>> 10.1.0.4:10002
>>> ;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
>>>
>>>   - the reply is retransmitted and gets to another worker, which tries
>>> to forward it again, but discovers a connection structure for that
>>> destination exists (created by previous reply worker) and now waits for
>>> the connection to be released (or better said, for the mutex on writing
>>> buffer to be unlocked)
>>>
>>>   - as the second reply waits, there can be other retransmissions of the
>>> reply ending up in other workers stuck on waiting for the mutex of the
>>> connection write buffer
>>>
>>> The solution here is to use set_reply_no_connect() -- you can put it
>>> first in request_route block. I think this would be a good addition to
>>> the default configuration file as well, IMO, the sip server should not
>>> connect for sending replies and should do it also for requests that go
>>> behind nat.
>>>
>>> Cheers,
>>> Daniel
>>>
>>> On 19.03.19 10:53, Kristijan Vrban wrote:
>>> > So i had again the situation. But this time, incoming udp was
>>> > affected. Kamailio was sending out OPTIONS (via dispatcher module) to
>>> > a group of asterisk machines
>>> > but the 200 OK reply to the OPTIONS where not processed, so the
>>> > dispatcher module set all asterisk to inactive, even though they
>>> > replied 200 OK
>>> >
>>> > Attached the output of kamctl trap during the situation. Hope there is
>>> > any useful in it. Because after "kamctl trap" it was working again
>>> > without kamailio restart.
>>> >
>>> > Best
>>> > Kristijan
>>> >
>>> > Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla
>>> > <miconda at gmail.com>:
>>> >> Hello,
>>> >>
>>> >> setting tcp_children=1 is not a god option for scallability,
>>> practically
>>> >> you set kamailio to process a single tcp message at one time, on high
>>> >> traffic, that won't work well.
>>> >>
>>> >> Maybe try to set tcp_children to 2 or 4, that should make an eventual
>>> >> race appear faster.
>>> >>
>>> >> Regarding the pid, if it is an outgoing connection, then it can be
>>> >> created by any worker process, including a UDP worker, if that was the
>>> >> one receiving the sip message over udp and sends it out via tcp.
>>> >>
>>> >> Cheers,
>>> >> Daniel
>>> >>
>>> >> On 18.03.19 10:09, Kristijan Vrban wrote:
>>> >>> Hi Daniel,
>>> >>>
>>> >>> for testing, i now had set: "tcp_children=1" and so far this issue
>>> did not occur
>>> >>> ever since. So now value to provide for "kamctl trap" yet.
>>> >>>
>>> >>> "kamctl ps" show this two process to handle tcp:
>>> >>>
>>> >>> ...
>>> >>>     }, {
>>> >>>       "IDX":  25,
>>> >>>       "PID":  71929,
>>> >>>       "DSC":  "tcp receiver (generic) child=0"
>>> >>>     }, {
>>> >>>       "IDX":  26,
>>> >>>       "PID":  71933,
>>> >>>       "DSC":  "tcp main process"
>>> >>>     }
>>> >>> ...
>>> >>>
>>> >>>
>>> >>> Ok, but then is was wondering to see a TCP connection on a udp
>>> receiver child:
>>> >>>
>>> >>>
>>> >>> netstat -ntp |grep 5061
>>> >>>
>>> >>> ...
>>> >>> tcp        0      0 172.17.217.10:5061      195.70.114.125:18252
>>> >>> ESTABLISHED 71895/kamailio
>>> >>> ...
>>> >>>
>>> >>> An pid 71895 is:
>>> >>>
>>> >>> }, {
>>> >>>       "IDX":  3,
>>> >>>       "PID":  71895,
>>> >>>       "DSC":  "udp receiver child=2 sock=127.0.0.1:5060"
>>> >>>     }, {
>>> >>>
>>> >>>
>>> >>>
>>> >>> And if i look into it via "lsof -p 71895" (the udp receiver child)
>>> >>>
>>> >>> ...
>>> >>> kamailio 71895 kamailio   14u  sock                0,9      0t0
>>> >>> 8856085 protocol: TCP
>>> >>> kamailio 71895 kamailio   15u  sock                0,9      0t0
>>> >>> 8886886 protocol: TCP
>>> >>> kamailio 71895 kamailio   16u  sock                0,9      0t0
>>> >>> 8854886 protocol: TCP
>>> >>> kamailio 71895 kamailio   17u  sock                0,9      0t0
>>> >>> 8828915 protocol: TCP
>>> >>> kamailio 71895 kamailio   18u  unix 0x000000005f73cb91      0t0
>>> >>> 1680314 type=DGRAM
>>> >>> kamailio 71895 kamailio   19u  IPv4            1846523      0t0
>>> >>> TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED)
>>> >>> kamailio 71895 kamailio   20u  sock                0,9      0t0
>>> >>> 8887192 protocol: TCP
>>> >>> kamailio 71895 kamailio   21u  sock                0,9      0t0
>>> >>> 8813634 protocol: TCP
>>> >>> kamailio 71895 kamailio   22u  unix 0x00000000c19bd102      0t0
>>> >>> 1681407 type=STREAM
>>> >>> kamailio 71895 kamailio   23u  sock                0,9      0t0
>>> >>> 8850488 protocol: TCP
>>> >>> ...
>>> >>>
>>> >>> Not only the ESTABLISHED TCP session. But also this empty sockets
>>> >>> "protocol: TCP"
>>> >>> What are they doing there in the udp receiver? Is that how it's
>>> supposed to be?
>>> >>>
>>> >>> Kristijan
>>> >>>
>>> >>> Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla
>>> >>> <miconda at gmail.com>:
>>> >>>> Can you get file written by `kamctl trap`? It should have the
>>> backtrace
>>> >>>> for all kamailio processes. You need latest kamailio 5.2.
>>> >>>>
>>> >>>> Also, get the output for: kamctl ps
>>> >>>>
>>> >>>> Cheers,
>>> >>>> Daniel
>>> >>>>
>>> >>>> On 14.03.19 13:52, Kristijan Vrban wrote:
>>> >>>>> When i attach via gdb to one of the tcp worker, i see this:
>>> >>>>>
>>> >>>>> (gdb) bt
>>> >>>>> #0  0x00007fdaf4d14470 in futex_wait (private=<optimized out>,
>>> >>>>> expected=1, futex_word=0x7fdaeca92f8c) at
>>> >>>>> ../sysdeps/unix/sysv/linux/futex-internal.h:61
>>> >>>>> #1  futex_wait_simple (private=<optimized out>, expected=1,
>>> >>>>> futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135
>>> >>>>> #2  __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at
>>> >>>>> pthread_rwlock_wrlock.c:67
>>> >>>>> #3  0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from
>>> >>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>> >>>>> #4  0x00007fdaf08e1c08 in ?? () from
>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>> >>>>> #5  0x00007fdaf08a6f69 in ?? () from
>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>> >>>>> #6  0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from
>>> >>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>> >>>>> #7  0x00007fdaf0c31144 in ?? () from
>>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>> >>>>> #8  0x00007fdaf0c2bddb in ?? () from
>>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>> >>>>> #9  0x00007fdaf0c22858 in ?? () from
>>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>> >>>>> #10 0x00007fdaf0c1af61 in SSL_do_handshake () from
>>> >>>>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>> >>>>> #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98,
>>> >>>>> error=0x7ffffe2a2df0) at tls_server.c:422
>>> >>>>> #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98,
>>> >>>>> flags=0x7ffffe2c318c) at tls_server.c:1116
>>> >>>>> #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98,
>>> >>>>> read_flags=0x7ffffe2c318c) at core/tcp_read.c:469
>>> >>>>> #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98,
>>> >>>>> bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at
>>> >>>>> core/tcp_read.c:1496
>>> >>>>> #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1,
>>> >>>>> idx=-1) at core/tcp_read.c:1862
>>> >>>>> #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0
>>> <io_w>,
>>> >>>>> t=2, repeat=0) at core/io_wait.h:1065
>>> >>>>> #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at
>>> >>>>> core/tcp_read.c:1974
>>> >>>>> #18 0x0000556ead4c8e24 in tcp_init_children () at
>>> core/tcp_main.c:4853
>>> >>>>> #19 0x0000556ead3c352a in main_loop () at main.c:1735
>>> >>>>> #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at
>>> main.c:2675
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban
>>> >>>>> <vrban.lkml at gmail.com>:
>>> >>>>>> Hi, with full debug is see this in log for every incoming TCP SIP
>>> request:
>>> >>>>>>
>>> >>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG:
>>> >>>>>> <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp
>>> >>>>>> receiver, connection passed to the least busy one (105)
>>> >>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG:
>>> >>>>>> <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker 2
>>> >>>>>> 27(17937) for activity on [tls:172.17.217.10:5061],
>>> 0x7fdaeda8f928
>>> >>>>>>
>>> >>>>>> So the Kamailio TCP process is working, and received TCP traffic.
>>> But
>>> >>>>>> the tcp workers are somehow busy.
>>> >>>>>>
>>> >>>>>> When i attach via strace to the TCP worker, i do not see any
>>> activity. Just:
>>> >>>>>>
>>> >>>>>> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL
>>> >>>>>>
>>> >>>>>> and nothing, even when i see the main tcp process choose this
>>> worker process.
>>> >>>>>>
>>> >>>>>> Kristijan
>>> >>>>>>
>>> >>>>>> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban
>>> >>>>>> <vrban.lkml at gmail.com>:
>>> >>>>>>> first of all thanks for the feedback. i prepared our system now
>>> to run
>>> >>>>>>> with debug=3
>>> >>>>>>> I hope to see more then then.
>>> >>>>>>>
>>> >>>>>>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban
>>> >>>>>>> <vrban.lkml at gmail.com>:
>>> >>>>>>>> Hi kamailios,
>>> >>>>>>>>
>>> >>>>>>>> i have a creepy situation with v5.2.1 stable Kamilio. After a
>>> day or
>>> >>>>>>>> so, Kamailio stop to process incoming SIP traffic via TCP. The
>>> >>>>>>>> incoming TCP network packages get TCP-ACK from the OS (Debian 9,
>>> >>>>>>>> 4.18.0-15-generic-Linux) but Kamailio does not show any
>>> processing for
>>> >>>>>>>> the SIP-Traffic incoming via TCP. No logs, nothing. While
>>> traffic via
>>> >>>>>>>> UDP is working just totally fine.
>>> >>>>>>>>
>>> >>>>>>>> When i look via command "netstat -ntp" is see, that the Recv-Q
>>> get
>>> >>>>>>>> bigger and bigger. e.g.:
>>> >>>>>>>>
>>> >>>>>>>> Proto Recv-Q Send-Q Local Address Foreign Address State
>>> PID/Program
>>> >>>>>>>> name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252
>>> ESTABLISHED
>>> >>>>>>>> 31347/kamailio
>>> >>>>>>>>
>>> >>>>>>>> After Kamailio restart, all is working fine again for a day. We
>>> have
>>> >>>>>>>> maybe 10-20 devices online via TCP and low call volume (1-2
>>> call per
>>> >>>>>>>> minute). The only settings for tcp we have is
>>> "tcp_delayed_ack=no"
>>> >>>>>>>>
>>> >>>>>>>> How to could we debug this situation? Again, no error, no
>>> warings in
>>> >>>>>>>> the log. Just nothing.
>>> >>>>>>>>
>>> >>>>>>>> Kristijan
>>> >>>>> _______________________________________________
>>> >>>>> Kamailio (SER) - Users Mailing List
>>> >>>>> sr-users at lists.kamailio.org
>>> >>>>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>>> >>>> --
>>> >>>> Daniel-Constantin Mierla -- www.asipto.com
>>> >>>> www.twitter.com/miconda -- www.linkedin.com/in/miconda
>>> >>>> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com
>>> >>>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC,
>>> USA -- www.asipto.com
>>> >>>>
>>> >>> _______________________________________________
>>> >>> Kamailio (SER) - Users Mailing List
>>> >>> sr-users at lists.kamailio.org
>>> >>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>>> >> --
>>> >> Daniel-Constantin Mierla -- www.asipto.com
>>> >> www.twitter.com/miconda -- www.linkedin.com/in/miconda
>>> >> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com
>>> >> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA
>>> -- www.asipto.com
>>> >>
>>> --
>>> Daniel-Constantin Mierla -- www.asipto.com
>>> www.twitter.com/miconda -- www.linkedin.com/in/miconda
>>> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com
>>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA --
>>> www.asipto.com
>>>
>>>
>>> _______________________________________________
>>> Kamailio (SER) - Users Mailing List
>>> sr-users at lists.kamailio.org
>>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>>>
>>
>>
>> --
>> Antisip - http://www.antisip.com
>>
>> --
>> Daniel-Constantin Mierla -- www.asipto.comwww.twitter.com/miconda -- www.linkedin.com/in/miconda
>> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com
>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
>>
>>
>
> --
> Antisip - http://www.antisip.com
> _______________________________________________
> Kamailio (SER) - Users Mailing List
> sr-users at lists.kamailio.org
> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kamailio.org/pipermail/sr-users/attachments/20190322/5e3956e1/attachment.html>


More information about the sr-users mailing list