[SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

Daniel-Constantin Mierla miconda at gmail.com
Mon Mar 25 14:27:33 CET 2019


Have you done a test with tools such as sipp, or was this happening
after a while, with usual phones registering?

Can you list the tcp connections and see if they are listed?

kamctl tcp core.tcp_list

Cheers,
Daniel

On 25.03.19 08:03, Kristijan Vrban wrote:
>> The solution here is to use set_reply_no_connect()
> implemented it. Now the issue has shifted to:
>
> ERROR: <core> [core/tcp_main.c:3959]: handle_new_connect(): maximum
> number of connections exceeded: 2048/2048
>
> But not a single TCP connection is active between Kamailio and any
> device. Seems this counter for maximum number of connections
> now has an issue?
>
> Kristijan
>
> Am Mi., 20. März 2019 um 15:07 Uhr schrieb Daniel-Constantin Mierla
> <miconda at gmail.com>:
>> Hello,
>>
>> based on the trap output I think I could figure out what happened there.
>>
>> You have tcp_children to very low value (1 or so), the problem is not
>> actually that one, but the fact that the connection to upstream (the
>> device/app sending the request) was closed after receiving the request
>> and routing of the reply gets stuck in the way of:
>>
>>   - a reply is received and has to be forwarded
>>   - connection was lost, so Kamailio tries to establish a new one, but
>> takes time till fails because the upstream is behind nat or so based on
>> the via header:
>>
>> Via: SIP/2.0/TLS
>> 10.1.0.4:10002;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
>>
>>   - the reply is retransmitted and gets to another worker, which tries
>> to forward it again, but discovers a connection structure for that
>> destination exists (created by previous reply worker) and now waits for
>> the connection to be released (or better said, for the mutex on writing
>> buffer to be unlocked)
>>
>>   - as the second reply waits, there can be other retransmissions of the
>> reply ending up in other workers stuck on waiting for the mutex of the
>> connection write buffer
>>
>> The solution here is to use set_reply_no_connect() -- you can put it
>> first in request_route block. I think this would be a good addition to
>> the default configuration file as well, IMO, the sip server should not
>> connect for sending replies and should do it also for requests that go
>> behind nat.
>>
>> Cheers,
>> Daniel
>>
>> On 19.03.19 10:53, Kristijan Vrban wrote:
>>> So i had again the situation. But this time, incoming udp was
>>> affected. Kamailio was sending out OPTIONS (via dispatcher module) to
>>> a group of asterisk machines
>>> but the 200 OK reply to the OPTIONS where not processed, so the
>>> dispatcher module set all asterisk to inactive, even though they
>>> replied 200 OK
>>>
>>> Attached the output of kamctl trap during the situation. Hope there is
>>> any useful in it. Because after "kamctl trap" it was working again
>>> without kamailio restart.
>>>
>>> Best
>>> Kristijan
>>>
>>> Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla
>>> <miconda at gmail.com>:
>>>> Hello,
>>>>
>>>> setting tcp_children=1 is not a god option for scallability, practically
>>>> you set kamailio to process a single tcp message at one time, on high
>>>> traffic, that won't work well.
>>>>
>>>> Maybe try to set tcp_children to 2 or 4, that should make an eventual
>>>> race appear faster.
>>>>
>>>> Regarding the pid, if it is an outgoing connection, then it can be
>>>> created by any worker process, including a UDP worker, if that was the
>>>> one receiving the sip message over udp and sends it out via tcp.
>>>>
>>>> Cheers,
>>>> Daniel
>>>>
>>>> On 18.03.19 10:09, Kristijan Vrban wrote:
>>>>> Hi Daniel,
>>>>>
>>>>> for testing, i now had set: "tcp_children=1" and so far this issue did not occur
>>>>> ever since. So now value to provide for "kamctl trap" yet.
>>>>>
>>>>> "kamctl ps" show this two process to handle tcp:
>>>>>
>>>>> ...
>>>>>     }, {
>>>>>       "IDX":  25,
>>>>>       "PID":  71929,
>>>>>       "DSC":  "tcp receiver (generic) child=0"
>>>>>     }, {
>>>>>       "IDX":  26,
>>>>>       "PID":  71933,
>>>>>       "DSC":  "tcp main process"
>>>>>     }
>>>>> ...
>>>>>
>>>>>
>>>>> Ok, but then is was wondering to see a TCP connection on a udp receiver child:
>>>>>
>>>>>
>>>>> netstat -ntp |grep 5061
>>>>>
>>>>> ...
>>>>> tcp        0      0 172.17.217.10:5061      195.70.114.125:18252
>>>>> ESTABLISHED 71895/kamailio
>>>>> ...
>>>>>
>>>>> An pid 71895 is:
>>>>>
>>>>> }, {
>>>>>       "IDX":  3,
>>>>>       "PID":  71895,
>>>>>       "DSC":  "udp receiver child=2 sock=127.0.0.1:5060"
>>>>>     }, {
>>>>>
>>>>>
>>>>>
>>>>> And if i look into it via "lsof -p 71895" (the udp receiver child)
>>>>>
>>>>> ...
>>>>> kamailio 71895 kamailio   14u  sock                0,9      0t0
>>>>> 8856085 protocol: TCP
>>>>> kamailio 71895 kamailio   15u  sock                0,9      0t0
>>>>> 8886886 protocol: TCP
>>>>> kamailio 71895 kamailio   16u  sock                0,9      0t0
>>>>> 8854886 protocol: TCP
>>>>> kamailio 71895 kamailio   17u  sock                0,9      0t0
>>>>> 8828915 protocol: TCP
>>>>> kamailio 71895 kamailio   18u  unix 0x000000005f73cb91      0t0
>>>>> 1680314 type=DGRAM
>>>>> kamailio 71895 kamailio   19u  IPv4            1846523      0t0
>>>>> TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED)
>>>>> kamailio 71895 kamailio   20u  sock                0,9      0t0
>>>>> 8887192 protocol: TCP
>>>>> kamailio 71895 kamailio   21u  sock                0,9      0t0
>>>>> 8813634 protocol: TCP
>>>>> kamailio 71895 kamailio   22u  unix 0x00000000c19bd102      0t0
>>>>> 1681407 type=STREAM
>>>>> kamailio 71895 kamailio   23u  sock                0,9      0t0
>>>>> 8850488 protocol: TCP
>>>>> ...
>>>>>
>>>>> Not only the ESTABLISHED TCP session. But also this empty sockets
>>>>> "protocol: TCP"
>>>>> What are they doing there in the udp receiver? Is that how it's supposed to be?
>>>>>
>>>>> Kristijan
>>>>>
>>>>> Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla
>>>>> <miconda at gmail.com>:
>>>>>> Can you get file written by `kamctl trap`? It should have the backtrace
>>>>>> for all kamailio processes. You need latest kamailio 5.2.
>>>>>>
>>>>>> Also, get the output for: kamctl ps
>>>>>>
>>>>>> Cheers,
>>>>>> Daniel
>>>>>>
>>>>>> On 14.03.19 13:52, Kristijan Vrban wrote:
>>>>>>> When i attach via gdb to one of the tcp worker, i see this:
>>>>>>>
>>>>>>> (gdb) bt
>>>>>>> #0  0x00007fdaf4d14470 in futex_wait (private=<optimized out>,
>>>>>>> expected=1, futex_word=0x7fdaeca92f8c) at
>>>>>>> ../sysdeps/unix/sysv/linux/futex-internal.h:61
>>>>>>> #1  futex_wait_simple (private=<optimized out>, expected=1,
>>>>>>> futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135
>>>>>>> #2  __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at
>>>>>>> pthread_rwlock_wrlock.c:67
>>>>>>> #3  0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from
>>>>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>>>>>> #4  0x00007fdaf08e1c08 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>>>>>> #5  0x00007fdaf08a6f69 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>>>>>> #6  0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from
>>>>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
>>>>>>> #7  0x00007fdaf0c31144 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>>>>>> #8  0x00007fdaf0c2bddb in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>>>>>> #9  0x00007fdaf0c22858 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>>>>>> #10 0x00007fdaf0c1af61 in SSL_do_handshake () from
>>>>>>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1
>>>>>>> #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98,
>>>>>>> error=0x7ffffe2a2df0) at tls_server.c:422
>>>>>>> #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98,
>>>>>>> flags=0x7ffffe2c318c) at tls_server.c:1116
>>>>>>> #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98,
>>>>>>> read_flags=0x7ffffe2c318c) at core/tcp_read.c:469
>>>>>>> #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98,
>>>>>>> bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at
>>>>>>> core/tcp_read.c:1496
>>>>>>> #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1,
>>>>>>> idx=-1) at core/tcp_read.c:1862
>>>>>>> #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0 <io_w>,
>>>>>>> t=2, repeat=0) at core/io_wait.h:1065
>>>>>>> #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at
>>>>>>> core/tcp_read.c:1974
>>>>>>> #18 0x0000556ead4c8e24 in tcp_init_children () at core/tcp_main.c:4853
>>>>>>> #19 0x0000556ead3c352a in main_loop () at main.c:1735
>>>>>>> #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at main.c:2675
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban
>>>>>>> <vrban.lkml at gmail.com>:
>>>>>>>> Hi, with full debug is see this in log for every incoming TCP SIP request:
>>>>>>>>
>>>>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG:
>>>>>>>> <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp
>>>>>>>> receiver, connection passed to the least busy one (105)
>>>>>>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG:
>>>>>>>> <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker 2
>>>>>>>> 27(17937) for activity on [tls:172.17.217.10:5061], 0x7fdaeda8f928
>>>>>>>>
>>>>>>>> So the Kamailio TCP process is working, and received TCP traffic. But
>>>>>>>> the tcp workers are somehow busy.
>>>>>>>>
>>>>>>>> When i attach via strace to the TCP worker, i do not see any activity. Just:
>>>>>>>>
>>>>>>>> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL
>>>>>>>>
>>>>>>>> and nothing, even when i see the main tcp process choose this worker process.
>>>>>>>>
>>>>>>>> Kristijan
>>>>>>>>
>>>>>>>> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban
>>>>>>>> <vrban.lkml at gmail.com>:
>>>>>>>>> first of all thanks for the feedback. i prepared our system now to run
>>>>>>>>> with debug=3
>>>>>>>>> I hope to see more then then.
>>>>>>>>>
>>>>>>>>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban
>>>>>>>>> <vrban.lkml at gmail.com>:
>>>>>>>>>> Hi kamailios,
>>>>>>>>>>
>>>>>>>>>> i have a creepy situation with v5.2.1 stable Kamilio. After a day or
>>>>>>>>>> so, Kamailio stop to process incoming SIP traffic via TCP. The
>>>>>>>>>> incoming TCP network packages get TCP-ACK from the OS (Debian 9,
>>>>>>>>>> 4.18.0-15-generic-Linux) but Kamailio does not show any processing for
>>>>>>>>>> the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via
>>>>>>>>>> UDP is working just totally fine.
>>>>>>>>>>
>>>>>>>>>> When i look via command "netstat -ntp" is see, that the Recv-Q get
>>>>>>>>>> bigger and bigger. e.g.:
>>>>>>>>>>
>>>>>>>>>> Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program
>>>>>>>>>> name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 ESTABLISHED
>>>>>>>>>> 31347/kamailio
>>>>>>>>>>
>>>>>>>>>> After Kamailio restart, all is working fine again for a day. We have
>>>>>>>>>> maybe 10-20 devices online via TCP and low call volume (1-2 call per
>>>>>>>>>> minute). The only settings for tcp we have is "tcp_delayed_ack=no"
>>>>>>>>>>
>>>>>>>>>> How to could we debug this situation? Again, no error, no warings in
>>>>>>>>>> the log. Just nothing.
>>>>>>>>>>
>>>>>>>>>> Kristijan
>>>>>>> _______________________________________________
>>>>>>> Kamailio (SER) - Users Mailing List
>>>>>>> sr-users at lists.kamailio.org
>>>>>>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>>>>>> --
>>>>>> Daniel-Constantin Mierla -- www.asipto.com
>>>>>> www.twitter.com/miconda -- www.linkedin.com/in/miconda
>>>>>> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com
>>>>>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
>>>>>>
>>>>> _______________________________________________
>>>>> Kamailio (SER) - Users Mailing List
>>>>> sr-users at lists.kamailio.org
>>>>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>>>> --
>>>> Daniel-Constantin Mierla -- www.asipto.com
>>>> www.twitter.com/miconda -- www.linkedin.com/in/miconda
>>>> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com
>>>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
>>>>
>> --
>> Daniel-Constantin Mierla -- www.asipto.com
>> www.twitter.com/miconda -- www.linkedin.com/in/miconda
>> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com
>> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
>>
-- 
Daniel-Constantin Mierla -- www.asipto.com
www.twitter.com/miconda -- www.linkedin.com/in/miconda
Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com
Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com




More information about the sr-users mailing list