[SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

Kristijan Vrban vrban.lkml at gmail.com
Tue Mar 19 10:53:09 CET 2019


So i had again the situation. But this time, incoming udp was
affected. Kamailio was sending out OPTIONS (via dispatcher module) to
a group of asterisk machines
but the 200 OK reply to the OPTIONS where not processed, so the
dispatcher module set all asterisk to inactive, even though they
replied 200 OK

Attached the output of kamctl trap during the situation. Hope there is
any useful in it. Because after "kamctl trap" it was working again
without kamailio restart.

Best
Kristijan

Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin Mierla
<miconda at gmail.com>:
>
> Hello,
>
> setting tcp_children=1 is not a god option for scallability, practically
> you set kamailio to process a single tcp message at one time, on high
> traffic, that won't work well.
>
> Maybe try to set tcp_children to 2 or 4, that should make an eventual
> race appear faster.
>
> Regarding the pid, if it is an outgoing connection, then it can be
> created by any worker process, including a UDP worker, if that was the
> one receiving the sip message over udp and sends it out via tcp.
>
> Cheers,
> Daniel
>
> On 18.03.19 10:09, Kristijan Vrban wrote:
> > Hi Daniel,
> >
> > for testing, i now had set: "tcp_children=1" and so far this issue did not occur
> > ever since. So now value to provide for "kamctl trap" yet.
> >
> > "kamctl ps" show this two process to handle tcp:
> >
> > ...
> >     }, {
> >       "IDX":  25,
> >       "PID":  71929,
> >       "DSC":  "tcp receiver (generic) child=0"
> >     }, {
> >       "IDX":  26,
> >       "PID":  71933,
> >       "DSC":  "tcp main process"
> >     }
> > ...
> >
> >
> > Ok, but then is was wondering to see a TCP connection on a udp receiver child:
> >
> >
> > netstat -ntp |grep 5061
> >
> > ...
> > tcp        0      0 172.17.217.10:5061      195.70.114.125:18252
> > ESTABLISHED 71895/kamailio
> > ...
> >
> > An pid 71895 is:
> >
> > }, {
> >       "IDX":  3,
> >       "PID":  71895,
> >       "DSC":  "udp receiver child=2 sock=127.0.0.1:5060"
> >     }, {
> >
> >
> >
> > And if i look into it via "lsof -p 71895" (the udp receiver child)
> >
> > ...
> > kamailio 71895 kamailio   14u  sock                0,9      0t0
> > 8856085 protocol: TCP
> > kamailio 71895 kamailio   15u  sock                0,9      0t0
> > 8886886 protocol: TCP
> > kamailio 71895 kamailio   16u  sock                0,9      0t0
> > 8854886 protocol: TCP
> > kamailio 71895 kamailio   17u  sock                0,9      0t0
> > 8828915 protocol: TCP
> > kamailio 71895 kamailio   18u  unix 0x000000005f73cb91      0t0
> > 1680314 type=DGRAM
> > kamailio 71895 kamailio   19u  IPv4            1846523      0t0
> > TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED)
> > kamailio 71895 kamailio   20u  sock                0,9      0t0
> > 8887192 protocol: TCP
> > kamailio 71895 kamailio   21u  sock                0,9      0t0
> > 8813634 protocol: TCP
> > kamailio 71895 kamailio   22u  unix 0x00000000c19bd102      0t0
> > 1681407 type=STREAM
> > kamailio 71895 kamailio   23u  sock                0,9      0t0
> > 8850488 protocol: TCP
> > ...
> >
> > Not only the ESTABLISHED TCP session. But also this empty sockets
> > "protocol: TCP"
> > What are they doing there in the udp receiver? Is that how it's supposed to be?
> >
> > Kristijan
> >
> > Am Do., 14. März 2019 um 14:48 Uhr schrieb Daniel-Constantin Mierla
> > <miconda at gmail.com>:
> >> Can you get file written by `kamctl trap`? It should have the backtrace
> >> for all kamailio processes. You need latest kamailio 5.2.
> >>
> >> Also, get the output for: kamctl ps
> >>
> >> Cheers,
> >> Daniel
> >>
> >> On 14.03.19 13:52, Kristijan Vrban wrote:
> >>> When i attach via gdb to one of the tcp worker, i see this:
> >>>
> >>> (gdb) bt
> >>> #0  0x00007fdaf4d14470 in futex_wait (private=<optimized out>,
> >>> expected=1, futex_word=0x7fdaeca92f8c) at
> >>> ../sysdeps/unix/sysv/linux/futex-internal.h:61
> >>> #1  futex_wait_simple (private=<optimized out>, expected=1,
> >>> futex_word=0x7fdaeca92f8c) at ../sysdeps/nptl/futex-internal.h:135
> >>> #2  __pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at
> >>> pthread_rwlock_wrlock.c:67
> >>> #3  0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock () from
> >>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
> >>> #4  0x00007fdaf08e1c08 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
> >>> #5  0x00007fdaf08a6f69 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
> >>> #6  0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from
> >>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
> >>> #7  0x00007fdaf0c31144 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
> >>> #8  0x00007fdaf0c2bddb in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
> >>> #9  0x00007fdaf0c22858 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
> >>> #10 0x00007fdaf0c1af61 in SSL_do_handshake () from
> >>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1
> >>> #11 0x00007fdaf0e8d31b in tls_accept (c=0x7fdaed26fa98,
> >>> error=0x7ffffe2a2df0) at tls_server.c:422
> >>> #12 0x00007fdaf0e96a1b in tls_read_f (c=0x7fdaed26fa98,
> >>> flags=0x7ffffe2c318c) at tls_server.c:1116
> >>> #13 0x0000556ead5e7c46 in tcp_read_headers (c=0x7fdaed26fa98,
> >>> read_flags=0x7ffffe2c318c) at core/tcp_read.c:469
> >>> #14 0x0000556ead5ef9cb in tcp_read_req (con=0x7fdaed26fa98,
> >>> bytes_read=0x7ffffe2c3184, read_flags=0x7ffffe2c318c) at
> >>> core/tcp_read.c:1496
> >>> #15 0x0000556ead5f575f in handle_io (fm=0x7fdaf597aa98, events=1,
> >>> idx=-1) at core/tcp_read.c:1862
> >>> #16 0x0000556ead5e2053 in io_wait_loop_epoll (h=0x556eadaaeec0 <io_w>,
> >>> t=2, repeat=0) at core/io_wait.h:1065
> >>> #17 0x0000556ead5f6b35 in tcp_receive_loop (unix_sock=49) at
> >>> core/tcp_read.c:1974
> >>> #18 0x0000556ead4c8e24 in tcp_init_children () at core/tcp_main.c:4853
> >>> #19 0x0000556ead3c352a in main_loop () at main.c:1735
> >>> #20 0x0000556ead3ca5f8 in main (argc=13, argv=0x7ffffe2c3828) at main.c:2675
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> Am Do., 14. März 2019 um 13:41 Uhr schrieb Kristijan Vrban
> >>> <vrban.lkml at gmail.com>:
> >>>> Hi, with full debug is see this in log for every incoming TCP SIP request:
> >>>>
> >>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG:
> >>>> <core> [core/tcp_main.c:3871]: send2child(): WARNING: no free tcp
> >>>> receiver, connection passed to the least busy one (105)
> >>>> Mar 14 12:10:15 kamailio-preview /usr/sbin/kamailio[17940]: DEBUG:
> >>>> <core> [core/tcp_main.c:3875]: send2child(): selected tcp worker 2
> >>>> 27(17937) for activity on [tls:172.17.217.10:5061], 0x7fdaeda8f928
> >>>>
> >>>> So the Kamailio TCP process is working, and received TCP traffic. But
> >>>> the tcp workers are somehow busy.
> >>>>
> >>>> When i attach via strace to the TCP worker, i do not see any activity. Just:
> >>>>
> >>>> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1, NULL
> >>>>
> >>>> and nothing, even when i see the main tcp process choose this worker process.
> >>>>
> >>>> Kristijan
> >>>>
> >>>> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb Kristijan Vrban
> >>>> <vrban.lkml at gmail.com>:
> >>>>> first of all thanks for the feedback. i prepared our system now to run
> >>>>> with debug=3
> >>>>> I hope to see more then then.
> >>>>>
> >>>>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb Kristijan Vrban
> >>>>> <vrban.lkml at gmail.com>:
> >>>>>> Hi kamailios,
> >>>>>>
> >>>>>> i have a creepy situation with v5.2.1 stable Kamilio. After a day or
> >>>>>> so, Kamailio stop to process incoming SIP traffic via TCP. The
> >>>>>> incoming TCP network packages get TCP-ACK from the OS (Debian 9,
> >>>>>> 4.18.0-15-generic-Linux) but Kamailio does not show any processing for
> >>>>>> the SIP-Traffic incoming via TCP. No logs, nothing. While traffic via
> >>>>>> UDP is working just totally fine.
> >>>>>>
> >>>>>> When i look via command "netstat -ntp" is see, that the Recv-Q get
> >>>>>> bigger and bigger. e.g.:
> >>>>>>
> >>>>>> Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program
> >>>>>> name tcp 4566 0 172.17.217.12:5060 xxx.xxx.xxx.xxx:57252 ESTABLISHED
> >>>>>> 31347/kamailio
> >>>>>>
> >>>>>> After Kamailio restart, all is working fine again for a day. We have
> >>>>>> maybe 10-20 devices online via TCP and low call volume (1-2 call per
> >>>>>> minute). The only settings for tcp we have is "tcp_delayed_ack=no"
> >>>>>>
> >>>>>> How to could we debug this situation? Again, no error, no warings in
> >>>>>> the log. Just nothing.
> >>>>>>
> >>>>>> Kristijan
> >>> _______________________________________________
> >>> Kamailio (SER) - Users Mailing List
> >>> sr-users at lists.kamailio.org
> >>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
> >> --
> >> Daniel-Constantin Mierla -- www.asipto.com
> >> www.twitter.com/miconda -- www.linkedin.com/in/miconda
> >> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com
> >> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
> >>
> > _______________________________________________
> > Kamailio (SER) - Users Mailing List
> > sr-users at lists.kamailio.org
> > https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>
> --
> Daniel-Constantin Mierla -- www.asipto.com
> www.twitter.com/miconda -- www.linkedin.com/in/miconda
> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com
> Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gdb_kamailio_20190319_072411
Type: application/octet-stream
Size: 141063 bytes
Desc: not available
URL: <http://lists.kamailio.org/pipermail/sr-users/attachments/20190319/80efe21e/attachment.obj>


More information about the sr-users mailing list