Hi All,
I was debugging a TCP issue (most probably, I may start a thread for this
question).
I was trying to get some info for TCP and TLS.
I typed:
$> sudo kamctl rpc tls.list
And waited for a while.... until... I realized that my User-Agent,
connected with TCP was not able to register any more. I think the rpc
command has introduced something wrong.
The device can successfully "connect", send the REGISTER over the
established TCP connection. The REGISTER do not appear in the logs any
more, I don't see any traffic for TCP any more. So the behavior is the same
as I had before: TCP and TLS are both not working and UDP is still working
fine.
kamctl do not work any more... so kamctl trap do not work...
I have been able to type.. manually... for (all?) kamailio threads:
gdb /usr/sbin/kamailio 16500 -batch --eval-command="bt full" >>
kamailio-trap-tcp-down.txt
I'm temporarly puting the backtrace I have here:
You can see a thread stuck on the json command line: "tls_list"
And many other waiting on CRYPTO_THREAD_write_lock
? might be related to:
SIDE NOTE:
Right before I was typing the last gdb command for the last thread, kamailio
has crashed: This was around 5 minutes after the dead lock started.
Mar 26 14:47:11 sip kamailio[16493]: ERROR: <core> [core/tcp_main.c:2561]:
tcpconn_do_send(): failed to send on 0x7ff8dfc2fdc8 (91.121.30.149:5061->
62.210.97.21:49351): Broken pipe (32)
Mar 26 14:47:11 sip kamailio[16493]: ERROR: <core> [core/tcp_read.c:1505]:
tcp_read_req(): ERROR: tcp_read_req: error reading - c: 0x7ff8dfc2fdc8 r:
0x7ff8dfc2fe48 (-1)
Mar 26 14:47:11 sip kamailio[16493]: WARNING: <core>
[core/tcp_read.c:1848]: handle_io(): F_TCPCONN connection marked as bad:
0x7ff8dfa6a408 id 846 refcnt 3
Mar 26 14:47:11 sip kamailio[16371]: ALERT: <core> [main.c:755]:
handle_sigs(): child process 16374 exited by a signal 11
Mar 26 14:47:11 sip kamailio[16371]: ALERT: <core> [main.c:758]:
handle_sigs(): core was not generated
Mar 26 14:47:11 sip kamailio[16371]: INFO: <core> [main.c:781]:
handle_sigs(): terminating due to SIGCHLD
Mar 26 14:47:11 sip kamailio[16493]: INFO: <core> [main.c:836]: sig_usr():
signal 15 received
Mar 26 14:47:11 sip kamailio[16500]: INFO: <core> [main.c:836]: sig_usr():
signal 15 received
Mar 26 14:47:11 sip kamailio[16479]: INFO: <core> [main.c:836]: sig_usr():
signal 15 received
Unfortunalty, even if I did my best to setup my service to generate a core
on crash, I still have "core was not generated".... (debian stretch)
Tks for reading!
Regards
Aymeric
Le mar. 26 mars 2019 à 14:11, Kristijan Vrban <vrban.lkml(a)gmail.com> a
écrit :
And again one more kamctl trap file where
set_reply_no_connect was set.
Am Di., 26. März 2019 um 08:53 Uhr schrieb Kristijan Vrban
<vrban.lkml(a)gmail.com>om>:
Attached also the output of kamctl trap
Am Di., 26. März 2019 um 08:42 Uhr schrieb Kristijan Vrban
<vrban.lkml(a)gmail.com>om>:
>
> > Have you done a test with tools such as sipp, or was this happening
> > after a while, with usual phones registering?
>
> Usual variety of devices registering via TLS. But i can not exclude
> that some devices displaying behavioural problems.
>
> > Can you list the tcp connections and see if they are listed?
> > kamctl tcp core.tcp_list
>
> Need Kex module for that? So i can deliver next time. But when i do
> "lsof -u kamailio |grep TCP"
> i get a long list of more then 2000 lines with:
>
> ...
> kamailio 37561 kamailio 2105u sock 0,9 0t0
> 27856287 protocol: TCP
> kamailio 37561 kamailio 2106u sock 0,9 0t0
> 27856305 protocol: TCP
> kamailio 37561 kamailio 2107u sock 0,9 0t0
> 27856306 protocol: TCP
> kamailio 37561 kamailio 2108u sock 0,9 0t0
> 27856914 protocol: TCP
> ...
>
> So about the time Kamailio created a lot of socket in the TCP domain,
> but which are not bound to any port (eg via connect(2) or listen(2) or
> bind(2))
> Until we get to the maximum number of 2048 connections.
>
> Best
> Kristijan
>
> Am Mo., 25. März 2019 um 14:27 Uhr schrieb Daniel-Constantin Mierla
> <miconda(a)gmail.com>om>:
> >
> > Have you done a test with tools such as sipp, or was this happening
> > after a while, with usual phones registering?
> >
> > Can you list the tcp connections and see if they are listed?
> >
> > kamctl tcp core.tcp_list
> >
> > Cheers,
> > Daniel
> >
> > On 25.03.19 08:03, Kristijan Vrban wrote:
> > >> The solution here is to use set_reply_no_connect()
> > > implemented it. Now the issue has shifted to:
> >
> > > ERROR:
<core> [core/tcp_main.c:3959]: handle_new_connect(): maximum
> > > number of connections exceeded: 2048/2048
> >
> > > But not a single
TCP connection is active between Kamailio and any
> > > device. Seems this counter for maximum number of connections
> > > now has an issue?
> >
> > > Kristijan
> >
> > > Am Mi., 20. März
2019 um 15:07 Uhr schrieb Daniel-Constantin Mierla
> > > <miconda(a)gmail.com>om>:
> > >> Hello,
> > >>
> > >> based on the trap output I think I could figure out what happened
there.
> > >>
> > >> You have tcp_children to very low value (1 or so), the problem is
not
> > >> actually that one, but the
fact that the connection to upstream
(the
> > >> device/app sending the
request) was closed after receiving the
request
> > >> and routing of the reply gets
stuck in the way of:
> > >>
> > >> - a reply is received and has to be forwarded
> > >> - connection was lost, so Kamailio tries to establish a new
one, but
> > >> takes time till fails because
the upstream is behind nat or so
based on
> > >> the via header:
> > >>
> > >> Via: SIP/2.0/TLS
> > >> 10.1.0.4:10002
;rport=55229;received=13.94.188.218;branch=z9hG4bK-3336-7f2927bfd703ae907348edff3611bfc9
> > >>
> > >> - the reply is retransmitted and gets to another worker, which
tries
> > >> to forward it again, but
discovers a connection structure for that
> > >> destination exists (created by previous reply worker) and now
waits for
> > >> the connection to be released
(or better said, for the mutex on
writing
> > >> buffer to be unlocked)
> > >>
> > >> - as the second reply waits, there can be other retransmissions
of the
> > >> reply ending up in other
workers stuck on waiting for the mutex
of the
> > >> connection write buffer
> > >>
> > >> The solution here is to use set_reply_no_connect() -- you can put
it
> > >> first in request_route block.
I think this would be a good
addition to
> > >> the default configuration file
as well, IMO, the sip server
should not
> > >> connect for sending replies
and should do it also for requests
that go
> > >> behind nat.
> > >>
> > >> Cheers,
> > >> Daniel
> > >>
> > >> On 19.03.19 10:53, Kristijan Vrban wrote:
> > >>> So i had again the situation. But this time, incoming udp was
> > >>> affected. Kamailio was sending out OPTIONS (via dispatcher
module) to
> > >>> a group of asterisk
machines
> > >>> but the 200 OK reply to the OPTIONS where not processed, so the
> > >>> dispatcher module set all asterisk to inactive, even though they
> > >>> replied 200 OK
> > >>>
> > >>> Attached the output of kamctl trap during the situation. Hope
there is
> > >>> any useful in it. Because
after "kamctl trap" it was working
again
> > >>> without kamailio restart.
> > >>>
> > >>> Best
> > >>> Kristijan
> > >>>
> > >>> Am Mo., 18. März 2019 um 12:27 Uhr schrieb Daniel-Constantin
Mierla
> > >>>
<miconda(a)gmail.com>om>:
> > >>>> Hello,
> > >>>>
> > >>>> setting tcp_children=1 is not a god option for scallability,
practically
> > >>>> you set kamailio to
process a single tcp message at one time,
on high
> > >>>> traffic, that
won't work well.
> > >>>>
> > >>>> Maybe try to set tcp_children to 2 or 4, that should make an
eventual
> > >>>> race appear faster.
> > >>>>
> > >>>> Regarding the pid, if it is an outgoing connection, then it
can
be
> > >>>> created by any worker
process, including a UDP worker, if that
was the
> > >>>> one receiving the sip
message over udp and sends it out via tcp.
> > >>>>
> > >>>> Cheers,
> > >>>> Daniel
> > >>>>
> > >>>> On 18.03.19 10:09, Kristijan Vrban wrote:
> > >>>>> Hi Daniel,
> > >>>>>
> > >>>>> for testing, i now had set: "tcp_children=1" and
so far this
issue did not occur
> > >>>>> ever since. So now
value to provide for "kamctl trap" yet.
> > >>>>>
> > >>>>> "kamctl ps" show this two process to handle
tcp:
> > >>>>>
> > >>>>> ...
> > >>>>> }, {
> > >>>>> "IDX": 25,
> > >>>>> "PID": 71929,
> > >>>>> "DSC": "tcp receiver (generic)
child=0"
> > >>>>> }, {
> > >>>>> "IDX": 26,
> > >>>>> "PID": 71933,
> > >>>>> "DSC": "tcp main process"
> > >>>>> }
> > >>>>> ...
> > >>>>>
> > >>>>>
> > >>>>> Ok, but then is was wondering to see a TCP connection on a
udp
receiver child:
> > >>>>>
> > >>>>>
> > >>>>> netstat -ntp |grep 5061
> > >>>>>
> > >>>>> ...
> > >>>>> tcp 0 0 172.17.217.10:5061
195.70.114.125:18252
> > >>>>> ESTABLISHED
71895/kamailio
> > >>>>> ...
> > >>>>>
> > >>>>> An pid 71895 is:
> > >>>>>
> > >>>>> }, {
> > >>>>> "IDX": 3,
> > >>>>> "PID": 71895,
> > >>>>> "DSC": "udp receiver child=2
sock=127.0.0.1:5060"
> > >>>>> }, {
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> And if i look into it via "lsof -p 71895" (the
udp receiver
child)
> > >>>>>
> > >>>>> ...
> > >>>>> kamailio 71895 kamailio 14u sock 0,9
0t0
> > >>>>> 8856085 protocol: TCP
> > >>>>> kamailio 71895 kamailio 15u sock 0,9
0t0
> > >>>>> 8886886 protocol: TCP
> > >>>>> kamailio 71895 kamailio 16u sock 0,9
0t0
> > >>>>> 8854886 protocol: TCP
> > >>>>> kamailio 71895 kamailio 17u sock 0,9
0t0
> > >>>>> 8828915 protocol: TCP
> > >>>>> kamailio 71895 kamailio 18u unix 0x000000005f73cb91
0t0
> > >>>>> 1680314 type=DGRAM
> > >>>>> kamailio 71895 kamailio 19u IPv4 1846523
0t0
> > >>>>> TCP kamailio-preview:sip-tls->XXX:18252 (ESTABLISHED)
> > >>>>> kamailio 71895 kamailio 20u sock 0,9
0t0
> > >>>>> 8887192 protocol: TCP
> > >>>>> kamailio 71895 kamailio 21u sock 0,9
0t0
> > >>>>> 8813634 protocol: TCP
> > >>>>> kamailio 71895 kamailio 22u unix 0x00000000c19bd102
0t0
> > >>>>> 1681407 type=STREAM
> > >>>>> kamailio 71895 kamailio 23u sock 0,9
0t0
> > >>>>> 8850488 protocol: TCP
> > >>>>> ...
> > >>>>>
> > >>>>> Not only the ESTABLISHED TCP session. But also this empty
sockets
> > >>>>> "protocol:
TCP"
> > >>>>> What are they doing there in the udp receiver? Is that how
it's supposed to be?
> > >>>>>
> > >>>>> Kristijan
> > >>>>>
> > >>>>> Am Do., 14. März 2019 um 14:48 Uhr schrieb
Daniel-Constantin
Mierla
> > >>>>>
<miconda(a)gmail.com>om>:
> > >>>>>> Can you get file written by `kamctl trap`? It should
have the
backtrace
> > >>>>>> for all
kamailio processes. You need latest kamailio 5.2.
> > >>>>>>
> > >>>>>> Also, get the output for: kamctl ps
> > >>>>>>
> > >>>>>> Cheers,
> > >>>>>> Daniel
> > >>>>>>
> > >>>>>> On 14.03.19 13:52, Kristijan Vrban wrote:
> > >>>>>>> When i attach via gdb to one of the tcp worker, i
see this:
> > >>>>>>>
> > >>>>>>> (gdb) bt
> > >>>>>>> #0 0x00007fdaf4d14470 in futex_wait
(private=<optimized
out>,
> > >>>>>>>
expected=1, futex_word=0x7fdaeca92f8c) at
> > >>>>>>> ../sysdeps/unix/sysv/linux/futex-internal.h:61
> > >>>>>>> #1 futex_wait_simple (private=<optimized
out>, expected=1,
> > >>>>>>> futex_word=0x7fdaeca92f8c) at
../sysdeps/nptl/futex-internal.h:135
> > >>>>>>> #2
__pthread_rwlock_wrlock_slow (rwlock=0x7fdaeca92f80) at
> > >>>>>>> pthread_rwlock_wrlock.c:67
> > >>>>>>> #3 0x00007fdaf0912ee9 in CRYPTO_THREAD_write_lock
() from
> > >>>>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
> > >>>>>>> #4 0x00007fdaf08e1c08 in ?? () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
> > >>>>>>> #5
0x00007fdaf08a6f69 in ?? () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
> > >>>>>>> #6
0x00007fdaf08b36c7 in EVP_CIPHER_CTX_ctrl () from
> > >>>>>>> /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
> > >>>>>>> #7 0x00007fdaf0c31144 in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
> > >>>>>>> #8
0x00007fdaf0c2bddb in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
> > >>>>>>> #9
0x00007fdaf0c22858 in ?? () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
> > >>>>>>> #10
0x00007fdaf0c1af61 in SSL_do_handshake () from
> > >>>>>>> /usr/lib/x86_64-linux-gnu/libssl.so.1.1
> > >>>>>>> #11 0x00007fdaf0e8d31b in tls_accept
(c=0x7fdaed26fa98,
> > >>>>>>> error=0x7ffffe2a2df0) at tls_server.c:422
> > >>>>>>> #12 0x00007fdaf0e96a1b in tls_read_f
(c=0x7fdaed26fa98,
> > >>>>>>> flags=0x7ffffe2c318c) at tls_server.c:1116
> > >>>>>>> #13 0x0000556ead5e7c46 in tcp_read_headers
(c=0x7fdaed26fa98,
> > >>>>>>> read_flags=0x7ffffe2c318c) at core/tcp_read.c:469
> > >>>>>>> #14 0x0000556ead5ef9cb in tcp_read_req
(con=0x7fdaed26fa98,
> > >>>>>>> bytes_read=0x7ffffe2c3184,
read_flags=0x7ffffe2c318c) at
> > >>>>>>> core/tcp_read.c:1496
> > >>>>>>> #15 0x0000556ead5f575f in handle_io
(fm=0x7fdaf597aa98,
events=1,
> > >>>>>>> idx=-1) at
core/tcp_read.c:1862
> > >>>>>>> #16 0x0000556ead5e2053 in io_wait_loop_epoll
(h=0x556eadaaeec0 <io_w>,
> > >>>>>>> t=2,
repeat=0) at core/io_wait.h:1065
> > >>>>>>> #17 0x0000556ead5f6b35 in tcp_receive_loop
(unix_sock=49) at
> > >>>>>>> core/tcp_read.c:1974
> > >>>>>>> #18 0x0000556ead4c8e24 in tcp_init_children () at
core/tcp_main.c:4853
> > >>>>>>> #19
0x0000556ead3c352a in main_loop () at main.c:1735
> > >>>>>>> #20 0x0000556ead3ca5f8 in main (argc=13,
argv=0x7ffffe2c3828) at main.c:2675
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Am Do., 14. März 2019 um 13:41 Uhr schrieb
Kristijan Vrban
> > >>>>>>> <vrban.lkml(a)gmail.com>om>:
> > >>>>>>>> Hi, with full debug is see this in log for
every incoming
TCP SIP request:
> > >>>>>>>>
> > >>>>>>>> Mar 14 12:10:15 kamailio-preview
/usr/sbin/kamailio[17940]:
DEBUG:
> > >>>>>>>>
<core> [core/tcp_main.c:3871]: send2child(): WARNING: no
free tcp
> > >>>>>>>>
receiver, connection passed to the least busy one (105)
> > >>>>>>>> Mar 14 12:10:15 kamailio-preview
/usr/sbin/kamailio[17940]:
DEBUG:
> > >>>>>>>>
<core> [core/tcp_main.c:3875]: send2child(): selected tcp
worker 2
> > >>>>>>>>
27(17937) for activity on [tls:172.17.217.10:5061],
0x7fdaeda8f928
> > >>>>>>>>
> > >>>>>>>> So the Kamailio TCP process is working, and
received TCP
traffic. But
> > >>>>>>>> the
tcp workers are somehow busy.
> > >>>>>>>>
> > >>>>>>>> When i attach via strace to the TCP worker, i
do not see
any activity. Just:
> > >>>>>>>>
> > >>>>>>>> futex(0x7fdaeca92f8c, FUTEX_WAIT_PRIVATE, 1,
NULL
> > >>>>>>>>
> > >>>>>>>> and nothing, even when i see the main tcp
process choose
this worker process.
> > >>>>>>>>
> > >>>>>>>> Kristijan
> > >>>>>>>>
> > >>>>>>>> Am Mi., 27. Feb. 2019 um 15:14 Uhr schrieb
Kristijan Vrban
> > >>>>>>>> <vrban.lkml(a)gmail.com>om>:
> > >>>>>>>>> first of all thanks for the feedback. i
prepared our
system now to run
> > >>>>>>>>>
with debug=3
> > >>>>>>>>> I hope to see more then then.
> > >>>>>>>>>
> > >>>>>>>>> Am Mi., 27. Feb. 2019 um 11:53 Uhr schrieb
Kristijan Vrban
> > >>>>>>>>> <vrban.lkml(a)gmail.com>om>:
> > >>>>>>>>>> Hi kamailios,
> > >>>>>>>>>>
> > >>>>>>>>>> i have a creepy situation with v5.2.1
stable Kamilio.
After a day or
> >
>>>>>>>>>> so, Kamailio stop to process incoming SIP traffic
via
TCP. The
> >
>>>>>>>>>> incoming TCP network packages get TCP-ACK from
the OS
(Debian 9,
> >
>>>>>>>>>> 4.18.0-15-generic-Linux) but Kamailio does not
show any
processing for
> >
>>>>>>>>>> the SIP-Traffic incoming via TCP. No logs,
nothing. While
traffic via
> >
>>>>>>>>>> UDP is working just totally fine.
> > >>>>>>>>>>
> > >>>>>>>>>> When i look via command "netstat
-ntp" is see, that the
Recv-Q get
> >
>>>>>>>>>> bigger and bigger. e.g.:
> > >>>>>>>>>>
> > >>>>>>>>>> Proto Recv-Q Send-Q Local Address
Foreign Address State
PID/Program
> >
>>>>>>>>>> name tcp 4566 0 172.17.217.12:5060
xxx.xxx.xxx.xxx:57252
ESTABLISHED
> >
>>>>>>>>>> 31347/kamailio
> > >>>>>>>>>>
> > >>>>>>>>>> After Kamailio restart, all is working
fine again for a
day. We have
> >
>>>>>>>>>> maybe 10-20 devices online via TCP and low call
volume
(1-2 call per
> >
>>>>>>>>>> minute). The only settings for tcp we have is
"tcp_delayed_ack=no"
> >
>>>>>>>>>>
> > >>>>>>>>>> How to could we debug this situation?
Again, no error, no
warings in
> >
>>>>>>>>>> the log. Just nothing.
> > >>>>>>>>>>
> > >>>>>>>>>> Kristijan
> > >>>>>>> _______________________________________________
> > >>>>>>> Kamailio (SER) - Users Mailing List
> > >>>>>>> sr-users(a)lists.kamailio.org
> > >>>>>>>
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
> > >>>>>> --
> > >>>>>> Daniel-Constantin Mierla --
www.asipto.com
> > >>>>>>
www.twitter.com/miconda --
www.linkedin.com/in/miconda
> > >>>>>> Kamailio World Conference - May 6-8, 2019 --
www.kamailioworld.com
> > >>>>>> Kamailio
Advanced Training - Mar 25-27, 2019, in Washington,
DC, USA --
www.asipto.com
> > >>>>>>
> > >>>>> _______________________________________________
> > >>>>> Kamailio (SER) - Users Mailing List
> > >>>>> sr-users(a)lists.kamailio.org
> > >>>>>
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
> > >>>> --
> > >>>> Daniel-Constantin Mierla --
www.asipto.com
> > >>>>
www.twitter.com/miconda --
www.linkedin.com/in/miconda
> > >>>> Kamailio World Conference - May 6-8, 2019 --
www.kamailioworld.com
> > >>>> Kamailio Advanced
Training - Mar 25-27, 2019, in Washington,
DC, USA --
www.asipto.com
> > >>>>
> > >> --
> > >> Daniel-Constantin Mierla --
www.asipto.com
> > >>
www.twitter.com/miconda --
www.linkedin.com/in/miconda
> > >> Kamailio World Conference - May 6-8, 2019 --
www.kamailioworld.com
> > >> Kamailio Advanced Training -
Mar 25-27, 2019, in Washington, DC,
USA --
www.asipto.com
> > >>
> > --
> > Daniel-Constantin Mierla --
www.asipto.com
> >
www.twitter.com/miconda --
www.linkedin.com/in/miconda
> > Kamailio World Conference - May 6-8, 2019 --
www.kamailioworld.com
> > Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA
--
www.asipto.com
> >
_______________________________________________
Kamailio (SER) - Users Mailing List
sr-users(a)lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users