[SR-Users] Kamailio stop to process incoming SIP traffic via TCP.

Daniel-Constantin Mierla miconda at gmail.com
Mon May 20 09:49:18 CEST 2019


Hello,

this kind of behaviour, with long time blocking and then moving on, is a
symptom of the same issue. One of the observed behaviours was that
attaching with gdb and detaching make code running further, that's what
kamctl trap does. I haven't looked deeper, but my guess is that some
signals are sent during the gdb operations.

It would be good if you can test with the workaround and see the
results. There was already a report that the issue was not seen after a
rather long running time.

Cheers,
Daniel

On 17.05.19 16:03, Aymeric Moizard wrote:
> Hi!
>
> I haven't used the workaround yet: I'm focusing on trying to make sure
> I have the same issue
> or trying to figure out how to force it to happen.
>
> I have started to check again the server today and I started by this
> command:
>  $> sudo kamcmd tls.list
>
> In my previous description, the above was a dead lock. Today, It
> finally completed, but
> after 5 minutes. (I suspect 5 minutes is abnormal)
>
> During the long running command:
> -> UDP was working
> -> TCP was not: 
> -> The TCP connection is being ESTABLISHED, but the SIP message was
> not replied.
>     (this was the behavior I had before)
>
> At the same time, I took a trap "sudo kamctl trap". (during the dead lock)
> -> one thread is on "tls_list" (tls_rpc.c:154)
> -> one thread is on tcpconn_get (core/tcp_main.c:1449) called
> from tcp_send (core/tcp_main.c:1716)
>     and seems to be sending a 484 Address Incomplete on a TLS connection
> -> 2 threads are on CRYPTO_THREAD_write_lock on a backtrace showing
> "SSL_do_handshake/tls_accept"
>
> Suddenly, "sudo kamcmd tls.list" completed, and then, my TCP Agent
> received
> 4 answers from kamailio for the last 4 REGISTER sent.
>
> I have a network capture for my TCP agent.
> I have a trap showing 2 thread waiting on "CRYPTO_THREAD_write_lock"
>
> Conclusion:
> The use-case showed that the lock was VERY long.
> The use-case showed that the lock was TEMPORARY...
>
> Side-note: From my understanding of the multi-fork/openssl issue, I
> would expect
> to see dead lock happening very fast after a kamailio restart?
>
> Do you expect the preload workaround to work in such behavior?
> Or do you consider that my issue is different?
>
> Because there is no "real" dead-lock, I don't understand why "my"
> issue would be related to libssl1.1...
>
> My gdb trap, network capture are available in private exchange if you
> need! (please ask me by direct email)
>
> Tks
> Aymeric
>
> Le lun. 13 mai 2019 à 12:48, Daniel-Constantin Mierla
> <miconda at gmail.com <mailto:miconda at gmail.com>> a écrit :
>
>     Hello,
>
>     thanks for the feedback! It is good to know that it works well so
>     far for you. I don't see any reason not to make the library to
>     preload as part of the next release.
>
>     Just to let everyone know, for now, the built packages are pinned
>     to link against libssl 1.0.x.
>
>     Soon, I will approach the openssl project in order to find a
>     proper solution for long term.
>
>     Cheers,
>     Daniel
>
>     On 13.05.19 10:48, Floimair Florian wrote:
>>
>>     Hi all!
>>
>>      
>>
>>     We have used the work-around with the pre-loaded library and so
>>     far this seems to have fixed our problem (that my colleague
>>     Kristijan Vrban reported).
>>
>>     At least we did not have a single failure within the last week,
>>     whereas before the issue happened about once every 2 days.
>>
>>     Would be nice if this would be part of the next Kamailio version.
>>
>>      
>>
>>      
>>
>>      
>>
>>     With best regards
>>
>>
>>     *Florian Floimair
>>     *Innovation - Software-Development
>>
>>     *COMMEND INTERNATIONAL GMBH
>>     *A-5020 Salzburg, Saalachstraße 51
>>     http://www.commend.com <http://www.commend.com/>
>>
>>     *Security and Communication by Commend
>>
>>     *FN 178618z | LG Salzburg
>>
>>      
>>
>>     *Von: *sr-users <sr-users-bounces at lists.kamailio.org>
>>     <mailto:sr-users-bounces at lists.kamailio.org> im Auftrag von
>>     Daniel-Constantin Mierla <miconda at gmail.com>
>>     <mailto:miconda at gmail.com>
>>     *Antworten an: *"miconda at gmail.com" <mailto:miconda at gmail.com>
>>     <miconda at gmail.com> <mailto:miconda at gmail.com>, "Kamailio (SER) -
>>     Users Mailing List" <sr-users at lists.kamailio.org>
>>     <mailto:sr-users at lists.kamailio.org>
>>     *Datum: *Montag, 15. April 2019 um 09:07
>>     *An: *Aymeric Moizard <amoizard at gmail.com>
>>     <mailto:amoizard at gmail.com>, "Kamailio (SER) - Users Mailing
>>     List" <sr-users at lists.kamailio.org>
>>     <mailto:sr-users at lists.kamailio.org>
>>     *Betreff: *Re: [SR-Users] Kamailio stop to process incoming SIP
>>     traffic via TCP.
>>
>>      
>>
>>     Hello Aymeric,
>>
>>     would you be able to test with tls module compiled against libssl
>>     1.1 and using the pre-loaded shared object workaround?
>>
>>       *
>>     https://github.com/kamailio/kamailio/tree/master/src/modules/tls/utils/openssl_mutex_shared
>>     <https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkamailio%2Fkamailio%2Ftree%2Fmaster%2Fsrc%2Fmodules%2Ftls%2Futils%2Fopenssl_mutex_shared&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526480174&sdata=d9E%2Fy4cvdLkGCPUexoCJ7tws3QL4rFqz5ebnMGnsESQ%3D&reserved=0>
>>
>>     You should be able to use it with any version, no need to test
>>     with kamailio master branch.
>>
>>     Just clone the master branch, then:
>>
>>     cd src/modules/tls/utils/openssl_mutex_shared
>>
>>     make
>>
>>     Either from there or copy openssl_mutex_shared.so to a location
>>     you want, then pre-load it before starting your version of Kamailio.
>>
>>     The README.md in the folder has some more details.
>>
>>     I would like to have some validation that it works fine before
>>     approaching this topic with libssl project to allow to init the
>>     locks with shared process option.
>>
>>     Thanks,
>>     Daniel
>>
>>     On 26.03.19 16:18, Daniel-Constantin Mierla wrote:
>>
>>         Hello,
>>
>>         yep, locking there is expected, as listing the tls
>>         connections wait for no other processes to change the content
>>         of internal tls connection structures. So it is a side effect
>>         of libssl/libcrypto getting stuck and the other processing
>>         waiting for it to move one. I have the Kamailio training in
>>         USA these days, so the trip and schedule of the day didn't
>>         allow me to look more at the libsll/libcrypto code in order
>>         to find a solution here. It is a high priority in my list, as
>>         I get time during the next days.
>>
>>         Cheers,
>>         Daniel
>>
>>         On 26.03.19 15:55, Aymeric Moizard wrote:
>>
>>             Hi All,
>>
>>              
>>
>>             I was debugging a TCP issue (most probably, I may start a
>>             thread for this question).
>>
>>              
>>
>>             I was trying to get some info for TCP and TLS.
>>
>>              
>>
>>             I typed:
>>
>>             $> sudo kamctl rpc tls.list
>>
>>              
>>
>>             And waited for a while.... until... I realized that my
>>             User-Agent, connected with TCP was not able to register
>>             any more. I think the rpc command has introduced
>>             something wrong.
>>
>>              
>>
>>             The device can successfully "connect", send the REGISTER
>>             over the established TCP connection. The REGISTER do not
>>             appear in the logs any more, I don't see any traffic for
>>             TCP any more. So the behavior is the same as I had
>>             before: TCP and TLS are both not working and UDP is still
>>             working fine.
>>
>>              
>>
>>             kamctl do not work any more... so kamctl trap do not work...
>>
>>              
>>
>>             I have been able to type.. manually... for (all?)
>>             kamailio threads:
>>
>>              
>>
>>             gdb /usr/sbin/kamailio 16500 -batch --eval-command="bt
>>             full" >> kamailio-trap-tcp-down.txt
>>
>>              
>>
>>             I'm temporarly puting the backtrace I have here:
>>
>>             https://sip.antisip.com/kamailio-trap-tcp-down.txt
>>             <https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsip.antisip.com%2Fkamailio-trap-tcp-down.txt&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526490178&sdata=1lfFxvR0m4PVcfnYsrrIO%2FM2nbGK6zfpl2C01O2c7M0%3D&reserved=0>
>>
>>              
>>
>>             You can see a thread stuck on the json command line:
>>             "tls_list"
>>
>>             And many other waiting on CRYPTO_THREAD_write_lock
>>
>>             ? might be related to:
>>             https://github.com/openssl/openssl/issues/5376
>>             <https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenssl%2Fopenssl%2Fissues%2F5376&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526490178&sdata=D5Fb4U3trdbRUY7ifMLSc5KE4mAxjK2%2BzOy8nSD1Rks%3D&reserved=0>
>>
>>             SIDE NOTE:
>>
>>             Right before I was typing the last gdb command for the
>>             last thread, kamailio
>>
>>             has crashed: This was around 5 minutes after the dead
>>             lock started.
>>
>>              
>>
>>             Mar 26 14:47:11 sip kamailio[16493]: ERROR: <core>
>>             [core/tcp_main.c:2561]: tcpconn_do_send(): failed to send
>>             on 0x7ff8dfc2fdc8 (91.121.30.149:5061->62.210.97.21:49351
>>             <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2F62.210.97.21%3A49351&data=02%7C01%7Cf.floimair%40commend.com%7C4008d49af1b347abe20308d6c1710532%7C13b1ddb756454e7fbe663171548559da%7C0%7C0%7C636909088526500195&sdata=9XqEUKoMwNEvCPFtKfvB0c43yk1GcSzYOiPdY9Pj1uo%3D&reserved=0>):
>>             Broken pipe (32)
>>
>>             Mar 26 14:47:11 sip kamailio[16493]: ERROR: <core>
>>             [core/tcp_read.c:1505]: tcp_read_req(): ERROR:
>>             tcp_read_req: error reading - c: 0x7ff8dfc2fdc8 r:
>>             0x7ff8dfc2fe48 (-1)
>>
>>             Mar 26 14:47:11 sip kamailio[16493]: WARNING: <core>
>>             [core/tcp_read.c:1848]: handle_io(): F_TCPCONN connection
>>             marked as bad: 0x7ff8dfa6a408 id 846 refcnt 3
>>
>>             Mar 26 14:47:11 sip kamailio[16371]: ALERT: <core>
>>             [main.c:755]: handle_sigs(): child process 16374 exited
>>             by a signal 11
>>
>>             Mar 26 14:47:11 sip kamailio[16371]: ALERT: <core>
>>             [main.c:758]: handle_sigs(): core was not generated
>>
>>             Mar 26 14:47:11 sip kamailio[16371]: INFO: <core>
>>             [main.c:781]: handle_sigs(): terminating due to SIGCHLD
>>
>>             Mar 26 14:47:11 sip kamailio[16493]: INFO: <core>
>>             [main.c:836]: sig_usr(): signal 15 received
>>
>>             Mar 26 14:47:11 sip kamailio[16500]: INFO: <core>
>>             [main.c:836]: sig_usr(): signal 15 received
>>
>>             Mar 26 14:47:11 sip kamailio[16479]: INFO: <core>
>>             [main.c:836]: sig_usr(): signal 15 received
>>
>>              
>>
>>              
>>
>>             Unfortunalty, even if I did my best to setup my service
>>             to generate a core on crash, I still have "core was not
>>             generated".... (debian stretch)
>>
>>              
>>
>>             Tks for reading!
>>
>>             Regards
>>
>>             Aymeric
>>
>>              
>>
>>              
>>
>>              
>>
>
> -- 
> Antisip - http://www.antisip.com

-- 
Daniel-Constantin Mierla -- www.asipto.com
www.twitter.com/miconda -- www.linkedin.com/in/miconda

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kamailio.org/pipermail/sr-users/attachments/20190520/e72f334f/attachment.html>


More information about the sr-users mailing list