Tks Daniel,

I have installed the workaround.

lsof seems to indicate that I have installed and pre-loaded openssl_mutex_shared.so correctly.

I will let you know if I see the issue again.

Tks!
Aymeric

Le lun. 20 mai 2019 à 09:49, Daniel-Constantin Mierla <miconda@gmail.com> a écrit :

Hello,

this kind of behaviour, with long time blocking and then moving on, is a symptom of the same issue. One of the observed behaviours was that attaching with gdb and detaching make code running further, that's what kamctl trap does. I haven't looked deeper, but my guess is that some signals are sent during the gdb operations.

It would be good if you can test with the workaround and see the results. There was already a report that the issue was not seen after a rather long running time.

Cheers,
Daniel

On 17.05.19 16:03, Aymeric Moizard wrote:
Hi!

I haven't used the workaround yet: I'm focusing on trying to make sure I have the same issue
or trying to figure out how to force it to happen.

I have started to check again the server today and I started by this command:
 $> sudo kamcmd tls.list

In my previous description, the above was a dead lock. Today, It finally completed, but
after 5 minutes. (I suspect 5 minutes is abnormal)

During the long running command:
-> UDP was working
-> TCP was not: 
-> The TCP connection is being ESTABLISHED, but the SIP message was not replied.
    (this was the behavior I had before)

At the same time, I took a trap "sudo kamctl trap". (during the dead lock)
-> one thread is on "tls_list" (tls_rpc.c:154)
-> one thread is on tcpconn_get (core/tcp_main.c:1449) called from tcp_send (core/tcp_main.c:1716)
    and seems to be sending a 484 Address Incomplete on a TLS connection
-> 2 threads are on CRYPTO_THREAD_write_lock on a backtrace showing "SSL_do_handshake/tls_accept"

Suddenly, "sudo kamcmd tls.list" completed, and then, my TCP Agent received
4 answers from kamailio for the last 4 REGISTER sent.

I have a network capture for my TCP agent.
I have a trap showing 2 thread waiting on "CRYPTO_THREAD_write_lock"

Conclusion:
The use-case showed that the lock was VERY long.
The use-case showed that the lock was TEMPORARY...

Side-note: From my understanding of the multi-fork/openssl issue, I would expect
to see dead lock happening very fast after a kamailio restart?

Do you expect the preload workaround to work in such behavior?
Or do you consider that my issue is different?

Because there is no "real" dead-lock, I don't understand why "my" issue would be related to libssl1.1...

My gdb trap, network capture are available in private exchange if you need! (please ask me by direct email)

Tks
Aymeric

--