It looks like something blocked on tls processing. The process 14848 is doing:

#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
No locals.
#1  0x00007f4fdee39c50 in __GI___pthread_rwlock_unlock (rwlock=0x7f4dde916288) at pthread_rwlock_unlock.c:42
        futex_shared = 0
#2  0x00007f4fdf207f09 in CRYPTO_THREAD_unlock () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#3  0x00007f4fdf1d6cb9 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#4  0x00007f4fdf5216a0 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
No symbol table info available.
#5  0x00007f4fdf5152cf in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
No symbol table info available.
#6  0x00007f4fdf50d9f1 in SSL_do_handshake () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
No symbol table info available.
#7  0x00007f4fdf77d73c in tls_accept (c=0x7f4ddf75e2f0, error=0x7ffd3522dc90) at tls_server.c:411

The 14847 and many others:

#0  0x00007f4fdee39450 in futex_wait (private=<optimized out>, expected=1446580, futex_word=0x7f4dde916294) at ../sysdeps/unix/sysv/linux/futex-internal.h:61
        __ret = -512
        err = <optimized out>
#1  futex_wait_simple (private=<optimized out>, expected=1446580, futex_word=0x7f4dde916294) at ../sysdeps/nptl/futex-internal.h:135
No locals.
#2  __pthread_rwlock_wrlock_slow (rwlock=0x7f4dde916288) at pthread_rwlock_wrlock.c:67
        waitval = 1446580
        result = 0
        futex_shared = <optimized out>
#3  0x00007f4fdf207ee9 in CRYPTO_THREAD_write_lock () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#4  0x00007f4fdf1d6658 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#5  0x00007f4fdf51500f in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
No symbol table info available.
#6  0x00007f4fdf50d9f1 in SSL_do_handshake () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
No symbol table info available.
#7  0x00007f4fdf77d73c in tls_accept (c=0x7f4ddf1993a8, error=0x7ffd3522dc90) at tls_server.c:411

As I assume by function names, one is trying to do unlock, but still waiting for something, while the others try to acquire the lock.

The thing is that openssl (libssl) 1.1.0 don't use anymore the custom lock functions for their internal stuff, being thread safe implicitely. I pushed a patch to master branch, disabling the part where we set the lock/unlock functions for libssl, but that mainly avoids compile warnings, otherwise it doesn't affect how libssl works internally for v1.1.0+.

If that unlock takes too long inside libssl/libcrypto, either is bug there in the libs or somehow the pointer to the lock (or the lock structure content) got corrupted. For the first option, you would have to use a libssl less than 1.1.0 -- maybe you can search on the web and see if there are others reporting issues with locking/deadlocks for openssl 1.1.0+.

The second option could be also the fault of kamailio, if there is a buffer overflow somewhere. But this should pop up as a bug with openssl less than 1.1.0 (in that case our internal lock structure should be overwritten/corrupted) and nobody reported any deadlock/crash related to tls for very long time, although it is used extensively for very long time.

I would try to use first openssl less than 1.1.0 and see if the same situation occurs. If not, then it is likely something related to the new locking system done inside openssl 1.1.0.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.