It looks like something blocked on tls processing. The process 14848 is doing:
```
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
No locals.
#1 0x00007f4fdee39c50 in __GI___pthread_rwlock_unlock (rwlock=0x7f4dde916288) at
pthread_rwlock_unlock.c:42
futex_shared = 0
#2 0x00007f4fdf207f09 in CRYPTO_THREAD_unlock () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#3 0x00007f4fdf1d6cb9 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#4 0x00007f4fdf5216a0 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
No symbol table info available.
#5 0x00007f4fdf5152cf in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
No symbol table info available.
#6 0x00007f4fdf50d9f1 in SSL_do_handshake () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
No symbol table info available.
#7 0x00007f4fdf77d73c in tls_accept (c=0x7f4ddf75e2f0, error=0x7ffd3522dc90) at
tls_server.c:411
```
The 14847 and many others:
```
#0 0x00007f4fdee39450 in futex_wait (private=<optimized out>, expected=1446580,
futex_word=0x7f4dde916294) at ../sysdeps/unix/sysv/linux/futex-internal.h:61
__ret = -512
err = <optimized out>
#1 futex_wait_simple (private=<optimized out>, expected=1446580,
futex_word=0x7f4dde916294) at ../sysdeps/nptl/futex-internal.h:135
No locals.
#2 __pthread_rwlock_wrlock_slow (rwlock=0x7f4dde916288) at pthread_rwlock_wrlock.c:67
waitval = 1446580
result = 0
futex_shared = <optimized out>
#3 0x00007f4fdf207ee9 in CRYPTO_THREAD_write_lock () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#4 0x00007f4fdf1d6658 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
No symbol table info available.
#5 0x00007f4fdf51500f in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
No symbol table info available.
#6 0x00007f4fdf50d9f1 in SSL_do_handshake () from
/usr/lib/x86_64-linux-gnu/libssl.so.1.1
No symbol table info available.
#7 0x00007f4fdf77d73c in tls_accept (c=0x7f4ddf1993a8, error=0x7ffd3522dc90) at
tls_server.c:411
```
As I assume by function names, one is trying to do unlock, but still waiting for
something, while the others try to acquire the lock.
The thing is that openssl (libssl) 1.1.0 don't use anymore the custom lock functions
for their internal stuff, being thread safe implicitely. I pushed a patch to master
branch, disabling the part where we set the lock/unlock functions for libssl, but that
mainly avoids compile warnings, otherwise it doesn't affect how libssl works
internally for v1.1.0+.
If that unlock takes too long inside libssl/libcrypto, either is bug there in the libs or
somehow the pointer to the lock (or the lock structure content) got corrupted. For the
first option, you would have to use a libssl less than 1.1.0 -- maybe you can search on
the web and see if there are others reporting issues with locking/deadlocks for openssl
1.1.0+.
The second option could be also the fault of kamailio, if there is a buffer overflow
somewhere. But this should pop up as a bug with openssl less than 1.1.0 (in that case our
internal lock structure should be overwritten/corrupted) and nobody reported any
deadlock/crash related to tls for very long time, although it is used extensively for very
long time.
I would try to use first openssl less than 1.1.0 and see if the same situation occurs. If
not, then it is likely something related to the new locking system done inside openssl
1.1.0.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1172#issuecomment-312634272