### Description References: #3635
Early initialization of tls in OpenSSL 3 in rank 0 results in the use of shared state (pointers to the same shared memory location). Note: this is not related to shared memory allocation contention as this protected by a (multi-process)futex. Under heavy traffic the workers will corrupt each others state (race condition). This is only visible under heavy loading and is the reason for the intermittent appearance in #3635.
Ping @miconda https://github.com/kamailio/kamailio/commit/1a9b0b63617afebcee2aecb3b2240d76... Since qm/fm are already protected by a multi-process futex this commit is redundant (it puts a pthread mutex around the futex). I have been able to reproduce OpenSSL 3 crashes with heavy loading with this commit.
Example of shared object is the error state (SSL_get_error). It should be per worker but
``` CRITICAL: <core> [core/mem/q_malloc.c:555]: qm_free(): BUG: freeing already freed pointer (0x7f1d7f9c77b0), called from tls: tls_init.c: ser_free(535), first free tls: tls_init.c: ser_free(535) - ignoring ```
shows that multiple workers are accessing the same object (in OpenSSL this is `ERR_STATE *`).
### Troubleshooting N/A
#### Reproduction * create heaving loading of TLS clients * observe shm logging errors
#### Debugging Data
Dump `ERR_STATE *` from two different processes: observe that these are identical meaning both workers are using the same struct. ``` # !!!!IMPORTANT!!!! 2 == OpenSSL thread local key for ERR_STATE * (gdb) p err_thread_local $3 = 2
# this is worker 1, process_no 7 Reading symbols from /usr/local/kamailio/lib64/kamailio/modules/debugger.so... [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". 0x00007f1df054e80a in epoll_wait (epfd=5, events=0x7f1db0504990, maxevents=1, timeout=2000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30 30 return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout); (gdb) p pthread_getspecific(2) $1 = (void *) 0x7f1d700a65a0
# this is worker 2, process_no 8, rank 4 Reading symbols from /usr/local/kamailio/lib64/kamailio/modules/stun.so... Reading symbols from /usr/local/kamailio/lib64/kamailio/modules/debugger.so... [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". --Type <RET> for more, q to quit, c to continue without paging--c 0x00007f1df054e80a in epoll_wait (epfd=5, events=0x7f1db0504990, maxevents=1, timeout=2000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30 30 return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout); (gdb) p pthread_getspecific(2) $2 = (void *) 0x7f1d700a65a0 ``` #### Log Messages ``` CRITICAL: <core> [core/mem/q_malloc.c:555]: qm_free(): BUG: freeing already freed pointer (0x7f1d7f9c77b0), called from tls: tls_init.c: ser_free(535), first free tls: tls_init.c: ser_free(535) - ignoring ```
#### SIP Traffic
N/A
### Possible Solutions
- avoid doing TLS initialization in rank 0; this will require cooperation from all modules that use OpenSSL 3 themselves. If an OpenSSL-using module initializes state before `fork()` then all workers will reuse global state
### Additional Information
- OpenSSL 3 has initialize-once and initialize-once-per-thread states. An example of this is `ERR_STATE *` - this is of the type initialize-once-per-thread.
- when kamailio does OpenSSL initialization in rank 0, the workers inherit all "global" objects. If these objects are in shared memory then the worker processes will contend for the same state leading to corruption
- due to the design of the OpenSSL 3 alot of this state (static variables, one time initialization) cannot be reinitialized after `fork()`. "initialize-once-per-thread" can be reinitialized if the child were to spawn threads.