...just don't do the TLS initialization in rank 0, right? If we need to touch all openssl using modules anyway, maybe this is an easier and less intrusive way?
One solution is to have each module declare a `mod_init_openssl()`; then have a helper function to run `mod_init_openssl` in a transient thread
``` pthread_create(... mod_init_openssl, ) // do OpenSSL stuff here pthread_join(...) ```
Then this thread will disappear after `mod_init` in rank 0—all the OpenSSL thread-local varables in rank 0(thread#1) will be "clean".
BTW this study explains why even OpenSSL 1.1.1 is so odd - per child replicated `SSL_CTX*`, and RNG replacement with `RAND_set_rand_method`. The root cause is the same: there are thread-local variables in rank 0(thread#1) that are replicated in the workers—after `fork()` OpenSSL doesn't properly reinitialize these states.
I have also gone back to look at the OpenSSL 1.1.1 implementation - by putting all initialization (`SSL_CTX_new`, `tls_fix_domains_cfg` etc) into a transient thread none of the workarounds are necessary any more(!) - in particular the `tls_rand.c` stuff is not needed.
To be clear, the dlsym-pthreads stuff(`src/main.c`) is still needed to handle multi-process locks.