### Description
We have updated one system from a Ubuntu Focal 5.6.x container to a Ubuntu Jammy 5.7.2 container. In the last 24 hours, we saw three crashes. The logs of the last crash say: ``` Nov 09 20:39:34 sipproxy /usr/sbin/kamailio[4332]: CRITICAL: <core> [core/mem/q_malloc.c:519]: qm_free(): BUG: freeing already freed pointer (0x7f[4434056180](tel:4434056180)), called from tls: tls_init.c: ser_free> Nov 09 20:39:34 sipproxy /usr/sbin/kamailio[4331]: CRITICAL: <core> [core/mem/q_malloc.c:123]: qm_debug_check_frag(): BUG: qm: fragm. 0x7f[44340561](tel:44340561)d0 (address 0x7f[4434056208](tel:4434056208)) beginning overwritten > Nov 09 20:39:43 sipproxy /usr/sbin/kamailio[4273]: ALERT: <core> [main.c:776]: handle_sigs(): child process 4331 exited by a signal 6 Nov 09 20:39:43 sipproxy /usr/sbin/kamailio[4273]: ALERT: <core> [main.c:779]: handle_sigs(): core was generated ```
#### Debugging Data
Backtraces of the three crashes are attached.
[kamailio-crash1.log](https://github.com/kamailio/kamailio/files/13313115/kamailio-crash1.log) [kamailio-crash2.log](https://github.com/kamailio/kamailio/files/13313116/kamailio-crash2.log) [kamailio-crash3.log](https://github.com/kamailio/kamailio/files/13313117/kamailio-crash3.log)
### Additional Information
We are using the official Kamailio debian packages.
* **Kamailio Version** - output of `kamailio -v`
``` version: kamailio 5.7.2 (x86_64/linux) flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, MEM_JOIN_FREE, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLOCKLIST, HAVE_RESOLV_RES, TLS_PTHREAD_MUTEX_SHARED ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. id: unknown compiled with gcc 11.4.0 ```
* **Operating System**:
<!-- Details about the operating system, the type: Linux (e.g.,: Debian 8.4, Ubuntu 16.04, CentOS 7.1, ...), MacOS, xBSD, Solaris, ...; Kernel details (output of `lsb_release -a` and `uname -a`) -->
``` root@sipproxy:/WORKSPACE/sipproxy-container# lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.3 LTS Release: 22.04 Codename: jammy root@sipproxy:/WORKSPACE/sipproxy-container# uname -a Linux sipproxy 5.15.0-87-generic #97-Ubuntu SMP Mon Oct 2 21:09:21 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux ```
After seeing the crashes on a system serving a lot of user requests, we see it happening on a system being connected to some carriers dealing with only outbound TLS connections, too.
Here are four more resolved core dumps.
[obp-crash1.log](https://github.com/kamailio/kamailio/files/13365195/obp-crash1.log) [obp-crash2.log](https://github.com/kamailio/kamailio/files/13365196/obp-crash2.log) [obp-crash3.log](https://github.com/kamailio/kamailio/files/13365197/obp-crash3.log) [obp-crash4.log](https://github.com/kamailio/kamailio/files/13365198/obp-crash4.log)
Is there any useful information in the core dumps which can help finding the problem?
I have similar crash with 5.7.2 also. My backtrace is the following:
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2 Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/.
For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/local/sbin/kamailio... [New LWP 2136176] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `/usr/local/sbin/kamailio -P /var/run/kamailio/kamailio.pid -f /etc/kamailio/kam'. Program terminated with signal SIGABRT, Aborted. #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007f005489d859 in __GI_abort () at abort.c:79 #2 0x00005586f8958fd8 in qm_debug_check_frag (qm=0x7f0033444000, f=0x7f0033d878b8, file=0x7f0053828be1 "tls: tls_init.c", line=405, efile=0x5586f8af4f99 "core/mem/q_malloc.c", eline=511) at core/mem/q_malloc.c:129 #3 0x00005586f895d7c1 in qm_free (qmp=0x7f0033444000, p=0x7f0033d878f0, file=0x7f0053828be1 "tls: tls_init.c", func=0x7f005382a410 <__func__.23589> "ser_free", line=405, mname=0x7f0053828b52 "tls") at core/mem/q_malloc.c:511 #4 0x00005586f8969410 in qm_shm_free (qmp=0x7f0033444000, p=0x7f0033d878f0, file=0x7f0053828be1 "tls: tls_init.c", func=0x7f005382a410 <__func__.23589> "ser_free", line=405, mname=0x7f0053828b52 "tls") at core/mem/q_malloc.c:1350 #5 0x00007f00537d0287 in ser_free (ptr=0x7f0033d878f0, fname=0x7f0053785398 "../ssl/record/ssl3_buffer.c", fline=176) at tls_init.c:405 #6 0x00007f0053741580 in ?? () from /lib/x86_64-linux-gnu/libssl.so.1.1 #7 0x00007f005373e687 in ?? () from /lib/x86_64-linux-gnu/libssl.so.1.1 #8 0x00007f005375669a in SSL_free () from /lib/x86_64-linux-gnu/libssl.so.1.1 #9 0x00007f0053805f7f in tls_h_tcpconn_clean_f (c=0x7f0033cf8be0) at tls_server.c:703 #10 0x00005586f86f183c in _tcpconn_free (c=0x7f0033cf8be0) at core/tcp_main.c:1583 #11 0x00005586f8708164 in tcpconn_put_destroy (tcpconn=0x7f0033cf8be0) at core/tcp_main.c:3336 #12 0x00005586f870e0c1 in handle_tcp_child (tcp_c=0x7f0053a6cd40, fd_i=-1) at core/tcp_main.c:3806 #13 0x00005586f871a7f9 in handle_io (fm=0x7f0053dd9528, ev=1, idx=-1) at core/tcp_main.c:4623 #14 0x00005586f86dc17b in io_wait_loop_epoll (h=0x5586f8b81dc0 <io_h>, t=5, repeat=0) at core/io_wait.h:1070 #15 0x00005586f871e28f in tcp_main_loop () at core/tcp_main.c:4908 #16 0x00005586f861e69f in main_loop () at main.c:1859 #17 0x00005586f8628e93 in main (argc=14, argv=0x7ffcd38c3938) at main.c:3086
With libssl 3.x, be sure you follow the notes at:
- https://www.kamailio.org/docs/modules/stable/modules/tls.html#tls.overview
As well as enable thread locking with the parameter:
- https://www.kamailio.org/docs/modules/stable/modules/tls.html#tls.p.init_mod...
In my case, libssl1.11.f is used.
@p-barabas then your case it is not the same as reported here, it is not the same version of os, but even for libssl 1.1 you have to go through those notes as in my previous comment.
I checked the notes: - --atexit=0 is set - tls module is the first loaded module in module.inc
TLS init_mode mode is not set, it is 0 by default. Should I try to set it to 1 or 2 in case of libssl1.1.1 also?
I have same issue with kamailio 5.7.3, openssl 3.0.11 on debian 12. I tried tls init_mode 1, 2 and 3 and there are no differences:
few times `freeing already freed pointer` errors in logs(it not causing crash immediately) ``` CRITICAL: <core> [core/mem/q_malloc.c:519]: qm_free(): BUG: freeing already freed pointer (0x7feb97ea8598), called from tls: tls_init.c: ser_free(323), first free tls: tls_init.c: ser_free(323) - ignoring ``` or ``` CRITICAL: <core> [core/mem/q_malloc.c:535]: qm_free(): BUG: freeing already freed pointer (0x7fa9d17f9a58), called from tls: tls_init.c: ser_free(412), first free tls: tls_init.c: ser_malloc(367) - ignoring ```
and after some time segfault somewhere in openssl:
``` #0 0x0000000000000000 in ?? () #1 0x00007f740a216975 in EVP_MAC_CTX_new (mac=mac@entry=0x7f73eacb4ec0) at ../crypto/evp/mac_lib.c:27 #2 0x00007f740a2ed12b in ossl_prov_macctx_load_from_params (macctx=macctx@entry=0x7f73ead6e100, params=params@entry=0x7fffcd233320, macname=<optimized out>, macname@entry=0x7f740a353aa6 "HMAC", ciphername=ciphername@entry=0x0, mdname=mdname@entry=0x0, libctx=libctx@entry=0x7f740a47e900 <default_context_int>) at ../providers/common/provider_util.c:318 #3 0x00007f740a31df2b in kdf_tls1_prf_set_ctx_params (vctx=vctx@entry=0x7f73ead6e0f8, params=params@entry=0x7fffcd233320) at ../providers/implementations/kdfs/tls1_prf.c:187 #4 0x00007f740a31e2eb in kdf_tls1_prf_derive (vctx=0x7f73ead6e0f8, key=0x7f73ead865c0 "", keylen=48, params=0x7fffcd233320) at ../providers/implementations/kdfs/tls1_prf.c:141 #5 0x00007f740a5c31e1 in tls1_PRF (s=s@entry=0x7f73ead88738, seed1=seed1@entry=0x7f740a604fff, seed1_len=seed1_len@entry=22, seed2=seed2@entry=0x7fffcd233500, seed2_len=48, seed3=seed3@entry=0x0, seed3_len=0, seed4=0x0, seed4_len=0, sec=0x7f73ead84ab8 "\327s\2030\215\025W\320U{\017\262\220ɾa\334\360X\352ocx\371\005)Q\347\274E_\023\300\300\300\300", slen=32, out=0x7f73ead865c0 "", olen=48, fatal=1, seed5_len=0, seed5=0x0) at ../ssl/t1_enc.c:72 #6 0x00007f740a5c444c in tls1_generate_master_secret (s=0x7f73ead88738, out=0x7f73ead865c0 "", p=0x7f73ead84ab8 "\327s\2030\215\025W\320U{\017\262\220ɾa\334\360X\352ocx\371\005)Q\347\274E_\023\300\300\300\300", len=32, secret_size=0x7f73ead86578) at ../ssl/t1_enc.c:657 #7 0x00007f740a5a9ef7 in ssl_generate_master_secret (s=0x7f73ead88738, pms=0x7f73ead84ab8 "\327s\2030\215\025W\320U{\017\262\220ɾa\334\360X\352ocx\371\005)Q\347\274E_\023\300\300\300\300", pmslen=32, free_pms=0) at ../ssl/s3_lib.c:4644 #8 0x00007f740a5aa68e in ssl_derive (s=s@entry=0x7f73ead88738, privkey=privkey@entry=0x7f73ead85820, pubkey=pubkey@entry=0x7f73ead57e20, gensecret=gensecret@entry=1) at ../ssl/s3_lib.c:4825 #9 0x00007f740a5fa911 in tls_process_cke_ecdhe (pkt=0x7fffcd233810, s=0x7f73ead88738) at ../ssl/statem/statem_srvr.c:3048 #10 tls_process_client_key_exchange (s=0x7f73ead88738, pkt=0x7fffcd233810) at ../ssl/statem/statem_srvr.c:3316 #11 0x00007f740a5e6672 in read_state_machine (s=0x7f73ead88738) at ../ssl/statem/statem.c:647 #12 state_machine (s=0x7f73ead88738, server=1) at ../ssl/statem/statem.c:442 #13 0x00007f740a66a956 in tls_accept (c=c@entry=0x7f73ead21458, error=error@entry=0x7fffcd2339f8) at ./src/modules/tls/tls_server.c:471 #14 0x00007f740a67320d in tls_h_read_f (c=c@entry=0x7f73ead21458, flags=flags@entry=0x7fffcd253e20) at ./src/modules/tls/tls_server.c:1173 #15 0x000055af78d7b9e6 in tcp_read_headers (c=c@entry=0x7f73ead21458, read_flags=read_flags@entry=0x7fffcd253e20) at core/tcp_read.c:445 #16 0x000055af78d7eae6 in tcp_read_req (con=0x7f73ead21458, bytes_read=bytes_read@entry=0x7fffcd253e18, read_flags=read_flags@entry=0x7fffcd253e20) at core/tcp_read.c:1508 #17 0x000055af78d83ca7 in handle_io (fm=fm@entry=0x7f740a80ca88, events=events@entry=1, idx=idx@entry=-1) at core/tcp_read.c:1912 #18 0x000055af78d89d5d in io_wait_loop_epoll (repeat=repeat@entry=0, t=<optimized out>, h=<optimized out>) at core/io_wait.h:1073 #19 0x000055af78d8a567 in tcp_receive_loop (unix_sock=<optimized out>) at core/tcp_read.c:2032 #20 0x000055af78d721b7 in tcp_init_children (woneinit=woneinit@entry=0x7fffcd25435c) at core/tcp_main.c:5364 #21 0x000055af78b7ffe0 in main_loop () at ./src/main.c:1936 #22 0x000055af78b714cc in main (argc=<optimized out>, argv=<optimized out>) at ./src/main.c:3212 ``` or
``` #0 0x00007f73ea7483e0 in ?? () #1 0x00007f740a216a53 in EVP_MAC_CTX_free (ctx=0x7f73eacecbf8) at ../crypto/evp/mac_lib.c:44 #2 0x00007f740a31e19d in tls1_prf_P_hash (ctx_init=0x7f73ead85d70, sec=sec@entry=0x7f73ead30ef8 "\226\273\177\2008\254}\363\034\351'H\250\032\177\225\274\b\264W\231\240\206Gп\036\032\036\347%鳀o\321\022q\361\2362\177\302Sk/ݒ", sec_len=sec_len@entry=48, seed=0x7f73ead6c360 "client finished=L\347\353\277f\2713\314-\372/GQ\037\206\300\305\336,\027\267\207Y\242ǔLO\2039\233\313/_Ͽ\334q\237\324\3310\025?\332oN", seed_len=seed_len@entry=63, out=out@entry=0x7f73ead73240 "", olen=12) at ../providers/implementations/kdfs/tls1_prf.c:314 #3 0x00007f740a31e48f in tls1_prf_alg (olen=12, out=0x7f73ead73240 "", seed_len=63, seed=0x7f73ead6c360 "client finished=L\347\353\277f\2713\314-\372/GQ\037\206\300\305\336,\027\267\207Y\242ǔLO\2039\233\313/_Ͽ\334q\237\324\3310\025?\332oN", slen=48, sec=0x7f73ead30ef8 "\226\273\177\2008\254}\363\034\351'H\250\032\177\225\274\b\264W\231\240\206Gп\036\032\036\347%鳀o\321\022q\361\2362\177\302Sk/ݒ", sha1ctx=0x0, mdctx=<optimized out>) at ../providers/implementations/kdfs/tls1_prf.c:407 #4 kdf_tls1_prf_derive (vctx=0x7f73ead6c338, key=0x7f73ead73240 "", keylen=12, params=<optimized out>) at ../providers/implementations/kdfs/tls1_prf.c:161 #5 0x00007f740a5c31e1 in tls1_PRF (s=s@entry=0x7f73ead72ff8, seed1=seed1@entry=0x7f740a5fd2e8, seed1_len=seed1_len@entry=15, seed2=seed2@entry=0x7fffcd233700, seed2_len=48, seed3=seed3@entry=0x0, seed3_len=0, seed4=0x0, seed4_len=0, sec=0x7f73ead83d50 "\226\273\177\2008\254}\363\034\351'H\250\032\177\225\274\b\264W\231\240\206Gп\036\032\036\347%鳀o\321\022q\361\2362\177\302Sk/ݒ", slen=48, out=0x7f73ead73240 "", olen=12, fatal=1, seed5_len=0, seed5=0x0) at ../ssl/t1_enc.c:72 #6 0x00007f740a5c4371 in tls1_final_finish_mac (s=0x7f73ead72ff8, str=0x7f740a5fd2e8 "client finished", slen=15, out=0x7f73ead73240 "") at ../ssl/t1_enc.c:627 #7 0x00007f740a5f1d2b in ssl3_take_mac (s=s@entry=0x7f73ead72ff8) at ../ssl/statem/statem_lib.c:716 #8 0x00007f740a5f2b30 in tls_get_message_body (s=s@entry=0x7f73ead72ff8, len=len@entry=0x7fffcd233808) at ../ssl/statem/statem_lib.c:1300 #9 0x00007f740a5e6636 in read_state_machine (s=0x7f73ead72ff8) at ../ssl/statem/statem.c:635 #10 state_machine (s=0x7f73ead72ff8, server=1) at ../ssl/statem/statem.c:442 #11 0x00007f740a66a956 in tls_accept (c=c@entry=0x7f73ead38970, error=error@entry=0x7fffcd2339f8) at ./src/modules/tls/tls_server.c:471 #12 0x00007f740a67320d in tls_h_read_f (c=c@entry=0x7f73ead38970, flags=flags@entry=0x7fffcd253e20) at ./src/modules/tls/tls_server.c:1173 #13 0x000055af78d7b9e6 in tcp_read_headers (c=c@entry=0x7f73ead38970, read_flags=read_flags@entry=0x7fffcd253e20) at core/tcp_read.c:445 #14 0x000055af78d7eae6 in tcp_read_req (con=0x7f73ead38970, bytes_read=bytes_read@entry=0x7fffcd253e18, read_flags=read_flags@entry=0x7fffcd253e20) at core/tcp_read.c:1508 #15 0x000055af78d83ca7 in handle_io (fm=fm@entry=0x7f740a80ca58, events=events@entry=1, idx=idx@entry=-1) at core/tcp_read.c:1912 #16 0x000055af78d89d5d in io_wait_loop_epoll (repeat=repeat@entry=0, t=<optimized out>, h=<optimized out>) at core/io_wait.h:1073 #17 0x000055af78d8a567 in tcp_receive_loop (unix_sock=<optimized out>) at core/tcp_read.c:2032 #18 0x000055af78d721b7 in tcp_init_children (woneinit=woneinit@entry=0x7fffcd25435c) at core/tcp_main.c:5364 #19 0x000055af78b7ffe0 in main_loop () at ./src/main.c:1936 #20 0x000055af78b714cc in main (argc=<optimized out>, argv=<optimized out>) at ./src/main.c:3212 ```
or ``` #0 0x00007f6ec591d6e1 in BN_num_bits (a=0x7f6ea658f068) at ../crypto/bn/bn_lib.c:199 #1 0x00007f6ec59d2ff8 in ecp_nistz256_windowed_mul (ctx=<optimized out>, num=<optimized out>, point=<optimized out>, scalar=<optimized out>, r=<optimized out>, group=<optimized out>) at ../crypto/ec/ecp_nistz256.c:651 #2 ecp_nistz256_points_mul (group=<optimized out>, r=<optimized out>, scalar=<optimized out>, num=<optimized out>, points=<optimized out>, scalars=<optimized out>, ctx=<optimized out>) at ../crypto/ec/ecp_nistz256.c:1152 #3 0x00007f6ec598fab4 in EC_POINT_mul (group=<optimized out>, r=r@entry=0x7f6ea65ecb88, g_scalar=g_scalar@entry=0x0, point=<optimized out>, p_scalar=<optimized out>, p_scalar@entry=0x7f6ea658f068, ctx=<optimized out>, ctx@entry=0x7f6ea65eca10) at ../crypto/ec/ec_lib.c:1143 #4 0x00007f6ec598bff1 in ossl_ec_key_public_check (eckey=eckey@entry=0x7f6ea65d0850, ctx=ctx@entry=0x7f6ea65eca10) at ../crypto/ec/ec_key.c:491 #5 0x00007f6ec5b224ce in ec_validate (keydata=0x7f6ea65d0850, selection=2, checktype=0) at ../providers/implementations/keymgmt/ec_kmgmt.c:966 #6 0x00007f6ec5a1e115 in try_provided_check (ctx=ctx@entry=0x7f6ea650d840, selection=selection@entry=2, checktype=checktype@entry=0) at ../crypto/evp/pmeth_check.c:44 #7 0x00007f6ec5a1e1a1 in evp_pkey_public_check_combined (ctx=ctx@entry=0x7f6ea650d840, checktype=checktype@entry=0) at ../crypto/evp/pmeth_check.c:57 #8 0x00007f6ec5a1e347 in EVP_PKEY_public_check (ctx=ctx@entry=0x7f6ea650d840) at ../crypto/evp/pmeth_check.c:83 #9 0x00007f6ec5a110df in EVP_PKEY_derive_set_peer_ex (ctx=ctx@entry=0x7f6ea65d1698, peer=peer@entry=0x7f6ea64db210, validate_peer=validate_peer@entry=1) at ../crypto/evp/exchange.c:402 #10 0x00007f6ec5a1141a in EVP_PKEY_derive_set_peer (ctx=ctx@entry=0x7f6ea65d1698, peer=peer@entry=0x7f6ea64db210) at ../crypto/evp/exchange.c:502 #11 0x00007f6ec5e4c4a8 in ssl_derive (s=s@entry=0x7f6ea64a8518, privkey=privkey@entry=0x7f6ea66a9528, pubkey=pubkey@entry=0x7f6ea64db210, gensecret=gensecret@entry=1) at ../ssl/s3_lib.c:4803 #12 0x00007f6ec5e9c911 in tls_process_cke_ecdhe (pkt=0x7fff7ab07990, s=0x7f6ea64a8518) at ../ssl/statem/statem_srvr.c:3048 #13 tls_process_client_key_exchange (s=0x7f6ea64a8518, pkt=0x7fff7ab07990) at ../ssl/statem/statem_srvr.c:3316 #14 0x00007f6ec5e88672 in read_state_machine (s=0x7f6ea64a8518) at ../ssl/statem/statem.c:647 #15 state_machine (s=0x7f6ea64a8518, server=1) at ../ssl/statem/statem.c:442 #16 0x00007f6ec5f0c956 in tls_accept (c=c@entry=0x7f6ea66ab558, error=error@entry=0x7fff7ab07b78) at ./src/modules/tls/tls_server.c:471 #17 0x00007f6ec5f1520d in tls_h_read_f (c=c@entry=0x7f6ea66ab558, flags=flags@entry=0x7fff7ab27fa0) at ./src/modules/tls/tls_server.c:1173 #18 0x0000555d90bf59e6 in tcp_read_headers (c=c@entry=0x7f6ea66ab558, read_flags=read_flags@entry=0x7fff7ab27fa0) at core/tcp_read.c:445 #19 0x0000555d90bf8ae6 in tcp_read_req (con=0x7f6ea66ab558, bytes_read=bytes_read@entry=0x7fff7ab27f98, read_flags=read_flags@entry=0x7fff7ab27fa0) at core/tcp_read.c:1508 #20 0x0000555d90bfdca7 in handle_io (fm=fm@entry=0x7f6ec60aea28, events=events@entry=1, idx=idx@entry=-1) at core/tcp_read.c:1912 #21 0x0000555d90c03d5d in io_wait_loop_epoll (repeat=repeat@entry=0, t=<optimized out>, h=<optimized out>) at core/io_wait.h:1073 #22 0x0000555d90c04567 in tcp_receive_loop (unix_sock=<optimized out>) at core/tcp_read.c:2032 #23 0x0000555d90bec1b7 in tcp_init_children (woneinit=woneinit@entry=0x7fff7ab284dc) at core/tcp_main.c:5364 #24 0x0000555d909f9fe0 in main_loop () at ./src/main.c:1936 #25 0x0000555d909eb4cc in main (argc=<optimized out>, argv=<optimized out>) at ./src/main.c:3212 ```
I think there is memory corruption related to `freeing already freed pointer` event.
I really had overlooked the init_mode parameter. Have set it now, it is deployed to one production system, waiting if it crashes or not.
i can (quite) reliably reproduce this bug on today's HEAD (commit 1ded1730c7fe81596d874fdc314dab64a7dab25b) on Ubuntu 22.04.3, compiled with OpenSSL 3.0.2, with `--atexit=0`, the tls module loaded first, and init mode 1.
In my case it happens when using the http_client module to make many concurrent requests to an external https server.
Using the tlsa module, i cannot reproduce the problem.
the logs show
``` 6(30) CRITICAL: {1Po7xZGepmYkCTNNsCw_ 1 1 SUBSCRIBE} <core> [core/mem/q_malloc.c:679]: qm_realloc(): BUG: trying to realloc an already freed pointer 0x7fa97ace8c00 , fragment 0x7fa97ace8bc0 -- aborting 123(154) CRITICAL: <core> [core/pass_fd.c:281]: receive_fd(): EOF on 27 0(1) ALERT: <core> [main.c:801]: handle_sigs(): child process 30 exited by a signal 6 0(1) ALERT: <core> [main.c:805]: handle_sigs(): core was generated 0(1) INFO: <core> [main.c:828]: handle_sigs(): terminating due to SIGCHLD ```
I have a backtrace to share if needed.
In the past I also got randomly conflicts between libcurl (which is behind the http_client/http_async_client modules) and libssl (as used/initialised by tls module), which resulted in writing `ruxc` to provide an alternative to http_client as well as the `tlsa` module to try to isolate the global ssl context. Those conflicts seemed to be specific to some combinations of library versions and OSes.
The tls and tlsa share the same code, only the build process is different, the first one links to shared libssl library and the second one embeds the static libssl library.
Similar there seems to be sometimes conflicts with libmysqlclient and libssl, there is a modparam for db_mysql to disable ssl option for mysql connection, or alternatively try to use tlsa.
So, indeed, a good suggestion is to try with tlsa module instead of tls.
I my case it doesn't look as conflict because I am not using mysql or http_client modules. My loaded modules:
``` loadmodule "tls.so" loadmodule "tm.so" loadmodule "tmx.so" loadmodule "sl.so" loadmodule "maxfwd.so" loadmodule "siputils.so" loadmodule "textops.so" loadmodule "ctl.so" loadmodule "kex.so" loadmodule "avpops.so" loadmodule "pv.so" loadmodule "htable.so" loadmodule "stun.so" loadmodule "outbound.so" loadmodule "rr.so" loadmodule "path.so" loadmodule "xlog.so" loadmodule "dispatcher.so" loadmodule "sipdump.so" loadmodule "ipops.so" loadmodule "corex.so" ```
if it helps, we also tried a custom python implementation using kemi to replace the http_client calls, obtaining the same segfault in the ssl code
Same here with `init_mode` set to `1` ``` #0 0x0000561af309da18 in qm_debug_check_frag (qm=qm@entry=0x7f881ca00000, f=f@entry=0x7f881e1926d0, file=file@entry=0x7f883cfdde1e "tls: tls_init.c", line=line@entry=422, eline=eline@entry=544, efile=0x561af31dc96c "core/mem/q_malloc.c") at core/mem/q_malloc.c:134 #1 0x0000561af309e929 in qm_free (qmp=<optimized out>, p=0x7f881e192710, file=0x7f883cfdde1e "tls: tls_init.c", func=0x7f883cfdf128 <__func__.6> "ser_free", line=422, mname=0x7f883cfd800a "tls") at core/mem/q_malloc.c:544 #2 0x0000561af30a3aff in qm_shm_free (qmp=<optimized out>, p=0x7f881e192710, file=0x7f883cfdde1e "tls: tls_init.c", func=0x7f883cfdf128 <__func__.6> "ser_free", line=422, mname=0x7f883cfd800a "tls") at core/mem/q_malloc.c:1525 #3 0x00007f883cfad4a7 in ser_free (ptr=0x7f881e192710, fname=<optimized out>, fline=<optimized out>) at ./src/modules/tls/tls_init.c:422 #4 0x00007f883cbe8aac in err_clear (deall=0, i=<optimized out>, es=0x7f881caaf2d0) at ../crypto/err/err_local.h:86 #5 ERR_pop_to_mark () at ../crypto/err/err.c:902 #6 0x00007f883cc2e2d2 in ossl_namemap_doall_names (namemap=<optimized out>, number=<optimized out>, fn=fn@entry=0x7f883cc19420 <find_ameth>, data=data@entry=0x7ffc4fda6490) at ../crypto/core_namemap.c:171 #7 0x00007f883cc0b79d in evp_names_do_all (prov=<optimized out>, number=<optimized out>, fn=fn@entry=0x7f883cc19420 <find_ameth>, data=data@entry=0x7ffc4fda6490) at ../crypto/evp/evp_fetch.c:689 #8 0x00007f883cc14663 in EVP_KEYMGMT_names_do_all (keymgmt=keymgmt@entry=0x7f881cd9e800, fn=fn@entry=0x7f883cc19420 <find_ameth>, data=data@entry=0x7ffc4fda6490) at ../crypto/evp/keymgmt_meth.c:302 #9 0x00007f883cc1ab58 in EVP_PKEY_set_type_by_keymgmt (pkey=pkey@entry=0x7f881e18d020, keymgmt=0x7f881cd9e800) at ../crypto/evp/p_lib.c:1642 #10 0x00007f883cc1be41 in EVP_PKEY_copy_parameters (to=to@entry=0x7f881e18d020, from=from@entry=0x7f881dfc2130) at ../crypto/evp/p_lib.c:155 #11 0x00007f883cf588e0 in tls_process_cke_ecdhe (pkt=0x7ffc4fda66a0, s=0x7f881e0bb340) at ../ssl/statem/statem_srvr.c:3037 #12 tls_process_client_key_exchange (s=0x7f881e0bb340, pkt=0x7ffc4fda66a0) at ../ssl/statem/statem_srvr.c:3316 #13 0x00007f883cf44672 in read_state_machine (s=0x7f881e0bb340) at ../ssl/statem/statem.c:647 #14 state_machine (s=0x7f881e0bb340, server=1) at ../ssl/statem/statem.c:442 #15 0x00007f883cfc8c36 in tls_accept (c=c@entry=0x7f881dffd5d0, error=error@entry=0x7ffc4fda6888) at ./src/modules/tls/tls_server.c:471 #16 0x00007f883cfd14ed in tls_h_read_f (c=c@entry=0x7f881dffd5d0, flags=flags@entry=0x7ffc4fdc6cb0) at ./src/modules/tls/tls_server.c:1173 #17 0x0000561af304aa36 in tcp_read_headers (c=c@entry=0x7f881dffd5d0, read_flags=read_flags@entry=0x7ffc4fdc6cb0) at core/tcp_read.c:445 #18 0x0000561af304db36 in tcp_read_req (con=0x7f881dffd5d0, bytes_read=bytes_read@entry=0x7ffc4fdc6ca8, read_flags=read_flags@entry=0x7ffc4fdc6cb0) at core/tcp_read.c:1509 #19 0x0000561af3052cf7 in handle_io (fm=fm@entry=0x7f883d68e7f8, events=events@entry=1, idx=idx@entry=-1) at core/tcp_read.c:1906 #20 0x0000561af3058dad in io_wait_loop_epoll (repeat=repeat@entry=0, t=<optimized out>, h=<optimized out>) at core/io_wait.h:1073 #21 0x0000561af30595b7 in tcp_receive_loop (unix_sock=<optimized out>) at core/tcp_read.c:2026 #22 0x0000561af3041207 in tcp_init_children (woneinit=woneinit@entry=0x7ffc4fdc71ec) at core/tcp_main.c:5360 #23 0x0000561af2e4d070 in main_loop () at ./src/main.c:1946 #24 0x0000561af2e40aac in main (argc=<optimized out>, argv=<optimized out>) at ./src/main.c:3236 ``` ``` (gdb) p ((struct qm_frag_end *)((char *)(f) + sizeof(struct qm_frag) + (f)->size)) $3 = (struct qm_frag_end *) 0x5c6bc564a435e6f7 (gdb) p *((struct qm_frag_end *)((char *)(f) + sizeof(struct qm_frag) + (f)->size)) Cannot access memory at address 0x5c6bc564a435e6f7 ``` As far as I can see, there's only `tls.so` using OpenSSL ``` % grep "^loadm" kamailio.cfg | cut -d" -f2 | while read mod ; do ldd /usr/lib/x86_64-linux-gnu/kamailio/modules/$mod | grep -qE 'crypto|ssl' && echo $mod ; done tls.so ```
Hello I have a similar problem The trace is different, but the accompanying log entry similarly reports twice freed memory
``modparam("tls", "init_mode", 1)``
``` kamailio -V version: kamailio 5.7.2 (x86_64/linux) 29caa1 flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, MEM_JOIN_FREE, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLOCKLIST, HAVE_RESOLV_RES, TLS_PTHREAD_MUTEX_SHARED ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. id: 29caa1 compiled on 11:40:35 Nov 16 2023 with clang 9.0 ```
``` #0 0x00007f2c94af2db1 in RSA_size () from /var/lib/ums/lib64/libcrypto.so.3 #1 0x00007f2c94bab19b in rsa_sign () from /var/lib/ums/lib64/libcrypto.so.3 #2 0x00007f2c94bab990 in rsa_digest_sign_final () from /var/lib/ums/lib64/libcrypto.so.3 #3 0x00007f2c94a95306 in EVP_DigestSignFinal () from /var/lib/ums/lib64/libcrypto.so.3 #4 0x00007f2c94d57e53 in tls_construct_cert_verify () from /var/lib/ums/lib64/libssl.so.3 #5 0x00007f2c94d4df28 in state_machine () from /var/lib/ums/lib64/libssl.so.3 #6 0x00007f2c94dc1aca in tls_accept (c=0x7f2c152b4b78, error=0x7ffce6b81fb8) at tls_server.c:470 #7 0x00007f2c94dd3d24 in tls_h_read_f (c=0x7f2c152b4b78, flags=0x7ffce6b8443c) at tls_server.c:1175 #8 0x0000000000832538 in tcp_read_headers (c=0x7f2c152b4b78, read_flags=0x7ffce6b8443c) at core/tcp_read.c:441 #9 0x000000000083aa4d in tcp_read_req (con=0x7f2c152b4b78, bytes_read=0x7ffce6b84440, read_flags=0x7ffce6b8443c) at core/tcp_read.c:1469 #10 0x000000000085419b in handle_io (fm=0x7f2c9513b7a8, events=1, idx=-1) at core/tcp_read.c:1780 #11 0x0000000000850f70 in io_wait_loop_epoll (h=0xb99020 <io_w>, t=2, repeat=0) at core/io_wait.h:1070 #12 0x00000000008444ae in tcp_receive_loop (unix_sock=47) at core/tcp_read.c:1976 #13 0x0000000000600235 in tcp_init_children (woneinit=0x7ffce6b86a78) at core/tcp_main.c:5239 #14 0x000000000043b403 in main_loop () at main.c:1851 #15 0x000000000044a9f3 in main (argc=9, argv=0x7ffce6b88b68) at main.c:3086 ```
``` CRITICAL: <core> [core/mem/q_malloc.c:521]: qm_free(): BUG: freeing already freed pointer (0x7f2c152ce688), called from tls: tls_init.c: ser_free(405), first free tls: tls_init.c: ser_free(405) - ignoring ```
@6ecbl: can you install libssl/libcrypto debug symbols and retake the backtrace with gdb from the same core file? It should be a package available in your OS distro. More details are provided in that case that allow to track lines of the code that were executed from libssl and libcrypto.
Is anyone here that got a core file able to provide ssh access to the system to allow investigation of the corefile? It will be much easier to look at the structures and values related to tls. If yes, write me to miconda@gmail.com and I can provide ssh key.
Observation: most of these faults are in `tls_accept()` even before the `SSL *` object is shared and used by multiple workers (in the steady state) — this is strange as we normally associate OpenSSL “problems” with using `SSL *` in multiple processes.
It seems to be related to (1) error handling and/or (2) handshaking with asymmetric keys.
If anyone is in a position to try with PSK it would be an interesting data point (not sure if kamailio's `tls.so` can be used with PSK though...).
I have reproduced similar crashes with OpenSSL 3.0.x and most of them occur in `tls_accept()` in various places with both RSA/ECDSA keys.
For workarounds: you can try `tls_wolfssl` (disclaimer: I am the contributor of this module) or `tlsa/OpenSSL 1.1.1`. I don't recommend `tlsa/OpenSSL 3.x.x` as I can reproduce such crashes in that scenario. For 5.7.2/3 you would have to build these modules yourself.
Just a generic note, it seems that some projects like haproxy don't recomment openssl 3.x for production use anymore, due to similar issues and also multiple major performance regressions with a performance of 1/10 to 1/4 comparing to openssl 1.1.x [link](https://github.com/haproxy/wiki/wiki/SSL-Libraries-Support-Status#openssl)
Thanks for the note from haproxy about openssl 3.x, useful to know similar issues popped up elsewhere.
Meanwhile, as I investigated a bit more earlier today with the help of one of the reporters here, I reworked the modparam `init_mode=1` to switch to pthread multi-process lock, before it was targeting thread protection inside a process. The commit is now in both master and 5.7 branches, everyone here should try with these latest versions to see if it makes it better.
Observation: most of these faults are in `tls_accept()` even before the `SSL *` object is shared and used by multiple workers (in the steady state) — this is strange as we normally associate OpenSSL “problems” with using `SSL *` in multiple processes.
It seems to be related to (1) error handling and/or (2) handshaking with asymmetric keys.
If anyone is in a position to try with PSK it would be an interesting data point (not sure if kamailio's `tls.so` can be used with PSK though...).
I have reproduced similar crashes with OpenSSL 3.0.x and most of them occur in `tls_accept()` in various places with both RSA/ECDSA keys.
For workarounds: you can try `tls_wolfssl` (disclaimer: I am the contributor of this module) or `tlsa/OpenSSL 1.1.1`. I don't recommend `tlsa/OpenSSL 3.x.x` as I can reproduce such crashes in that scenario. For 5.7.2/3 you would have to build these modules yourself.
We are currently facing this issue aswell and what I can tell is that it happens with OpenSSL 1.1.1 too. Not tried tls_wolfssl.
For libssl 1.1.x is better to open a new issue and attach there the log messages, gdb back traces, kamailio version and OS details, it is not easy to track here what messages are per libssl version and os, this one should stay for libssl 3.x.
Just to give a feedback: A nightly build has been running on one of our affected systems for three weeks now, and we haven't had a crash since. So the fix seems to improve the behavior. Next step is to test it on a system with client devices on the other end, handling much more TLS connections and more frequent connects and disconnects.
@SipSeb - I think I have a better understanding of the cause for OpenSSL 3 errors now: see #3695. During your load testing try loading another module in the config that also uses OpenSSL (e.g. outbound.so). You need not use the functions of this extra module - just the loading of the module should accelerate the error condition.
If you config does not load any other OpenSSL-using module - then you should be safe.
The safety test is: for each of your workers, use gdb to attach and run the function
``` gdb> p ossl_err_get_state_int() # this shows a pointer to ERR_STATE * ```
If all these pointers are different - then you are safe. If the pointers are the same then there is a chance of memory corruption.
@space88man I just checked it on a test system with exactly one connected client, and all 4 TCP listeners as well as the TCP main process show this output: ``` (gdb) p ossl_err_get_state_int() No symbol "ossl_err_get_state_int" in current context. (gdb) ```
@space88man I just checked it on a test system with exactly one connected client, and all 4 TCP listeners as well as the TCP main process show this output:
(gdb) p ossl_err_get_state_int() No symbol "ossl_err_get_state_int" in current context. (gdb)
``` # alternatives - otherwise you might need to compile with a debug version of OpenSSL (gdb) p ERR_get_state() (gdb) p pthread_getspecific(err_thread_local)
Closing now with commits on master—reopen with separate issues for OpenSSL 1.1.1/OpenSSL
Open a separate issue if your config actually manages to init OpenSSL in rank 0(thread#1)—this is the main cause of intermittent crashes. A possible example would be `db_mysql` with an SSL connection (I'm not able to test this yet). In the issue do list your config (list of all the modules and loading sequence).
Closed #3635 as completed.
Fixes back-ported to 5.7 and available with 5.7.4 release.