### Description
We are using Kamailio 5.7.4 on Debian 12 (from http://deb.kamailio.org/kamailio57) with rtpengine as an Edgeproxy for our clients. The instance terminates SIP/TLS (with Cliencertificates) and forwards the SIP Traffic to internal systems.
After some days we are getting errors like this `tls_complete_init(): tls: ssl bug #1491 workaround: not enough memory for safe operation: shm=7318616 threshold1=8912896`
First we thought Kamailio just doesnt have enough memory, so we doubled it..
But after some days the Logmessage (and Userissues) occured again.
So we monitored the shmmem statistics and found that used and max_used are constantly growing til it reaches the limit.
As i mentioned we are using client-certificates and so we are also using the CRL feature. We do have a systemd-timer which fetches the CRL every hour and runs 'kamcmd tls.reload' when finished.
Our tls.cfg looks like this: ``` [server:default] method = TLSv1.2+ private_key = /etc/letsencrypt/live/hostname.de/privkey.pem certificate = /etc/letsencrypt/live/hostname.de/fullchain.pem ca_list = /etc/kamailio/ca_list.pem ca_path = /etc/kamailio/ca_list.pem crl = /etc/kamailio/combined.crl.pem verify_certificate = yes require_certificate = yes
[client:default] verify_certificate = yes require_certificate = yes ```
After testing a bit we found that every time tls.reload is executed Kamailio consumes a bit more memory which eventually leads to all the memory being consumed which leads to issues for our users.
See following example: ``` [0][root@edgar-dev:~]# while true ; do /usr/sbin/kamcmd tls.reload ; /usr/sbin/kamcmd core.shmmem ; sleep 1 ; done Ok. TLS configuration reloaded. { total: 268435456 free: 223001520 used: 41352552 real_used: 45433936 max_used: 45445968 fragments: 73 } Ok. TLS configuration reloaded. { total: 268435456 free: 222377960 used: 41975592 real_used: 46057496 max_used: 46069232 fragments: 78 } Ok. TLS configuration reloaded. { total: 268435456 free: 221748664 used: 42604992 real_used: 46686792 max_used: 46698080 fragments: 77 } Ok. TLS configuration reloaded. { total: 268435456 free: 221110832 used: 43242408 real_used: 47324624 max_used: 47335608 fragments: 81 } ^C [130][root@edgar-dev:~]# ```
### Troubleshooting
#### Reproduction
Everytime tls.reload is called the memory consumptions grows..
#### Debugging Data
<!-- If you got a core dump, use gdb to extract troubleshooting data - full backtrace, local variables and the list of the code at the issue location.
gdb /path/to/kamailio /path/to/corefile bt full info locals list
If you are familiar with gdb, feel free to attach more of what you consider to be relevant. -->
``` If you let me know what would be interesting for tracking this down, i am happy to provide logs/debugging data! ```
#### Log Messages
<!-- Check the syslog file and if there are relevant log messages printed by Kamailio, add them next, or attach to issue, or provide a link to download them (e.g., to a pastebin site). -->
``` If you let me know what would be interesting for tracking this down, i am happy to provide logs/debugging data! ```
#### SIP Traffic
SIP doesnt seem to be relevant here
### Possible Solutions
Calling tls.reload less often or restart kamailio before memory is consumed ;)
### Additional Information
``` version: kamailio 5.7.4 (x86_64/linux) flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, MEM_JOIN_FREE, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLOCKLIST, HAVE_RESOLV_RES, TLS_PTHREAD_MUTEX_SHARED ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. id: unknown compiled with gcc 12.2.0 ```
* **Operating System**:
``` * Debian GNU/Linux 12 (bookworm) * Linux edgar-dev 6.1.0-20-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.85-1 (2024-04-11) x86_64 GNU/Linux ```
I just realized that i forgot to mention.. in addition to the logged error message our clients start to get connection issues as well, so we have to restart Kamailio asap in that case..
@denzs do you have a monitoring tool? Prometheus + Graphana graphs?
Probably this part has to be reviewed ... first the tls reload was initially designed to be done rather rarely, when the certificates expires. The CRL feature was also not much in use, at least in what I could experience so far, most of the deployments are with server-side only certificates.
Furthermore, I am not sure if old certificates can be cleared right away after the restart, existing connections are not closed and there might be some references to their certificates.
Are you doing the reload only if there are changes in the content of the crl or certificate files? Or the reload is done anyhow?
@sergey-safarov yes we do :) 
@miconda at the moment we do the tls.reload unconditionally and quite 'high frequently' to ensure the CRLs are up to date.. of course we can check if the CRL changed, but from my point of view that would only delay the neccesary restart of kamailio..
 This Screenshot is from our dev environment running: ``` while true ; do /usr/sbin/kamcmd tls.reload ; /usr/sbin/kamcmd tls.reload ; sleep 0.5 ; done ```
Parallel watching core.shmmem outpot looks like: ``` Ok. TLS configuration reloaded. { total: 268435456 free: 1894256 used: 262444424 real_used: 266541200 max_used: 266550968 fragments: 85 } error: 500 - Error while fixing TLS configuration (consult server log) { total: 268435456 free: 1208784 used: 263491296 real_used: 267226672 max_used: 268435208 fragments: 11749 } Ok. TLS configuration reloaded. { total: 268435456 free: -9223372036854776 used: 267589696 real_used: 271686888 max_used: 271696928 fragments: 87 } ```
Could you compare it with a graph for our server for last 60 days and about 25 WebRTC clients? 
and 
Here used Kamailio 5.7.2 with Letencrypt server. Cert reloads once per two-mouth. We dot use CRL. To avoid too often cert reloads we compare currently used certificates and the last cert using commands like. ```sh rsync -l --recursive --info=name --dry-run ${LECRTSDIR} ${LETARGETDIR} >${CHKUPDLOG} # Synchronizing certificates. if [ ! -s ${CHKUPDLOG} ]; then echo "Check updates. No changes required" rm -f ${CHKUPDLOG} else echo "Has new certificates. Start sync" rsync -azlcv --recursive --delete --info=name ${LECRTSDIR} ${LETARGETDIR} >"${SYNCLOG}" rm -f ${CHKUPDLOG} fi ```
The problem actually occured after we added the CRL some weeks ago.. without CRL there was no such behaviour. And of course there are a lot options to mitigate the issue respectively decrease the propability by doing less reloads by decreasing the cycle and/or check if there was a change at the CRL at all..
Anyhow i thought raising an issue makes sense, because from my point of view there is definitively some memory leaking when using tls.reload in combination with a CRL..
If it happens only with adding a CRL, it looks indeed like an issue in this code path. In the end using CRL is probably quite rare.
After some time debuging, I could replicate this issue of memory increase when using a CRL and tls.reload.
One possible issue according to memory statistics printed frequently while we have `while true ; do /usr/sbin/kamcmd tls.reload ; /usr/sbin/kamcmd tls.reload ; sleep 0.5 ; done` running is:
``` INFO: qm_sums: qm_sums(): count= 5288 size= 183440 bytes from tls: tls_init.c: ser_realloc(372) INFO: qm_sums: qm_sums(): count= 17378 size= 1275712 bytes from tls: tls_init.c: ser_malloc(364) --- INFO: qm_sums: qm_sums(): count= 5341 size= 242768 bytes from tls: tls_init.c: ser_realloc(372) INFO: qm_sums: qm_sums(): count= 17325 size= 1381936 bytes from tls: tls_init.c: ser_malloc(364) --- INFO: qm_sums: qm_sums(): count= 5331 size= 248544 bytes from tls: tls_init.c: ser_realloc(372) INFO: qm_sums: qm_sums(): count= 17335 size= 1422112 bytes from tls: tls_init.c: ser_malloc(364) --- INFO: qm_sums: qm_sums(): count= 5360 size= 290560 bytes from tls: tls_init.c: ser_realloc(372) INFO: qm_sums: qm_sums(): count= 17306 size= 1466000 bytes from tls: tls_init.c: ser_malloc(364) ``` Memory here increases until we exhaust the shared memory max allocation and then tls.reload fails.
Some notes: When using tls.reload without a CRL, I didn't see any notable increase in memory usage. The above-noted allocations are steady around ``` count= 9415 size= 948432 bytes from tls: tls_init.c: ser_malloc(364) count= 1011 size= 151408 bytes from tls: tls_init.c: ser_realloc(372) ```
This issue is stale because it has been open 6 weeks with no activity. Remove stale label or comment or this will be closed in 2 weeks.
Although it is quite easy to monitor and workaround this issue - i still think it is a valid bug :)
Just for reference, this was discussed on the developer list, thread: https://lists.kamailio.org/mailman3/hyperkitty/list/sr-dev@lists.kamailio.or...
This issue is stale because it has been open 6 weeks with no activity. Remove stale label or comment or this will be closed in 2 weeks.
Are there any news/intentions on merging the branch from xkaraman? :)
Hey @denzs,
it's been some time i have checked this sorry.
There was a discussion about introducing a parameter for this change. I will try to implement it asap, so i can create a PR for this and reinitiate the discussion!
Thanks for your patience, Xenofon
@xkaraman thank you so much! I did not want to rush you, i just wanted to prevent this issue from being auto-closed :)
Hey @denzs,
I have just create https://github.com/kamailio/kamailio/pull/3972 for this.
Can you maybe check whether kamailio still functions as intended (other than the tls.reload) with the new shared context stuff?
After applying the patch, set the new tls parameter `enable_shared_ctx` to 1 to the config file and you are good to go.
Any feedback is welcome!
@xkaraman thank you so much for taking care of this! :)
I tested your branch on our dev instance, the normal functions are doing fine so far :+1:
But switchting `enable_shared_ctx` from 0 to 1 only seems to delay the memory leaking issue: 
The first 5 minutes are with `enable_shared_ctx=0` and the rest with `enable_shared_ctx=1`. During the last 5 minutes i stopped the tls.reload to see if memory consumption would descrease again, but that is not the case..
Tested with: `while true ; do /usr/sbin/kamcmd tls.reload ; sleep 0.5 ; done`
Hey @denzs,
Thanks for testing this out.
As we discussed in the mailing list and also as noted in the PR, indeed this patch is not adequate to fix the actual problem. I was trying to lower the memory usage and hoped that the increase would not really be noticable any more (clearly not the case from your report).
The problem seems to be in the `SSL_CTX_load_verify_locations` and the usage of it in the `load_crl()`. I will keep digging and see if there is something to be done to actually free the memory.
Just for refernece what OpenSSL are you testing this with?
@xkaraman thanks for your feedback :) It is a Debian 12 system with: ``` ii libssl-dev:amd64 3.0.14-1~deb12u2 amd64 Secure Sockets Layer toolkit - development files ii libssl3:amd64 3.0.14-1~deb12u2 amd64 Secure Sockets Layer toolkit - shared libraries ii openssl 3.0.14-1~deb12u2 amd64 Secure Sockets Layer toolkit - cryptographic utility ```
This issue is stale because it has been open 6 weeks with no activity. Remove stale label or comment or this will be closed in 2 weeks.
Just a 'ping' to prevent the bot from closing the issue.. :)
This issue is stale because it has been open 6 weeks with no activity. Remove stale label or comment or this will be closed in 2 weeks.
Just a 'ping' to prevent the bot from closing the issue.. :)
This issue is stale because it has been open 6 weeks with no activity. Remove stale label or comment or this will be closed in 2 weeks.
Time for yet another 'ping' :)
Hello,
We have noticed the exact issue on our own systems (Kamailio 5.8.2 on Ubuntu 22.04.4 LTS). Every time we run
kamctl rpc tls.reload
the "shmem:used_size" output from the 'kamctl stats shmem' command increases by about 3.2MB.
This looks like a memory leak, so I tried to find which module could be increasing its memory use by running the following commands:
kamcmd mod.stats all shm kamcmd pkg.stats kamcmd mod.stats all pkg
but if there are any changes before and after the restart those are minimal and they do not amount to anything approximating 3.19 MB for every reload.
I have attached the before and after outputs of the above commands in case these help.
[after_mod.stats_all_pkg.txt](https://github.com/user-attachments/files/18623471/after_mod.stats_all_pkg.t...) [after_mod.stats_all_shm.txt](https://github.com/user-attachments/files/18623474/after_mod.stats_all_shm.t...) [after_pkg.stats.txt](https://github.com/user-attachments/files/18623472/after_pkg.stats.txt) [after_stats_shmem.txt](https://github.com/user-attachments/files/18623473/after_stats_shmem.txt)
[before_mod.stats_all_pkg.txt](https://github.com/user-attachments/files/18623482/before_mod.stats_all_pkg....) [before_mod.stats_all_shm.txt](https://github.com/user-attachments/files/18623481/before_mod.stats_all_shm....) [before_pkg.stats.txt](https://github.com/user-attachments/files/18623484/before_pkg.stats.txt) [before_stats_shmem.txt](https://github.com/user-attachments/files/18623483/before_stats_shmem.txt)
Just pinging this.