Description

I found a memory leak which is causing some issues for us in our production. I think I was able to reproduce it and determine that this is happening when we set up an MSRP connection, and that connection is closed due to inactivity (for example when tcp_connection_lifetime is triggered).

Troubleshooting

Reproduction

In this environment we have a WebRTC SIP client, connected to Kamailio and a second party using SIP, both sending messages each via MSRP protocol. Web Client and SIP client establish a session normally, all fine here, they can communicate. At this point we have this status:

sudo netstat -natp | grep 10.22.22
tcp        0      0 10.22.22.21:5060        0.0.0.0:*               LISTEN      28646/kamailio 
tcp        0      0 10.22.22.21:10000       0.0.0.0:*               LISTEN      28646/kamailio 
tcp        0      0 10.22.22.21:8080        0.0.0.0:*               LISTEN      28646/kamailio 
tcp        0      0 10.22.22.21:10001       0.0.0.0:*               LISTEN      28646/kamailio 
tcp        0      0 10.22.22.21:10002       0.0.0.0:*               LISTEN      28646/kamailio 
tcp        0      0 10.22.22.21:8080        10.22.22.1:58508        ESTABLISHED 28640/kamailio 
tcp        0      0 10.22.22.21:10000       10.22.22.190:32786      ESTABLISHED 28646/kamailio 
tcp        0      0 10.22.22.21:8080        10.22.22.1:58506        ESTABLISHED 28591/kamailio 

TESTER|centos8.4|tester21  [2021-10-06][16:23:57] [/home/vagrant]$ kamcmd -s udp:127.0.0.1:2046 mod.stats core all | grep tcp
	tcpconn_new(971): 198888
	init_tcp(4700): 8192
	init_tcp(4694): 32768
	init_tcp(4686): 8
	init_tcp(4679): 8
	init_tcp(4672): 8
	init_tcp(4666): 8
	init_tcp(4654): 8

In order to trigger this failure much faster, I set tcp_connection_lifetime=60. Then we just wait the required seconds to let the TCP connections closed. Then we finish the session, and stop our WebClient registering, so all connections are cleaned up.

 TESTER|centos8.4|tester21  [2021-10-06][16:28:08] [/home/vagrant]$ sudo netstat -natp | grep 10.22.22
tcp        0      0 10.22.22.21:5060        0.0.0.0:*               LISTEN      28646/kamailio 
tcp        0      0 10.22.22.21:10000       0.0.0.0:*               LISTEN      28646/kamailio 
tcp        0      0 10.22.22.21:8080        0.0.0.0:*               LISTEN      28646/kamailio 
tcp        0      0 10.22.22.21:10001       0.0.0.0:*               LISTEN      28646/kamailio 
tcp        0      0 10.22.22.21:10002       0.0.0.0:*               LISTEN      28646/kamailio 
 TESTER|centos8.4|tester21  [2021-10-06][16:28:20] [/home/vagrant]$ kamcmd -s udp:127.0.0.1:2046 mod.stats core all | grep tcp
	tcpconn_new(971): 66296
	init_tcp(4700): 8192
	init_tcp(4694): 32768
	init_tcp(4686): 8
	init_tcp(4679): 8
	init_tcp(4672): 8
	init_tcp(4666): 8
	init_tcp(4654): 8

As you can see there are no connections but there is memory allocated by tcpconn_new. I know this is related to the MSRP because we noticed our SIP Web Client is not closing the socket, so we just wait for Kamailio to close it.

Debugging Data

(paste your debugging data here)

Log Messages

2021-07-25T23:54:41Z ERROR ERROR: <core> [mem/f_malloc.c:415]: fm_search_defrag(): fm_search_defrag(0x7f39a284c000, 66288); Free fragment not found!
2021-07-25T23:54:41Z ERROR ERROR: <core> [mem/f_malloc.c:498]: fm_malloc(): fm_malloc(0x7f39a284c000, 66288) called from core: tcp_main.c: tcpconn_new(957), module: core; Free fragment not found!
2021-07-25T23:54:41Z ERROR ERROR: <core> [tcp_main.c:959]: tcpconn_new(): mem. allocation failure
2021-07-25T23:54:41Z ERROR ERROR: <core> [tcp_main.c:3985]: handle_new_connect(): tcpconn_new failed, closing socket

SIP Traffic

(paste your sip traffic here)

Possible Solutions

Additional Information

version: kamailio 5.1.6 (x86_64/linux) 7d1964
flags: STATS: Off, EXTRA_DEBUG, USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, DBG_QM_MALLOC, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144 MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: 7d1964 
compiled on 13:54:09 Oct  5 2021 with gcc 8.4.1

Already tried with the latest stable Kamailio version, with the same issue.

Linux tester21 4.18.0-305.17.1.el8_4.x86_64 #1 SMP Wed Sep 8 14:00:07 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

centos 8


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.