Kamailio eats pkg mem, but where? - sr-users

14 Jun 2022


      Hi,
we have some Kamailio instances running (currently latest 5.4 release), and we need to restart it from time to time. We have a grafana graph showing the pkg memory usage of one random tcp listener, and it increases slowly over time. Config is pure python KEMI.
A mem dump directly after restarting Kamailio says this:
SipSeppBook22:tmp sdamm$ grep alloc pkgmem_before.log | awk '{ print substr( $0, 16, length($0) ) }' | sort | uniq -c | sort -k1n | tail -10
  16  sipproxy qm_status():           alloc'd from core: core/re.c: subst_parser(301)
  31  sipproxy qm_status():           alloc'd from core: core/sr_module.c: load_module(436)
  31  sipproxy qm_status():           alloc'd from core: core/sr_module.c: register_module(236)
  31  sipproxy qm_status():           alloc'd from core: core/sr_module.c: register_module(253)
  40  sipproxy qm_status():           alloc'd from core: core/pvapi.c: pv_init_buffer(2139)
  58  sipproxy qm_status():           alloc'd from core: core/cfg.lex: pp_define(1827)
 133  sipproxy qm_status():           alloc'd from core: core/rpc_lookup.c: rpc_hash_add(101)
 162  sipproxy qm_status():           alloc'd from core: core/counters.c: cnt_hash_add(339)
 211  sipproxy qm_status():           alloc'd from core: core/cfg.lex: addstr(1448)
 265  sipproxy qm_status():           alloc'd from core: core/pvapi.c: pv_table_add(236)
And after running for some weeks, the same dump looks like this:
SipSeppBook22:tmp sdamm$ grep alloc prod_pkgmem.log | awk '{ print substr( $0, 16, length($0) ) }' | sort | uniq -c | sort -k1n | tail -10
  31  ifens5 qm_status():           alloc'd from core: core/sr_module.c: register_module(253)
  40  ifens5 qm_status():           alloc'd from core: core/pvapi.c: pv_init_buffer(2139)
  59  ifens5 qm_status():           alloc'd from core: core/cfg.lex: pp_define(1827)
 133  ifens5 qm_status():           alloc'd from core: core/rpc_lookup.c: rpc_hash_add(101)
 161  ifens5 qm_status():           alloc'd from core: core/counters.c: cnt_hash_add(339)
 203  ifens5 qm_status():           alloc'd from core: core/cfg.lex: addstr(1448)
 265  ifens5 qm_status():           alloc'd from core: core/pvapi.c: pv_table_add(236)
 686  ifens5 qm_status():           alloc'd from core: core/pvapi.c: pv_parse_format(1173)
 694  ifens5 qm_status():           alloc'd from htable: ht_var.c: pv_parse_ht_name(158)
 707  ifens5 qm_status():           alloc'd from core: core/pvapi.c: pv_cache_add(349)
I know, currently there are a few lines in the code which look like this:
self.instance_name = KSR.pv.get("$sht(pbxdata=>ip.%s)" % (ip,))
This has been an issue in the past and I have replaced the code with  something like this:
KSR.pv.sets("$var(tmpInstanceIp)", ip)
self.instance_name = KSR.pv.get("$sht(pbxdata=>ip.$var(tmpInstanceIp))")
However, even after changing this, the memory still grows slowly but steadily.
The usage scenario is TLS-only on one side (clients) and TCP-only on the other side (pbxes).
Does anybody have a hint for me how to debug this? Looks like there's a lot of pv stuff in the memory, but I don't really know where.
Thanks for any hints,
Sebastian