Hi,
we have some Kamailio instances running (currently latest 5.4 release), and we need to
restart it from time to time. We have a grafana graph showing the pkg memory usage of one
random tcp listener, and it increases slowly over time. Config is pure python KEMI.
A mem dump directly after restarting Kamailio says this:
SipSeppBook22:tmp sdamm$ grep alloc pkgmem_before.log | awk '{ print substr( $0, 16,
length($0) ) }' | sort | uniq -c | sort -k1n | tail -10
16 sipproxy qm_status(): alloc'd from core: core/re.c: subst_parser(301)
31 sipproxy qm_status(): alloc'd from core: core/sr_module.c:
load_module(436)
31 sipproxy qm_status(): alloc'd from core: core/sr_module.c:
register_module(236)
31 sipproxy qm_status(): alloc'd from core: core/sr_module.c:
register_module(253)
40 sipproxy qm_status(): alloc'd from core: core/pvapi.c:
pv_init_buffer(2139)
58 sipproxy qm_status(): alloc'd from core: core/cfg.lex:
pp_define(1827)
133 sipproxy qm_status(): alloc'd from core: core/rpc_lookup.c:
rpc_hash_add(101)
162 sipproxy qm_status(): alloc'd from core: core/counters.c:
cnt_hash_add(339)
211 sipproxy qm_status(): alloc'd from core: core/cfg.lex: addstr(1448)
265 sipproxy qm_status(): alloc'd from core: core/pvapi.c:
pv_table_add(236)
And after running for some weeks, the same dump looks like this:
SipSeppBook22:tmp sdamm$ grep alloc prod_pkgmem.log | awk '{ print substr( $0, 16,
length($0) ) }' | sort | uniq -c | sort -k1n | tail -10
31 ifens5 qm_status(): alloc'd from core: core/sr_module.c:
register_module(253)
40 ifens5 qm_status(): alloc'd from core: core/pvapi.c:
pv_init_buffer(2139)
59 ifens5 qm_status(): alloc'd from core: core/cfg.lex: pp_define(1827)
133 ifens5 qm_status(): alloc'd from core: core/rpc_lookup.c:
rpc_hash_add(101)
161 ifens5 qm_status(): alloc'd from core: core/counters.c:
cnt_hash_add(339)
203 ifens5 qm_status(): alloc'd from core: core/cfg.lex: addstr(1448)
265 ifens5 qm_status(): alloc'd from core: core/pvapi.c:
pv_table_add(236)
686 ifens5 qm_status(): alloc'd from core: core/pvapi.c:
pv_parse_format(1173)
694 ifens5 qm_status(): alloc'd from htable: ht_var.c:
pv_parse_ht_name(158)
707 ifens5 qm_status(): alloc'd from core: core/pvapi.c:
pv_cache_add(349)
I know, currently there are a few lines in the code which look like this:
self.instance_name = KSR.pv.get("$sht(pbxdata=>ip.%s)" % (ip,))
This has been an issue in the past and I have replaced the code with something like
this:
KSR.pv.sets("$var(tmpInstanceIp)", ip)
self.instance_name = KSR.pv.get("$sht(pbxdata=>ip.$var(tmpInstanceIp))")
However, even after changing this, the memory still grows slowly but steadily.
The usage scenario is TLS-only on one side (clients) and TCP-only on the other side
(pbxes).
Does anybody have a hint for me how to debug this? Looks like there's a lot of pv
stuff in the memory, but I don't really know where.
Thanks for any hints,
Sebastian