Hi! it seems the pua module in send_subscribe ( ) is leaking some memory because the hentity is not released. According to memory allocations the shmem allocated for hentity is not cleaned. ``` 1454582 file = 0x70e59fd30209 "usrloc: ../../core/ut.h", func = 0x70e59fd326c8 <__func__.16> "shm_str_dup", mname = 0x70e59fd2fdf0 "usrloc", line = 722 698090 file = 0x604218f2fcd5 "core: core/xavp.c", func = 0x604218f31988 <__func__.9> "xavp_new_value", mname = 0x604218f2fcd0 "core", line = 100 360187 file = 0x70e59fd30663 "usrloc: ucontact.c", func = 0x70e59fd326b8 <__func__.17> "new_ucontact", mname = 0x70e59fd2fdf0 "usrloc", line = 94 34208 file = 0x70e59fc91ae9 "pua: send_subscribe.c", func = 0x70e59fc94c80 <__func__.2> "subscribe_cbparam", mname = 0x70e59fc917e0 "pua", line = 773 5569 file = 0x70e59fc91ae9 "pua: send_subscribe.c", func = 0x70e59fc94ca0 <__func__.1> "subs_cbparam_indlg", mname = 0x70e59fc917e0 "pua", line = 861 64 file = 0x70e5a231f424 "tmx: tmx_pretran.c", func = 0x70e5a2320230 <__func__.0> "tmx_check_pretran", mname = 0x70e5a231f420 "tmx", line = 271 64 file = 0x70e5a231f424 "tmx: tmx_pretran.c", func = 0x70e5a2320230 <__func__.0> "tmx_check_pretran", mname = 0x70e5a231f420 "tmx", line = 250 7 file = 0x70e59fc86854 "pua: event_list.c", func = 0x70e59fc86ad0 <__func__.1> "add_pua_event", mname = 0x70e59fc86850 "pua", line = 72 3 file = 0x70e5a297c249 "license: ../../core/ut.h", func = 0x70e5a297d4f8 <__func__.3> "shm_str_dup", mname = 0x70e5a297c000 "license", line = 722 3 file = 0x70e59fd329d4 "usrloc: udomain.c", func = 0x70e59fd35660 <__func__.22> "build_stat_name", mname = 0x70e59fd328a0 "usrloc", line = 56 ``` According to mod_stats for shmem over RPC the pua module was taking double as much memory as usrloc on the busy system.
I have tried to address the issue shown on lines 773 and 861 and added clearing of hentity at the end of send_subscribe () : but it had caused the subscribe_200 outside of dialog to crash kamailio because of running the tm callback that accesses the hentity which is already free'd
So I thought perhaps another approach is required and the tm callback should clean it, while I must say i do not 100% clear understand why this needs to be so complicated compared to other modules. Perhaps i have missed another bug and overcomplicated things because the uac_r seems to be cleaned just fine. The issue around the hentity does not seem to be solved in the latest releases while this is 5.3-something custom build.
Anyway my last resort that could help is convert to use tm to clear memory once it is done, in tm/uac.c : t_uac_prepare() is registering a callback with release function ``` if(uac_r->cb && insert_tmcb(&(new_cell->tmcb_hl), uac_r->cb_flags, *(uac_r->cb), uac_r->cbp, NULL)!=1){ ret=E_OUT_OF_MEM; LM_ERR("short of tmcb shmem\n"); goto error1; } int insert_tmcb(struct tmcb_head_list *cb_list, int types, transaction_cb f, void *param, release_tmcb_param rel_func) { ``` is there any example of using such a relfunc in the other modules? Or anyone else has see the pua causing a mem leak when it is subscribed to reginfo from the config? I would appreciate some ideas what could be the issue and the solution.
As far as I know, *reginfo* is not used (much, if at all) outside of ims world, so it might be related to *pua_reginfo*. Maybe you can look at pua_dialoginfo and see how it deals with TM callbacks.
Overall, I am not completely satisfied with the TM callbacks system, there are many events related to them, not always clear which and when are called, and what component is in charge of cleaning up, the one and engages them or tm on transaction destroy. It can be cases when the callbacks are engaged, but the transaction is not yet created. It seems to work well for commonly used modules, but it is not easy to track what happens always.
Most of the related code in tm is rather old, maybe more than 20 years, designed by developers no longer active it the project, but maybe during Kamailio development meeting planned for the upcoming Novemeber in Dusseldorf we will get enough interested people to dig in and do a review of this internal callbacks system for tm module.
Thank you Daniel!
On Tue, Jul 23, 2024 at 6:28 PM Daniel-Constantin Mierla via sr-dev < sr-dev@lists.kamailio.org> wrote:
As far as I know, *reginfo* is not used (much, if at all) outside of ims world, so it might be related to *pua_reginfo*. Maybe you can look at pua_dialoginfo and see how it deals with TM callbacks.
Overall, I am not completely satisfied with the TM callbacks system, there are many events related to them, not always clear which and when are called, and what component is in charge of cleaning up, the one and engages them or tm on transaction destroy. It can be cases when the callbacks are engaged, but the transaction is not yet created. It seems to work well for commonly used modules, but it is not easy to track what happens always.
Most of the related code in tm is rather old, maybe more than 20 years, designed by developers no longer active it the project, but maybe during Kamailio development meeting planned for the upcoming Novemeber in Dusseldorf we will get enough interested people to dig in and do a review of this internal callbacks system for tm module.
— Reply to this email directly, view it on GitHub https://github.com/kamailio/kamailio/issues/3928#issuecomment-2245681533, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABO7UZMQDY2WTFNFRNCEOTLZNZ7KXAVCNFSM6AAAAABLIU2FCOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBVGY4DCNJTGM . You are receiving this because you are subscribed to this thread.Message ID: kamailio/kamailio/issues/3928/2245681533@github.com _______________________________________________ Kamailio (SER) - Development Mailing List To unsubscribe send an email to sr-dev-leave@lists.kamailio.org
I think the pua is cleaning the memory in normal scenario, there has been some other passing issue such as innodb locks that caused some transactions to get stuck. Sorry for confusion. I understood the point about the tm callbacks. I am still investigating some mem leak in usrloc for XAVP when we "ucontact.c", new_ucontact, shm_str_dup and xavp_new_value were quite hungry as seen in the gdb. There is something else in the usrloc module where we have seen when using DB-only mode with xavp_contact parameter, the memory allocated for the xavp and temporary contact may be not released. @miconda ``` b642263a083 src/modules/usrloc/udomain.c (Daniel-Constantin Mierla 2019-07-29 10:54:41 +0200 1084) ruid.len = 0; b642263a083 src/modules/usrloc/udomain.c (Daniel-Constantin Mierla 2019-07-29 10:54:41 +0200 1085) if(c->ruid.len > 0 && ul_xavp_contact_name.s != NULL) { b642263a083 src/modules/usrloc/udomain.c (Daniel-Constantin Mierla 2019-07-29 10:54:41 +0200 1086) /* clone ruid to delete attributes out of lock */ b642263a083 src/modules/usrloc/udomain.c (Daniel-Constantin Mierla 2019-07-29 10:54:41 +0200 1087) if(c->ruid.len < RUIDBUF_SIZE - 2) { b642263a083 src/modules/usrloc/udomain.c (Daniel-Constantin Mierla 2019-07-29 10:54:41 +0200 1088) memcpy(ruidbuf, c->ruid.s, c->ruid.len); b642263a083 src/modules/usrloc/udomain.c (Daniel-Constantin Mierla 2019-07-29 10:54:41 +0200 1089) ruidbuf[c->ruid.len] = '\0'; b642263a083 src/modules/usrloc/udomain.c (Daniel-Constantin Mierla 2019-07-29 10:54:41 +0200 1090) ruid.s = ruidbuf; b642263a083 src/modules/usrloc/udomain.c (Daniel-Constantin Mierla 2019-07-29 10:54:41 +0200 1091) ruid.len = c->ruid.len; b642263a083 src/modules/usrloc/udomain.c (Daniel-Constantin Mierla 2019-07-29 10:54:41 +0200 1092) } else { 12dbf48d421 src/modules/usrloc/udomain.c (Daniel-Constantin Mierla 2019-07-30 16:58:45 +0200 1093) LM_ERR("ruid is too long %d for %.*s\n", c->ruid.len, f3d00c91a79 src/modules/usrloc/udomain.c (Victor Seva 2023-02-09 11:45:29 +0100 1094) aor.len, aor.s); b642263a083 src/modules/usrloc/udomain.c (Daniel-Constantin Mierla 2019-07-29 10:54:41 +0200 1095) } b642263a083 src/modules/usrloc/udomain.c (Daniel-Constantin Mierla 2019-07-29 10:54:41 +0200 1096) } 00000000000 src/modules/usrloc/udomain.c (Not Committed Yet 2024-07-25 13:42:15 +0200 1097) mem_delete_ucontact(&r, c); f3cf0a2f4e6 src/modules/usrloc/udomain.c (Victor Seva 2019-10-03 14:21:35 +0200 1098) release_urecord(&r); f3d00c91a79 src/modules/usrloc/udomain.c (Victor Seva 2023-02-09 11:45:29 +0100 1099) unlock_udomain(_d, &aor); b642263a083 src/modules/usrloc/udomain.c (Daniel-Constantin Mierla 2019-07-29 10:54:41 +0200 1100) if(ruid.len > 0 && ul_xavp_contact_name.s != NULL) { b642263a083 src/modules/usrloc/udomain.c (Daniel-Constantin Mierla 2019-07-29 10:54:41 +0200 1101) /* delete attributes by ruid */ b642263a083 src/modules/usrloc/udomain.c (Daniel-Constantin Mierla 2019-07-29 10:54:41 +0200 1102) uldb_delete_attrs_ruid(_d->name, &ruid); b642263a083 src/modules/usrloc/udomain.c (Daniel-Constantin Mierla 2019-07-29 10:54:41 +0200 1103) ``` maybe we need this line to engage mem_delete_ucontact(). The delete_ucontact also in DB-only mode calls mem_delete_ucontact -> free_ucontact but in timer function for DB-only this is not getting done. I will try to do this locally and stress test it.
Which version of Kamailio are you testing? I didn't looked into the details, but in the last year there were some changes and fixes done for usrloc in the area of the different storage modes.
@henningw it is a fork from pre-5.4 Henning, actually I am going to test the latest release because this is a separate kamailio registrar that is not dependable on the IMS modules for which we maintain a lot of customizations. I have not been yet able to test the mem_delete_ucontact() line addition and have mixed feelings in that regard. It seems as long as the DB is functioning properly, the memory usage is constant (measured by the prometheus scraping and the `mod.stats all shm` but as soon as the DB is congested under load it is piling up memory for the xavp/ucontact. Strange
Ok, this is obviously a quite old code base. Please report back the test result from 5.7.x or 5.8.x.
Closed #3928 as completed.
Will do, thanks again.