Hello,

 

 

A follow-up on this (a bit old) topic, as I’m still having issues with shared memory…

 

 

To recap what I’m doing:

Each day, using a custom Kamailio module, I’m loading a lot of data in Kamailio shared memory. After the new data is loaded, the old data (which is no longer used) is freed.

When the old data is being freed, I observe that there is a very big performance impact on all SIP requests that are handled at this moment (with response times more than 500 ms, which triggers a fallback on another Kamailio server).

 

At first, this is relatively minor because the “free” time is very short (2 to 5 seconds).

But after about 30 days, it starts  increasing very dramatically, up to 5 minutes in the span of the last 7 days (after which we restarted all the Kamailio servers).

 

I can’t understand why it works perfectly fine for some time, and then not at all :/

 

 

I noticed two things, that I would like your opinion on:

 

 

1)

First, I’ve looked at the memory allocation executed when handling a SIP request by Kamailio.

Using “memlog” to trace all calls, then I checked in the source code to see if the memory involved was “pkg” or “shm”.

 

I noticed that all of the allocations use “pkg” (e.g. “get_hdr_field”, “parse_contacts”, “set_var_value”, etc – 38 different function calls in total)…

Except for “xavp_new_value” which uses “shm”.

Why is that ?

 

Since there is a locking mechanism involved when allocating / freeing memory, and the lock for shared memory is global, I think this could explain the performances issues I’m having when freeing the memory in my custom module.

 

I’m wondering if I could use xavp/xavu in “pkg” memory instead of “shm” ?

 

 

2)

Second, I checked the Kamailio shared memory fragmentation before and after the restart (using RPC command “core.shmmem”).

Before the Kamailio restart, the value seems very large: 1416490 (1.4 million fragments !).

After the restart (and initial loading of the data), it’s only about 8000.

(I have Kamailio configured with “mem_join=1”, which if I understand correctly tells Kamailio to try to join fragments upon free if possible; but in all cases, defragmentation is handled when allocating if necessary)

 

Maybe there is an issue with memory defragmentation ?

 

I did not monitor the evolution over time so I cannot say if it started increasing along with the performance issues… I’ll monitor this closely from now on)

 

 

 

Thanks in advance for any help!

 

 

 

Regards,

Nicolas.

 

De : Chaigneau, Nicolas <nicolas.chaigneau@capgemini.com>
Envoyé : mardi 24 janvier 2023 10:36
À : Henning Westerholt; sr-dev@lists.kamailio.org
Objet : [sr-dev] Re: issues when freeing shared memory in custom module (Kamailio 5.5.2) - available shared memory (follow up), and crash with tlsf

 

Hello Henning,

 

 

It seems you’re right. :)

I did tests over the last four days with allocation/release cycles performed every five minutes.

 

During the first two days, the available memory fluctuates, and drops significantly three times.

But after that, it goes up again two times.

 

So it looks like there is no issue in the long term.

I just have to ensure the shared memory size is configured with this is mind.

 

Thanks again for your help !

 

 

 

 

Now about tlsf, maybe there is an issue…

I can reproduce the crash when starting Kamailio with « -x tlsf », and it does not look like the issue is with my code…

Here is the trace in debug before the segfault :

 

0(82419) INFO: <core> [main.c:2139]: main(): private (per process) memory: 8388608 bytes

0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1232]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 1024) called from core: core/str_hash.h: str_hash_alloc(59)

0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1234]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 1024) returns address 0x7f8983b32110

0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1232]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 256) called from core: core/str_hash.h: str_hash_alloc(59)

0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1234]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 256) returns address 0x7f8983b32518

0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1232]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 512) called from core: core/counters.c: init_counters(117)

0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1234]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 512) returns address 0x7f8983b32620

0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1232]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 128) called from core: core/counters.c: init_counters(125)

0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1234]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 128) returns address 0x7f8983b32828

0(82419) DEBUG: <core> [core/cfg.lex:1964]: pp_define(): defining id: KAMAILIO_5

0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1232]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 11) called from core: core/cfg.lex: pp_define(1995)

0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1234]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 11) returns address 0x7f8983b328b0

0(82419) DEBUG: <core> [core/cfg.lex:1964]: pp_define(): defining id: KAMAILIO_5_5

0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1232]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 13) called from core: core/cfg.lex: pp_define(1995)

0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1234]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 13) returns address 0x7f8983b328f0

0(82419) DEBUG: <core> [core/cfg.lex:1964]: pp_define(): defining id: KAMAILIO_5_5_2

0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1232]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 15) called from core: core/cfg.lex: pp_define(1995)

0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1234]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 15) returns address 0x7f8983b32930

0(82419) INFO: <core> [main.c:2198]: main(): shared memory: 268435456 bytes

0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1232]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 16) called from core: core/route.c: init_rlist(146)

0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1234]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 16) returns address 0x7f8983b32970

0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1232]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 128) called from core: core/str_hash.h: str_hash_alloc(59)

0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1234]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 128) returns address 0x7f8983b329b0

0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1232]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 48) called from core: core/route.c: route_add(124)

0(82419) DEBUG: <core> [core/mem/tlsf_malloc.c:1234]: tlsf_malloc(): tlsf_malloc(0x7f8983b30010, 48) returns address 0x7f8983b32a38

0(82419) DEBUG: <core> [core/route.c:129]: route_add(): mapping routing block (0x9bf900)[0] to 0

Segmentation fault

 

 

 

Regards,

Nicolas.

 

De : Henning Westerholt <hw@gilawa.com>
Envoyé : lundi 23 janvier 2023 11:01
À : Chaigneau, Nicolas; sr-dev@lists.kamailio.org
Objet : RE: issues when freeing shared memory in custom module (Kamailio 5.5.2) - available shared memory not recovered

 

Hello,

 

I don’t think there is a generic issue with the order of operation of first allocating the new memory, and then freeing the old.

This patter is also used e.g. from modules like carrierroute, for a routing data reload.

 

The issues in reporting statistics might be related to memory fragementation/defragmentation. If you free first and then allocate, the memory manager will probably hand out you the same memory block again. If the opposite order, it needs to allocate new memory blocks.

 

Maybe you can execute the load/reload function several times just as experiment, as it should even out after a few tries.

 

Cheers,

 

Henning

 

--

Henning Westerholt – https://skalatan.de/blog/

Kamailio services – https://gilawa.com

 

From: Chaigneau, Nicolas <nicolas.chaigneau@capgemini.com>
Sent: Monday, January 23, 2023 10:37 AM
To: Henning Westerholt <hw@gilawa.com>; sr-dev@lists.kamailio.org; Kamailio (SER) - Users Mailing List <sr-users@lists.kamailio.org>
Subject: RE: issues when freeing shared memory in custom module (Kamailio 5.5.2) - available shared memory not recovered

 

Hello,

 

 

I’ve pushed my investigations further, and now I understand a bit better what’s going on.

The following applies to the two memory managers « qm » and « fm ».

 

In its simplest form, what I’m doing is the following :

 

alloc_new_data();

free_old_data();

 

After this, I’m looking at :

1)      Kamailio available shared memory (as returned by function « shm_available »).

2)      My module shared memory usage (as shown by command « kamcmd mod.stats my_module shm »).

 

What I’m observing is that Kamailio available shared memory is steadily decreasing (but not systematically after each execution), and that my module shared memory usage is, conversely, steadily increasing.

(My fear is, of course, that at some point the allocation will fail because the available shared memory would be exhausted.)

 

I notice from the reports of « mod.stats » that Kamailio seems to keep track of the exact function and line number where an allocation occured.

Maybe, as long as such a reference exists, the shared memory is not properly recovered ? (even though it is properly freed using shm_free).

 

To test this theory, I temporarily changed the code to :

 

free_old_data();

alloc_new_data();

 

With this all my issues disappear. The available shared memory is stable, as well as the module shared memory usage reported.

 

This is really weird. Is it how Kamailio shared memory is supposed to work ?

How could I solve this issue ?

 

 

Regards,

Nicolas.

 

De : Chaigneau, Nicolas
Envoyé : vendredi 20 janvier 2023 15:28
À : Henning Westerholt; sr-dev@lists.kamailio.org
Cc : Kamailio (SER) - Users Mailing List
Objet : RE: issues when freeing shared memory in custom module (Kamailio 5.5.2) - available shared memory

 

Hello Henning,

 

 

 

Thanks for your help. :)

I’m coming with an update, and yet more questions.

 

 

 

First, I tried using « fm » instead of « qm » on real data.

The results are impressive :

-          Allocation time is reduced from 85 s to 49 s

-          Free time is reduced from 77 s to about 2 s

-          And I do not notice SIP high response times when freeing

 

The time difference when freeing is huge. I’m surprised that this is so much faster than « qm », this is just because we don’t have the same debugging information ?

 

 

 

Now, another issue I’m looking into (possible memory leak ?).

This happens with both memory managers, « qm » and « fm ».

 

I’m using « shm_available » function from Kamailio to keep track of the remaining available memory in the shared memory pool.

I’ve noticed something weird. At first I thought that I had a memory leak in my code, but I’m not so sure anymore…

 

Each time I reload the (same) data (through a RPC command), the value of shm_available is decreasing.

This happens if I load new data before freeing the old data.

However, if I first free the existing data, then load the new data, the memory available shown by shm_available seems to be properly « reset ».

 

For example :

 

Remaining memory available: 758960224  # <- allocate new

Remaining memory available: 756141328  # <- allocate new, then free old

Remaining memory available: 752037032  # <- allocate new, then free old

Remaining memory available: 749523176  # <- allocate new, then free old

 

Remaining memory available: 1073094936  # <- free

Remaining memory available: 758958544  # <- allocate new

Remaining memory available: 756143304  # <- allocate new, then free old

Remaining memory available: 752067480  # <- allocate new, then free old

Remaining memory available: 749532680  # <- allocate new, then free old

 

And so on…

This is for the same exact data used each time.

 

 

I’ve also tried to use the following command to track memory :

 

kamcmd mod.stats my_module shm

 

The results seem consistent with what shm_available reports : the memory used seem to increase for each allocation being tracked, even though the memory is properly freed (or should be : shm_free is called as needed).

Apparently the values are only reset when the free is performed before the new allocation.

 

 

It is as if the memory being tracked is not properly « cleaned up » until everything has been freed…

 

I’m not sure what this entails : is the memory really not properly released ? or is it just a reporting issue ?

 

 

 

 

One more thing, I think there might be a bug with the command « kamcmd mod.stats my_module shm » : it can display negative values.

Maybe there’s an integer overflow ?

 

 

 

 

Regards,

Nicolas.

 

De : Henning Westerholt <hw@gilawa.com>
Envoyé : jeudi 19 janvier 2023 15:43
À : Chaigneau, Nicolas; sr-dev@lists.kamailio.org
Cc : Kamailio (SER) - Users Mailing List
Objet : RE: Performances issue when freeing shared memory in custom module (Kamailio 5.5.2)

 

Hello Nicolas,

 

some people are using the TLSF memory manager, so it should certainly not crash. Maybe you could create an issue about it if you got a backtrace and its not related to your (custom) module.

 

The QM memory manager is providing more debugging information and can be also used to find memory leaks and such. Therefore, its enabled by default, as most people are not using huge data sets internally.

 

The FM memory manager is more lightweight, and in your scenario apparently significant faster. Let us know if it’s also working fine in the production setup.

 

Cheers,

 

Henning

 

--

Henning Westerholt – https://skalatan.de/blog/

Kamailio services – https://gilawa.com

 

From: Chaigneau, Nicolas <nicolas.chaigneau@capgemini.com>
Sent: Thursday, January 19, 2023 12:47 PM
To: Henning Westerholt <hw@gilawa.com>; Kamailio (SER) - Users Mailing List <sr-users@lists.kamailio.org>
Cc: sr-dev@lists.kamailio.org
Subject: RE: Performances issue when freeing shared memory in custom module (Kamailio 5.5.2)

 

[mail resent because I was not subscribed to sr-dev – sorry for the duplicate]

 

 

Hello Henning,

 

 

Thank you for your quick response !

 

 

I do not have any error messages.

 

Shared memory allocation and freeing is done exclusively by the RPC process.

The workers only read that memory (and only the memory that is *not* being allocated or freed by the RPC process).

 

 

I’ve looked at the different shared memory managers as you suggested.

First, « tlsf » does not work : Kamailio crashes on startup with « -x tlsf ».

 

A comparison of « qm » (default) and « fm » :

 

With « fm », the loading time is reduced by 25%.

The freeing is also much faster (maybe 4 times faster).

And I do not notice the performances issues (that I can reproduce when using « qm »).

But maybe this is because I do not have enough data on my test environment. I’ll have to test this with the real data.

 

But these first results with « fm » look promising ! :)

 

 

 

Could you maybe explain to me the main differences between the 3 shared memory managers ? and why is « qm » the default ?

Also, do you have an idea why « tlsf » makes Kamailio crash ? (does anyone use « tlsf » ?)

 

 

Thanks again.

 

 

Regards,

Nicolas.

 

De : Henning Westerholt <hw@gilawa.com>
Envoyé : jeudi 19 janvier 2023 08:28
À : Kamailio (SER) - Users Mailing List
Cc : Chaigneau, Nicolas; sr-dev@lists.kamailio.org
Objet : RE: Performances issue when freeing shared memory in custom module (Kamailio 5.5.2)

 

 

Hello,

 

(Adding sr-dev to CC)

 

This looks indeed a bit strange. Do you get any error messages in the log? In which process you are freeing the memory, one of the worker processes or the RPC process?

 

You could also try to use another memory manager to see if you get better performance. There is a command line parameter to choose one during startup.

 

Cheers,

 

Henning

 

--

Henning Westerholt – https://skalatan.de/blog/

Kamailio services – https://gilawa.com

 

From: Chaigneau, Nicolas <nicolas.chaigneau@capgemini.com>
Sent: Wednesday, January 18, 2023 6:49 PM
To: Kamailio (SER) - Users Mailing List <sr-users@lists.kamailio.org>
Subject: [SR-Users] Performances issue when freeing shared memory in custom module (Kamailio 5.5.2)

 

Hello,

 

 

I'm encountering performance issues with Kamailio (5.5.2).

 

I’m using a custom Kamailio module that loads routing data in memory, using Kamailio shared memory.

This routing data is very large. It can be fully reloaded through a Kamailio RPC command (which is done once each day).

 

When reloading, two sets of data are maintained, one "loading" and another "current" (the latter being used to handle SIP requests).

When loading of the new data is finished, it is swapped to "current". Then, memory of the old (now unused) data is freed.

 

I've noticed that when Kamailio is freeing the old data, there is a very significant performance impact on SIP requests.

This is surprising to me, because the SIP requests do not use this old data.

This is not a CPU issue, idle CPU% is at about 99% at that moment.

 

I'm using the following functions :

- shm_mallocxz

- shm_free

 

From what I understand, shm_free is actually "qm_shm_free" defined in "src\core\mem\q_malloc.c" (the default shared memory manager being "qm").

I've noticed that there is also a variant shm_free_unsafe ("qm_free"), which does not perform locking.

 

I'm wondering if the lock could be the cause of my performances issues ?

(But I'm not sure how this could be possible, because although the SIP requests need to access the shared memory allocated, they do not use directly the functions from the share memory manager.)

 

If the performances issues are causes by the lock, could I use the unsafe version "safely" ? (considering that it is guaranteed that the old data cannot be used by anyone else)

 

 

 

 

Thanks for your help.

 

 

Regards,

Nicolas.

 

This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.