On Freitag, 6. November 2009, Robin Vleij wrote:
Since 1.3.0 (now running 1.4.4) I'm seeing a very slow uptake of SHM memory on our low traffic setup (less than 5 cps per machine). I'm looking for some basis to go further on in my research to the cause. :)
I compiled Kamailio 1.4.4-notls with #define SHM_MEM_SIZE 4*32 in config.h in production. For my testing setup I'm running on the standard 32 there.
After about 3 weeks uptime I start top, sort on memory size and find the kamailio processes (I'm running with 16 children) to all have about 40mb in the SHR column. My understanding is that this should also go down, but it only goes up, slowly. More CPS (for example a benchmark using sipp) makes this go up faster, but it never seems to go down this figure. I think this is wrong, but I could be wrong myself. :)
On a seperate machine with no traffic I compiled the memory debugging according to the "memory troubleshooting" page on the wiki. LOTS of info in the logs. Also ran with valgrind, didn't find anything interesting (but I'm no dev myself really).
My plan now is to take away our acc module (compiled with radius support) and see if it's maybe that module that's causing this. My test on this traffic-less machine is as follows: start, run 20cps for a while (we do no registers, just routing and auth) and note the SHR data from top. Then according to my understanding this figure should drop down after a period of 20 minutes with no traffic. Is this a right assumption? [..] As far as I know, it never goes down, the SHR entries. When running with very little SHM i config.h, the process goes out of shm memory and complains, as expected.
Are my assumptions about all of this correct?
Hello Robin,
do you experience any problems in your setup when you use a reasonable SHM mem size? In my experience the size of the SHM memory (as displayed from top) depends on the load of the machine. But there is a certain level of shared memory that is used regardless of the load. Even if the machine has been completely passive over a longer time, it will not reclaim this memory. On a certain test system for example there is one process that has 11MB SHM at the moment, even if its completely idle.
For the VIRT column (again top) its another story, here it will just show something like SHM + PKG memory size, regardless of the actual load.
If you've a real memory leak in shared memory then after a certain time interval the server will report memory allocation errors. Otherwise i don't think its something to worry about.
Regards,
Henning