Memory Leak on DB Errors?

List overview All Threads
Download

newer

older

siremis: login error / http error...

current shm memory status

Klaus Darilion

4 Oct 2011 4 Oct '11

10:08 a.m.

Hi!

I recently had a problem with Kamailio 3.1.4 (provided Debian packages):

I had some DB problems (missing tables). Thus, the timer module failed to insert the statistics (for siremis):

ERROR: db_mysql [km_dbase.c:120]: driver error on query: Table 'kamailio.statistics_tmx' doesn't exist ERROR: <core> [db_query.c:130]: error while submitting query ERROR: sqlops [sql_api.c:217]: cannot do the query

This happened for some time (weeks?), other DB queries were unaffected.

Then, suddenly Kamailio ran out of memory:

ERROR: <core> [sip_msg_clone.c:506]: ERROR: sip_msg_cloner: cannot allocate memory ERROR: tm [t_lookup.c:1338]: ERROR: new_t: out of mem: ERROR: tm [t_lookup.c:1478]: ERROR: t_newtran: new_t failed ERROR: sl [sl_funcs.c:282]: ERROR: sl_reply_error used: I'm terribly sorry, server error occurred (1/SL)

It do not think it is a load problem as the server is more or less idle. May it be a memory leak due to wrong error handling?

Where there any fixes recently?

Thanks Klaus

Show replies by date

Daniel-Constantin Mierla

4 Oct 4 Oct

10:24 a.m.

Hello,

sqlops is using pkg and tm shm, so they should not be directly related, but maybe in the way config file works.

Can you run it again with memlog lower than debug and see where the allocated (not-freed) chunks were done? It should appear soon, not waiting for out of mem message.

If you haven't restarted, there is a way to attach with gdb and walk through shm allocated chunks to spot the occurences.

Cheers, Daniel

On 10/4/11 12:08 PM, Klaus Darilion wrote:

...

Hi!

I recently had a problem with Kamailio 3.1.4 (provided Debian packages):

I had some DB problems (missing tables). Thus, the timer module failed to insert the statistics (for siremis):

ERROR: db_mysql [km_dbase.c:120]: driver error on query: Table 'kamailio.statistics_tmx' doesn't exist ERROR: <core> [db_query.c:130]: error while submitting query ERROR: sqlops [sql_api.c:217]: cannot do the query

This happened for some time (weeks?), other DB queries were unaffected.

Then, suddenly Kamailio ran out of memory:

ERROR: <core> [sip_msg_clone.c:506]: ERROR: sip_msg_cloner: cannot allocate memory ERROR: tm [t_lookup.c:1338]: ERROR: new_t: out of mem: ERROR: tm [t_lookup.c:1478]: ERROR: t_newtran: new_t failed ERROR: sl [sl_funcs.c:282]: ERROR: sl_reply_error used: I'm terribly sorry, server error occurred (1/SL)

It do not think it is a load problem as the server is more or less idle. May it be a memory leak due to wrong error handling?

Where there any fixes recently?

Thanks Klaus

SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users

-- Daniel-Constantin Mierla -- http://www.asipto.com Kamailio Advanced Training, Dec 5-8, Berlin: http://asipto.com/u/kat http://linkedin.com/in/miconda -- http://twitter.com/miconda

Klaus Darilion

10:27 a.m.

Meanwhile the server was restarted and the DB problems were fixed. As it is a production server I can not reproduce anymore.

Sorry Klaus

On 04.10.2011 12:24, Daniel-Constantin Mierla wrote:

...

Hello,

sqlops is using pkg and tm shm, so they should not be directly related, but maybe in the way config file works.

Can you run it again with memlog lower than debug and see where the allocated (not-freed) chunks were done? It should appear soon, not waiting for out of mem message.

If you haven't restarted, there is a way to attach with gdb and walk through shm allocated chunks to spot the occurences.

Cheers, Daniel

On 10/4/11 12:08 PM, Klaus Darilion wrote:

...
Hi!

I recently had a problem with Kamailio 3.1.4 (provided Debian packages):

I had some DB problems (missing tables). Thus, the timer module failed to insert the statistics (for siremis):

ERROR: db_mysql [km_dbase.c:120]: driver error on query: Table 'kamailio.statistics_tmx' doesn't exist ERROR: <core> [db_query.c:130]: error while submitting query ERROR: sqlops [sql_api.c:217]: cannot do the query

This happened for some time (weeks?), other DB queries were unaffected.

Then, suddenly Kamailio ran out of memory:

ERROR: <core> [sip_msg_clone.c:506]: ERROR: sip_msg_cloner: cannot allocate memory ERROR: tm [t_lookup.c:1338]: ERROR: new_t: out of mem: ERROR: tm [t_lookup.c:1478]: ERROR: t_newtran: new_t failed ERROR: sl [sl_funcs.c:282]: ERROR: sl_reply_error used: I'm terribly sorry, server error occurred (1/SL)

It do not think it is a load problem as the server is more or less idle. May it be a memory leak due to wrong error handling?

Where there any fixes recently?

Thanks Klaus

SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users

Daniel-Constantin Mierla

12:03 p.m.

Hello,

On 10/4/11 12:27 PM, Klaus Darilion wrote:

...

Meanwhile the server was restarted and the DB problems were fixed. As it is a production server I can not reproduce anymore.

So, once it started it didn't recovered, continued always with that error? How much of shm did you configure?

You can try to attach from time to time to one process (can be even the main one to avoid blocking a sip worker) and walk through the shm allocated chunks, in order to see if there are some unexpected repetitions of allocation from same place in sources.

I posted the gdb script for walking through pkg at some point, the difference will be to start from the head of shm list (i.e., starting with shm_block->first_frag instead of mem_block->first_frag):

http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:memory#walking_thr...

You should go as much as possible to the end of the allocated list.

Another option is to add shm_status() (see cfgutils module) in your config, executed on a special request you can send with sipsak/udp_flood/sipp . There are other options if you load the cfg_rpc module and send some rpc commands with sercmd.

Cheers, Daniel

...

Sorry Klaus

On 04.10.2011 12:24, Daniel-Constantin Mierla wrote:

...
Hello,

sqlops is using pkg and tm shm, so they should not be directly related, but maybe in the way config file works.

Can you run it again with memlog lower than debug and see where the allocated (not-freed) chunks were done? It should appear soon, not waiting for out of mem message.

If you haven't restarted, there is a way to attach with gdb and walk through shm allocated chunks to spot the occurences.

Cheers, Daniel

On 10/4/11 12:08 PM, Klaus Darilion wrote:

...
Hi!

I recently had a problem with Kamailio 3.1.4 (provided Debian packages):

I had some DB problems (missing tables). Thus, the timer module failed to insert the statistics (for siremis):

ERROR: db_mysql [km_dbase.c:120]: driver error on query: Table 'kamailio.statistics_tmx' doesn't exist ERROR: <core> [db_query.c:130]: error while submitting query ERROR: sqlops [sql_api.c:217]: cannot do the query

This happened for some time (weeks?), other DB queries were unaffected.

Then, suddenly Kamailio ran out of memory:

ERROR: <core> [sip_msg_clone.c:506]: ERROR: sip_msg_cloner: cannot allocate memory ERROR: tm [t_lookup.c:1338]: ERROR: new_t: out of mem: ERROR: tm [t_lookup.c:1478]: ERROR: t_newtran: new_t failed ERROR: sl [sl_funcs.c:282]: ERROR: sl_reply_error used: I'm terribly sorry, server error occurred (1/SL)

It do not think it is a load problem as the server is more or less idle. May it be a memory leak due to wrong error handling?

Where there any fixes recently?

Thanks Klaus

SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users

SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users

-- Daniel-Constantin Mierla -- http://www.asipto.com Kamailio Advanced Training, Dec 5-8, Berlin: http://asipto.com/u/kat http://linkedin.com/in/miconda -- http://twitter.com/miconda

Klaus Darilion

5 Oct 5 Oct

9:18 a.m.

On 04.10.2011 14:03, Daniel-Constantin Mierla wrote:

...

Hello,

On 10/4/11 12:27 PM, Klaus Darilion wrote:

...
Meanwhile the server was restarted and the DB problems were fixed. As it is a production server I can not reproduce anymore.

So, once it started it didn't recovered, continued always with that error? How much of shm did you configure?

You can try to attach from time to time to one process (can be even the main one to avoid blocking a sip worker) and walk through the shm allocated chunks, in order to see if there are some unexpected repetitions of allocation from same place in sources.

I posted the gdb script for walking through pkg at some point, the difference will be to start from the head of shm list (i.e., starting with shm_block->first_frag instead of mem_block->first_frag):

http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:memory#walking_thr...

Hi Daniel!

After reading this wiki page I came to the conclusion that for further debugging I have to recompile Kamailio (using DBG_QM_MALLOC memory manager instead of F_MALLOC). With the default memory manager it is not possible to debug the problem. Is it correct?

regards Klaus

Daniel-Constantin Mierla

6 Oct 6 Oct

11:07 a.m.

Hello,

On 10/5/11 11:18 AM, Klaus Darilion wrote:

...

On 04.10.2011 14:03, Daniel-Constantin Mierla wrote:

...
Hello,

On 10/4/11 12:27 PM, Klaus Darilion wrote:

...
Meanwhile the server was restarted and the DB problems were fixed. As it is a production server I can not reproduce anymore.

So, once it started it didn't recovered, continued always with that error? How much of shm did you configure?

You can try to attach from time to time to one process (can be even the main one to avoid blocking a sip worker) and walk through the shm allocated chunks, in order to see if there are some unexpected repetitions of allocation from same place in sources.

I posted the gdb script for walking through pkg at some point, the difference will be to start from the head of shm list (i.e., starting with shm_block->first_frag instead of mem_block->first_frag):

http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:memory#walking_thr...

Hi Daniel!

After reading this wiki page I came to the conclusion that for further debugging I have to recompile Kamailio (using DBG_QM_MALLOC memory manager instead of F_MALLOC). With the default memory manager it is not possible to debug the problem. Is it correct?

in 3.1 malloc debug was left on (with the goal of catching buffer overflows quickly after several years of development of no using this flag in production), so unless you switched if off, you should get the reports. you can check in the output of kamailio -V

Cheers, Daniel

-- Daniel-Constantin Mierla -- http://www.asipto.com Kamailio Advanced Training, Dec 5-8, Berlin: http://asipto.com/u/kat http://linkedin.com/in/miconda -- http://twitter.com/miconda

Klaus Darilion

3:31 p.m.

Indeed, DBG_QM_MALLOC is defined. So I have set memlog=1 and dumped mem_info with: sercmd cfg.set_now_int core mem_dump_pkg 13286 sercmd cfg.set_now_int core mem_dump_shm 13286

The dumps were done after ~1h uptime. I can not offload the traffic and wait until transactions are freed, thus the logs are quite huge (~15MByte)

http://pernau.at/kd/memlog.zip

I have no idea for what I should look for - any hints how to analyze the mem_dump?

Thanks Klaus

On 06.10.2011 13:07, Daniel-Constantin Mierla wrote:

...

Hello,

On 10/5/11 11:18 AM, Klaus Darilion wrote:

...
On 04.10.2011 14:03, Daniel-Constantin Mierla wrote:

...
Hello,

On 10/4/11 12:27 PM, Klaus Darilion wrote:

...
Meanwhile the server was restarted and the DB problems were fixed. As it is a production server I can not reproduce anymore.

So, once it started it didn't recovered, continued always with that error? How much of shm did you configure?

You can try to attach from time to time to one process (can be even the main one to avoid blocking a sip worker) and walk through the shm allocated chunks, in order to see if there are some unexpected repetitions of allocation from same place in sources.

I posted the gdb script for walking through pkg at some point, the difference will be to start from the head of shm list (i.e., starting with shm_block->first_frag instead of mem_block->first_frag):

http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:memory#walking_thr...

Hi Daniel!

After reading this wiki page I came to the conclusion that for further debugging I have to recompile Kamailio (using DBG_QM_MALLOC memory manager instead of F_MALLOC). With the default memory manager it is not possible to debug the problem. Is it correct?

in 3.1 malloc debug was left on (with the goal of catching buffer overflows quickly after several years of development of no using this flag in production), so unless you switched if off, you should get the reports. you can check in the output of kamailio -V

Cheers, Daniel

Daniel-Constantin Mierla

4:03 p.m.

Hello,

seem the leak is in snmpstats, I see lot of allocations like:

ALERT: qm_status: 37599. N address=0xf30cdf74 frag=0xf30cdf5c size=20 used=1 ALERT: qm_status: alloc'd from snmpstats: interprocess_buffer.c: handleContactCallbacks(143) ALERT: qm_status: start check=f0f0f0f0, end check= c0c0c0c0, abcdefed ALERT: qm_status: 37600. N address=0xf30cdfb8 frag=0xf30cdfa0 size=16 used=1 ALERT: qm_status: alloc'd from snmpstats: utilities.c: convertStrToCharString(62) ALERT: qm_status: start check=f0f0f0f0, end check= c0c0c0c0, abcdefed

There are some from usrloc, but very likely they are ok, because they are persistent in shm for long time, unless snmpstats asks for some clones of the structures from usrloc and forgets to free them (i see one allocation is from handleContactCallbacks).

No time to look in the sources, but this is a lead to follow if you want to investigate further.

In general, fr a memleak you have to look at allocated chunks that are done from same place in the code and there are many of them. The decide whether it is something that should be there for long time (like usrloc records) or they should be freed quicker comparing with the number of allocations.

Pkg log looks very clean, allocations only from startup time (maybe is the main process).

Cheers, Daniel

On 10/6/11 5:31 PM, Klaus Darilion wrote:

...

Indeed, DBG_QM_MALLOC is defined. So I have set memlog=1 and dumped mem_info with: sercmd cfg.set_now_int core mem_dump_pkg 13286 sercmd cfg.set_now_int core mem_dump_shm 13286

The dumps were done after ~1h uptime. I can not offload the traffic and wait until transactions are freed, thus the logs are quite huge (~15MByte)

http://pernau.at/kd/memlog.zip

I have no idea for what I should look for - any hints how to analyze the mem_dump?

Thanks Klaus

On 06.10.2011 13:07, Daniel-Constantin Mierla wrote:

...
Hello,

On 10/5/11 11:18 AM, Klaus Darilion wrote:

...
On 04.10.2011 14:03, Daniel-Constantin Mierla wrote:

...
Hello,

On 10/4/11 12:27 PM, Klaus Darilion wrote:

...
Meanwhile the server was restarted and the DB problems were fixed. As it is a production server I can not reproduce anymore.

So, once it started it didn't recovered, continued always with that error? How much of shm did you configure?

You can try to attach from time to time to one process (can be even the main one to avoid blocking a sip worker) and walk through the shm allocated chunks, in order to see if there are some unexpected repetitions of allocation from same place in sources.

I posted the gdb script for walking through pkg at some point, the difference will be to start from the head of shm list (i.e., starting with shm_block->first_frag instead of mem_block->first_frag):

http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:memory#walking_thr...

Hi Daniel!

After reading this wiki page I came to the conclusion that for further debugging I have to recompile Kamailio (using DBG_QM_MALLOC memory manager instead of F_MALLOC). With the default memory manager it is not possible to debug the problem. Is it correct?

in 3.1 malloc debug was left on (with the goal of catching buffer overflows quickly after several years of development of no using this flag in production), so unless you switched if off, you should get the reports. you can check in the output of kamailio -V

Cheers, Daniel

-- Daniel-Constantin Mierla -- http://www.asipto.com Kamailio Advanced Training, Dec 5-8, Berlin: http://asipto.com/u/kat http://linkedin.com/in/miconda -- http://twitter.com/miconda

Daniel-Constantin Mierla

17 Oct 17 Oct

7:33 a.m.

Hi Klaus,

over the weekend I looked a bit at snmpstats module. These allocated chunks are for exporting location records. Are you pulling them over snmp? At the first sight, there should be a free of the memory when the records are consumed.

The fact is that they are not pulled from usrloc module at the time of the request over snmp, but cached in snmp when registration happens. Practically, it is a partial clone of usrloc commands, which is not the best solution IMO, but I am not the developer. For the moment, I added a parameter to control whether the location records should be cached by snmpstats module or not (if not, they cannot be exported), to fix this issue. If you actually pull the location records over snmp, let me know.

I could not test, but if you can give a try (maybe you have a testbed for 3.2 with snmpstats) and see if the memory is steady with export_registrar set to 0 (which is default):

http://kamailio.org/docs/modules/devel/modules_k/snmpstats.html#id2539456

Cheers, Daniel

On 10/6/11 6:03 PM, Daniel-Constantin Mierla wrote:

...

Hello,

seem the leak is in snmpstats, I see lot of allocations like:

ALERT: qm_status: 37599. N address=0xf30cdf74 frag=0xf30cdf5c size=20 used=1 ALERT: qm_status: alloc'd from snmpstats: interprocess_buffer.c: handleContactCallbacks(143) ALERT: qm_status: start check=f0f0f0f0, end check= c0c0c0c0, abcdefed ALERT: qm_status: 37600. N address=0xf30cdfb8 frag=0xf30cdfa0 size=16 used=1 ALERT: qm_status: alloc'd from snmpstats: utilities.c: convertStrToCharString(62) ALERT: qm_status: start check=f0f0f0f0, end check= c0c0c0c0, abcdefed

There are some from usrloc, but very likely they are ok, because they are persistent in shm for long time, unless snmpstats asks for some clones of the structures from usrloc and forgets to free them (i see one allocation is from handleContactCallbacks).

No time to look in the sources, but this is a lead to follow if you want to investigate further.

In general, fr a memleak you have to look at allocated chunks that are done from same place in the code and there are many of them. The decide whether it is something that should be there for long time (like usrloc records) or they should be freed quicker comparing with the number of allocations.

Pkg log looks very clean, allocations only from startup time (maybe is the main process).

Cheers, Daniel

On 10/6/11 5:31 PM, Klaus Darilion wrote:

...
Indeed, DBG_QM_MALLOC is defined. So I have set memlog=1 and dumped mem_info with: sercmd cfg.set_now_int core mem_dump_pkg 13286 sercmd cfg.set_now_int core mem_dump_shm 13286

The dumps were done after ~1h uptime. I can not offload the traffic and wait until transactions are freed, thus the logs are quite huge (~15MByte)

http://pernau.at/kd/memlog.zip

I have no idea for what I should look for - any hints how to analyze the mem_dump?

Thanks Klaus

On 06.10.2011 13:07, Daniel-Constantin Mierla wrote:

...
Hello,

On 10/5/11 11:18 AM, Klaus Darilion wrote:

...
On 04.10.2011 14:03, Daniel-Constantin Mierla wrote:

...
Hello,

On 10/4/11 12:27 PM, Klaus Darilion wrote:

...
Meanwhile the server was restarted and the DB problems were fixed. As it is a production server I can not reproduce anymore.

So, once it started it didn't recovered, continued always with that error? How much of shm did you configure?

You can try to attach from time to time to one process (can be even the main one to avoid blocking a sip worker) and walk through the shm allocated chunks, in order to see if there are some unexpected repetitions of allocation from same place in sources.

I posted the gdb script for walking through pkg at some point, the difference will be to start from the head of shm list (i.e., starting with shm_block->first_frag instead of mem_block->first_frag):

http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:memory#walking_thr...

Hi Daniel!

After reading this wiki page I came to the conclusion that for further debugging I have to recompile Kamailio (using DBG_QM_MALLOC memory manager instead of F_MALLOC). With the default memory manager it is not possible to debug the problem. Is it correct?

in 3.1 malloc debug was left on (with the goal of catching buffer overflows quickly after several years of development of no using this flag in production), so unless you switched if off, you should get the reports. you can check in the output of kamailio -V

Cheers, Daniel

-- Daniel-Constantin Mierla -- http://www.asipto.com Kamailio Advanced Training, Dec 5-8, Berlin: http://asipto.com/u/kat http://linkedin.com/in/miconda -- http://twitter.com/miconda

Klaus Darilion

31 Oct 31 Oct

6:56 a.m.

Hi Daniel!

The "out of memory" happened again. This time they were able to dump the memory statistics before restarting the server.

There are almost no allocations from other modules, but lot of allocations from usrloc and snmpstats:

# grep 'd from usrloc' syslog_core_dump|wc -l 138083 # grep 'd from snmpstats' syslog_core_dump|wc -l 2837533

Thus, snmpstats seems guilty. What about usrloc? Around 2000 clients are registered to this Kamailio. I think 138.000 allocations for just 2000 clients is too much. Are those usrloc allocations related to the snmpstats problem you mentioned?

AFAIS, your patch was done before 3.2 branch, thus updating to 3.2 should fix the issue (as default=turned off), correct?

Thanks Klaus

On 17.10.2011 09:33, Daniel-Constantin Mierla wrote:

...

Hi Klaus,

over the weekend I looked a bit at snmpstats module. These allocated chunks are for exporting location records. Are you pulling them over snmp? At the first sight, there should be a free of the memory when the records are consumed.

The fact is that they are not pulled from usrloc module at the time of the request over snmp, but cached in snmp when registration happens. Practically, it is a partial clone of usrloc commands, which is not the best solution IMO, but I am not the developer. For the moment, I added a parameter to control whether the location records should be cached by snmpstats module or not (if not, they cannot be exported), to fix this issue. If you actually pull the location records over snmp, let me know.

I could not test, but if you can give a try (maybe you have a testbed for 3.2 with snmpstats) and see if the memory is steady with export_registrar set to 0 (which is default):

http://kamailio.org/docs/modules/devel/modules_k/snmpstats.html#id2539456

Cheers, Daniel

On 10/6/11 6:03 PM, Daniel-Constantin Mierla wrote:

...
Hello,

seem the leak is in snmpstats, I see lot of allocations like:

ALERT: qm_status: 37599. N address=0xf30cdf74 frag=0xf30cdf5c size=20 used=1 ALERT: qm_status: alloc'd from snmpstats: interprocess_buffer.c: handleContactCallbacks(143) ALERT: qm_status: start check=f0f0f0f0, end check= c0c0c0c0, abcdefed ALERT: qm_status: 37600. N address=0xf30cdfb8 frag=0xf30cdfa0 size=16 used=1 ALERT: qm_status: alloc'd from snmpstats: utilities.c: convertStrToCharString(62) ALERT: qm_status: start check=f0f0f0f0, end check= c0c0c0c0, abcdefed

There are some from usrloc, but very likely they are ok, because they are persistent in shm for long time, unless snmpstats asks for some clones of the structures from usrloc and forgets to free them (i see one allocation is from handleContactCallbacks).

No time to look in the sources, but this is a lead to follow if you want to investigate further.

In general, fr a memleak you have to look at allocated chunks that are done from same place in the code and there are many of them. The decide whether it is something that should be there for long time (like usrloc records) or they should be freed quicker comparing with the number of allocations.

Pkg log looks very clean, allocations only from startup time (maybe is the main process).

Cheers, Daniel

On 10/6/11 5:31 PM, Klaus Darilion wrote:

...
Indeed, DBG_QM_MALLOC is defined. So I have set memlog=1 and dumped mem_info with: sercmd cfg.set_now_int core mem_dump_pkg 13286 sercmd cfg.set_now_int core mem_dump_shm 13286

The dumps were done after ~1h uptime. I can not offload the traffic and wait until transactions are freed, thus the logs are quite huge (~15MByte)

http://pernau.at/kd/memlog.zip

I have no idea for what I should look for - any hints how to analyze the mem_dump?

Thanks Klaus

On 06.10.2011 13:07, Daniel-Constantin Mierla wrote:

...
Hello,

On 10/5/11 11:18 AM, Klaus Darilion wrote:

...
On 04.10.2011 14:03, Daniel-Constantin Mierla wrote:

...
Hello,

On 10/4/11 12:27 PM, Klaus Darilion wrote: > Meanwhile the server was restarted and the DB problems were > fixed. As > it is a production server I can not reproduce anymore.

So, once it started it didn't recovered, continued always with that error? How much of shm did you configure?

You can try to attach from time to time to one process (can be even the main one to avoid blocking a sip worker) and walk through the shm allocated chunks, in order to see if there are some unexpected repetitions of allocation from same place in sources.

I posted the gdb script for walking through pkg at some point, the difference will be to start from the head of shm list (i.e., starting with shm_block->first_frag instead of mem_block->first_frag):

http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:memory#walking_thr...

Hi Daniel!

After reading this wiki page I came to the conclusion that for further debugging I have to recompile Kamailio (using DBG_QM_MALLOC memory manager instead of F_MALLOC). With the default memory manager it is not possible to debug the problem. Is it correct?

in 3.1 malloc debug was left on (with the goal of catching buffer overflows quickly after several years of development of no using this flag in production), so unless you switched if off, you should get the reports. you can check in the output of kamailio -V

Cheers, Daniel

Daniel-Constantin Mierla

4:28 p.m.

Hello,

On 10/31/11 7:56 AM, Klaus Darilion wrote:

...

Hi Daniel!

The "out of memory" happened again. This time they were able to dump the memory statistics before restarting the server.

There are almost no allocations from other modules, but lot of allocations from usrloc and snmpstats:

# grep 'd from usrloc' syslog_core_dump|wc -l 138083 # grep 'd from snmpstats' syslog_core_dump|wc -l 2837533

is this command catching freed chunks as well?

Can you send the name of the files and lines that allocates memory chunks and repeat a lot?

...

Thus, snmpstats seems guilty. What about usrloc? Around 2000 clients are registered to this Kamailio. I think 138.000 allocations for just 2000 clients is too much. Are those usrloc allocations related to the snmpstats problem you mentioned?

AFAIS, your patch was done before 3.2 branch, thus updating to 3.2 should fix the issue (as default=turned off), correct?

Yes, it is in 3.2.0 and I hope I caught it all, at least that looked as a problem.

Cheers, Daniel

...

Thanks Klaus

On 17.10.2011 09:33, Daniel-Constantin Mierla wrote:

...
Hi Klaus,

over the weekend I looked a bit at snmpstats module. These allocated chunks are for exporting location records. Are you pulling them over snmp? At the first sight, there should be a free of the memory when the records are consumed.

The fact is that they are not pulled from usrloc module at the time of the request over snmp, but cached in snmp when registration happens. Practically, it is a partial clone of usrloc commands, which is not the best solution IMO, but I am not the developer. For the moment, I added a parameter to control whether the location records should be cached by snmpstats module or not (if not, they cannot be exported), to fix this issue. If you actually pull the location records over snmp, let me know.

I could not test, but if you can give a try (maybe you have a testbed for 3.2 with snmpstats) and see if the memory is steady with export_registrar set to 0 (which is default):

http://kamailio.org/docs/modules/devel/modules_k/snmpstats.html#id2539456

Cheers, Daniel

On 10/6/11 6:03 PM, Daniel-Constantin Mierla wrote:

...
Hello,

seem the leak is in snmpstats, I see lot of allocations like:

ALERT: qm_status: 37599. N address=0xf30cdf74 frag=0xf30cdf5c size=20 used=1 ALERT: qm_status: alloc'd from snmpstats: interprocess_buffer.c: handleContactCallbacks(143) ALERT: qm_status: start check=f0f0f0f0, end check= c0c0c0c0, abcdefed ALERT: qm_status: 37600. N address=0xf30cdfb8 frag=0xf30cdfa0 size=16 used=1 ALERT: qm_status: alloc'd from snmpstats: utilities.c: convertStrToCharString(62) ALERT: qm_status: start check=f0f0f0f0, end check= c0c0c0c0, abcdefed

There are some from usrloc, but very likely they are ok, because they are persistent in shm for long time, unless snmpstats asks for some clones of the structures from usrloc and forgets to free them (i see one allocation is from handleContactCallbacks).

No time to look in the sources, but this is a lead to follow if you want to investigate further.

In general, fr a memleak you have to look at allocated chunks that are done from same place in the code and there are many of them. The decide whether it is something that should be there for long time (like usrloc records) or they should be freed quicker comparing with the number of allocations.

Pkg log looks very clean, allocations only from startup time (maybe is the main process).

Cheers, Daniel

On 10/6/11 5:31 PM, Klaus Darilion wrote:

...
Indeed, DBG_QM_MALLOC is defined. So I have set memlog=1 and dumped mem_info with: sercmd cfg.set_now_int core mem_dump_pkg 13286 sercmd cfg.set_now_int core mem_dump_shm 13286

The dumps were done after ~1h uptime. I can not offload the traffic and wait until transactions are freed, thus the logs are quite huge (~15MByte)

http://pernau.at/kd/memlog.zip

I have no idea for what I should look for - any hints how to analyze the mem_dump?

Thanks Klaus

On 06.10.2011 13:07, Daniel-Constantin Mierla wrote:

...
Hello,

On 10/5/11 11:18 AM, Klaus Darilion wrote:

...
On 04.10.2011 14:03, Daniel-Constantin Mierla wrote: > Hello, > > On 10/4/11 12:27 PM, Klaus Darilion wrote: >> Meanwhile the server was restarted and the DB problems were >> fixed. As >> it is a production server I can not reproduce anymore. > > So, once it started it didn't recovered, continued always with that > error? How much of shm did you configure? > > You can try to attach from time to time to one process (can be > even the > main one to avoid blocking a sip worker) and walk through the shm > allocated chunks, in order to see if there are some unexpected > repetitions of allocation from same place in sources. > > I posted the gdb script for walking through pkg at some point, the > difference will be to start from the head of shm list (i.e., > starting > with shm_block->first_frag instead of mem_block->first_frag): > > http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:memory#walking_thr... > > >

Hi Daniel!

After reading this wiki page I came to the conclusion that for further debugging I have to recompile Kamailio (using DBG_QM_MALLOC memory manager instead of F_MALLOC). With the default memory manager it is not possible to debug the problem. Is it correct?

in 3.1 malloc debug was left on (with the goal of catching buffer overflows quickly after several years of development of no using this flag in production), so unless you switched if off, you should get the reports. you can check in the output of kamailio -V

Cheers, Daniel

SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users

-- Daniel-Constantin Mierla -- http://www.asipto.com Kamailio Advanced Training, Dec 5-8, Berlin: http://asipto.com/u/kat http://linkedin.com/in/miconda -- http://twitter.com/miconda

Klaus Darilion

4:43 p.m.

On 31.10.2011 17:28, Daniel-Constantin Mierla wrote:

...

On 10/31/11 7:56 AM, Klaus Darilion wrote:

...
Hi Daniel!

The "out of memory" happened again. This time they were able to dump the memory statistics before restarting the server.

There are almost no allocations from other modules, but lot of allocations from usrloc and snmpstats:

# grep 'd from usrloc' syslog_core_dump|wc -l 138083 # grep 'd from snmpstats' syslog_core_dump|wc -l 2837533

is this command catching freed chunks as well?

Ups, I have to check this ....

klaus

Klaus Darilion

5:18 p.m.

On 31.10.2011 17:28, Daniel-Constantin Mierla wrote:

...

Hello,

On 10/31/11 7:56 AM, Klaus Darilion wrote:

...
Hi Daniel!

The "out of memory" happened again. This time they were able to dump the memory statistics before restarting the server.

There are almost no allocations from other modules, but lot of allocations from usrloc and snmpstats:

# grep 'd from usrloc' syslog_core_dump|wc -l 138083 # grep 'd from snmpstats' syslog_core_dump|wc -l 2837533

is this command catching freed chunks as well?

The log of sercmd cfg.set_now_int core mem_dump_shm <pid> shows only "alloc'd" messages, no "free'd" messages. Thus I guess that the report only dumps current allocated chunks, not the previously allocated and freed ones.

If my assumption is correct, the above values are correct too. And considering reREGISTER intervals the above values seem related to registrations.

regards Klaus

Klaus Darilion

6 Oct 6 Oct

4:04 p.m.

Daniel, I tried

sercmd cfg.set_now_int core mem_dump_pkg <ps-id-of-MI-FIFO>

but it did not dumped anything. Is this by design?

thanks klaus

On 06.10.2011 13:07, Daniel-Constantin Mierla wrote:

...

Hello,

On 10/5/11 11:18 AM, Klaus Darilion wrote:

...
On 04.10.2011 14:03, Daniel-Constantin Mierla wrote:

...
Hello,

On 10/4/11 12:27 PM, Klaus Darilion wrote:

...
Meanwhile the server was restarted and the DB problems were fixed. As it is a production server I can not reproduce anymore.

So, once it started it didn't recovered, continued always with that error? How much of shm did you configure?

You can try to attach from time to time to one process (can be even the main one to avoid blocking a sip worker) and walk through the shm allocated chunks, in order to see if there are some unexpected repetitions of allocation from same place in sources.

I posted the gdb script for walking through pkg at some point, the difference will be to start from the head of shm list (i.e., starting with shm_block->first_frag instead of mem_block->first_frag):

http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:memory#walking_thr...

Hi Daniel!

After reading this wiki page I came to the conclusion that for further debugging I have to recompile Kamailio (using DBG_QM_MALLOC memory manager instead of F_MALLOC). With the default memory manager it is not possible to debug the problem. Is it correct?

in 3.1 malloc debug was left on (with the goal of catching buffer overflows quickly after several years of development of no using this flag in production), so unless you switched if off, you should get the reports. you can check in the output of kamailio -V

Cheers, Daniel

Daniel-Constantin Mierla

4:08 p.m.

Hello,

On 10/6/11 6:04 PM, Klaus Darilion wrote:

...

Daniel, I tried

sercmd cfg.set_now_int core mem_dump_pkg <ps-id-of-MI-FIFO>

but it did not dumped anything. Is this by design?

even if you send a mi command afterwards (like kamctl fifo ps)? It might be something missing for this process, since the command to dump is sent via RPC, which comes from SER development path.

Cheers, Daniel

...

thanks klaus

On 06.10.2011 13:07, Daniel-Constantin Mierla wrote:

...
Hello,

On 10/5/11 11:18 AM, Klaus Darilion wrote:

...
On 04.10.2011 14:03, Daniel-Constantin Mierla wrote:

...
Hello,

On 10/4/11 12:27 PM, Klaus Darilion wrote:

...
Meanwhile the server was restarted and the DB problems were fixed. As it is a production server I can not reproduce anymore.

So, once it started it didn't recovered, continued always with that error? How much of shm did you configure?

You can try to attach from time to time to one process (can be even the main one to avoid blocking a sip worker) and walk through the shm allocated chunks, in order to see if there are some unexpected repetitions of allocation from same place in sources.

I posted the gdb script for walking through pkg at some point, the difference will be to start from the head of shm list (i.e., starting with shm_block->first_frag instead of mem_block->first_frag):

http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:memory#walking_thr...

Hi Daniel!

After reading this wiki page I came to the conclusion that for further debugging I have to recompile Kamailio (using DBG_QM_MALLOC memory manager instead of F_MALLOC). With the default memory manager it is not possible to debug the problem. Is it correct?

in 3.1 malloc debug was left on (with the goal of catching buffer overflows quickly after several years of development of no using this flag in production), so unless you switched if off, you should get the reports. you can check in the output of kamailio -V

Cheers, Daniel

SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users

Klaus Darilion

4 Oct 4 Oct

10:37 a.m.

Daniel, is it necessary to call "sql_result_free()" ad the end of the timer route? May it be related to this?

regards Klaus

On 04.10.2011 12:24, Daniel-Constantin Mierla wrote:

...

Hello,

sqlops is using pkg and tm shm, so they should not be directly related, but maybe in the way config file works.

Can you run it again with memlog lower than debug and see where the allocated (not-freed) chunks were done? It should appear soon, not waiting for out of mem message.

If you haven't restarted, there is a way to attach with gdb and walk through shm allocated chunks to spot the occurences.

Cheers, Daniel

On 10/4/11 12:08 PM, Klaus Darilion wrote:

...
Hi!

I recently had a problem with Kamailio 3.1.4 (provided Debian packages):

I had some DB problems (missing tables). Thus, the timer module failed to insert the statistics (for siremis):

ERROR: db_mysql [km_dbase.c:120]: driver error on query: Table 'kamailio.statistics_tmx' doesn't exist ERROR: <core> [db_query.c:130]: error while submitting query ERROR: sqlops [sql_api.c:217]: cannot do the query

This happened for some time (weeks?), other DB queries were unaffected.

Then, suddenly Kamailio ran out of memory:

ERROR: <core> [sip_msg_clone.c:506]: ERROR: sip_msg_cloner: cannot allocate memory ERROR: tm [t_lookup.c:1338]: ERROR: new_t: out of mem: ERROR: tm [t_lookup.c:1478]: ERROR: t_newtran: new_t failed ERROR: sl [sl_funcs.c:282]: ERROR: sl_reply_error used: I'm terribly sorry, server error occurred (1/SL)

It do not think it is a load problem as the server is more or less idle. May it be a memory leak due to wrong error handling?

Where there any fixes recently?

Thanks Klaus

SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users

Daniel-Constantin Mierla

11:48 a.m.

Hello,

On 10/4/11 12:37 PM, Klaus Darilion wrote:

...

Daniel, is it necessary to call "sql_result_free()" ad the end of the timer route? May it be related to this?

it is not really necessary to call sql_result_free(), just if you want to free the existing result quickly, otherwise it will be freed automatically with the next sql_query() executed to store on same result id.

Even so, sql results are in private memory, not in shared memory where the tm tries to allocate memory.

Cheers, Daniel

...

regards Klaus

On 04.10.2011 12:24, Daniel-Constantin Mierla wrote:

...
Hello,

sqlops is using pkg and tm shm, so they should not be directly related, but maybe in the way config file works.

Can you run it again with memlog lower than debug and see where the allocated (not-freed) chunks were done? It should appear soon, not waiting for out of mem message.

If you haven't restarted, there is a way to attach with gdb and walk through shm allocated chunks to spot the occurences.

Cheers, Daniel

On 10/4/11 12:08 PM, Klaus Darilion wrote:

...
Hi!

I recently had a problem with Kamailio 3.1.4 (provided Debian packages):

I had some DB problems (missing tables). Thus, the timer module failed to insert the statistics (for siremis):

ERROR: db_mysql [km_dbase.c:120]: driver error on query: Table 'kamailio.statistics_tmx' doesn't exist ERROR: <core> [db_query.c:130]: error while submitting query ERROR: sqlops [sql_api.c:217]: cannot do the query

This happened for some time (weeks?), other DB queries were unaffected.

Then, suddenly Kamailio ran out of memory:

ERROR: <core> [sip_msg_clone.c:506]: ERROR: sip_msg_cloner: cannot allocate memory ERROR: tm [t_lookup.c:1338]: ERROR: new_t: out of mem: ERROR: tm [t_lookup.c:1478]: ERROR: t_newtran: new_t failed ERROR: sl [sl_funcs.c:282]: ERROR: sl_reply_error used: I'm terribly sorry, server error occurred (1/SL)

It do not think it is a load problem as the server is more or less idle. May it be a memory leak due to wrong error handling?

Where there any fixes recently?

Thanks Klaus

SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users

SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users

-- Daniel-Constantin Mierla -- http://www.asipto.com Kamailio Advanced Training, Dec 5-8, Berlin: http://asipto.com/u/kat http://linkedin.com/in/miconda -- http://twitter.com/miconda

4993

Age (days ago)

5020

Last active (days ago)

sr-users@lists.kamailio.org

16 comments

3 participants

tags (0)

participants (3)

Daniel-Constantin Mierla
Daniel-Constantin Mierla
Klaus Darilion