Got the following errors on one of the SER proxies I maintain this morning:
May 27 09:00:03 hillcrest /usr/sbin/ser[8260]: ERROR:receive_msg: memory allocation failure May 27 09:00:03 hillcrest /usr/sbin/ser[8264]: ERROR: build_req_buf_from_sip_req: out of memory May 27 09:00:03 hillcrest /usr/sbin/ser[8264]: ERROR: print_uac_request: no pkg_mem May 27 09:00:03 hillcrest /usr/sbin/ser[8264]: ERROR: build_res_buf_from_sip_req: out of memory ; needs 567 May 27 09:00:03 hillcrest /usr/sbin/ser[8264]: WARNING: receive_msg: error while trying script
The server is running 0.8.10 with the following patches: http://www.iptel.org/ser/issues/hash_fix.patch http://iptel.org/~janakj/invroute.patch http://www.iptel.org/ser/security/secalert-002-0_8_10.patch http://www.mobile-ip.de/~andrei/ser/0.8.10/ser-0.8.10-bison-1.75.patch
Of these, the only one the indicates memory errors is the invroute.patch. Has anyone seen this before?
If necessary, I have ngrep captures for the time period.
On May 27, 2003 at 10:17, Jamin W. Collins jcollins@asgardsrealm.net wrote:
Got the following errors on one of the SER proxies I maintain this morning:
May 27 09:00:03 hillcrest /usr/sbin/ser[8260]: ERROR:receive_msg: memory allocation failure May 27 09:00:03 hillcrest /usr/sbin/ser[8264]: ERROR: build_req_buf_from_sip_req: out of memory May 27 09:00:03 hillcrest /usr/sbin/ser[8264]: ERROR: print_uac_request: no pkg_mem May 27 09:00:03 hillcrest /usr/sbin/ser[8264]: ERROR: build_res_buf_from_sip_req: out of memory ; needs 567 May 27 09:00:03 hillcrest /usr/sbin/ser[8264]: WARNING: receive_msg: error while trying script
The server is running 0.8.10 with the following patches: http://www.iptel.org/ser/issues/hash_fix.patch http://iptel.org/~janakj/invroute.patch http://www.iptel.org/ser/security/secalert-002-0_8_10.patch http://www.mobile-ip.de/~andrei/ser/0.8.10/ser-0.8.10-bison-1.75.patch
Of these, the only one the indicates memory errors is the invroute.patch. Has anyone seen this before?
No.. Could you stop the proxy and send us the memory status log dumped at the end, or (if you don't want to stop it) kill -SIGUSR1 and again send the mem. log.
What happens is ser runs out of local memory. ser uses shared memory (you can specify the shared mem. size with -m) and "local" memory (per process). The local memory can be alloc'ed/dealloc'ed much faster (it does not require any locking). Usually ser runs out of shared mem., since "local" mem. is used in general only for temporary storage. ser will use maximum 1 Mb of "local" memory. So far we didn't need more. As a quick fix you can try to increase the size. You will have to change the value of PKG_MEM_POOL_SIZE in config.h and recompile.
Andrei
On Tue, May 27, 2003 at 06:58:00PM +0200, Andrei Pelinescu-Onciul wrote:
Could you stop the proxy and send us the memory status log dumped at the end, or (if you don't want to stop it) kill -SIGUSR1 and again send the mem. log.
I'm not sure which log you are referring to. The server itself has been rebooted by the customer to return functionality. I found the errors after the fact.
What happens is ser runs out of local memory. ser uses shared memory (you can specify the shared mem. size with -m) and "local" memory (per process). The local memory can be alloc'ed/dealloc'ed much faster (it does not require any locking). Usually ser runs out of shared mem., since "local" mem. is used in general only for temporary storage. ser will use maximum 1 Mb of "local" memory. So far we didn't need more. As a quick fix you can try to increase the size. You will have to change the value of PKG_MEM_POOL_SIZE in config.h and recompile.
Is the local memory allocated per listening process (as reported by 'ps') or per child process (as configured in ser.cfg) or based on something else? Just trying to get a grasp on how much the change will effect the memory usage in my environment.
Also how much shared memory does SER allocate by default? Would increasing this number possibly help?
The server had been running for upwards of a month prior to this problem.
On May 27, 2003 at 11:16, Jamin W. Collins jcollins@asgardsrealm.net wrote:
On Tue, May 27, 2003 at 06:58:00PM +0200, Andrei Pelinescu-Onciul wrote:
Could you stop the proxy and send us the memory status log dumped at the end, or (if you don't want to stop it) kill -SIGUSR1 and again send the mem. log.
I'm not sure which log you are referring to. The server itself has been rebooted by the customer to return functionality. I found the errors after the fact.
Look in the log for lines conatining: "qm_status" , "heap size=", "dumping all allocked. fragments:". They should appear before ser shutdown.
What happens is ser runs out of local memory. ser uses shared memory (you can specify the shared mem. size with -m) and "local" memory (per process). The local memory can be alloc'ed/dealloc'ed much faster (it does not require any locking). Usually ser runs out of shared mem., since "local" mem. is used in general only for temporary storage. ser will use maximum 1 Mb of "local" memory. So far we didn't need more. As a quick fix you can try to increase the size. You will have to change the value of PKG_MEM_POOL_SIZE in config.h and recompile.
Is the local memory allocated per listening process (as reported by 'ps') or per child process (as configured in ser.cfg) or based on something else? Just trying to get a grasp on how much the change will effect the memory usage in my environment.
per listening process ( total no. of processes = child_no * no_of_listening_addresses + 3).
Also how much shared memory does SER allocate by default? Would increasing this number possibly help?
32 Mb. In this case it will not help since this is not shared mem. The shared mem. runs out (usually :-)) when ser is in statefull mode (tm is used for forwarding) and a lot of calls are handled almost simultaneously (within sip expire time limits). This also depends on the CPU power (if the machine can handle only 100 cps due to cpu limits you will never exceed the default 32 Mb). On a dual athlon 2000+ which can handle ~4900 cps, 256 Mb are enough (ser -m 256).
Andrei
On Tue, May 27, 2003 at 07:34:10PM +0200, Andrei Pelinescu-Onciul wrote:
On May 27, 2003 at 11:16, Jamin W. Collins jcollins@asgardsrealm.net wrote:
I'm not sure which log you are referring to. The server itself has been rebooted by the customer to return functionality. I found the errors after the fact.
Look in the log for lines conatining: "qm_status" , "heap size=", "dumping all allocked. fragments:". They should appear before ser shutdown.
Nothing like that in the syslog or any other log that I can find. There are errors in the logs from SER right up to just prior to the reboot:
May 27 09:41:02 hillcrest /usr/sbin/ser[8258]: process_ins_list(): Error while deleting from database May 27 09:41:02 hillcrest ntpd[269]: ntpd exiting on signal 15 May 27 09:41:04 hillcrest kernel: Kernel logging (proc) stopped. May 27 09:41:04 hillcrest kernel: Kernel log daemon terminating. May 27 09:41:04 hillcrest exiting on signal 15 May 27 09:41:51 hillcrest syslogd 1.4.1#10: restart.
Is the local memory allocated per listening process (as reported by 'ps') or per child process (as configured in ser.cfg) or based on something else? Just trying to get a grasp on how much the change will effect the memory usage in my environment.
per listening process ( total no. of processes = child_no * no_of_listening_addresses + 3).
So, with a child_no of 4 and 1 IP listed for listening, I'm looking at 8 listening processes or ~8 Megs of local memory allocated to SER?
Also how much shared memory does SER allocate by default? Would increasing this number possibly help?
32 Mb. In this case it will not help since this is not shared mem. The shared mem. runs out (usually :-)) when ser is in statefull mode (tm is used for forwarding) and a lot of calls are handled almost simultaneously (within sip expire time limits). This also depends on the CPU power (if the machine can handle only 100 cps due to cpu limits you will never exceed the default 32 Mb). On a dual athlon 2000+ which can handle ~4900 cps, 256 Mb are enough (ser -m 256).
The system is a 1.1 Ghz Celeron with 128 Megs and SER's it's only function. So, if I'm following correctly, it might be a good idea to increase the share memory a bit, but the local memory would help more?
On May 27, 2003 at 13:41, Jamin W. Collins jcollins@asgardsrealm.net wrote:
On Tue, May 27, 2003 at 07:34:10PM +0200, Andrei Pelinescu-Onciul wrote:
On May 27, 2003 at 11:16, Jamin W. Collins jcollins@asgardsrealm.net wrote:
I'm not sure which log you are referring to. The server itself has been rebooted by the customer to return functionality. I found the errors after the fact.
Look in the log for lines conatining: "qm_status" , "heap size=", "dumping all allocked. fragments:". They should appear before ser shutdown.
Nothing like that in the syslog or any other log that I can find. There are errors in the logs from SER right up to just prior to the reboot:
May 27 09:41:02 hillcrest /usr/sbin/ser[8258]: process_ins_list(): Error while deleting from database May 27 09:41:02 hillcrest ntpd[269]: ntpd exiting on signal 15 May 27 09:41:04 hillcrest kernel: Kernel logging (proc) stopped. May 27 09:41:04 hillcrest kernel: Kernel log daemon terminating. May 27 09:41:04 hillcrest exiting on signal 15 May 27 09:41:51 hillcrest syslogd 1.4.1#10: restart.
Probably your debug level is < 4 (4 generates lots of debugging messages).
Could you send us the whole log, (or grep ser syslog), the cfg. file and the packet dumps you mentioned? Use serhelp@lists.iptel.org (no limits and not public, only ser developers on it). Maybe we can figure out where the mem. leak is.
Is the local memory allocated per listening process (as reported by 'ps') or per child process (as configured in ser.cfg) or based on something else? Just trying to get a grasp on how much the change will effect the memory usage in my environment.
per listening process ( total no. of processes = child_no * no_of_listening_addresses + 3).
So, with a child_no of 4 and 1 IP listed for listening, I'm looking at 8 listening processes or ~8 Megs of local memory allocated to SER?
That's the maximum. In practice 3 of the processes (the timer, fifo server and the "main" one) hardly use any local memory (probably only a few hundred bytes). There is also a very small probability that all the others will use all their local memory, and due to OS features like copy-on-write when forking, there will be no real 8Mb used.
You could try to start ser with maximum debuging (e.g debug=9) let it run a few days under your expected load, then stop it /send SIGUSR1 and look in the log for the lines: qm_status heap size= ... used= ..., used+overhead=..., free max used (+overhead)= ...
for each process.
max used will be very interesting.
If you do this and max used is close to 1Mb for some processes, then please send us the whole log, it would be much easier to catch the memory leak.
BTW: is your ser compiled with DBG_QM_MALLOC? (try ser -V, it should be unless you edited the Makefile)
Also how much shared memory does SER allocate by default? Would increasing this number possibly help?
32 Mb. In this case it will not help since this is not shared mem. The shared mem. runs out (usually :-)) when ser is in statefull mode (tm is used for forwarding) and a lot of calls are handled almost simultaneously (within sip expire time limits). This also depends on the CPU power (if the machine can handle only 100 cps due to cpu limits you will never exceed the default 32 Mb). On a dual athlon 2000+ which can handle ~4900 cps, 256 Mb are enough (ser -m 256).
The system is a 1.1 Ghz Celeron with 128 Megs and SER's it's only function. So, if I'm following correctly, it might be a good idea to increase the share memory a bit, but the local memory would help more?
Well if ser it's the only function than increase the shared mem.. 80Mb would be more than enough (I don't think a 1.1Ghz Celeron can handle more than 1500-2000cps). If this is a memory leak increasing the local memory won't help, it will only prolong the agony :-) However it could be that in your setup 1Mb of local memory is not enough (maybe there is a "legitimate" consumer) and we should increase the default.
Andrei
On Wed, May 28, 2003 at 01:31:45PM +0200, Andrei Pelinescu-Onciul wrote:
On May 27, 2003 at 13:41, Jamin W. Collins jcollins@asgardsrealm.net wrote:
Nothing like that in the syslog or any other log that I can find. There are errors in the logs from SER right up to just prior to the reboot:
May 27 09:41:02 hillcrest /usr/sbin/ser[8258]: process_ins_list(): Error while deleting from database May 27 09:41:02 hillcrest ntpd[269]: ntpd exiting on signal 15 May 27 09:41:04 hillcrest kernel: Kernel logging (proc) stopped. May 27 09:41:04 hillcrest kernel: Kernel log daemon terminating. May 27 09:41:04 hillcrest exiting on signal 15 May 27 09:41:51 hillcrest syslogd 1.4.1#10: restart.
Probably your debug level is < 4 (4 generates lots of debugging messages).
Yea, it's at the default of 3.
Could you send us the whole log, (or grep ser syslog), the cfg. file and the packet dumps you mentioned? Use serhelp@lists.iptel.org (no limits and not public, only ser developers on it). Maybe we can figure out where the mem. leak is.
Sent them in to serhelp.
You could try to start ser with maximum debuging (e.g debug=9) let it run a few days under your expected load, then stop it /send SIGUSR1 and look in the log for the lines: qm_status heap size= ... used= ..., used+overhead=..., free max used (+overhead)= ...
for each process.
max used will be very interesting.
How much (if any) would this impact the performance of the proxy? Trying to limit the impact to the customer.
If you do this and max used is close to 1Mb for some processes, then please send us the whole log, it would be much easier to catch the memory leak.
BTW: is your ser compiled with DBG_QM_MALLOC? (try ser -V, it should be unless you edited the Makefile)
Yes, it has the DBG_QM_MALLOC flag.
On May 28, 2003 at 08:46, Jamin W. Collins jcollins@asgardsrealm.net wrote:
On Wed, May 28, 2003 at 01:31:45PM +0200, Andrei Pelinescu-Onciul wrote:
On May 27, 2003 at 13:41, Jamin W. Collins jcollins@asgardsrealm.net wrote:
[...]
Probably your debug level is < 4 (4 generates lots of debugging messages).
Yea, it's at the default of 3.
I just changed the log level for the mem. status messages to 3 on cvs, for the new version. (I think it's better to get the mem. status on program exit and still be able to filter out the other dbg. messages).
Could you send us the whole log, (or grep ser syslog), the cfg. file and the packet dumps you mentioned? Use serhelp@lists.iptel.org (no limits and not public, only ser developers on it). Maybe we can figure out where the mem. leak is.
Sent them in to serhelp.
thanks
You could try to start ser with maximum debuging (e.g debug=9) let it run a few days under your expected load, then stop it /send SIGUSR1 and look in the log for the lines: qm_status heap size= ... used= ..., used+overhead=..., free max used (+overhead)= ...
for each process.
max used will be very interesting.
How much (if any) would this impact the performance of the proxy? Trying to limit the impact to the customer.
It should work without any visible impact for 100-200 cps. It might be ok for more (depends also on syslog speed, disk a.s.o). It will also write huge logs (so if your /var partition is small ...). You might consider also setting log_stderr=yes and starting ser with ser 2>/tmp/ser_log.
BTW: if you want maximum speed from ser, change Makefile.defs: remove -DDBG_QM_MALLOC and add -DF_MALLOC -DNO_DEBUG. -DNO_DEBUG will kill most of the debug messages (but you don't see them anyway since you are running ser with debug=3). -DF_MALLOC will use a much faster malloc version, w/o any debugging code.
Andrei