Hi,
Scenario: Our kamailio server normally has a debug level of 2. The server gets a segmentation fault and dies when we run a certain demo. The error message is: "[4097]: ALERT: <core> [main.c:789]: child process 4098 exited by a signal 11" And we got a lot of these before it dies: "error reading: Connection reset by peer (104)" "ERROR: tcp_read_req: error reading" Because of the low debug level, I have no idea of what child process 4098 was doing before it died.
Here is the wired bit: The problem goes away if I set debug level to 5. But always occurs at debug level 2.
Two questions: 1, I noticed that the main thread was 4097 and 4098 died. what is the child process created straight after main? My guess is this 4098 child process manages TCP connections. Is this correct?
2, Why does debugging level has an impact on this? Is it because higher debugging level introduces some delay?
Regards,
Allen
On 03/19/2014 06:50 PM, Allen Zhang wrote:
1, I noticed that the main thread was 4097 and 4098 died. what is the child process created straight after main? My guess is this 4098 child process manages TCP connections. Is this correct?
2, Why does debugging level has an impact on this? Is it because higher debugging level introduces some delay?
That's hard to say. However, changing any aspect of the execution behaviour changes the state of the program, and can certainly have an impact on when it crashes, and whether it crashes at all.
The nature of memory bugs is that memory boundaries are often overstepped, but this does not necessarily result in a crash. The crash arises from the consequences of accessing that out-of-bounds memory, such as when the program ingests garbage from that memory area because it has been written to by something else. And, all of this behaviour varies with the order of operations, the particular libc you are using, its version, and the memory footprint of various other executed components.
The way to troubleshoot an issue like this is to analyse the core dump that is generated by the process that died due to the segmentation fault (signal 11). You should be able to find that core dump somewhere on your system. When you do, you can read it with 'gdb':
gdb /path/to/kamailio/binary /path/to/core.4098
Note that by default, many values will be optimised out. To get a fuller picture, you may need to compile Kamailio without -Ox compiler optimisations, and with additional debug information, e.g. -g.
-- Alex
Hi Alex,
Shouldn't the debug level only have an impact on the amount of information written to the log? And that should only changes the delay between operations?
Allen
-----Original Message----- From: sr-users-bounces@lists.sip-router.org [mailto:sr-users-bounces@lists.sip-router.org] On Behalf Of Alex Balashov Sent: Thursday, 20 March 2014 11:58 a.m. To: sr-users@lists.sip-router.org Subject: Re: [SR-Users] Child process exited by a signal 11
On 03/19/2014 06:50 PM, Allen Zhang wrote:
1, I noticed that the main thread was 4097 and 4098 died. what is the child process created straight after main? My guess is this 4098 child process manages TCP connections. Is this correct?
2, Why does debugging level has an impact on this? Is it because higher debugging level introduces some delay?
That's hard to say. However, changing any aspect of the execution behaviour changes the state of the program, and can certainly have an impact on when it crashes, and whether it crashes at all.
The nature of memory bugs is that memory boundaries are often overstepped, but this does not necessarily result in a crash. The crash arises from the consequences of accessing that out-of-bounds memory, such as when the program ingests garbage from that memory area because it has been written to by something else. And, all of this behaviour varies with the order of operations, the particular libc you are using, its version, and the memory footprint of various other executed components.
The way to troubleshoot an issue like this is to analyse the core dump that is generated by the process that died due to the segmentation fault (signal 11). You should be able to find that core dump somewhere on your system. When you do, you can read it with 'gdb':
gdb /path/to/kamailio/binary /path/to/core.4098
Note that by default, many values will be optimised out. To get a fuller picture, you may need to compile Kamailio without -Ox compiler optimisations, and with additional debug information, e.g. -g.
-- Alex
-- Alex Balashov - Principal Evariste Systems LLC 235 E Ponce de Leon Ave Suite 106 Decatur, GA 30030 United States Tel: +1-678-954-0670 Web: http://www.evaristesys.com/, http://www.alexbalashov.com/
_______________________________________________ SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
On 03/19/2014 07:04 PM, Allen Zhang wrote:
Shouldn't the debug level only have an impact on the amount of information written to the log? And that should only changes the delay between operations?
Well, from a programmatic point of view, not necessarily. Writing debug logs is an operation that involves buffering and parsing strings internally, which in turn draws on static (stack) and dynamic (heap) memory allocations. All of that influences the memory state of the program, and thus has an impact on whether it'll crash, and when it will do so.
Yes this makes sense. But higher debug level = more writing. Then increasing the debug level should causes more problem - because more buffering and parsing strings internally, which in turn draws on static (stack) and dynamic (heap) memory allocations - instead of hiding the problem, right?
-----Original Message----- From: sr-users-bounces@lists.sip-router.org [mailto:sr-users-bounces@lists.sip-router.org] On Behalf Of Alex Balashov Sent: Thursday, 20 March 2014 12:06 p.m. To: sr-users@lists.sip-router.org Subject: Re: [SR-Users] Child process exited by a signal 11
On 03/19/2014 07:04 PM, Allen Zhang wrote:
Shouldn't the debug level only have an impact on the amount of information written to the log? And that should only changes the delay between operations?
Well, from a programmatic point of view, not necessarily. Writing debug logs is an operation that involves buffering and parsing strings internally, which in turn draws on static (stack) and dynamic (heap) memory allocations. All of that influences the memory state of the program, and thus has an impact on whether it'll crash, and when it will do so.
-- Alex Balashov - Principal Evariste Systems LLC 235 E Ponce de Leon Ave Suite 106 Decatur, GA 30030 United States Tel: +1-678-954-0670 Web: http://www.evaristesys.com/, http://www.alexbalashov.com/
_______________________________________________ SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
On 03/19/2014 07:10 PM, Allen Zhang wrote:
Yes this makes sense. But higher debug level = more writing. Then increasing the debug level should causes more problem - because more buffering and parsing strings internally, which in turn draws on static (stack) and dynamic (heap) memory allocations - instead of hiding the problem, right?
That is logical, and is probably true in many cases.
However, it all depends on the memory allocation strategy used by the program internally, as well as on the operating system side. For instance, more logging could trigger a larger buffer allocation or different fragmentation, which could serve to mask the memory bug by not creating the circumstances that lead to an acute access violation, or not creating them in the same place or as soon.
Um.... This makes perfect sense. Enhanced my understanding about memory allocation, too. Thanks Alex.
-----Original Message----- From: sr-users-bounces@lists.sip-router.org [mailto:sr-users-bounces@lists.sip-router.org] On Behalf Of Alex Balashov Sent: Thursday, 20 March 2014 12:12 p.m. To: sr-users@lists.sip-router.org Subject: Re: [SR-Users] Child process exited by a signal 11
On 03/19/2014 07:10 PM, Allen Zhang wrote:
Yes this makes sense. But higher debug level = more writing. Then increasing the debug level should causes more problem - because more buffering and parsing strings internally, which in turn draws on static (stack) and dynamic (heap) memory allocations - instead of hiding the problem, right?
That is logical, and is probably true in many cases.
However, it all depends on the memory allocation strategy used by the program internally, as well as on the operating system side. For instance, more logging could trigger a larger buffer allocation or different fragmentation, which could serve to mask the memory bug by not creating the circumstances that lead to an acute access violation, or not creating them in the same place or as soon.
-- Alex Balashov - Principal Evariste Systems LLC 235 E Ponce de Leon Ave Suite 106 Decatur, GA 30030 United States Tel: +1-678-954-0670 Web: http://www.evaristesys.com/, http://www.alexbalashov.com/
_______________________________________________ SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users