On Thursday 28 February 2008, Sergio Gutierrez wrote:
My OpenSER 1.3 installation running on Solaris Sparc is facing random and unexpected crashes, in appearance related to timer process.
The last core presents the following backtrace
#0 0xfe977a04 in get_expired_dlgs (time=4233810208) at dlg_timer.c:194 #1 0xfe977540 in dlg_timer_routine (ticks=7980, attr=0x0) at dlg_timer.c:210 #2 0x000a839c in timer_ticker (timer_list=0x15ec00) at timer.c:275 #3 0x000a80ec in run_timer_process (tpl=0x1b8088, do_jiffies=1) at timer.c
:357
#4 0x000a8668 in start_timer_processes () at timer.c:386 #5 0x00035ea8 in main_loop () at main.c:873 #6 0x000397c4 in main (argc=-4195024, argv=0x150e9c) at main.c:1372
Thanks in advance for any hint you can give me.
Hi Sergio,
signal 10 is SIGBUS on solaris. This could be caused from an invalid address alignment, a segmention fault on a physical address and a object hardware error (wikipedia).
The first crashes were both caused from a get_all_ucontact, triggered by a timer. This crash is now another timer, deletion of expired dialogs, strange.. Is this machine otherwise stable, when (openser release) does this crashes started?
Do you have already inspected with the debugger the datastructures in the code of the get_expired_dlgs functions? Perhaps there is something wrong in there..
Cheers,
Henning