Hi Henning.

Thanks a lot for your answer.

Currently, the machine does not report any hardware problem; Solaris 10 has a service called Fault Manager, which is running on my machine, and it has not reported any error or problem related to it.

At this moment, I am testing a Openser installation compiled using an optimized version of GCC released by Sun to be used on Sparc Systems; this release is based on gcc 4, and at this time, OpenSER has been running for almost 18 hours without crash.

I will inspect the core file again, and I will be posting what I find.

Best regards, and thanks again.

Sergio Gutierrez.



On Thu, Feb 28, 2008 at 5:19 AM, Henning Westerholt <henning.westerholt@1und1.de> wrote:
On Thursday 28 February 2008, Sergio Gutierrez wrote:
> My OpenSER 1.3 installation running on Solaris Sparc is facing random and
> unexpected crashes, in appearance related to timer process.
>
> The last core presents the following backtrace
>
> #0  0xfe977a04 in get_expired_dlgs (time=4233810208) at dlg_timer.c:194
> #1  0xfe977540 in dlg_timer_routine (ticks=7980, attr=0x0) at
> dlg_timer.c:210
> #2  0x000a839c in timer_ticker (timer_list=0x15ec00) at timer.c:275
> #3  0x000a80ec in run_timer_process (tpl=0x1b8088, do_jiffies=1) at timer.c
>
> :357
>
> #4  0x000a8668 in start_timer_processes () at timer.c:386
> #5  0x00035ea8 in main_loop () at main.c:873
> #6  0x000397c4 in main (argc=-4195024, argv=0x150e9c) at main.c:1372
>
>
> Thanks in advance for any hint you can give me.

Hi Sergio,

signal 10 is SIGBUS on solaris. This could be caused from an invalid address
alignment, a segmention fault on a physical address and a object hardware
error (wikipedia).

The first crashes were both caused from a get_all_ucontact, triggered by a
timer. This crash is now another timer, deletion of expired dialogs,
strange.. Is this machine otherwise stable, when (openser release) does this
crashes started?

Do you have already inspected with the debugger the datastructures in the code
of the get_expired_dlgs functions? Perhaps there is something wrong in
there..

Cheers,

Henning