Hey Brandon,
On 09.08.2011 17:54, Brandon Armstead wrote:
Looks like I spoke too soon! It is still happening.
Any additional thoughts? All and any help is greatly appreciated.
My original theory with Anton's issue was (and still is) that the dialog module is trying to touch a dialog which has already terminated. When provoking things like that through modifications in the dialog module, we encountered crashes at similar locations in the code.
Is there any chance you can provide me with some (anonymized) SIP traces? Dissecting this problem has proven to be very hard even when given a lot of details, such as Anton's core dumps. Not being able to reconstruct the call flow has been a show stopper so far.
On Tue, Aug 9, 2011 at 8:37 AM, Brandon Armstead <brandon@cryy.com mailto:brandon@cryy.com> wrote:
Timo, Looks like that worked - going to keep watching it and see what happens. However I am now getting: Aug 9 15:34:00 /usr/local/sbin/kamailio[3040]: CRITICAL: dialog [dlg_timer.c:138]: Trying to insert a bogus dlg tl=0x7f8dc8089368 tl->next=0x7f8dc80202e0 tl->prev=0x7f8dc8108b68 Aug 9 15:34:00 /usr/local/sbin/kamailio[3040]: CRITICAL: dialog [dlg_handlers.c:373]: Unable to insert dlg 0x7f8dc8089318 [603:994585481] on event 3 [2->3] with clid 'CINMGC0320110809153354004076@XXX.XXX.XXX.XXX' and tags 'VPSF506071629460' 'gK0cc7be82' Which looks similar to the original thread? Not sure if there is still an underlying issue that I should be wary of?
This seems to be related to setting up the dialog timer on reception of a 200 OK message when the given dialog was about to transition from the "early" state (18x reply seen) to the "confirmed without ACK" state (200 OK seen, ACK still outstanding).
Not quite sure what this means but it could possibly be just another manifestation of the same bug.
Cheers,
--Timo
On Tue, Aug 9, 2011 at 8:16 AM, Brandon Armstead <brandon@cryy.com <mailto:brandon@cryy.com>> wrote: Timo, I have actually been researching that thread - it does look *similar* however it does not look 100% related. I am checking out that commit now however - and will see if it resolves the same issue. As for dlg_end_dlg - I am not calling this via FIFO or anything - I am simply calling dlg_manage() in the routing config - however I am not sure if this is being called internally (I assume that it is) upon a dialog cleanup? Sincerely, Brandon Armstead On Tue, Aug 9, 2011 at 8:08 AM, Timo Reimann <timo.reimann@1und1.de <mailto:timo.reimann@1und1.de>> wrote: Hello Brandon, On 09.08.2011 16:17, Brandon Armstead wrote: > Hello, > > Any further insight any of you can provide is very much appreciated. > > Here is the core dump syslog: [snip!] A few months ago, Anton Roman provided a core dump looking very similar to yours. I wasn't exactly able to pin down the cause but suspected the dlg_end_dlg() function. Is there a chance you used that function around the time the crash happened? Cheers, --Timo