Hey Brandon,
On 09.08.2011 17:54, Brandon Armstead wrote:
Looks like I spoke too soon! It is still
happening.
Any additional thoughts? All and any help is greatly appreciated.
My original theory with Anton's issue was (and still is) that the dialog
module is trying to touch a dialog which has already terminated. When
provoking things like that through modifications in the dialog module,
we encountered crashes at similar locations in the code.
Is there any chance you can provide me with some (anonymized) SIP
traces? Dissecting this problem has proven to be very hard even when
given a lot of details, such as Anton's core dumps. Not being able to
reconstruct the call flow has been a show stopper so far.
On Tue, Aug 9, 2011 at 8:37 AM, Brandon Armstead
<brandon(a)cryy.com
<mailto:brandon@cryy.com>> wrote:
Timo,
Looks like that worked - going to keep watching it and see what happens.
However I am now getting:
Aug 9 15:34:00 /usr/local/sbin/kamailio[3040]: CRITICAL: dialog
[dlg_timer.c:138]: Trying to insert a bogus dlg tl=0x7f8dc8089368
tl->next=0x7f8dc80202e0 tl->prev=0x7f8dc8108b68
Aug 9 15:34:00 /usr/local/sbin/kamailio[3040]: CRITICAL: dialog
[dlg_handlers.c:373]: Unable to insert dlg 0x7f8dc8089318
[603:994585481] on event 3 [2->3] with clid
'CINMGC0320110809153354004076(a)XXX.XXX.XXX.XXX' and tags
'VPSF506071629460' 'gK0cc7be82'
Which looks similar to the original thread? Not sure if there is
still an underlying issue that I should be wary of?
This seems to be related to setting up the dialog timer on reception of
a 200 OK message when the given dialog was about to transition from the
"early" state (18x reply seen) to the "confirmed without ACK" state
(200
OK seen, ACK still outstanding).
Not quite sure what this means but it could possibly be just another
manifestation of the same bug.
Cheers,
--Timo
On Tue, Aug 9, 2011 at 8:16 AM, Brandon Armstead
<brandon(a)cryy.com
<mailto:brandon@cryy.com>> wrote:
Timo,
I have actually been researching that thread - it does look
*similar* however it does not look 100% related. I am checking
out that commit now however - and will see if it resolves the
same issue.
As for dlg_end_dlg - I am not calling this via FIFO or anything
- I am simply calling dlg_manage() in the routing config -
however I am not sure if this is being called internally (I
assume that it is) upon a dialog cleanup?
Sincerely,
Brandon Armstead
On Tue, Aug 9, 2011 at 8:08 AM, Timo Reimann
<timo.reimann(a)1und1.de <mailto:timo.reimann@1und1.de>> wrote:
Hello Brandon,
On 09.08.2011 16:17, Brandon Armstead wrote:
Hello,
Any further insight any of you can provide is very much
appreciated.
Here is the core dump syslog:
[snip!]
A few months ago, Anton Roman provided a core dump looking
very similar
to yours. I wasn't exactly able to pin down the cause but
suspected the
dlg_end_dlg() function.
Is there a chance you used that function around the time the
crash happened?
Cheers,
--Timo