It seems I found the problem and I have a fix.

The root cause is probably that the locally generated 408 is not updating the dialog to-tag.

However, always checking for a to-tag match, before a non to-tag match will fix any such issue.

I will prepare a merge request on Monday to start discussing the option always matching to-tag first.

On Fri, Sep 25, 2020 at 11:27 AM Julien Chavanton <jchavanton@gmail.com> wrote:
I did catch the logs, and after looking at the trace, it seems like dialog mismatch with a serial forking scenario :

- log line 3 is telling us that a NO-ACK disconnection should be triggered
- log line 1-2 is telling us what happened when the ACK was received in dlg_onroute(), oddly enough state 5 was old and new, could it be a mismatch/confusio with the previous dialog, looking in this direction ...

1: 2020-09-25T16:30:16.896: dialog [dlg_handlers.c:1273]: extra_ack_debug_info(): [ACK][1] state not changed >>> call-id[562419_125824138_2072238224] to-tag[<sip:+14019991904@anon.com>;tag=gK02b68836]
2: 2020-09-25T16:30:16.896: dialog [dlg_handlers.c:1440]: dlg_onroute(): [ACK] state not changed old[5]new[5]
...
3: 2020-09-25T16:32:22.674: dialog [dlg_hash.c:247]: dlg_clean_run(): dialog disconnection no-ACK call-id[562419_125824138_2072238224][1601051416]<[1601051542 - 60]


After looking at the pcap trace, call-id 562419_125824138_2072238224 was involved in serial forking :

call attempt #1

X >> INVITE >> Y   // no to-tag  
X << 100
...
X << 408           // to-tag=594d50c3218065a60bb91fd47a70fbc1-59edef02 (locally generated)
X >> ACK           // to-tag=594d50c3218065a60bb91fd47a70fbc1-59edef02

call attempt #2

X >> INVITE >> Z   // no to-tag
X << 100
X << 200    << Z   // to-tag=gK02b68836
X >> ACK    >> Z   // to-tag=gK02b68836 (Should be state old[3]new[4], I wonder how it could possibly be state old[5]new[5])



I did look at several occurrences and there is always a locally generated 408/to-tag before, seems like I have a good lead to investigate further.