Hello,
We've got an issue we've been trying to track for days with 'dialog', where the dialogs go through the following life cycle:
1. INVITE --> 2. <-- 407 challenge 3. ACK --> 4. INVITE --> 5. <-- 100 Trying 6. <-- 180 Ringing 7. <-- 183 Session Progress 8. <-- 200 OK 9. ACK --> 10. BYE --> 11. <-- 200 OK
Somewhere around steps #6-7 (18x messages), the dialogs no longer appear to be counted in the dialog profile they're attached to, as gleaned from 'kamctl fifo profile_get_size <profile> <key>'. The dialog count in the profile falls to 0.
Naturally, I suspected a SIP issue that was preventing the dialog state from being tracked correctly. However, subsequent investigation revealed that these very same dialogs are tracked just fine throughout their entire lifetime. The 'kamctl fifo dlg_list' command shows them to have the right states at the right times.
So, the dialog itself isn't going away from tracking. Its attachment to the given profile seems to be the issue.
I can't find any logical explanation for this in the code, nor any technical reasoning to support this hypothesis. However, that's what seems to be happening. I'm not sure that this is the issue; it might be another issue masquerading as having this effect. I'm just trying to rule possibilities out.
Are there any situations that can cause a dialog that is otherwise being normally tracked to be dumped from a profile of which it was previously part?
Thanks!
-- Alex
Something I should probably add:
On 03/11/2014 10:28 AM, Alex Balashov wrote:
Hello,
We've got an issue we've been trying to track for days with 'dialog', where the dialogs go through the following life cycle:
- INVITE -->
- <-- 407 challenge
- ACK -->
- INVITE -->
- <-- 100 Trying
- <-- 180 Ringing
- <-- 183 Session Progress
- <-- 200 OK
- ACK -->
- BYE -->
- <-- 200 OK
The 407 challenge here is happening between the endpoints of the call; it is not being done by the proxy. The proxy relays the 407 response back to the caller, and the caller reoriginates the INVITE with credentials, and that's passed back up to the UAS.
Aside from a different CSeq, these invites are the same. Would calling dlg_manage() on both of them (which sets the profile count to 2) cause them both to get deleted from the profile when the 407 challenge comes back and is ACK'd? I would think not, since we call dlg_manage() again when the INVITE with credentials is relayed.
We appear to have fixed this problem by calling dlg_manage() before doing any set_dlg_profile() manipulations.
The documentation is not clear on whether dlg_manage() needs to be called first before doing this. But it makes me wonder: if dlg_manage() is prerequisite, then why would the profile manipulation work at all beforehand?
I do wonder if this [relatively new] setting is relevant:
http://kamailio.org/docs/modules/4.1.x/modules/dialog.html#idp1930160
We don't have it set.
Since the default value of this is '1', it makes me wonder if, in the olden days, when one used a flag, it was inconsequential to mess with profile affinity prior to t_relay(), and now it is consequential because the callbacks are run immediately?
On 03/11/2014 01:44 PM, Alex Balashov wrote:
We appear to have fixed this problem by calling dlg_manage() before doing any set_dlg_profile() manipulations.
The documentation is not clear on whether dlg_manage() needs to be called first before doing this. But it makes me wonder: if dlg_manage() is prerequisite, then why would the profile manipulation work at all beforehand?
On 11/03/14 18:44, Alex Balashov wrote:
We appear to have fixed this problem by calling dlg_manage() before doing any set_dlg_profile() manipulations.
The documentation is not clear on whether dlg_manage() needs to be called first before doing this. But it makes me wonder: if dlg_manage() is prerequisite, then why would the profile manipulation work at all beforehand?
Should work both ways. But I said in previous email, for the second invite, the dialog might be found in memory and reused. In that case it might not get the new profile for local static lists... so I guess that a set_dlg_profile() before dlg_manage() doesn't find dialog shortcut (which probably is set by dlg_manage()) and will add to local static lists. When the dlg_manage() is executed, first looks for the dialog and finds it, then don't create a new one. When creating a new structure, the code is looking to local static lists and add the dialog in those profiles. Code has to be checked, though, only my guess here.
Cheers, Daniel
Daniel,
On 03/12/2014 04:27 AM, Daniel-Constantin Mierla wrote:
On 11/03/14 18:44, Alex Balashov wrote:
We appear to have fixed this problem by calling dlg_manage() before doing any set_dlg_profile() manipulations.
The documentation is not clear on whether dlg_manage() needs to be called first before doing this. But it makes me wonder: if dlg_manage() is prerequisite, then why would the profile manipulation work at all beforehand?
Should work both ways. But I said in previous email, for the second invite, the dialog might be found in memory and reused. In that case it might not get the new profile for local static lists... so I guess that a set_dlg_profile() before dlg_manage() doesn't find dialog shortcut (which probably is set by dlg_manage()) and will add to local static lists. When the dlg_manage() is executed, first looks for the dialog and finds it, then don't create a new one. When creating a new structure, the code is looking to local static lists and add the dialog in those profiles. Code has to be checked, though, only my guess here.
When dlg_manage() was called after the set_dlg_profile() calls, the profile count looked like this:
0 -> 1 -> 2 --> 0 (abrupt crash to 0 after 18x messages)
When we put dlg_manage() prior to the set_dlg_profile() calls, the profile count looks like this:
0 -> 1 -> 2 -> 1 --> 1
This is closer to what we want, of course. The issue is that the dialog module takes 1-2 seconds to delete the old dialog in this scenario (the '2 -> 1' step), and for a high volume of calls, that can be a problem because it inflates the number of calls currently in process by a substantial proportion, breaking the ability to do concurrent call limiting effectively.
Since dialog relies on TM callbacks, my question is: are there any timeouts we can tweak on the TM side that would have the effect of aging the old (407-challenged) dialog out faster, perhaps nearly instantly, once the hop-by-hop ACK for the 407 is received?
That is the fundamental problem we're trying to solve at this point.
Thanks,
-- Alex
P.S. if(is_present_hf("Proxy-Authorization")) { track dialog }
... would be an attractive solution. Unfortunately, authentication is not used in all cases by the upstream gateway, and we cannot predict which cases those will be.
Are you using the latest 4.1.x? There was a fix related to the counting of the dialog in profile, by removing it from profiles when the call gets to terminated state. The dialog is still kept for a bit in memory after termination and was blocking new calls as the user appear to still have another call in the same profile, although it was terminated.
I haven't checked commit logs, but iirc, quite some time ago someone added the possibility to reuse the structure for authenticated dialogs, but if you set the dialog in the profile for the second invite, then should work.
Cheers, Daniel
On 11/03/14 17:47, Alex Balashov wrote:
Something I should probably add:
On 03/11/2014 10:28 AM, Alex Balashov wrote:
Hello,
We've got an issue we've been trying to track for days with 'dialog', where the dialogs go through the following life cycle:
- INVITE -->
- <-- 407 challenge
- ACK -->
- INVITE -->
- <-- 100 Trying
- <-- 180 Ringing
- <-- 183 Session Progress
- <-- 200 OK
- ACK -->
- BYE -->
- <-- 200 OK
The 407 challenge here is happening between the endpoints of the call; it is not being done by the proxy. The proxy relays the 407 response back to the caller, and the caller reoriginates the INVITE with credentials, and that's passed back up to the UAS.
Aside from a different CSeq, these invites are the same. Would calling dlg_manage() on both of them (which sets the profile count to 2) cause them both to get deleted from the profile when the 407 challenge comes back and is ACK'd? I would think not, since we call dlg_manage() again when the INVITE with credentials is relayed.
Do you happen to have any additional information about this? I've referenced the documentation for both dialog and dialog_ng, and there's no mention of reuse that fits this description. We've confirmed that calling dlg_manage() on a new authenticated invite does result in a second entry in the dialog table. The first entry does fall off after some time, however it would be very preferable to reuse the existing entry as opposed to creating a new one, especially in our case where we are dealing with a somewhat high velocity of initial invites.
Brooks Bridges Senior Technical Consultant Evariste Systems LLC 235 E Ponce de Leon Ave, Suite 106 Decatur, GA 30030 United States Tel: +1-678-954-0670 Web: http://www.evaristesys.com/
On 3/12/2014 3:21 AM, Daniel-Constantin Mierla wrote:
iirc, quite some time ago someone added the possibility to reuse the structure for authenticated dialogs
On 12/03/14 16:58, Brooks Bridges wrote:
Do you happen to have any additional information about this?
No time to look over the code, I might mix it with the plans for designing ng version.
I've referenced the documentation for both dialog and dialog_ng, and there's no mention of reuse that fits this description. We've confirmed that calling dlg_manage() on a new authenticated invite does result in a second entry in the dialog table. The first entry does fall off after some time, however it would be very preferable to reuse the existing entry as opposed to creating a new one, especially in our case where we are dealing with a somewhat high velocity of initial invites.
If you test it, then is like that. Maybe you can simulate on a less loaded system with debug=3 (or if you use latest version, then use debugger module with higher debug value for dialog module), to see if you notice some hints on profile operations.
Cheers, Daniel
Brooks Bridges Senior Technical Consultant Evariste Systems LLC 235 E Ponce de Leon Ave, Suite 106 Decatur, GA 30030 United States Tel: +1-678-954-0670 Web: http://www.evaristesys.com/
On 3/12/2014 3:21 AM, Daniel-Constantin Mierla wrote:
iirc, quite some time ago someone added the possibility to reuse the structure for authenticated dialogs
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users