Hello everyone,
I found many references in serdev, openser users, etc. MLs to the issue at hand, but only came up with diverging opinions and no solution whatsoever.
The issue appears in case of a UA sending a re-INVITE inside of an established dialog. Depending on your configuration file, version, network conditions, there is NO guarantee that this re-INVITE will be sent to (I'm not even talking about reaching) the next hop (typically a PSTN gateway) AFTER the ACK to the first INVITE of the dialog.
I perfectly understand why (openser being transaction and not dialog statefull being the only reason that really matters, given the fact that INVITE and ACK are separate transactions, often handled by separate openser processes) it is this way.
However, it seems that several gateways are not handling this well at all, and just drop the transaction typically sending a "500 Server internal error". I've read reports about cisco AS5300, Asterisk, and I am myself experiencing this with an Andiocodes Mediant 2000 gateway.
In my case, the issue is there 99% of the time. I know I could profile a bit my config file to ensure ACKs are processed faster than INVITEs (which propably is already the case), but this is hardly a workaround. I got tempted to even call an external "timer" via exec_msg when detecting a re-INVITE (if it is an INVITE, has a to_tag and is loose routed), but come on, I'm sure we can do better than this !
So no matter what the RFCs say or mean to say, no matter how long we argue about this, the facts are :
- gateways often don't stand this ordering issue - many iPBXs are using reINVITEs for several call features - the only way to solve this is to be dialog statefull, at least for ACKs and INVITES
So my conclusion is :
I'm going to code an ugly little hack - using an external database and avpops - so that ACKs are logged per Cseq+callid, and when reinvites are detected, relay will be delayed until the last ACK is logged as sent.
What do you think ? Is there a way to use the dialog module to optimize this (flagging the SIP headers themselves instead of an external DB) ?
Best Regards, Jerome Martin
We solved this problem in a SBC I used to work on by dropping the reInvite if the ACK was still pending. This works in UDP since the reInvite then would be resent. Of course the SBC was dialog stateful.
Dropping the reInvite on the floor in the case of UDP transport certainly is easier than delay/resending
Hope this helps
T.R.
On 11/29/06 3:09 PM, "Jerome Martin" jmartin@longphone.fr wrote:
Hello everyone,
I found many references in serdev, openser users, etc. MLs to the issue at hand, but only came up with diverging opinions and no solution whatsoever.
The issue appears in case of a UA sending a re-INVITE inside of an established dialog. Depending on your configuration file, version, network conditions, there is NO guarantee that this re-INVITE will be sent to (I'm not even talking about reaching) the next hop (typically a PSTN gateway) AFTER the ACK to the first INVITE of the dialog.
I perfectly understand why (openser being transaction and not dialog statefull being the only reason that really matters, given the fact that INVITE and ACK are separate transactions, often handled by separate openser processes) it is this way.
However, it seems that several gateways are not handling this well at all, and just drop the transaction typically sending a "500 Server internal error". I've read reports about cisco AS5300, Asterisk, and I am myself experiencing this with an Andiocodes Mediant 2000 gateway.
In my case, the issue is there 99% of the time. I know I could profile a bit my config file to ensure ACKs are processed faster than INVITEs (which propably is already the case), but this is hardly a workaround. I got tempted to even call an external "timer" via exec_msg when detecting a re-INVITE (if it is an INVITE, has a to_tag and is loose routed), but come on, I'm sure we can do better than this !
So no matter what the RFCs say or mean to say, no matter how long we argue about this, the facts are :
- gateways often don't stand this ordering issue
- many iPBXs are using reINVITEs for several call features
- the only way to solve this is to be dialog statefull, at least for
ACKs and INVITES
So my conclusion is :
I'm going to code an ugly little hack - using an external database and avpops - so that ACKs are logged per Cseq+callid, and when reinvites are detected, relay will be delayed until the last ACK is logged as sent.
What do you think ? Is there a way to use the dialog module to optimize this (flagging the SIP headers themselves instead of an external DB) ?
Best Regards, Jerome Martin
Users mailing list Users@openser.org http://openser.org/cgi-bin/mailman/listinfo/users
Hi,
Thanks for answer my post,
On Wed, 2006-11-29 at 17:27 -0500, T.R. Missner wrote:
We solved this problem in a SBC I used to work on by dropping the reInvite if the ACK was still pending. This works in UDP since the reInvite then would be resent. Of course the SBC was dialog stateful.
Do you have an idea of the overhead of dialog statefullness ? Would a DB storage of pending ACKs (mean 1 DB write per (re)INVITE callID+Cseq, 1 DB write (delete) per ACK, 1 DB read per reINVITE) have a big impact on performance ? Am I going to butcher my dear openser 1.1 by implementing this ?
Dropping the reInvite on the floor in the case of UDP transport certainly is easier than delay/resending
Nice one. Do you have an idea of the average timeout waiting for an INVITE reply ? I mean before resending ? Because I makes no sense anyway to delay for more than this timeout instead of dropping the request ... but dropping requires DB overhead.
But still, only delaying the reINVITE for a fixed amount of time indeed will generate escalating resends if delay is too big, and might no solve 100% of the issue (WILL not) if delay is too short (so to avoid resends). So in order to keep the UA from resending, one must carefully send provisionnal replies while delaying, which starts to get unpleasant to the eye (IMHO).
Hope this helps T.R.
Well it does :-) It made me think a bit more, I just didn't really worried about timeout and resends, but clearly one MUST take this into consideration, either by leveraging the mechanism to do part of the job or finding a workaround so it doesn't get into the way, depending on the approach. Cleary, if choosing to implement partial dialog statefullness for reINVITE handling, I'll stick to your suggestion, which is much more elegant than delay.
Thanks again for your answer, Best Regards, Jerome Martin
On 11/30/06, Jerome Martin jmartin@longphone.fr wrote:
Hi,
Thanks for answer my post,
On Wed, 2006-11-29 at 17:27 -0500, T.R. Missner wrote:
We solved this problem in a SBC I used to work on by dropping the
reInvite
if the ACK was still pending. This works in UDP since the reInvite then would be resent. Of course the SBC was dialog stateful.
Do you have an idea of the overhead of dialog statefullness ? Would a DB storage of pending ACKs (mean 1 DB write per (re)INVITE callID+Cseq, 1 DB write (delete) per ACK, 1 DB read per reINVITE) have a big impact on performance ? Am I going to butcher my dear openser 1.1 by implementing this ?
You might get away cheaper with a little TM modification and using TCP with the gateway: - you have to enhance TM to allow you to see if a transaction belonging to same dialog is still pending; if it is, either drop the second transaction (as already suggested) or reply with something like 480 or 491 (none of them best suited for the job, but might work); - by using one TCP connection you minimize the chance of packet reordering.
Delaying is not really possible without blocking the process doing it: you could end up with a blocked server.
Hth, WL.
Dropping the reInvite on the floor in the case of UDP transport certainly is
easier than delay/resending
Nice one. Do you have an idea of the average timeout waiting for an INVITE reply ? I mean before resending ? Because I makes no sense anyway to delay for more than this timeout instead of dropping the request ... but dropping requires DB overhead.
But still, only delaying the reINVITE for a fixed amount of time indeed will generate escalating resends if delay is too big, and might no solve 100% of the issue (WILL not) if delay is too short (so to avoid resends). So in order to keep the UA from resending, one must carefully send provisionnal replies while delaying, which starts to get unpleasant to the eye (IMHO).
Hope this helps T.R.
Well it does :-) It made me think a bit more, I just didn't really worried about timeout and resends, but clearly one MUST take this into consideration, either by leveraging the mechanism to do part of the job or finding a workaround so it doesn't get into the way, depending on the approach. Cleary, if choosing to implement partial dialog statefullness for reINVITE handling, I'll stick to your suggestion, which is much more elegant than delay.
Thanks again for your answer, Best Regards, Jerome Martin
Users mailing list Users@openser.org http://openser.org/cgi-bin/mailman/listinfo/users
Hi Jerome,
I this this is an interesting solution to the problem.
1) use the dialog module ; we can add a function to access the dialog state from the script. 2) when receiving a re-INVITE, if dialog state is NOT-ACKed, drop the request - the delay between re-INVITE and ACK should not be too long (as there were 2 requests going in the same direction, generated in reversed order), so 2, maximum 3 retransmission cycles should cover it up.
At least you can run some test to see what is the average delay between re-INVITE and ACK to get an idea what you should expect for.
regards, bogdan
Jerome Martin wrote:
Dropping the reInvite on the floor in the case of UDP transport certainly is easier than delay/resending
Nice one. Do you have an idea of the average timeout waiting for an INVITE reply ? I mean before resending ? Because I makes no sense anyway to delay for more than this timeout instead of dropping the request ... but dropping requires DB overhead.
But still, only delaying the reINVITE for a fixed amount of time indeed will generate escalating resends if delay is too big, and might no solve 100% of the issue (WILL not) if delay is too short (so to avoid resends). So in order to keep the UA from resending, one must carefully send provisionnal replies while delaying, which starts to get unpleasant to the eye (IMHO).
On Thu, 2006-11-30 at 11:52 +0200, Bogdan-Andrei Iancu wrote:
Hi Jerome,
I this this is an interesting solution to the problem.
- use the dialog module ; we can add a function to access the dialog
state from the script.
Yes, that's what I wanted to do, but I fear there could be a race condition for reading/setting the cookie between openser processes handling the ACK and the following reINVITE. What do you think ?
I wanted to add a "PendingACK" parameter to the dialog cookie upon successfull INVITE transaction, and check that parameter for every reINVITE received (and drop the reINVITE as suggested by T.R. Missner).
- when receiving a re-INVITE, if dialog state is NOT-ACKed, drop the
request - the delay between re-INVITE and ACK should not be too long (as there were 2 requests going in the same direction, generated in reversed order), so 2, maximum 3 retransmission cycles should cover it up.
Agreed.
At least you can run some test to see what is the average delay between re-INVITE and ACK to get an idea what you should expect for.
Just for a test, by using a systematic (blindly) delay of 1 second when receiving a reINVITE, the probleme fully disapears in my tests.
30 nov 2006 kl. 10.57 skrev Jerome Martin:
On Thu, 2006-11-30 at 11:52 +0200, Bogdan-Andrei Iancu wrote:
Hi Jerome,
I this this is an interesting solution to the problem.
- use the dialog module ; we can add a function to access the dialog
state from the script.
Yes, that's what I wanted to do, but I fear there could be a race condition for reading/setting the cookie between openser processes handling the ACK and the following reINVITE. What do you think ?
I wanted to add a "PendingACK" parameter to the dialog cookie upon successfull INVITE transaction, and check that parameter for every reINVITE received (and drop the reINVITE as suggested by T.R. Missner).
- when receiving a re-INVITE, if dialog state is NOT-ACKed, drop the
request - the delay between re-INVITE and ACK should not be too long (as there were 2 requests going in the same direction, generated in reversed order), so 2, maximum 3 retransmission cycles should cover it up.
Agreed.
At least you can run some test to see what is the average delay between re-INVITE and ACK to get an idea what you should expect for.
Just for a test, by using a systematic (blindly) delay of 1 second when receiving a reINVITE, the probleme fully disapears in my tests.
You can also send a "pending" reply if you're already processing one INVITE and get another in the meantime.
/O
You can also send a "pending" reply if you're already processing one INVITE and get another in the meantime.
Yes, the pending idea is a way to improve basic delaying. But the thing is that ACK and INVITE are separate transactions, so testing if the first INVITE transaction has ended is not enough, we also need to look for the subsequent ACK transaction.
Jerome Martin wrote:
You can also send a "pending" reply if you're already processing one INVITE and get another in the meantime.
Yes, the pending idea is a way to improve basic delaying. But the thing is that ACK and INVITE are separate transactions, so testing if the first INVITE transaction has ended is not enough, we also need to look for the subsequent ACK transaction.
I wonder how this can be handled with TCP. When using UDP, the reINVITE can be dropped, and will be resent by the client. But what about TCP? Openser must not drop the INVITE as it won't be retransmitted. Delaying the reINVITE also means that this thread is blocked which may cause DoS attacks (it can be exploited by sending INVITE and reINVITEs without ACK).
regards klaus
Jerome Martin wrote:
On Thu, 2006-11-30 at 11:52 +0200, Bogdan-Andrei Iancu wrote:
Hi Jerome,
I this this is an interesting solution to the problem.
- use the dialog module ; we can add a function to access the dialog
state from the script.
Yes, that's what I wanted to do, but I fear there could be a race condition for reading/setting the cookie between openser processes handling the ACK and the following reINVITE. What do you think ?
with or without a control you will still have this race - actually is quite impossible to get rid of it. Important is to be able to handle it and to recover properly if it occurs. so, if you are parallely processing the re-INVITE and ACK, in the worst case you will not see the ACK and you will trigger a re-INVITE retransmission.
I wanted to add a "PendingACK" parameter to the dialog cookie upon successfull INVITE transaction, and check that parameter for every reINVITE received (and drop the reINVITE as suggested by T.R. Missner).
you do not need a parameter or cookie. The idea is to use the current "dialog" module and just to extend a bit to be able to see the dialog status from the script.
- when receiving a re-INVITE, if dialog state is NOT-ACKed, drop the
request - the delay between re-INVITE and ACK should not be too long (as there were 2 requests going in the same direction, generated in reversed order), so 2, maximum 3 retransmission cycles should cover it up.
Agreed.
At least you can run some test to see what is the average delay between re-INVITE and ACK to get an idea what you should expect for.
Just for a test, by using a systematic (blindly) delay of 1 second when receiving a reINVITE, the probleme fully disapears in my tests.
which means 2 or maximum 3 retransmissions.
regards, bogdan
with or without a control you will still have this race - actually is quite impossible to get rid of it. Important is to be able to handle it and to recover properly if it occurs.
Yes. We just need to store "pendingACK" in RR, so if we drop a reINVITE wrongly (because of the race condition occuring), this will just increase delay a bit. It is possible because we are storing "pendingACK", and not the opposite which would be "NoPendingACK". We're sort of lucky because the first one is the easiest in our case and make handling the race simple. If we were to store the opposite, then the race condition would be a real problem. Am I right ?
so, if you are parallely processing the re-INVITE and ACK, in the worst case you will not see the ACK and you will trigger a re-INVITE retransmission.
Yes.
I wanted to add a "PendingACK" parameter to the dialog cookie upon successfull INVITE transaction, and check that parameter for every reINVITE received (and drop the reINVITE as suggested by T.R. Missner).
you do not need a parameter or cookie. The idea is to use the current "dialog" module and just to extend a bit to be able to see the dialog status from the script.
Well, I was referring to the "dialog cookie" added to RR parameters BY the dialog module, as referenced in http://www.openser.org/docs/modules/1.1.x/dialog.html#AEN93
I really want to implement it using RR storage (via the dialog module) instead of an external database. I haven't looked at either TM or DIALOG source code yet, but if it is trivial for you, I'd gladly test in preproduction any patch you send to me to add/read an arbitrary parameter. But what would be the added sugar in those functions compared to RR module add_rr_param(param) and check_route_param(re) ? Can't I use those already ?
Just for a test, by using a systematic (blindly) delay of 1 second when receiving a reINVITE, the probleme fully disapears in my tests.
which means 2 or maximum 3 retransmissions.
Right.
Hello everyone,
A little update/summary on the issue of reINVITEs being sent before the ACK of previous INVITE :
- I fixed it temporarily by delaying reINVITEs by 1 second This is ugly, but it works for now.
- I'd really like to fix it for good using dialog module (instead of maintaining state in an external DB). The idea is to tag dialogs with pending ACKs, and drop reINVITEs while the tag is still set.
- Bogdan offered to export store/read functions from dialog. I'd really like to see this happen. Any update on this, Bogdan ? Should I add a feature request to tracker ? Should I move this conversation to devel ML ?
- I was really wondering what those functions would do that RR module's add_rr_param and check_route_param don't. Could anyone help me to understand this ?
Best Regards, Jérôme Martin
Hi Jerome,
Jerome Martin wrote:
Hello everyone,
A little update/summary on the issue of reINVITEs being sent before the ACK of previous INVITE :
- I fixed it temporarily by delaying reINVITEs by 1 second
This is ugly, but it works for now.
- I'd really like to fix it for good using dialog module
(instead of maintaining state in an external DB). The idea is to tag dialogs with pending ACKs, and drop reINVITEs while the tag is still set.
- Bogdan offered to export store/read functions from dialog. I'd
really like to see this happen. Any update on this, Bogdan ? Should I add a feature request to tracker ? Should I move this conversation to devel ML ?
yes please - open a feature request to have it in mind for the next release. Just to be sure - the enhancement I was referring at is not about adding additional information about the dialog state as RR param, but just a function to access the internal dialog state.
- I was really wondering what those functions would do that RR
module's add_rr_param and check_route_param don't. Could anyone help me to understand this ?
see above; the dialog module puts a dialog id in as RR param just for a fast dialog matching - the information about the dialog parameters, its state and so on are kept in memory by the dialog module.
Regards, Bogdan
yes please - open a feature request to have it in mind for the next release. Just to be sure - the enhancement I was referring at is not about adding additional information about the dialog state as RR param, but just a function to access the internal dialog state.
Does it mean transaction information is kept in-memory as part of dialog state, and I could check if last ACK was processed ? I can't really see how this state information would be structured/what it contains.
- I was really wondering what those functions would do that RR
module's add_rr_param and check_route_param don't. Could anyone help me to understand this ?
see above; the dialog module puts a dialog id in as RR param just for a fast dialog matching - the information about the dialog parameters, its state and so on are kept in memory by the dialog module.
Thanks, I misunderstood how it works. Now it makes sense :-) Do you plan to add an avp access to store/access arbitrary per-dialog information in-memory ? Like declaring avps to be dialog-specific as dialog module parameter, than just using avp the usual way ?
Regards, Jerome
HI Jerome,
Jerome Martin wrote:
yes please - open a feature request to have it in mind for the next release. Just to be sure - the enhancement I was referring at is not about adding additional information about the dialog state as RR param, but just a function to access the internal dialog state.
Does it mean transaction information is kept in-memory as part of dialog state, and I could check if last ACK was processed ? I can't really see how this state information would be structured/what it contains.
see in modules/dialog/dlg_hash.h the dlg_cell structure, state field. There is no state yet saying if ACK was received or not, but I guess can be added without problems
- I was really wondering what those functions would do that RR
module's add_rr_param and check_route_param don't. Could anyone help me to understand this ?
see above; the dialog module puts a dialog id in as RR param just for a fast dialog matching - the information about the dialog parameters, its state and so on are kept in memory by the dialog module.
Thanks, I misunderstood how it works. Now it makes sense :-) Do you plan to add an avp access to store/access arbitrary per-dialog information in-memory ? Like declaring avps to be dialog-specific as dialog module parameter, than just using avp the usual way ?
the dialog module will export a new PV containing the dialog state.
Regards, bogdan
Hi Bogdan,
see in modules/dialog/dlg_hash.h the dlg_cell structure, state field. There is no state yet saying if ACK was received or not, but I guess can be added without problems
Going to take a look right now :-)
the dialog module will export a new PV containing the dialog state.
OK, so no arbitrary pairs stored by config script. Don't you think that would be usefull ? I could for instance implement my "pending ACK" using this, without any added code in the module so to implement new state information. Plus, the memory consumption of state info would be reduced to only what the user actually needs (think of the acc module move from 1.1 to 1.2, making most fields optionnals via extra accounting).
Regards, Jerome
Jerome,
please open a feature request on the tracker, otherwise I might forget about this :(.
regards, bogdan
Jerome Martin wrote:
Hi Bogdan,
see in modules/dialog/dlg_hash.h the dlg_cell structure, state field. There is no state yet saying if ACK was received or not, but I guess can be added without problems
Going to take a look right now :-)
the dialog module will export a new PV containing the dialog state.
OK, so no arbitrary pairs stored by config script. Don't you think that would be usefull ? I could for instance implement my "pending ACK" using this, without any added code in the module so to implement new state information. Plus, the memory consumption of state info would be reduced to only what the user actually needs (think of the acc module move from 1.1 to 1.2, making most fields optionnals via extra accounting).
Regards, Jerome
Request ID 1610630
On Wed, 2006-12-06 at 14:20 +0200, Bogdan-Andrei Iancu wrote:
Jerome,
please open a feature request on the tracker, otherwise I might forget about this :(.
regards, bogdan
Jerome Martin wrote:
Hi Bogdan,
see in modules/dialog/dlg_hash.h the dlg_cell structure, state field. There is no state yet saying if ACK was received or not, but I guess can be added without problems
Going to take a look right now :-)
the dialog module will export a new PV containing the dialog state.
OK, so no arbitrary pairs stored by config script. Don't you think that would be usefull ? I could for instance implement my "pending ACK" using this, without any added code in the module so to implement new state information. Plus, the memory consumption of state info would be reduced to only what the user actually needs (think of the acc module move from 1.1 to 1.2, making most fields optionnals via extra accounting).
Regards, Jerome
Thanks Jerome!
regards, bogdan
Jerome Martin wrote:
Request ID 1610630
On Wed, 2006-12-06 at 14:20 +0200, Bogdan-Andrei Iancu wrote:
Jerome,
please open a feature request on the tracker, otherwise I might forget about this :(.
regards, bogdan
Jerome Martin wrote:
Hi Bogdan,
see in modules/dialog/dlg_hash.h the dlg_cell structure, state field. There is no state yet saying if ACK was received or not, but I guess can be added without problems
Going to take a look right now :-)
the dialog module will export a new PV containing the dialog state.
OK, so no arbitrary pairs stored by config script. Don't you think that would be usefull ? I could for instance implement my "pending ACK" using this, without any added code in the module so to implement new state information. Plus, the memory consumption of state info would be reduced to only what the user actually needs (think of the acc module move from 1.1 to 1.2, making most fields optionnals via extra accounting).
Regards, Jerome
Bogdan-Andrei Iancu wrote:
Hi Jerome,
Jerome Martin wrote:
Hello everyone,
A little update/summary on the issue of reINVITEs being sent before the ACK of previous INVITE :
- I fixed it temporarily by delaying reINVITEs by 1 second
This is ugly, but it works for now.
- I'd really like to fix it for good using dialog module
(instead of maintaining state in an external DB). The idea is to tag dialogs with pending ACKs, and drop reINVITEs while the tag is still set.
- Bogdan offered to export store/read functions from dialog. I'd
really like to see this happen. Any update on this, Bogdan ? Should I add a feature request to tracker ? Should I move this conversation to devel ML ?
yes please - open a feature request to have it in mind for the next release. Just to be sure - the enhancement I was referring at is not about adding additional information about the dialog state as RR param, but just a function to access the internal dialog state.
- I was really wondering what those functions would do that RR
module's add_rr_param and check_route_param don't. Could anyone help me to understand this ?
see above; the dialog module puts a dialog id in as RR param just for a fast dialog matching - the information about the dialog parameters, its state and so on are kept in memory by the dialog module.
Further, this is the only valid method, as the Record Route parameters can't be changed during the dialog.
regards klaus
Hi Klaus,
see above; the dialog module puts a dialog id in as RR param just for a fast dialog matching - the information about the dialog parameters, its state and so on are kept in memory by the dialog module.
Further, this is the only valid method, as the Record Route parameters can't be changed during the dialog.
I didn't realize that. Thanks for the explanation, I'll look it up in RFCs. Intuitively, I though this was true for the actual route hops, but not for added parameters.
Regards, Jerome