5. UA B sends end-to-end ACK for reinvite #1 and almost
simultaneously sends reinvite #2. The temporal delta is
between reinvite #2 and ACK for reinvite #1 on the wire
is 3 ms.
So, the result — for all kinds of stochastic processing and userspace
scheduling type reasons — is that the reinvite is forwarded first,
before the ACK. That leads to a 500 / 491 scenario UA A.
The cause, as we all know, is that Kamailio's worker threads are loosely
coupled, and incoming UDP datagrams are distributed directly by the
kernel.
Is there any general guidance on what to do with these scenarios? I
looked at RFC 5407 § 3.1.4, which appears to describe a similar, but not
identical scenario involving an initial INVITE and subsequent reinvite.
As far as I can tell, the recommendation in that standard is "space the
messaging out more in time".
Switching to TCP would presumably help, since any given flow would
involve a single connection to a single worker thread and the transport
would guarantee ordering. However, that's not really feasible in this
implementation for a host of reasons.
I know Kamailio has some config locking primitives, but I am extremely
wary of complex synchronisation on dialog-identifying attributes,
particularly if there is a possibility that such locks could stall a
worker perpetually.
Any other thoughts welcome!