I'm observing the following scenario:

mod_dialog callbacks trigger 2 or more times (nearly) simultaneously for the same dialog
pua_dialoginfo sends PUBLISH 1, referencing etag A
pua_dialoginfo sends PUBLISH 2, referencing etag A
presence_dialoginfo processes PUBLISH 1, replies with new etag B
presence_dialoginfo processes PUBLISH 2, replies with a 412 (because etag A no longer exists)
pua_dialoginfo receives the 412 and re-tries it as PUBLISH 3 ("sent a PUBLISH within a dialog that no longer exists, send again an intial PUBLISH")
presence_dialoginfo processes PUBLISH 3, and may or may not accept it

The situation as described is not ideal since it'll fill up your logs with errors, but isn't critical per se. Much more problematic is when there are more than 2 PUBLISHes generated for the same dialog simultaneously, as this can cause a (near) infinite race between the various PUBLISH requests all fighting to update the same etag. For example, 10 PUBLISH are sent out for etag A; all but one are rejected with a 412; then the other 9 keep on bouncing back and forth between pua_dialoginfo and presence_dialoginfo because they do not share the same view on the dialog's latest etag.

Even worse is when presence_dialoginfo is rejecting all incoming PUBLISHes with a 412, for example because of a database/memory/replication problem or a malformed request. A t_reply("412", "Not today") in the presence_dialoginfo server, combined with a single PUBLISH from pua_dialoginfo is enough to reproducibly brick the pua_dialoginfo server because it runs into critical memory fragmentation levels.

I think there are multiple ways to fix or alleviate this problem.

pua generic

pua (publ_cback_func) should not retry 412-failed PUBLISHes indefinitely, but e.g. at most once
pua should not generate simultaneous PUBLISHes for the same presentity. It should delay PUBLISH 2 until PUBLISH 1 is either (permanently) accepted or rejected; or it should discard PUBLISH 2 immediately when it is generated.
Perhaps make handling of 412 replies more fine-grained. Currently every 412 reply is handled like this ("sent a PUBLISH within a dialog that no longer exists"), while that statement doesn't apply to all possible 412 replies.

pua_dialoginfo specific

pua_dialoginfo currently subscribes to a lot of mod_dialog callbacks. For example subscribing to both DLGCB_CONFIRMED and DLGCB_CONFIRMED_NA will always get you two (rapidly succeeding) PUBLISHes with exactly the same contents. Subscribing to DLGCB_REQ_WITHIN means you'll get a new PUBLISH for every re-INVITE such as hold/unhold or codec negotiation, which is useless in many usecases. It would be helpful to allow configuring pua_dialoginfo with a list of callbacks to subscribe to.
With or without a smaller set of mod_dialog callbacks, pua_dialoginfo can generate multiple PUBLISH requests with exactly the same contents. Since pua is aware of the (last) state that it published for the presentity, it could compare if the newly generated PUBLISH is any different from the last known state, and discard it if it's not.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.