On 05/02/2021 04.46, David Escartin wrote:
sorry i edit the previous message

....
since version 4.4 we are using this parameter to 1 to allow that ongoing calls process their messages through a particular rtpengine when disabling it manually from database.
But we see that enabling it also makes all the messages select the rtpengine and attempt to send its own op command **when an rtpengine is disabled because the socket or host is unreachable**. This is causing an impact in cases where the rtpengine crashes (luckily not so often).

We have been doing some tests, and with this change
....

El vie, 5 feb 2021 a las 10:43, David Escartin (<descartin@sonoc.io>) escribió:
Dear all

since version 4.4 we are using this parameter to 1 to allow that ongoing calls process their messages through a particular rtpengine when disabling it manually from database.
But we see that enabling it also makes all the messages select the rtpengine and attempt to send its own op command. This is causing an impact in cases where the rtpengine crashes (luckily not so often).

We have been doing some tests, and with this change

 diff --git a/src/modules/rtpengine/rtpengine.c b/src/modules/rtpengine/rtpengine.c
index 20df725d2e..2f6130c62a 100644
--- a/src/modules/rtpengine/rtpengine.c
+++ b/src/modules/rtpengine/rtpengine.c
@@ -3138,11 +3138,12 @@ select_rtpp_node(str callid, str viabranch, int do_test, struct rtpp_node **quer
                if (node->rn_recheck_ticks == RTPENGINE_MAX_RECHECK_TICKS) {
                        LM_DBG("node=%.*s for calllen=%d callid=%.*s is disabled(permanent) (probably still UP)! Return it\n",
                                node->rn_url.len, node->rn_url.s, callid.len, callid.len, callid.s);
+                        return node;
                } else {
                        LM_DBG("node=%.*s for calllen=%d callid=%.*s is disabled, either broke or timeout disabled! Return it\n",
                                node->rn_url.len, node->rn_url.s, callid.len, callid.len, callid.s);
                }
-               return node;
+               /*return node;*/
        }
 
        return NULL;

it would make to only use allow_op for manually disabled rtpengine nodes, and would make to not select the node and not try to send the rtpp command, so we would not have almost any impact in kamailio like timeouts and congestion, expect having messages not doing rtpengine in a call where others did.

Do you think this change could make sense?
Is there any reason for being like it is now?

Don't have any direct experience with using that option, but it makes sense to me. Perhaps having a third value for that option (0/1/2) is the best way forward here, to retain compatibility for people who rely on the old behaviour. You can open a pull request on Github.

Cheers