UDP workers block when one or more rtpengine instances go offline

List overview All Threads
Download

newer

older

Ratio between cores ram and...

kamcmd htable.reload...

Sebastian Damm

6 Aug 2018 6 Aug '18

10:58 a.m.

Hi,

we run multiple rtpengine servers to share the load. Whenever we need to take an rtpengine server offline, we used to just block the control port via iptables, then no new calls ended up on this instance of rtpengine. This worked pretty well in Kamailio 4.4.5.

However, since Kamailio 5.0, and the problem persists with 5.1.4, Kamailio hangs almost immediately after we block the control port traffic. In the log file there are almost no packets processed except every few seconds, which looks like a timeout thing.

Did we configure anything wrong there? Or is the "dead rtpengine detection" just broken?

Our configuration:

loadmodule "rtpengine.so" modparam("rtpengine", "rtpengine_disable_tout", 120) modparam("rtpengine", "setid_avp", "$avp(rtpsetid)") modparam("rtpengine", "rtpengine_sock", "0 == udp:1.2.3.4:9001=2 udp:1.2.3.5:9001=2 udp:1.2.3.6:9001=2") modparam("rtpengine", "rtpengine_sock", "1 == udp:2.3.4.5:9001=2")

Any help appreciated.

Regards, Sebastian

Show replies by date

Daniel Tryba

6 Aug 6 Aug

12:15 p.m.

New subject: UDP workers block when one or more rtpengine instances go offline

On Mon, Aug 06, 2018 at 12:58:00PM +0200, Sebastian Damm wrote:

...

we run multiple rtpengine servers to share the load. Whenever we need to take an rtpengine server offline, we used to just block the control port via iptables, then no new calls ended up on this instance of rtpengine. This worked pretty well in Kamailio 4.4.5.

No answer to you question (which sounds like a legit problem), but why not do it with: kamcmd rtpengine.enable udp:x.y.z.a:port 0/1 on the kamailio machines?

By simply blocking you might interrupt updates in SDP for running calls and prevent hangups from being sent.

Sebastian Damm

12:37 p.m.

New subject: UDP workers block when one or more rtpengine instances go offline

Hi Daniel,

On Mon, Aug 6, 2018 at 2:17 PM Daniel Tryba d.tryba@pocos.nl wrote:

...

No answer to you question (which sounds like a legit problem), but why not do it with: kamcmd rtpengine.enable udp:x.y.z.a:port 0/1 on the kamailio machines?

Of course, that's probably the better way. But as far as I know, that command wasn't available before 5.0. So I guess our blocking of the control port was the way we did it before that.

But anyhow, if an rtpengine crashes, Kamailio shouldn't block.

Sebastian

Daniel Tryba

4:38 p.m.

New subject: UDP workers block when one or more rtpengine instances go offline

On Mon, Aug 06, 2018 at 02:37:15PM +0200, Sebastian Damm wrote:

...

...
kamcmd rtpengine.enable udp:x.y.z.a:port 0/1 on the kamailio machines?

Of course, that's probably the better way. But as far as I know, that command wasn't available before 5.0. So I guess our blocking of the control port was the way we did it before that.

In the past (4.x) the command was: kamctl fifo nh_enable_rtpp udp:x.y.z.a:port 0/1

Daniel Tryba

4:55 p.m.

New subject: UDP workers block when one or more rtpengine instances go offline

On Mon, Aug 06, 2018 at 02:37:15PM +0200, Sebastian Damm wrote:

...

Of course, that's probably the better way. But as far as I know, that command wasn't available before 5.0. So I guess our blocking of the control port was the way we did it before that.

But anyhow, if an rtpengine crashes, Kamailio shouldn't block.

BTW forgot to ask: are you REJECTing or DROPing packets? A reject should trigger a failover in the rtpengine_* calls immediately. A drop will result in a timeout mechanism triggering, which according to your description blocks the thread.

Sebastian Damm

7 Aug 7 Aug

12:49 p.m.

New subject: UDP workers block when one or more rtpengine instances go offline

On Mon, Aug 6, 2018 at 6:56 PM Daniel Tryba d.tryba@pocos.nl wrote:

...

BTW forgot to ask: are you REJECTing or DROPing packets? A reject should trigger a failover in the rtpengine_* calls immediately.

Of course, we block the traffic with a REJECT rule.

...

A drop will result in a timeout mechanism triggering, which according to your description blocks the thread.

But since we have configured 120 seconds as disable_timeout, I could expect a short period of blocking but after that it should run again for at least 120 seconds. But from what we see, this does not happen.

Oh, and we tested the disabling and enabling via kamctl before, but as far as I remember, while disabling still worked, Kamailio crashed reproducably when enabling an rtpengine again.

Regards, Sebastian

Daniel Tryba

3:52 p.m.

New subject: UDP workers block when one or more rtpengine instances go offline

On Tue, Aug 07, 2018 at 02:49:58PM +0200, Sebastian Damm wrote:

...

Oh, and we tested the disabling and enabling via kamctl before, but as far as I remember, while disabling still worked, Kamailio crashed reproducably when enabling an rtpengine again.

We had the same problem, this was fixed in a late 4.4.x (6 or 7).

Richard Fuchs

1:03 p.m.

New subject: UDP workers block when one or more rtpengine instances go offline

On 2018-08-06 06:58, Sebastian Damm wrote:

...

Hi,

we run multiple rtpengine servers to share the load. Whenever we need to take an rtpengine server offline, we used to just block the control port via iptables, then no new calls ended up on this instance of rtpengine. This worked pretty well in Kamailio 4.4.5.

However, since Kamailio 5.0, and the problem persists with 5.1.4, Kamailio hangs almost immediately after we block the control port traffic. In the log file there are almost no packets processed except every few seconds, which looks like a timeout thing.

Did we configure anything wrong there? Or is the "dead rtpengine detection" just broken?

Our configuration:

loadmodule "rtpengine.so" modparam("rtpengine", "rtpengine_disable_tout", 120) modparam("rtpengine", "setid_avp", "$avp(rtpsetid)") modparam("rtpengine", "rtpengine_sock", "0 == udp:1.2.3.4:9001=2 udp:1.2.3.5:9001=2 udp:1.2.3.6:9001=2") modparam("rtpengine", "rtpengine_sock", "1 == udp:2.3.4.5:9001=2")

When you query the running config via kamcmd for the value of rtpengine_tout_ms, what does it say? (Wondering if the default value of 1000 properly gets established or if some other value is in effect - it shouldn't block longer than this value)

Sebastian Damm

8 Aug 8 Aug

11:38 a.m.

New subject: UDP workers block when one or more rtpengine instances go offline

On Tue, Aug 7, 2018 at 3:04 PM Richard Fuchs rfuchs@sipwise.com wrote:

...

On 2018-08-06 06:58, Sebastian Damm wrote: When you query the running config via kamcmd for the value of rtpengine_tout_ms, what does it say? (Wondering if the default value of 1000 properly gets established or if some other value is in effect - it shouldn't block longer than this value)

kamcmd> cfg.get rtpengine rtpengine_tout_ms 1000

I actually don't know how long it blocks for one request. But I know that whenever one RTPengine is gone, we get "SIP offline" notifications from our monitoring system (sending SIP OPTIONS) within minutes. I think, waiting for an RTPengine answer for a second is okay if it happens once every 120 seconds, but it's not okay if it happens every time.

Sebastian

Muhammad Zaka

9 Aug 9 Aug

6:32 a.m.

New subject: UDP workers block when one or more rtpengine instances go offline

Hi Sebastian,

You may need the following fix for your rtpengine module in Kamailio.

https://github.com/kamailio/kamailio/pull/1593

We had the similar issue with rtpengine module in Kamailio as it is using package memory and not shared memory.

Many Thanks

Regards Muhammad Zaka

-----Original Message----- From: sr-users sr-users-bounces@lists.kamailio.org On Behalf Of Sebastian Damm Sent: 08 August 2018 12:38 To: Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org Subject: Re: [SR-Users] UDP workers block when one or more rtpengine instances go offline

On Tue, Aug 7, 2018 at 3:04 PM Richard Fuchs rfuchs@sipwise.com wrote:

...

On 2018-08-06 06:58, Sebastian Damm wrote: When you query the running config via kamcmd for the value of rtpengine_tout_ms, what does it say? (Wondering if the default value of 1000 properly gets established or if some other value is in effect - it shouldn't block longer than this value)

kamcmd> cfg.get rtpengine rtpengine_tout_ms 1000

Sebastian

_______________________________________________ Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users

Sebastian Damm

8 a.m.

New subject: UDP workers block when one or more rtpengine instances go offline

Hi Muhammad, On Thu, Aug 9, 2018 at 8:34 AM Muhammad Zaka muhammad.zaka@cloudcall.com wrote:

...

You may need the following fix for your rtpengine module in Kamailio. https://github.com/kamailio/kamailio/pull/1593 We had the similar issue with rtpengine module in Kamailio as it is using package memory and not shared memory.

...

From the commit message I would expect this patch to apply only to

deleted rtpengine nodes but not temporarily offline nodes? Or is the same code run when temporarily disabling a node as well?

Regards, Sebastian

Muhammad Zaka

10 Aug 10 Aug

9:20 a.m.

New subject: UDP workers block when one or more rtpengine instances go offline

Hi Sebastian

It looks like your issue is related to package memory and not shared memory. Kamailio forked instance will pick the same rtpengine and is blocked via sending command to offline nodes socket connection.

Many Thanks

Regards Muhammad Zaka

-----Original Message----- From: sr-users sr-users-bounces@lists.kamailio.org On Behalf Of Sebastian Damm Sent: 09 August 2018 09:00 To: Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org Subject: Re: [SR-Users] UDP workers block when one or more rtpengine instances go offline

Hi Muhammad, On Thu, Aug 9, 2018 at 8:34 AM Muhammad Zaka muhammad.zaka@cloudcall.com wrote:

...

You may need the following fix for your rtpengine module in Kamailio. https://github.com/kamailio/kamailio/pull/1593 We had the similar issue with rtpengine module in Kamailio as it is using package memory and not shared memory.

From the commit message I would expect this patch to apply only to deleted rtpengine nodes but not temporarily offline nodes? Or is the same code run when temporarily disabling a node as well?

Regards, Sebastian

_______________________________________________ Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users

2544

Age (days ago)

2548

Last active (days ago)

sr-users@lists.kamailio.org

11 comments

4 participants

tags (0)

participants (4)

Daniel Tryba
Muhammad Zaka
Richard Fuchs
Sebastian Damm