[SR-Users] kamailio does not responde if an rtpengine is unreachable
miconda at gmail.com
Fri Dec 28 10:15:34 CET 2018
I just pushed a series of commits trying to rework how loading (and
reloading) of rtpegines list is done, to avoid that sync'ed probing,
which can take long if any of the rtpengines is down.
Now, building the local (per process) structures/sockets for rtpengines
during kamailio start up is done without locking. This is guarded by the
fact a reload command can be executed only after all children were
initialized (added also with these commits). Moreover, the probing of
rtpeningesis done only by child process 1, because the status is stored
in shared memory list, so it is visible in all children. Based on my
understanding there, doing probing from all processes is useless now,
that was probably kept from the time when the list was not stored in
shared memory, from the early rtpproxy times.
There is also a restriction on how often the rtpengine list can be
reloaded, now having a 10 seconds interval guard. I added this because
the reload is done over the old list, not building a new list to swap
with the old one. So it requires some time to walk through the existing
list and update based on the new records. I went this way for now, even
building a new list may be better/safer in long term, but it would
require more work. I also wanted to avoid being very intrusive right
now, given that those patches would need to be backported.
The last relevant change was to use a version number to discover when a
reload was done. So far, as I understood, it was relying on the number
of rtpengines, but one may trigger a reload with same rtpengines, but
different attributes (e.g., disabled or not). Having a version number is
better in detecting when each worker needs to rebuild its local list of
sockets, as well as for troubleshooting, because a value is increased
with each reload, so easier to track if it was done or now.
I didn't have time for any tests, so it would be good if you can test
and report if works as expected.
All related commits are in master, if they prove to work fine, we can
backport all those patches.
On 26.12.18 12:46, Juha Heinanen wrote:
> Daniel-Constantin Mierla writes:
>> I pushed a quick fix for the case when db support is not enabled,
>> because these locks are useless in that case, so all children will do
>> the rtpengine init at the same time, without waiting for the others:
> Still took in rtpengine db mode about 2 minutes before kamailio became
> responsive after start.
> -- Juha
Daniel-Constantin Mierla -- www.asipto.com
www.twitter.com/miconda -- www.linkedin.com/in/miconda
Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com
Kamailio Advanced Training - Mar 4-6, 2019 in Berlin; Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
More information about the sr-users