Hi, some questions about LCR module ping mechanism:
- How to know which gws are down? The only I see is a NOTICE log: --------- NOTICE:lcr:gw_set_state: trunk "99.99.99.99:5060" from group: <2> is OFFLINE --------- Is not possible a MI command to list the down gateways?
- Why ping_interval cannot be less than 180 seconds?
- In case of failure_route and "next_gw()", is the used gw (failing gw) automatically marked as down? (it would be useful so we don't need to wait "fr_timer" seconds for each request during "ping_interval").
Thanks.
El Viernes, 20 de Marzo de 2009, Iñaki Baz Castillo escribió:
- Why ping_interval cannot be less than 180 seconds?
I see in the code: ----------------- if (ping_interval < DEF_PING_TIMER) { ping_interval = DEF_PING_TIMER; LM_DBG("set OPTIONS timer to default value <%d>\n", DEF_PING_TIMER); } ---------------
I see no reason for so big minimun value. I need this value being 5-10 seconds.
Hello Iñaki,
you should ask Alexandr he has introdused this feature with rev5452, but I'm not sure if he is on the lists all the time, so I'll forward your question to him...
regards, Andreas
2009/3/20 Iñaki Baz Castillo ibc@aliax.net
El Viernes, 20 de Marzo de 2009, Iñaki Baz Castillo escribió:
- Why ping_interval cannot be less than 180 seconds?
I see in the code:
if (ping_interval < DEF_PING_TIMER) { ping_interval = DEF_PING_TIMER; LM_DBG("set OPTIONS timer to default value <%d>\n", DEF_PING_TIMER); }
I see no reason for so big minimun value. I need this value being 5-10 seconds.
-- Iñaki Baz Castillo
Kamailio (OpenSER) - Users mailing list Users@lists.kamailio.org http://lists.kamailio.org/cgi-bin/mailman/listinfo/users http://lists.openser-project.org/cgi-bin/mailman/listinfo/users
El Viernes, 20 de Marzo de 2009, Iñaki Baz Castillo escribió:
- In case of failure_route and "next_gw()", is the used gw (failing gw)
automatically marked as down? (it would be useful so we don't need to wait "fr_timer" seconds for each request during "ping_interval").
Unfortunatelly I've checked that this is not true, the gateway is not automatically set as "offline" when t_relay fails or when "next_gw" is again called.
Suggestion: it would be really great if "next_gw" automatically sets as down the previous used gw (since it didn't success). Or perhaps a new function "mark_previous_gw_offline()" which could be manually called before "next_gw()".
Opinions?
Iñaki Baz Castillo writes:
Suggestion: it would be really great if "next_gw" automatically sets as down the previous used gw (since it didn't success). Or perhaps a new function "mark_previous_gw_offline()" which could be manually called before "next_gw()".
i think there is currently no information left about previous gw (it is removed from the avp). either it would need to be stored to another avp or the marking function would need to be given the ip address of the previous gw as argument.
-- juha
El Viernes, 20 de Marzo de 2009, Juha Heinanen escribió:
Iñaki Baz Castillo writes:
Suggestion: it would be really great if "next_gw" automatically sets as down the previous used gw (since it didn't success). Or perhaps a new function "mark_previous_gw_offline()" which could be manually called before "next_gw()".
i think there is currently no information left about previous gw (it is removed from the avp). either it would need to be stored to another avp or the marking function would need to be given the ip address of the previous gw as argument.
Yes, it's not so easy as I though initially... but not impossible :)
Iñaki Baz Castillo writes:
- How to know which gws are down? The only I see is a NOTICE log:
NOTICE:lcr:gw_set_state: trunk "99.99.99.99:5060" from group: <2> is OFFLINE
Is not possible a MI command to list the down gateways?
lcr_gw_dump should include information if gw is down:
p = int2str((unsigned long)(*gws)[i].ping, &len); attr = add_mi_attr(node, MI_DUP_VALUE, "PING", 4, p, len); if (attr == NULL) goto err;
value 2 is offline.
- Why ping_interval cannot be less than 180 seconds?
i don't know. the ping stuff was contributed by another author.
- In case of failure_route and "next_gw()", is the used gw (failing gw)
automatically marked as down? (it would be useful so we don't need to wait "fr_timer" seconds for each request during "ping_interval").
it might be possible to write a function that you could call (from failure route) to mark the current gw offline.
-- juha
On Fri, Mar 20, 2009 at 4:26 PM, Juha Heinanen jh@tutpro.com wrote:
Iñaki Baz Castillo writes:
> - How to know which gws are down? The only I see is a NOTICE log: > --------- > NOTICE:lcr:gw_set_state: trunk "99.99.99.99:5060" from group: <2> is OFFLINE > --------- > Is not possible a MI command to list the down gateways?
lcr_gw_dump should include information if gw is down:
p = int2str((unsigned long)(*gws)[i].ping, &len); attr = add_mi_attr(node, MI_DUP_VALUE, "PING", 4, p, len); if (attr == NULL) goto err;
value 2 is offline.
> - Why ping_interval cannot be less than 180 seconds?
i don't know. the ping stuff was contributed by another author.
It should be configurable via a param
> - In case of failure_route and "next_gw()", is the used gw (failing gw) > automatically marked as down? (it would be useful so we don't need to > wait "fr_timer" seconds for each request during "ping_interval").
it might be possible to write a function that you could call (from failure route) to mark the current gw offline.
Or/and a function that will return the status of a particular gw. Before relaying, the script can check the availability of the gw and call the next_gw() if neccessary.
Regards, Ovidiu Sas
El Viernes, 20 de Marzo de 2009, Ovidiu Sas escribió:
> - Why ping_interval cannot be less than 180 seconds?
i don't know. the ping stuff was contributed by another author.
It should be configurable via a param
Well, there is already a param "pin_interval" to set the... ping interval. But for now ifyou set a value less than 180 then it is set to 180. IMHO it makes no sense having both parameters: - ping_interval = 100 - max_ping_interval = 80 XD
> - In case of failure_route and "next_gw()", is the used gw (failing gw) > automatically marked as down? (it would be useful so we don't need to > wait "fr_timer" seconds for each request during "ping_interval").
it might be possible to write a function that you could call (from failure route) to mark the current gw offline.
Or/and a function that will return the status of a particular gw. Before relaying, the script can check the availability of the gw and call the next_gw() if neccessary.
That's already done by "load_gws()" function. It only loads gws marked as active. What I mean is that setting online/offline is done when the ping successes/fails, and it would be nice to set a gw as "offline" manually after failure_route and so (without the need of waiting the ping action to be executed, which could take many seconds yet).
Regards.
if gw could be marked dead from failure route, i don't see much point in pinging the gws, because invite would be the ping.
-- juha
Iñaki Baz Castillo writes:
The previous ping would avoid "fr_timer" seconds of waiting in the INVITE.
inaki,
in a typical case, if ping interval is, say, 10 seconds, and fr_timer, say 3 seconds, i fail to see any big advantage in pinging the gws provided that if fr_timer fires, there is a possibility to mark the gw as being dead.
i'll try to implement this as an alternative/complement for the pinging for the next release.
-- juha
El Sábado, 21 de Marzo de 2009, Juha Heinanen escribió:
Iñaki Baz Castillo writes:
The previous ping would avoid "fr_timer" seconds of waiting in the INVITE.
inaki,
in a typical case, if ping interval is, say, 10 seconds, and fr_timer, say 3 seconds, i fail to see any big advantage in pinging the gws provided that if fr_timer fires, there is a possibility to mark the gw as being dead.
True, but note that default fr_timer is 30 seconds :)
i'll try to implement this as an alternative/complement for the pinging for the next release.
That's great, thanks a lot.
Iñaki Baz Castillo writes:
in a typical case, if ping interval is, say, 10 seconds, and fr_timer, say 3 seconds, i fail to see any big advantage in pinging the gws provided that if fr_timer fires, there is a possibility to mark the gw as being dead.
True, but note that default fr_timer is 30 seconds :)
when i forward a request to a gw, i set fr_timer to 3 sec.
i'll try to implement this as an alternative/complement for the pinging for the next release.
That's great, thanks a lot.
how about waking a gw up from death? there could be a timestamp telling when a gw died and then a module parameter telling how long to wait before the gw is tried again. that way there would be no need for a timer process.
-- juha
El Domingo, 22 de Marzo de 2009, Juha Heinanen escribió:
how about waking a gw up from death? there could be a timestamp telling when a gw died and then a module parameter telling how long to wait before the gw is tried again. that way there would be no need for a timer process.
Good idea.
El Viernes, 20 de Marzo de 2009, Juha Heinanen escribió:
Iñaki Baz Castillo writes:
- How to know which gws are down? The only I see is a NOTICE log:
NOTICE:lcr:gw_set_state: trunk "99.99.99.99:5060" from group: <2> is OFFLINE --------- Is not possible a MI command to list the down gateways?
lcr_gw_dump should include information if gw is down:
p = int2str((unsigned long)(*gws)[i].ping, &len); attr = add_mi_attr(node, MI_DUP_VALUE, "PING", 4, p, len); if (attr == NULL) goto err;
value 2 is offline.
Right, I get the following line:
GW:: GRP_ID=2 IP_ADD=x.x.x.x HOSTNAME= PORT=5060 SCHEME=sip TRANSPORT= STRIP=0 TAG= WEIGHT=1 FLAGS=0 PING=2
However it requires parsing as so...
- Why ping_interval cannot be less than 180 seconds?
i don't know. the ping stuff was contributed by another author.
Ok, so could I submit a patch to just eliminate that requeriment?
- In case of failure_route and "next_gw()", is the used gw (failing gw)
automatically marked as down? (it would be useful so we don't need to wait "fr_timer" seconds for each request during "ping_interval").
it might be possible to write a function that you could call (from failure route) to mark the current gw offline.
ok, that's exactly what I was thinking about. I will try it.
Thanks.
-- juha