Hi all, I see a very strange behaviour using accounting on RADIUS. If RADIUS server is up and running, all is OK. If it is down dialog module (in db mode) can't work properly: In the log I see: after 200 OK to INVITE CRITICAL:dialog:log_next_state_dlg: bogus event 6 in state 2 for dlg 0xb61c0918 [360:515907930] with clid 'B8108B3CEBD9B8E3E1CC23DE8361D35F@sipsvr' and tags 'C319BA6F3E777E04867202774CE6443B' ''
after BYE CRITICAL:dialog:log_next_state_dlg: bogus event 7 in state 2 for dlg 0xb61c0918 [360:515907930] with clid 'B8108B3CEBD9B8E3E1CC23DE8361D35F@sipsvr' and tags 'C319BA6F3E777E04867202774CE6443B' ''
and no record is written in the dialog table. All come back to work after starting RADIUS server. This is not the main problem... In some circumstance kamailio fails to route properly the calls, due retransmissions; for instance after the client answers I see a lot of 200 OK sent to kamailio because it doesn't send the ACK immediately. In the log I see for each 200 OK: ... Mar 9 11:34:01 [27466] DBG:tm:timer_routine: timer routine:3,tl=0xb61a8ecc next=(nil), timeout=75 Mar 9 11:34:01 [27466] DBG:tm:delete_handler: removing 0xb61a8e68 Mar 9 11:34:01 [27466] DBG:tm:delete_cell: delete_cell 0xb61a8e68: can't delete -- still reffed (1) Mar 9 11:34:01 [27466] DBG:tm:set_timer: relative timeout is 2 Mar 9 11:34:01 [27466] DBG:tm:insert_timer_unsafe: [3]: 0xb61a8ecc (77) Mar 9 11:34:01 [27466] DBG:tm:delete_handler: done ... I was not able to replicate this problem. Solved after starting RADIUS server, but the problem doesn't occur if I stop it. I'm using kamailio 1.4.3, but the last problem occurred also on a production system with openser 1.2.3. After customer's RADIUS server went down, openser wasn't able to route all calls.
Thank you very much for support. Regards, Antonio.
Antonio Reale writes:
I see a very strange behaviour using accounting on RADIUS. If RADIUS server is up and running, all is OK. If it is down dialog module (in db mode) can't work properly:
is dialog module really using radius? if not, perhaps it is accounting or something else that you have which uses radius.
if you want your proxy to work fine without having radius server running, i don't see any other way than stop using any radius based modules.
-- juha
2009/3/9 Juha Heinanen jh@tutpro.com:
is dialog module really using radius? if not, perhaps it is accounting or something else that you have which uses radius.
Hi Juha, RADIUS is used only for accounting. I don't see direct relationship between RADIUS and dialog modules. I think that it is due to the state of slowness caused by the absence of the RADIUS server. About this error Bogdan wrote on OpenSIPS ML: "bogus event 6 in state 2 for" means ACK was received while still in in EARLY state (no 200 OK received). This is a know bug...".
if you want your proxy to work fine without having radius server running, i don't see any other way than stop using any radius based modules.
Sure.. I configured RADIUS in the kamailio script only in scenarios that really need it. The problem here is: what happen if the RADIUS server goes down for a fault? I expect that calls are not accounted, but correctly routed. Insted for two times I had a loss of service and I had to temporarly disable RADIUS module to put it in service.
Is there any timer for the kamailio to wait responses by the RADIUS server?
-- juha
Regards. Antonio.
Antonio Reale writes:
Sure.. I configured RADIUS in the kamailio script only in scenarios that really need it. The problem here is: what happen if the RADIUS server goes down for a fault? I expect that calls are not accounted, but correctly routed. Insted for two times I had a loss of service and I had to temporarly disable RADIUS module to put it in service.
Is there any timer for the kamailio to wait responses by the RADIUS server?
radiusclient-ng lib has radius_timeout config variable. i would imagine that if there is no response from the server by that time, radius request will fail. i don't know what accounting module does if accounting request fails.
-- juha
2009/3/10 Juha Heinanen jh@tutpro.com:
radiusclient-ng lib has radius_timeout config variable.
[CUT]
Thanks. I'll try if this can help. Waiting for someone that can confirme the issue.
-- juha
Antonio.
Antonio Reale writes:
Thanks. I'll try if this can help. Waiting for someone that can confirme the issue.
radius accounting is different from other radius modules, because in acc module, accounting is triggered by tm module. in other radius modules, if radius request fails, the corresponding script function fails too.
if sending of radius accounting request fails, it should not fail the transaction. i have not tested if it does.
-- juha
2009/3/10 Juha Heinanen jh@tutpro.com:
radius accounting is different from other radius modules, because in acc module, accounting is triggered by tm module.
[CUT]
OK. So this can be the relationship between the accounting and the delay relaying SIP messages. Is that so?
if sending of radius accounting request fails, it should not fail the transaction. i have not tested if it does.
I think that it doesn't fail the transaction but it affect the signaling. Attached the short wireshark trace when reproduced the issue in lab with RADIUS server down. In the trace I see about 10 seconds between the reception and the relay of the ACK by kamailio (maybe "radius_timeout 10" in radiusclient.conf ?) P.S.: in the trace you see only two IP addresses because both clients are on 10.10.45.86...
-- juha
Thank you for your help. Regards. Antonio.
AFAIR, if radius server is down, ACK will not be relayed. The call cannot be accounted and therefor the call is dropped.
So, the radius server is a single point of failure.
Regards, Ovidiu Sas
On Tue, Mar 10, 2009 at 6:49 AM, Antonio Reale ant.reale@gmail.com wrote:
2009/3/10 Juha Heinanen jh@tutpro.com:
radius accounting is different from other radius modules, because in acc module, accounting is triggered by tm module.
[CUT]
OK. So this can be the relationship between the accounting and the delay relaying SIP messages. Is that so?
if sending of radius accounting request fails, it should not fail the transaction. i have not tested if it does.
I think that it doesn't fail the transaction but it affect the signaling. Attached the short wireshark trace when reproduced the issue in lab with RADIUS server down. In the trace I see about 10 seconds between the reception and the relay of the ACK by kamailio (maybe "radius_timeout 10" in radiusclient.conf ?) P.S.: in the trace you see only two IP addresses because both clients are on 10.10.45.86...
-- juha
Thank you for your help. Regards. Antonio.
Kamailio (OpenSER) - Users mailing list Users@lists.kamailio.org http://lists.kamailio.org/cgi-bin/mailman/listinfo/users http://lists.openser-project.org/cgi-bin/mailman/listinfo/users
Ovidiu Sas writes:
AFAIR, if radius server is down, ACK will not be relayed. The call cannot be accounted and therefor the call is dropped.
So, the radius server is a single point of failure.
not necessarily a SINGLE point of failure, i.e., there can be more than one radius server. my suggestion is have more than one radius server and set radius_timeout and radius_retries to small values.
-- juha
/etc/radiusclient-ng/radiusclient.conf
# RADIUS server to use for authentication requests. this config # item can appear more then one time. if multiple servers are # defined they are tried in a round robin fashion if one # server is not answering. # optionally you can specify a the port number on which is remote # RADIUS listens separated by a colon from the hostname. if # no port is specified /etc/services is consulted of the radius # service. if this fails also a compiled in default is used. authserver 127.0.0.1
# RADIUS server to use for accouting requests. All that I # said for authserver applies, too. # acctserver 127.0.0.1
2009/3/10 Juha Heinanen jh@tutpro.com:
Ovidiu Sas writes:
> AFAIR, if radius server is down, ACK will not be relayed. > The call cannot be accounted and therefor the call is dropped. > > So, the radius server is a single point of failure.
not necessarily a SINGLE point of failure, i.e., there can be more than one radius server. my suggestion is have more than one radius server and set radius_timeout and radius_retries to small values.
OK. Probably this is a good solution. Anyway I thing that there is a strange behaviour because in the majority of tests the loss of RADIUS server affect the accounting but not the signaling, only sometimes it affect the signaling. Thanks to all for the attention and support.
-- juha
Regards, Antonio
Would using a local radius-proxy help in this case?
Kamailio configured to communicate with the local proxy which can queue the request for the remote radius server when it becomes unavailable.