Hi, I would like to share some experience using LCR under Kamailio 3.X in which there is no longer OPTIONS-based gateways monitorization.
Now, the way to dissable a gateway is by calling defunct_gw() in a failure_route block (i.e. when there is no response for a request and fr_timer fires). So it's based on a single request processing. This is dangerous and I will put a real example:
An ugly client sends us a request with a malformed P-Asserted-Identity as follows:
P-Asserted-Identity(sip@domain.com
Note that it's an *invalid* header. But Kamailio "allows" it and the request arrives to the GW. But the GW drops the request due to the malformed header so it sends NO reply at all. Then timeout occurs in the client transaction and failure_route block is called in which I call to defunct_gw().
Conclusion: an attacker could dissable my gws just by sending a simple malformed request. I strongly miss the monitorization feature in the old LCR module. And ever worse, I could make my own monitorization client by sending OPTIONS to all the gateways, but LCR module does not include a simple MI command to enable/dissable a gw so, what should I do? re-populate all the LCR tables and invoke LCR reload() MI command every time I detect a gw is offline/online?
Regards.
Iñaki Baz Castillo writes:
An ugly client sends us a request with a malformed P-Asserted-Identity as follows:
P-Asserted-Identity(sip@domain.com
Note that it's an *invalid* header. But Kamailio "allows" it and the request arrives to the GW. But the GW drops the request due to the malformed header so it sends NO reply at all. Then timeout occurs in the client transaction and failure_route block is called in which I call to defunct_gw().
check the headers you are forwarding to your gws. also, you can count the number of failures yourself by using htable, for example, and not defunct your gw based on the first failure. further, you could define a timed route, and based on the htable, ping your gws.
Conclusion: an attacker could dissable my gws just by sending a simple malformed request. I strongly miss the monitorization feature in the old LCR module.
my conclusion is as it was before: keep lcr module simple and do monitoring separately. it might be possible to include a mi command to manage defunct time of a gw, but i'm not sure about it, because currently the tables may not include enough info to pinpoint a particular gw.
-- juha
2011/12/28 Juha Heinanen jh@tutpro.com
An ugly client sends us a request with a malformed P-Asserted-Identity as follows:
P-Asserted-Identity(sip@domain.com
Note that it's an *invalid* header. But Kamailio "allows" it and the request arrives to the GW. But the GW drops the request due to the malformed header so it sends NO reply at all. Then timeout occurs in the client transaction and failure_route block is called in which I call to defunct_gw().
check the headers you are forwarding to your gws.
There is no way to detect a header like I meant:
P-Asserted-Identity(sip@domain.com
Kamailio parser does not detect such header as "P-Asserted-Identity".
Also, it's unfeasible that a proxy checks the syntax of all the headers. Typically a proxy just cares about some few headers.
also, you can count the number of failures yourself by using htable, for example, and not defunct your gw based on the first failure.
So the attacker should just send 5 malformed requests rather than one.
further, you could define a timed route, and based on the htable, ping your gws.
Right, but is failure_route executed for those locally sent requests? I must check it.
Conclusion: an attacker could dissable my gws just by sending a simple malformed request. I strongly miss the monitorization feature in the old LCR module.
my conclusion is as it was before: keep lcr module simple and do monitoring separately. it might be possible to include a mi command to manage defunct time of a gw, but i'm not sure about it, because currently the tables may not include enough info to pinpoint a particular gw.
IMHO that's due to the design of the tables in LCR module. IMHO there should be a table just with gws definition (without containing the lcr_id field). It would make easier the management for cases like the present (just my opinion).
Regards.
-- Iñaki Baz Castillo ibc@aliax.net
Iñaki Baz Castillo writes:
Kamailio parser does not detect such header as "P-Asserted-Identity".
Also, it's unfeasible that a proxy checks the syntax of all the headers. Typically a proxy just cares about some few headers.
then use the script function that drops all headers except the ones your gw cares about.
also, you can count the number of failures yourself by using htable, for example, and not defunct your gw based on the first failure.
So the attacker should just send 5 malformed requests rather than one.
see above. also, there is response '400 bad request'. fix your gw to use it.
IMHO that's due to the design of the tables in LCR module. IMHO there should be a table just with gws definition (without containing the lcr_id field). It would make easier the management for cases like the present (just my opinion).
you may be right about that one. when i have time, i'll take a look at it.
-- juha
2011/12/28 Juha Heinanen jh@tutpro.com:
Kamailio parser does not detect such header as "P-Asserted-Identity".
Also, it's unfeasible that a proxy checks the syntax of all the headers. Typically a proxy just cares about some few headers.
then use the script function that drops all headers except the ones your gw cares about.
The client and the GW are UA's, but the proxy is just a proxy. As a proxy, I cannot decide which headers are mandatory and which ones can be dropped. The UAC (client) and UAS (gw) could decide to negotiate some SIP extension I'm not aware of, or I do know but it's transparent for me (the proxy), for example PRACK usage which involves new headers.
Iñaki Baz Castillo writes:
The client and the GW are UA's, but the proxy is just a proxy. As a proxy, I cannot decide which headers are mandatory and which ones can be dropped. The UAC (client) and UAS (gw) could decide to negotiate some SIP extension I'm not aware of, or I do know but it's transparent for me (the proxy), for example PRACK usage which involves new headers.
as i already said, then fix your gw to reply with '400 bad request'.
-- juha
2011/12/28 Juha Heinanen jh@tutpro.com:
Iñaki Baz Castillo writes:
The client and the GW are UA's, but the proxy is just a proxy. As a proxy, I cannot decide which headers are mandatory and which ones can be dropped. The UAC (client) and UAS (gw) could decide to negotiate some SIP extension I'm not aware of, or I do know but it's transparent for me (the proxy), for example PRACK usage which involves new headers.
as i already said, then fix your gw to reply with '400 bad request'.
Juha, the wrong header I meant is not a wrong P-Asserted-Identity, but a malformed text that makes *all* the SIP message invalid. So a proxy or server receiving it can just *drop* the request (the same Kamailio does if the Request URI contains a falmormed URI).
So there is nothing to fix in the GW. It's behaving 100% correctly.
Regards.
Iñaki Baz Castillo writes:
Juha, the wrong header I meant is not a wrong P-Asserted-Identity, but a malformed text that makes *all* the SIP message invalid. So a proxy or server receiving it can just *drop* the request (the same Kamailio does if the Request URI contains a falmormed URI).
So there is nothing to fix in the GW. It's behaving 100% correctly.
i'm lost. why is your sip proxy forwarding a sip request that is totally (not just one header) malformed? and why is your sip proxy able to forward such a request when your uas is not able to respond to it? any how is this all correct based on rfc3261?
-- juha
2011/12/28 Juha Heinanen jh@tutpro.com:
Iñaki Baz Castillo writes:
Juha, the wrong header I meant is not a wrong P-Asserted-Identity, but a malformed text that makes *all* the SIP message invalid. So a proxy or server receiving it can just *drop* the request (the same Kamailio does if the Request URI contains a falmormed URI).
So there is nothing to fix in the GW. It's behaving 100% correctly.
i'm lost. why is your sip proxy forwarding a sip request that is totally (not just one header) malformed? and why is your sip proxy able to forward such a request when your uas is not able to respond to it? any how is this all correct based on rfc3261?
Hi Juha, my SIP proxy is Kamailio 3.2 :)
And why it is forwarding such an invalid request is a good question. Please let me get a trace of the malformed request. Tomorrow I will paste it here.
Cheers.
2011/12/28 Iñaki Baz Castillo ibc@aliax.net:
i'm lost. why is your sip proxy forwarding a sip request that is totally (not just one header) malformed? and why is your sip proxy able to forward such a request when your uas is not able to respond to it? any how is this all correct based on rfc3261?
Hi Juha, my SIP proxy is Kamailio 3.2 :)
And why it is forwarding such an invalid request is a good question. Please let me get a trace of the malformed request. Tomorrow I will paste it here.
Cheers.
Anyhow, it could occur that the request contains a valid SIP header (token: value) but with an invalid grammar (i.e. a P-Asserted-Identity with a wrong URI value). The proxy could not check it (because in some configuration it just ignores such a header) but later the UAS/GW could inspect it and drop the request since the header is invalid (and it could reply a 400 or just drop the request).
Iñaki Baz Castillo writes:
Anyhow, it could occur that the request contains a valid SIP header (token: value) but with an invalid grammar (i.e. a P-Asserted-Identity with a wrong URI value). The proxy could not check it (because in some configuration it just ignores such a header) but later the UAS/GW could inspect it and drop the request since the header is invalid (and it could reply a 400 or just drop the request).
where in rfc3261 is it stated that UAS is allowed to drop a request that it could respond to (i.e., the request has valid via header).
-- juha
2011/12/28 Juha Heinanen jh@tutpro.com:
Iñaki Baz Castillo writes:
Anyhow, it could occur that the request contains a valid SIP header (token: value) but with an invalid grammar (i.e. a P-Asserted-Identity with a wrong URI value). The proxy could not check it (because in some configuration it just ignores such a header) but later the UAS/GW could inspect it and drop the request since the header is invalid (and it could reply a 400 or just drop the request).
where in rfc3261 is it stated that UAS is allowed to drop a request that it could respond to (i.e., the request has valid via header).
If the Via header is invalid, then the Proxy/UAS *cannot* reply to the request, or not in every cases. Theorically the Proxy/UAS replies to a request based on the information in Via header.
Iñaki Baz Castillo writes:
If the Via header is invalid, then the Proxy/UAS *cannot* reply to the request, or not in every cases. Theorically the Proxy/UAS replies to a request based on the information in Via header.
since via header was added by your sip proxy, it is valid and the uas should be able to respond to the request. if it is does not, show me where is rfc3261 it is written and uas does not need to respond to such a request.
-- juha
2011/12/28 Juha Heinanen jh@tutpro.com:
Iñaki Baz Castillo writes:
If the Via header is invalid, then the Proxy/UAS *cannot* reply to the request, or not in every cases. Theorically the Proxy/UAS replies to a request based on the information in Via header.
since via header was added by your sip proxy, it is valid and the uas should be able to respond to the request. if it is does not, show me where is rfc3261 it is written and uas does not need to respond to such a request.
You are assuming that:
- There is a proxy between the UAC and the UAS (GW). - The proxy does not route the request to another proxy. - The UAS just inspects the top most Via. - The proxy does not inspect the top Via added by the UAC (if not it could reject the request).
First of all, a proxy could inspect ALL the Via headers, for example in order to detect spirals or loops as RFC 5393 states.
Anyhow, we are not discussing about that. Tomorrow I will paste the malformed SIP request that Kamailio forwards to the GW.
Cheers.
Iñaki Baz Castillo writes:
Anyhow, we are not discussing about that. Tomorrow I will paste the malformed SIP request that Kamailio forwards to the GW.
originally we were discussing, why lcr module does not poll gws and my reply was that the polling can be done in a timed route that stores the status of the gw in a htable entry. if gw does not respond to a request, your script can check what to do about it by checking the gw's htable entry.
-- juha
Hi,
2011/12/28 Juha Heinanen jh@tutpro.com:
Iñaki Baz Castillo writes:
Anyhow, we are not discussing about that. Tomorrow I will paste the malformed SIP request that Kamailio forwards to the GW.
originally we were discussing, why lcr module does not poll gws and my reply was that the polling can be done in a timed route that stores the status of the gw in a htable entry. if gw does not respond to a request, your script can check what to do about it by checking the gw's htable entry.
Is there any way to dump the gateways defined in LCR module from the script in order to pool them?
I can get them via RPC command 'lcr.dump_gws' and defunct them via script but I don't figure out how to accomplish both tasks from any of them.
Thanks in advance.
-- juha
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
2011/12/28 Juha Heinanen jh@tutpro.com:
Iñaki Baz Castillo writes:
Anyhow, we are not discussing about that. Tomorrow I will paste the malformed SIP request that Kamailio forwards to the GW.
originally we were discussing, why lcr module does not poll gws and my reply was that the polling can be done in a timed route that stores the status of the gw in a htable entry. if gw does not respond to a request, your script can check what to do about it by checking the gw's htable entry.
AFAIK locally sends requests (from Kamailio) do not generate a transaction so there is no failure_route to react. So, if I miss nothing, your proposal is not feasible.
In the other side, please check this SIP request:
-------------------------- OPTIONS sip:bob@domain.com SIP/2.0 Via: SIP/2.0/UDP 1.1.1.1;branch=qweqweqwe From: "Alice" sip:alice@domain.com;tag=tag1234 To: "Bob" sip:bob@domain.com Bad-Header( lalala sip:lalala@1.2.3.4 Call-id: qweqweqweqwe Cseq: 1234 OPTIONS Content-Length: 0 --------------------------
This request is *invalid* since the "Bad-Header" header is fully malformed (it's not a header in fact), so it becomes an invalid whole SIP message. But send this request to Kamailio and Kamailio will happily forward it. And maybe the UAS receiving it will just *drop* the request rather than replying 400 (since it's not a valid SIP request).
2011/12/29 Iñaki Baz Castillo ibc@aliax.net:
In the other side, please check this SIP request:
OPTIONS sip:bob@domain.com SIP/2.0 Via: SIP/2.0/UDP 1.1.1.1;branch=qweqweqwe From: "Alice" sip:alice@domain.com;tag=tag1234 To: "Bob" sip:bob@domain.com Bad-Header( lalala sip:lalala@1.2.3.4 Call-id: qweqweqweqwe Cseq: 1234 OPTIONS Content-Length: 0
This request is *invalid* since the "Bad-Header" header is fully malformed (it's not a header in fact), so it becomes an invalid whole SIP message. But send this request to Kamailio and Kamailio will happily forward it. And maybe the UAS receiving it will just *drop* the request rather than replying 400 (since it's not a valid SIP request).
The original "header" causing the problem in my system was the following (number and IP replaced):
P-Asserted-Identity(<sip: XXXXXXXX@XXX.XXX.XXX.XXX>)
Cheers.
2011/12/29 Iñaki Baz Castillo ibc@aliax.net:
originally we were discussing, why lcr module does not poll gws and my reply was that the polling can be done in a timed route that stores the status of the gw in a htable entry. if gw does not respond to a request, your script can check what to do about it by checking the gw's htable entry.
AFAIK locally sends requests (from Kamailio) do not generate a transaction so there is no failure_route to react. So, if I miss nothing, your proposal is not feasible.
Hi. confirmed that locally generated requests don't raise failure_route blocks.
2011/12/28 Juha Heinanen jh@tutpro.com:
IMHO that's due to the design of the tables in LCR module. IMHO there should be a table just with gws definition (without containing the lcr_id field). It would make easier the management for cases like the present (just my opinion).
you may be right about that one. when i have time, i'll take a look at it.
Hi Juha, such a design was already proposed by me in the wiki:
http://www.kamailio.org/dokuwiki/doku.php/modules-new-design:lcr-module-desi...
But finally it was changed by you to the current one.
Cheers.
Iñaki Baz Castillo writes:
Hi Juha, such a design was already proposed by me in the wiki:
http://www.kamailio.org/dokuwiki/doku.php/modules-new-design:lcr-module-desi...
But finally it was changed by you to the current one.
there was some reason for the change, but i don't remember anymore what it was. i'll think about it again.
-- juha
Iñaki Baz Castillo writes:
Hi Juha, such a design was already proposed by me in the wiki:
http://www.kamailio.org/dokuwiki/doku.php/modules-new-design:lcr-module-desi...
But finally it was changed by you to the current one.
i think that the tradeoff was that one may want to defunct a gw in one lcr instance and leave it enabled in another instance. that would not be possible if gws were shared by all instances.
-- juha
2011/12/29 Juha Heinanen jh@tutpro.com:
Iñaki Baz Castillo writes:
Hi Juha, such a design was already proposed by me in the wiki:
http://www.kamailio.org/dokuwiki/doku.php/modules-new-design:lcr-module-desi...
But finally it was changed by you to the current one.
i think that the tradeoff was that one may want to defunct a gw in one lcr instance and leave it enabled in another instance. that would not be possible if gws were shared by all instances.
Right. Maybe "defunt" column should be in my proposed lcr_gws_grps table rather than in lcr_gws table.
Another possibility is to split this concept into "enabled gw" and "available (not defunt) gw". "defunt" would mean that it has been detected to fail, so it would make sense within the lcr_gws table. In the other side a new column "available" in the proposed lcr_gws_grps table could contain a "enabled" field, so LCR just uses/loads it if such field is 1.
Iñaki Baz Castillo writes:
Another possibility is to split this concept into "enabled gw" and "available (not defunt) gw". "defunt" would mean that it has been detected to fail, so it would make sense within the lcr_gws table. In the other side a new column "available" in the proposed lcr_gws_grps table could contain a "enabled" field, so LCR just uses/loads it if such field is 1.
gw groups concept made the module far too complicated and was therefore dropped. i'm not going to re-introduce it.
it might be possible to add 'enabled' field to lcr_rule_target table like there is one in lcr_rule table if that would help anything.
-- juha
2011/12/29 Juha Heinanen jh@tutpro.com:
gw groups concept made the module far too complicated and was therefore dropped. i'm not going to re-introduce it.
But it would make LCR management easier via a, i.e, a web interface. Currently making a management web interface for LCR module is a pain (I mean a real management interface, not just a web to edit LCR tables verbatim like any LCR web interface I've seen).
Iñaki Baz Castillo writes:
But it would make LCR management easier via a, i.e, a web interface. Currently making a management web interface for LCR module is a pain (I mean a real management interface, not just a web to edit LCR tables verbatim like any LCR web interface I've seen).
i haven't had difficulties with web interface. on top level i create/delete lcr instances. then under each instance, i have routing and gateways page. click on a prefix on routing page leads to page where gws with priorities/weights can be defined for the prefix.
-- juha
On 28.12.2011 17:12, Iñaki Baz Castillo wrote:
2011/12/28 Juha Heinanenjh@tutpro.com
further, you could define a timed route, and based on the htable, ping your gws.
Right, but is failure_route executed for those locally sent requests? I must check it.
Lat time I tried it the failure route was not triggered. I solved the problem by reversing the logic. "Pinging" the gateways every few seconds and increase the failure counter with every ping. If I get a response (reply routes work fine) I clear the failure counter. If the failure counter > 3 I disable the gateway.
It is all possible from kamailio.cfg, although the code is quite complex.
regards Klaus
2012/1/24 Klaus Darilion klaus.mailinglists@pernau.at:
On 28.12.2011 17:12, Iñaki Baz Castillo wrote:
2011/12/28 Juha Heinanenjh@tutpro.com
further, you could define a timed route, and based on the htable, ping your gws.
Right, but is failure_route executed for those locally sent requests? I must check it.
Lat time I tried it the failure route was not triggered. I solved the problem by reversing the logic. "Pinging" the gateways every few seconds and increase the failure counter with every ping. If I get a response (reply routes work fine) I clear the failure counter. If the failure counter > 3 I disable the gateway.
It is all possible from kamailio.cfg, although the code is quite complex.
Thanks a lot. Indeed it seems too much complex, even more taking into account that the old 1.5 LCR module did include an integrated monitorization mechanism that just worked fine. That is a lost feature.