On 17/05/18 14:23, Daniel Tryba wrote:
I noticed the same problem (most important with multiple registrars I didn't like paritioning logic) and my solution was to send my own OPTIONS.
We have it running with multiple (3) registrars that replicate with dmq usrloc, works perfectly so far for us.
In your favorite scripting language, select all location entries and construct an OPTIONS and send that directly to the correct proxy with a preloaded Route (based on Path headers inserted on the proxy). In the response I defined a valid reply as a response code other than 408 and below 500.
I'm not sure I'm following your logic here. The registrars with nathelper loaded are already doing what you describe above. They are actually sending the options keepalive using path from the location/registrar module. The options messages are getting to the ua's and that part is working as intended.
The part we're having an issue with is when we enable the keepalive_timeout module parameter. My understanding is that with this module parameter enabled, if a ua is sent a keepalive message and nathelper does *not* recieve a response back from the client after predefined number of attempts, then nathelper will flush the contact from location as it deems it as "down".
When the options message is sent via the correct proxy to a ua that is "down", the proxy retransmits the options 3 - 4 times to the ua, and because the ua does not respond (its down), kamailio generates a 408 Request Timeout response back to the registrar for that keepalive request. The behaviour of the proxy is expected behaviour under normal conditions, however, it causes an issue with this sceanrio in that the 408 generated by the proxy and sent back to the registrar is interpreted by the nathelper module as a successfull response from the ua and therefor does not remove the contact from the location table, even though the request had timed out.
Are you saying that you manualy remove your contacts from the database using your custom script based on the responses back? We are currently running in memory only mode, so it makes it a little more difficult.
I think what would be nice is if nathelper had a similar option to what dispatcher has, in the sense that you could defined, via module parameter, what response code would be deemed as a "failure" in contacting the ua, that way, in this scenario, we could say that a 408 timeout should be considered as identicle to a non-response from the ua.
I cant think of how to disable sending these 408 from the proxy for these specific types of messages, or, if this is even possible.