Hi everybody,
OpenSER 1.2.0 has now the capability to do DNS based failover, according to RFC3263 (http://www.ietf.org/rfc/rfc3263.txt).
The SIP resolver was enhanced to to be able to save and resume later the DNS queries in order to get all possible IP destinations. The resolving process is step-by-step done (get next IP only on demand) to minimize the total number of DNS queries. So having this support does not imply, in normal processing, more load on the DNS server. Additional queries are done only when needed (after failure detected).
The scanning for new IP destinations is done by the SIP resolver on all DNS levels: NAPTR, SRV, A.
Both core and TM are using this new feature.
In core, the stateless forwarding can do only DNS-based failover at transport level (if no egress interface found or send operation failed due whatever reason).
In TM, the DNS-based failover is extended to transaction level. If the transaction completes with 503 or 408 with no reply, automatically, a new branch will be fork if any destination IP can be found by the DNS resolver. Read more here http://www.openser.org/docs/modules/1.2.x/tm.html#AEN103
For controlling this feature use: - newly added core parameter "disable_dns_failover" - use to generally disable the DNS-based failover. By default is false. - in TM, t_relay() take a new flag for turning off the DNS-based failover. This setting is per transaction. By default, the failover is done.
Any feedback is appreciated.
regards, Bogdan
That's very good news, this will definitely simplify our openser config. I'll let you know once I've tested this new feature.
I'm impressed,
Christian
Bogdan-Andrei Iancu wrote:
Hi everybody,
OpenSER 1.2.0 has now the capability to do DNS based failover, according to RFC3263 (http://www.ietf.org/rfc/rfc3263.txt).
The SIP resolver was enhanced to to be able to save and resume later the DNS queries in order to get all possible IP destinations. The resolving process is step-by-step done (get next IP only on demand) to minimize the total number of DNS queries. So having this support does not imply, in normal processing, more load on the DNS server. Additional queries are done only when needed (after failure detected).
The scanning for new IP destinations is done by the SIP resolver on all DNS levels: NAPTR, SRV, A.
Both core and TM are using this new feature.
In core, the stateless forwarding can do only DNS-based failover at transport level (if no egress interface found or send operation failed due whatever reason).
In TM, the DNS-based failover is extended to transaction level. If the transaction completes with 503 or 408 with no reply, automatically, a new branch will be fork if any destination IP can be found by the DNS resolver. Read more here http://www.openser.org/docs/modules/1.2.x/tm.html#AEN103
For controlling this feature use: - newly added core parameter "disable_dns_failover" - use to generally disable the DNS-based failover. By default is false. - in TM, t_relay() take a new flag for turning off the DNS-based failover. This setting is per transaction. By default, the failover is done.
Any feedback is appreciated.
regards, Bogdan
Users mailing list Users@openser.org http://openser.org/cgi-bin/mailman/listinfo/users
Hi Christian,
note that as the DNS-based failover uses also the NAPTR records, the protocol may also change :). Of course, depends on your proxy settings (if tcp / tls is enabled).
Regards, Bogdan
Christian Schlatter wrote:
That's very good news, this will definitely simplify our openser config. I'll let you know once I've tested this new feature.
I'm impressed,
Christian
Bogdan-Andrei Iancu wrote:
Hi everybody,
OpenSER 1.2.0 has now the capability to do DNS based failover, according to RFC3263 (http://www.ietf.org/rfc/rfc3263.txt).
The SIP resolver was enhanced to to be able to save and resume later the DNS queries in order to get all possible IP destinations. The resolving process is step-by-step done (get next IP only on demand) to minimize the total number of DNS queries. So having this support does not imply, in normal processing, more load on the DNS server. Additional queries are done only when needed (after failure detected).
The scanning for new IP destinations is done by the SIP resolver on all DNS levels: NAPTR, SRV, A.
Both core and TM are using this new feature.
In core, the stateless forwarding can do only DNS-based failover at transport level (if no egress interface found or send operation failed due whatever reason).
In TM, the DNS-based failover is extended to transaction level. If the transaction completes with 503 or 408 with no reply, automatically, a new branch will be fork if any destination IP can be found by the DNS resolver. Read more here http://www.openser.org/docs/modules/1.2.x/tm.html#AEN103
For controlling this feature use: - newly added core parameter "disable_dns_failover" - use to generally disable the DNS-based failover. By default is false. - in TM, t_relay() take a new flag for turning off the DNS-based failover. This setting is per transaction. By default, the failover is done.
Any feedback is appreciated.
regards, Bogdan
Users mailing list Users@openser.org http://openser.org/cgi-bin/mailman/listinfo/users
Users mailing list Users@openser.org http://openser.org/cgi-bin/mailman/listinfo/users
Hi Bogdan!
Good news. How are the DNS lookups done in detail? E.g. the following setup:
t_relay to domain.com:
@domain.com NAPTR 90 50 "s" "SIP+D2T" "" _sip._tcp.domain.com. NAPTR 100 50 "s" "SIP+D2U" "" _sip._udp.domain.com.
@_sip._tcp.domain.com. SRV 0 0 6060 sip1.domain.com. SRV 1 10 6060 sip2.domain.com.
@_sip._udp.domain.com. SRV 0 0 6060 sip1.domain.com. SRV 1 10 6060 sip4.domain.com.
sip1.domain.com A 1.2.3.4
sip2.domain.com A 2.2.3.4 sip2.domain.com A 2.2.3.5
sip4.domain.com A 4.2.3.4
Is the following assumption correct?
1. lookup NAPTR domain.com 2. lookup SRV _sip._tcp.domain.com 3. lookup A sip1.domain.com 4. request to 1.2.3.4; if failure 5. lookup A sip2.domain.com 6. request to 2.2.3.4; if failure 7. request to 2.2.3.5; if failure
8. lookup SRV _sip._udp.domain.com 9. (sip1 cached) send request to 1.2.3.4; if failure 10. lookup A sip4.domain.com 11 request to 4.2.3.4; if failure 12 reply error
regards klaus
Bogdan-Andrei Iancu wrote:
Hi everybody,
OpenSER 1.2.0 has now the capability to do DNS based failover, according to RFC3263 (http://www.ietf.org/rfc/rfc3263.txt).
The SIP resolver was enhanced to to be able to save and resume later the DNS queries in order to get all possible IP destinations. The resolving process is step-by-step done (get next IP only on demand) to minimize the total number of DNS queries. So having this support does not imply, in normal processing, more load on the DNS server. Additional queries are done only when needed (after failure detected).
The scanning for new IP destinations is done by the SIP resolver on all DNS levels: NAPTR, SRV, A.
Both core and TM are using this new feature.
In core, the stateless forwarding can do only DNS-based failover at transport level (if no egress interface found or send operation failed due whatever reason).
In TM, the DNS-based failover is extended to transaction level. If the transaction completes with 503 or 408 with no reply, automatically, a new branch will be fork if any destination IP can be found by the DNS resolver. Read more here http://www.openser.org/docs/modules/1.2.x/tm.html#AEN103
For controlling this feature use: - newly added core parameter "disable_dns_failover" - use to generally disable the DNS-based failover. By default is false. - in TM, t_relay() take a new flag for turning off the DNS-based failover. This setting is per transaction. By default, the failover is done.
Any feedback is appreciated.
regards, Bogdan
Devel mailing list Devel@openser.org http://openser.org/cgi-bin/mailman/listinfo/devel
Hi Klaus,
yes, that is more than correct. You example points out exactly what I was warning Christian of - during DNS-based failover, the used protocol may changes, based on the DNS records and proxy configuration.
regards, Bogdan
Klaus Darilion wrote:
Hi Bogdan!
Good news. How are the DNS lookups done in detail? E.g. the following setup:
t_relay to domain.com:
@domain.com NAPTR 90 50 "s" "SIP+D2T" "" _sip._tcp.domain.com. NAPTR 100 50 "s" "SIP+D2U" "" _sip._udp.domain.com.
@_sip._tcp.domain.com. SRV 0 0 6060 sip1.domain.com. SRV 1 10 6060 sip2.domain.com.
@_sip._udp.domain.com. SRV 0 0 6060 sip1.domain.com. SRV 1 10 6060 sip4.domain.com.
sip1.domain.com A 1.2.3.4
sip2.domain.com A 2.2.3.4 sip2.domain.com A 2.2.3.5
sip4.domain.com A 4.2.3.4
Is the following assumption correct?
lookup NAPTR domain.com
lookup SRV _sip._tcp.domain.com
lookup A sip1.domain.com
request to 1.2.3.4; if failure
lookup A sip2.domain.com
request to 2.2.3.4; if failure
request to 2.2.3.5; if failure
lookup SRV _sip._udp.domain.com
(sip1 cached) send request to 1.2.3.4; if failure
lookup A sip4.domain.com
11 request to 4.2.3.4; if failure 12 reply error
regards klaus
Bogdan-Andrei Iancu wrote:
Hi everybody,
OpenSER 1.2.0 has now the capability to do DNS based failover, according to RFC3263 (http://www.ietf.org/rfc/rfc3263.txt).
The SIP resolver was enhanced to to be able to save and resume later the DNS queries in order to get all possible IP destinations. The resolving process is step-by-step done (get next IP only on demand) to minimize the total number of DNS queries. So having this support does not imply, in normal processing, more load on the DNS server. Additional queries are done only when needed (after failure detected).
The scanning for new IP destinations is done by the SIP resolver on all DNS levels: NAPTR, SRV, A.
Both core and TM are using this new feature.
In core, the stateless forwarding can do only DNS-based failover at transport level (if no egress interface found or send operation failed due whatever reason).
In TM, the DNS-based failover is extended to transaction level. If the transaction completes with 503 or 408 with no reply, automatically, a new branch will be fork if any destination IP can be found by the DNS resolver. Read more here http://www.openser.org/docs/modules/1.2.x/tm.html#AEN103
For controlling this feature use: - newly added core parameter "disable_dns_failover" - use to generally disable the DNS-based failover. By default is false. - in TM, t_relay() take a new flag for turning off the DNS-based failover. This setting is per transaction. By default, the failover is done.
Any feedback is appreciated.
regards, Bogdan
Devel mailing list Devel@openser.org http://openser.org/cgi-bin/mailman/listinfo/devel
On Thursday 25 January 2007 15:10, Bogdan-Andrei Iancu wrote:
Any feedback is appreciated.
For me, it doesn't work at all. I get the same behaviour of OpenSER 1.2.x as with 1.1.x. I can see that SER does a SRV query to the DNS which should return 2 entries with different priorities. SER then only tries the lower priority one. If that fails, the script ends.
Apr 13 14:54:32 sip1 outbound1[17834]: DEBUG: mk_proxy: doing DNS lookup... Apr 13 14:54:32 sip1 outbound1[17834]: DEBUG:sip_resolvehost2: no port, has proto -> do SRV lookup! Apr 13 14:54:32 sip1 outbound1[17834]: DEBUG:do_srv_lookup: SRV(_sip._udp.outbounds.domain.net) = out_sip1.domain.net:5060 Apr 13 14:54:32 sip1 outbound1[17834]: check_via_address(123.123.123.123, 123.123.123.123, 0) Apr 13 14:54:32 sip1 outbound1[17834]: DEBUG:tm:set_timer: relative timeout is 500000 Apr 13 14:54:32 sip1 outbound1[17834]: DEBUG: add_to_tail_of_timer[4]: 0xb3a4bc6c (1760600000) Apr 13 14:54:32 sip1 outbound1[17834]: DEBUG:tm:set_timer: relative timeout is 2 Apr 13 14:54:32 sip1 outbound1[17834]: DEBUG: add_to_tail_of_timer[0]: 0xb3a4bc88 (1762) Apr 13 14:54:32 sip1 outbound1[17834]: DEBUG:tm:t_relay_to: new transaction fwd'ed Apr 13 14:54:32 sip1 outbound1[17834]: DEBUG:tm:UNREF_UNSAFE: after is 0 Apr 13 14:54:32 sip1 outbound1[17834]: DEBUG:destroy_avp_list: destroying list (nil) Apr 13 14:54:32 sip1 outbound1[17834]: receive_msg: cleaning up Apr 13 14:54:33 sip1 outbound1[17836]: DEBUG: timer routine:4,tl=0xb3a4bc6c next=(nil), timeout=1760600000 Apr 13 14:54:33 sip1 outbound1[17836]: DEBUG: retransmission_handler : request resending (t=0xb3a4bb20, INVITE si ... ) Apr 13 14:54:33 sip1 outbound1[17836]: DEBUG:tm:set_timer: relative timeout is 1000000 Apr 13 14:54:33 sip1 outbound1[17836]: DEBUG: add_to_tail_of_timer[5]: 0xb3a4bc6c (1761600000) Apr 13 14:54:33 sip1 outbound1[17836]: DEBUG: retransmission_handler : done Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG: timer routine:5,tl=0xb3a4bc6c next=(nil), timeout=1761600000 Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG: retransmission_handler : request resending (t=0xb3a4bb20, INVITE si ... ) Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG:tm:set_timer: relative timeout is 2000000 Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG: add_to_tail_of_timer[6]: 0xb3a4bc6c (1763600000) Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG: retransmission_handler : done Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG: timer routine:0,tl=0xb3a4bc88 next=(nil), timeout=1762 Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG: final_response_handler:stop retr. and send CANCEL (0xb3a4bb20) Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG:tm:t_should_relay_response: T_code=100, new_code=408 Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG:tm:t_pick_branch: picked branch 0, code 408 Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG:tm:relay_reply: branch=0, save=0, relay=0 Apr 13 14:54:34 sip1 outbound1[17836]: parse_headers: flags=ffffffffffffffff Apr 13 14:54:34 sip1 outbound1[17836]: check_via_address(217.114.103.90, 217.114.103.90, 0) Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG:tm:set_timer: relative timeout is 500000 Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG: add_to_tail_of_timer[4]: 0xb3a4bbe8 (1762500000) Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG:tm:set_timer: relative timeout is 2 Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG: add_to_tail_of_timer[0]: 0xb3a4bc04 (1764) Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG:tm:relay_reply: sent buf=0x81651e8: SIP/2.0 4..., shmem=0xb3a4da08: SIP/2.0 4 Apr 13 14:54:34 sip1 outbound1[17836]: DBG: trans=0xb3a4bb20, callback type 128, id 0 entered
# dig _sip._udp.outbounds.domain.net SRV
; <<>> DiG 9.3.4 <<>> _sip._udp.outbounds.domain.net SRV ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36981 ;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION: ;_sip._udp.outbounds.domain.net. IN SRV
;; ANSWER SECTION: _sip._udp.outbounds.domain.net. 3600 IN SRV 101 100 5060 out_sip2.domain.net. _sip._udp.outbounds.domain.net. 3600 IN SRV 100 100 5060 out_sip1.domain.net.
;; Query time: 63 msec
Hi Alex,
did you receive any provisional reply from UAS before the timeout event?
regards, bogdan
Alex Hermann wrote:
On Thursday 25 January 2007 15:10, Bogdan-Andrei Iancu wrote:
Any feedback is appreciated.
For me, it doesn't work at all. I get the same behaviour of OpenSER 1.2.x as with 1.1.x. I can see that SER does a SRV query to the DNS which should return 2 entries with different priorities. SER then only tries the lower priority one. If that fails, the script ends.
Apr 13 14:54:32 sip1 outbound1[17834]: DEBUG: mk_proxy: doing DNS lookup... Apr 13 14:54:32 sip1 outbound1[17834]: DEBUG:sip_resolvehost2: no port, has proto -> do SRV lookup! Apr 13 14:54:32 sip1 outbound1[17834]: DEBUG:do_srv_lookup: SRV(_sip._udp.outbounds.domain.net) = out_sip1.domain.net:5060 Apr 13 14:54:32 sip1 outbound1[17834]: check_via_address(123.123.123.123, 123.123.123.123, 0) Apr 13 14:54:32 sip1 outbound1[17834]: DEBUG:tm:set_timer: relative timeout is 500000 Apr 13 14:54:32 sip1 outbound1[17834]: DEBUG: add_to_tail_of_timer[4]: 0xb3a4bc6c (1760600000) Apr 13 14:54:32 sip1 outbound1[17834]: DEBUG:tm:set_timer: relative timeout is 2 Apr 13 14:54:32 sip1 outbound1[17834]: DEBUG: add_to_tail_of_timer[0]: 0xb3a4bc88 (1762) Apr 13 14:54:32 sip1 outbound1[17834]: DEBUG:tm:t_relay_to: new transaction fwd'ed Apr 13 14:54:32 sip1 outbound1[17834]: DEBUG:tm:UNREF_UNSAFE: after is 0 Apr 13 14:54:32 sip1 outbound1[17834]: DEBUG:destroy_avp_list: destroying list (nil) Apr 13 14:54:32 sip1 outbound1[17834]: receive_msg: cleaning up Apr 13 14:54:33 sip1 outbound1[17836]: DEBUG: timer routine:4,tl=0xb3a4bc6c next=(nil), timeout=1760600000 Apr 13 14:54:33 sip1 outbound1[17836]: DEBUG: retransmission_handler : request resending (t=0xb3a4bb20, INVITE si ... ) Apr 13 14:54:33 sip1 outbound1[17836]: DEBUG:tm:set_timer: relative timeout is 1000000 Apr 13 14:54:33 sip1 outbound1[17836]: DEBUG: add_to_tail_of_timer[5]: 0xb3a4bc6c (1761600000) Apr 13 14:54:33 sip1 outbound1[17836]: DEBUG: retransmission_handler : done Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG: timer routine:5,tl=0xb3a4bc6c next=(nil), timeout=1761600000 Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG: retransmission_handler : request resending (t=0xb3a4bb20, INVITE si ... ) Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG:tm:set_timer: relative timeout is 2000000 Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG: add_to_tail_of_timer[6]: 0xb3a4bc6c (1763600000) Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG: retransmission_handler : done Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG: timer routine:0,tl=0xb3a4bc88 next=(nil), timeout=1762 Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG: final_response_handler:stop retr. and send CANCEL (0xb3a4bb20) Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG:tm:t_should_relay_response: T_code=100, new_code=408 Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG:tm:t_pick_branch: picked branch 0, code 408 Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG:tm:relay_reply: branch=0, save=0, relay=0 Apr 13 14:54:34 sip1 outbound1[17836]: parse_headers: flags=ffffffffffffffff Apr 13 14:54:34 sip1 outbound1[17836]: check_via_address(217.114.103.90, 217.114.103.90, 0) Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG:tm:set_timer: relative timeout is 500000 Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG: add_to_tail_of_timer[4]: 0xb3a4bbe8 (1762500000) Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG:tm:set_timer: relative timeout is 2 Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG: add_to_tail_of_timer[0]: 0xb3a4bc04 (1764) Apr 13 14:54:34 sip1 outbound1[17836]: DEBUG:tm:relay_reply: sent buf=0x81651e8: SIP/2.0 4..., shmem=0xb3a4da08: SIP/2.0 4 Apr 13 14:54:34 sip1 outbound1[17836]: DBG: trans=0xb3a4bb20, callback type 128, id 0 entered
# dig _sip._udp.outbounds.domain.net SRV
; <<>> DiG 9.3.4 <<>> _sip._udp.outbounds.domain.net SRV ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36981 ;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION: ;_sip._udp.outbounds.domain.net. IN SRV
;; ANSWER SECTION: _sip._udp.outbounds.domain.net. 3600 IN SRV 101 100 5060 out_sip2.domain.net. _sip._udp.outbounds.domain.net. 3600 IN SRV 100 100 5060 out_sip1.domain.net.
;; Query time: 63 msec
On Friday 13 April 2007 16:40, you wrote:
Hi Alex,
did you receive any provisional reply from UAS before the timeout event?
No, the failing node was inserted for testing purposes. It has nothing listening on that port.
I'm a bit worried about this line: Apr 13 14:54:32 sip1 outbound1[17834]: DEBUG:do_srv_lookup: SRV(_sip._udp.outbounds.domain.net) = out_sip1.domain.net:5060
shouldn't it mention both hosts?
Alex Hermann wrote:
On Friday 13 April 2007 16:40, you wrote:
Hi Alex,
did you receive any provisional reply from UAS before the timeout event?
No, the failing node was inserted for testing purposes. It has nothing listening on that port.
I'm a bit worried about this line: Apr 13 14:54:32 sip1 outbound1[17834]: DEBUG:do_srv_lookup: SRV(_sip._udp.outbounds.domain.net) = out_sip1.domain.net:5060
shouldn't it mention both hosts?
no - only the one selected for usage is printed. I'll try to prepare a patch to introduce more debug messages to trace the problem.
regards, bogdan
Hi Alex,
apply the attached patch - it will print some more debug log that will help in finding the problem.
Regards, Bogdan
Alex Hermann wrote:
On Friday 13 April 2007 16:40, you wrote:
Hi Alex,
did you receive any provisional reply from UAS before the timeout event?
No, the failing node was inserted for testing purposes. It has nothing listening on that port.
I'm a bit worried about this line: Apr 13 14:54:32 sip1 outbound1[17834]: DEBUG:do_srv_lookup: SRV(_sip._udp.outbounds.domain.net) = out_sip1.domain.net:5060
shouldn't it mention both hosts?
Index: resolve.c =================================================================== --- resolve.c (revision 2027) +++ resolve.c (working copy) @@ -722,6 +722,8 @@ n->vals[l].ival = get_srv(r)->port; n->vals[l].sval = p; memcpy( p, get_srv(r)->name, get_srv(r)->name_len ); + DBG("DEBUG:a2dns_node: storing %s:%d\n", + n->vals[l].sval,n->vals[l].ival); p += get_srv(r)->name_len; *(p++) = 0; }