### Description
DNS core resolver fails in returning a valid IP when there are too many SRV results in the DNS reply. It acts like if no records were found, so request is not relayed and a 478 reply is generated instead (in the example of a DNS name in $ru or $du).
### Troubleshooting
#### Reproduction
It is easy to reproduce with DNS failover + NAPTR enabled (cf parameters used far below) and with such DNS records:
``` # dig +short NAPTR ko.sip.provider.com 50 30 "S" "SIP+D2U" "" _sip._udp.ko.sip.provider.com.
# dig +short SRV _sip._udp.ko.sip.provider.com. 10 10 5060 endpoint-01.k0.sip.provider.com. 10 10 5060 endpoint-02.k0.sip.provider.com. 10 10 5060 endpoint-03.k0.sip.provider.com. 10 10 5060 endpoint-04.k0.sip.provider.com. 10 10 5060 endpoint-05.k0.sip.provider.com. 10 10 5060 endpoint-06.k0.sip.provider.com. 10 10 5060 endpoint-07.k0.sip.provider.com. 10 10 5060 endpoint-08.k0.sip.provider.com. 10 10 5060 endpoint-09.k0.sip.provider.com.
# Each SRV result above has a corresponding # 'A' record so that command below gives a correct IP: # dig +short A endpoint-01.k0.sip.provider.com. ```
To reproduce, relay a request towards it, like: `$du="sip:ko.sip.provider.com"`
#### Debugging data
One interesting thing is that Kamailio behaves exactly the same as the `sip-dig` tool. But `sip-dig` seems to be limited on the DNS reply size it can handle (cf my comment below about the RFC). Does Kamailio have this same kind of limitation regarding DNS resolution?
#### Log Messages
##### Failure example: with 9 SRV records
``` DEBUG: <core> [core/dns_cache.c:527]: _dns_hash_find(): (ko.sip.provider.com(26), 35), h=275 DEBUG: <core> [core/mem/q_malloc.c:482]: qm_free(): qm_free(0x7ff6a20f0000, 0x7ff6a27777d8), called from core: core/dns_cache.c: dns_destroy_entry(151) DEBUG: <core> [core/mem/q_malloc.c:526]: qm_free(): freeing frag. 0x7ff6a27777a0 alloc'ed from core: core/dns_cache.c: dns_cache_mk_rd_entry(1110) DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff7234f4010, 58) called from core: core/resolve.c: get_record(862) DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff7234f4010, 64) returns address 0x7ff72363d8f8 frag. 0x7ff72363d8c0 (size=64) on 1 -th hit DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff7234f4010, 92) called from core: core/resolve.c: dns_naptr_parser(405) DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff7234f4010, 96) returns address 0x7ff72363d9a0 frag. 0x7ff72363d968 (size=96) on 1 -th hit DEBUG: <core> [core/resolve.c:984]: get_record(): skipping 0 NS (p=0x558fb300dba7, end=0x558fb300dba7) DEBUG: <core> [core/resolve.c:997]: get_record(): parsing 0 ARs (p=0x558fb300dba7, end=0x558fb300dba7) DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff6a20f0000, 216) called from core: core/dns_cache.c: dns_cache_mk_rd_entry(1110) DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff6a20f0000, 216) returns address 0x7ff6a27748a8 frag. 0x7ff6a2774870 (size=232) on 1 -th hit DEBUG: <core> [core/mem/q_malloc.c:482]: qm_free(): qm_free(0x7ff7234f4010, 0x7ff72363d9a0), called from core: core/resolve.c: free_rdata_list(678) DEBUG: <core> [core/mem/q_malloc.c:526]: qm_free(): freeing frag. 0x7ff72363d968 alloc'ed from core: core/resolve.c: dns_naptr_parser(405) DEBUG: <core> [core/mem/q_malloc.c:482]: qm_free(): qm_free(0x7ff7234f4010, 0x7ff72363d8f8), called from core: core/resolve.c: free_rdata_list(679) DEBUG: <core> [core/mem/q_malloc.c:526]: qm_free(): freeing frag. 0x7ff72363d8c0 alloc'ed from core: core/resolve.c: get_record(862) DEBUG: <core> [core/dns_cache.c:1633]: dns_get_related(): (0x7ff6a27748a8 (ko.sip.provider.com, 35), 35, *(nil)) (0) DEBUG: <core> [core/dns_cache.c:739]: dns_cache_add_unsafe(): adding ko.sip.provider.com(26) 35 (flags=0) at 275 DEBUG: <core> [core/dns_cache.c:2614]: dns_naptr_sip_iterate(): found a valid sip NAPTR rr _sip._udp.ko.sip.provider.com, proto 1 DEBUG: <core> [core/resolve.c:1182]: naptr_choose(): o:-1 w:-1 p:0, o:50 w:30 p:1 DEBUG: <core> [core/resolve.c:1197]: naptr_choose(): changed DEBUG: <core> [core/dns_cache.c:2625]: dns_naptr_sip_iterate(): choosed NAPTR rr _sip._udp.ko.sip.provider.com, proto 1 tried: 0x0 DEBUG: <core> [core/dns_cache.c:527]: _dns_hash_find(): (_sip._udp.ko.sip.provider.com(36), 33), h=989 DEBUG: <core> [core/dns_cache.c:3041]: dns_srv_resolve_ip(): ("_sip._udp.ko.sip.provider.com", 0, 0), ret=-5, ip= DEBUG: <core> [core/dns_cache.c:527]: _dns_hash_find(): (_sip._udp.ko.sip.provider.com(36), 33), h=989 DEBUG: <core> [core/dns_cache.c:3041]: dns_srv_resolve_ip(): ("_sip._udp.ko.sip.provider.com", 0, 0), ret=-5, ip= DEBUG: <core> [core/dns_cache.c:527]: _dns_hash_find(): (_sip._tcp.ko.sip.provider.com(36), 33), h=772 DEBUG: <core> [core/dns_cache.c:3041]: dns_srv_resolve_ip(): ("_sip._tcp.ko.sip.provider.com", 0, 0), ret=-5, ip= DEBUG: <core> [core/dns_cache.c:527]: _dns_hash_find(): (_sips._tcp.ko.sip.provider.com(37), 33), h=786 DEBUG: <core> [core/dns_cache.c:3041]: dns_srv_resolve_ip(): ("_sips._tcp.ko.sip.provider.com", 0, 0), ret=-5, ip= DEBUG: <core> [core/dns_cache.c:527]: _dns_hash_find(): (ko.sip.provider.com(26), 1), h=275 DEBUG: <core> [core/dns_cache.c:2803]: dns_a_resolve(): (ko.sip.provider.com, 0) returning -7 DEBUG: <core> [core/dns_cache.c:3167]: dns_srv_sip_resolve(): (ko.sip.provider.com, 0, 0), ip, ret=-7 ERROR: tm [ut.h:284]: uri2dst2(): failed to resolve "ko.sip.provider.com" :unresolvable A or AAAA request (-7) ```
##### Comparison with a working example (only 3 SRV records)
``` DEBUG: <core> [core/dns_cache.c:527]: _dns_hash_find(): (ok.sip.provider.com(26), 35), h=275 DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff7234f4010, 58) called from core: core/resolve.c: get_record(862) DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff7234f4010, 64) returns address 0x7ff723613ff8 frag. 0x7ff723613fc0 (size=64) on 1 -th hit DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff7234f4010, 92) called from core: core/resolve.c: dns_naptr_parser(405) DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff7234f4010, 96) returns address 0x7ff7236140a0 frag. 0x7ff723614068 (size=96) on 1 -th hit DEBUG: <core> [core/resolve.c:984]: get_record(): skipping 0 NS (p=0x558fb300dba7, end=0x558fb300dba7) DEBUG: <core> [core/resolve.c:997]: get_record(): parsing 0 ARs (p=0x558fb300dba7, end=0x558fb300dba7) DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff6a20f0000, 216) called from core: core/dns_cache.c: dns_cache_mk_rd_entry(1110) DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff6a20f0000, 216) returns address 0x7ff6a27755b8 frag. 0x7ff6a2775580 (size=376) on 1 -th hit DEBUG: <core> [core/mem/q_malloc.c:482]: qm_free(): qm_free(0x7ff7234f4010, 0x7ff7236140a0), called from core: core/resolve.c: free_rdata_list(678) DEBUG: <core> [core/mem/q_malloc.c:526]: qm_free(): freeing frag. 0x7ff723614068 alloc'ed from core: core/resolve.c: dns_naptr_parser(405) DEBUG: <core> [core/mem/q_malloc.c:482]: qm_free(): qm_free(0x7ff7234f4010, 0x7ff723613ff8), called from core: core/resolve.c: free_rdata_list(679) DEBUG: <core> [core/mem/q_malloc.c:526]: qm_free(): freeing frag. 0x7ff723613fc0 alloc'ed from core: core/resolve.c: get_record(862) DEBUG: <core> [core/dns_cache.c:1633]: dns_get_related(): (0x7ff6a27755b8 (ok.sip.provider.com, 35), 35, *(nil)) (0) DEBUG: <core> [core/dns_cache.c:739]: dns_cache_add_unsafe(): adding ok.sip.provider.com(26) 35 (flags=0) at 275 DEBUG: <core> [core/dns_cache.c:2614]: dns_naptr_sip_iterate(): found a valid sip NAPTR rr _sip._udp.ok.sip.provider.com, proto 1 DEBUG: <core> [core/resolve.c:1182]: naptr_choose(): o:-1 w:-1 p:0, o:50 w:30 p:1 DEBUG: <core> [core/resolve.c:1197]: naptr_choose(): changed DEBUG: <core> [core/dns_cache.c:2625]: dns_naptr_sip_iterate(): choosed NAPTR rr _sip._udp.ok.sip.provider.com, proto 1 tried: 0x0 DEBUG: <core> [core/dns_cache.c:527]: _dns_hash_find(): (_sip._udp.ok.sip.provider.com(36), 33), h=989 DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff7234f4010, 68) called from core: core/resolve.c: get_record(862) DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff7234f4010, 72) returns address 0x7ff723613ff8 frag. 0x7ff723613fc0 (size=72) on 1 -th hit DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff7234f4010, 46) called from core: core/resolve.c: dns_srv_parser(318) DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff7234f4010, 48) returns address 0x7ff7236140a8 frag. 0x7ff723614070 (size=48) on 1 -th hit DEBUG: <core> [core/resolve.c:984]: get_record(): skipping 0 NS (p=0x558fb300dbb4, end=0x558fb300dbb4) DEBUG: <core> [core/resolve.c:997]: get_record(): parsing 0 ARs (p=0x558fb300dbb4, end=0x558fb300dbb4) DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff6a20f0000, 176) called from core: core/dns_cache.c: dns_cache_mk_rd_entry(1110) DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff6a20f0000, 176) returns address 0x7ff6a2775900 frag. 0x7ff6a27758c8 (size=176) on 1 -th hit DEBUG: <core> [core/mem/q_malloc.c:482]: qm_free(): qm_free(0x7ff7234f4010, 0x7ff7236140a8), called from core: core/resolve.c: free_rdata_list(678) DEBUG: <core> [core/mem/q_malloc.c:526]: qm_free(): freeing frag. 0x7ff723614070 alloc'ed from core: core/resolve.c: dns_srv_parser(318) DEBUG: <core> [core/mem/q_malloc.c:482]: qm_free(): qm_free(0x7ff7234f4010, 0x7ff723613ff8), called from core: core/resolve.c: free_rdata_list(679) DEBUG: <core> [core/mem/q_malloc.c:526]: qm_free(): freeing frag. 0x7ff723613fc0 alloc'ed from core: core/resolve.c: get_record(862) DEBUG: <core> [core/dns_cache.c:1633]: dns_get_related(): (0x7ff6a2775900 (_sip._udp.ok.sip.provider.com, 33), 33, *(nil)) (0) DEBUG: <core> [core/dns_cache.c:739]: dns_cache_add_unsafe(): adding _sip._udp.ok.sip.provider.com(36) 33 (flags=0) at 989 DEBUG: <core> [core/dns_cache.c:2222]: dns_srv_get_nxt_rr(): (0x7ff6a2775900, 0, 0, 1457300027): selected 0/1 in grp. 0 (rand_w=0, rr=0x7ff6a2775968 rd=0x7ff6a2775980 p=10 w=10 rsum=10) DEBUG: <core> [core/dns_cache.c:527]: _dns_hash_find(): (endpoint.ok.sip.provider.com(38), 1), h=530 DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff7234f4010, 70) called from core: core/resolve.c: get_record(862) DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff7234f4010, 72) returns address 0x7ff723613ff8 frag. 0x7ff723613fc0 (size=72) on 1 -th hit DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff7234f4010, 4) called from core: core/resolve.c: dns_a_parser(474) DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff7234f4010, 8) returns address 0x7ff7236140a8 frag. 0x7ff723614070 (size=8) on 1 -th hit DEBUG: <core> [core/resolve.c:984]: get_record(): skipping 0 NS (p=0x558fb300db8e, end=0x558fb300db8e) DEBUG: <core> [core/resolve.c:997]: get_record(): parsing 0 ARs (p=0x558fb300db8e, end=0x558fb300db8e) DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff6a20f0000, 136) called from core: core/dns_cache.c: dns_cache_mk_rd_entry(1110) DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff6a20f0000, 136) returns address 0x7ff6a2775a18 frag. 0x7ff6a27759e0 (size=136) on 1 -th hit DEBUG: <core> [core/mem/q_malloc.c:482]: qm_free(): qm_free(0x7ff7234f4010, 0x7ff7236140a8), called from core: core/resolve.c: free_rdata_list(678) DEBUG: <core> [core/mem/q_malloc.c:526]: qm_free(): freeing frag. 0x7ff723614070 alloc'ed from core: core/resolve.c: dns_a_parser(474) DEBUG: <core> [core/mem/q_malloc.c:482]: qm_free(): qm_free(0x7ff7234f4010, 0x7ff723613ff8), called from core: core/resolve.c: free_rdata_list(679) DEBUG: <core> [core/mem/q_malloc.c:526]: qm_free(): freeing frag. 0x7ff723613fc0 alloc'ed from core: core/resolve.c: get_record(862) DEBUG: <core> [core/dns_cache.c:1633]: dns_get_related(): (0x7ff6a2775a18 (endpoint.ok.sip.provider.com, 1), 1, *(nil)) (0) DEBUG: <core> [core/dns_cache.c:739]: dns_cache_add_unsafe(): adding endpoint.ok.sip.provider.com(38) 1 (flags=0) at 530 DEBUG: <core> [core/dns_cache.c:2803]: dns_a_resolve(): (endpoint.ok.sip.provider.com, 0) returning 0 DEBUG: <core> [core/dns_cache.c:3041]: dns_srv_resolve_ip(): ("_sip._udp.ok.sip.provider.com", 0, 0), ret=0, ip=[RESOLVED_IP] DEBUG: <core> [core/dns_cache.c:3241]: dns_naptr_sip_resolve(): (ok.sip.provider.com, 0, 0), srv0, ret=0
```
### Possible Solutions
I had a quick look inside the code and did not find any limitation about a maximum number of records. There are some max defined in `dns_cache.c` but I did not found a relation between them and my issue.
Could there be a limitation in result size? Here is what I got from my RFCs reading regarding that:
* Extract from **RFC 2782 DNS RR** (mentioned in RFC 3263 as being the RFC to follow for implementing DNS in SIP):
Currently there's a practical limit of 512 bytes for DNS replies. Until all resolvers can handle larger responses, domain administrators are strongly advised to keep their SRV replies below 512 bytes.
There is a RFC about how to deal with truncated messages:
If a truncated response comes back from an SRV query, the rules described in RFC 2181 (https://tools.ietf.org/html/rfc2181#page-11) shall apply.
### Additional Information
* **Kamailio Version** - kamailio 5.3.8
``` dns_try_naptr=yes dns_tcp_pref = 1 dns_udp_pref = 1 dns_tls_pref = 1 dns_srv_lb=yes use_dns_failover=yes use_dns_cache=yes dns_cache_max_ttl=30 ```
* **Operating System**: Debian 9.13 on Docker
Thanks
I haven't implemented the DNS code in Kamailio, but if you didn't find any define setting some limits in our C code and other external tools behave the same, then maybe the limit is from the libc dns resolving functions.
Could it be a DNS over TCP issue?
I remember testing with crazy size SRV record sets on SIPit and don't remember any issues. Just make sure your firewall supports DNS/TCP too.
Things could have changed since then, so don't take for granted that it works today :-)
Thanks for your appreciated replies!
I do not think this is a firewall issue because when using `dig` command I get no issues. Tried with two different sets of options: * `dig +bufsize=512` : got `Truncated, retrying in TCP mode` then a correct reply received through TCP * or simple `dig` : in that case I directly get the answer through UDP
But of course it does not work when disallowing TCP retry mode and setting a 512 bytes buffer size (`dig +bufsize=512 +ignore`)
Tests show clearly now a limit based on packet size (512 bytes) but I still do not know where it comes from precisely. Will investigate deeper when I will have some time.
@jklingenmeyer - have you had any time to dig in further? Is it libc/OS limitation after all, or something inside Kamailio code?
No activity for long time and it may be a limitation coming from library functions, as commented above.
When having new troubleshooting details, reopen.
Closed #2651.