[sr-dev] [kamailio/kamailio] DNS destination resolution failure when too many SRV records (#2651)

jklingenmeyer notifications at github.com
Wed Feb 24 17:59:59 CET 2021


### Description

DNS core resolver fails in returning a valid IP when there are too many SRV results in the DNS reply.
It acts like if no records were found, so request is not relayed and a 478 reply is generated instead (in the example of a DNS name in $ru or $du).

### Troubleshooting

#### Reproduction

It is easy to reproduce with DNS failover + NAPTR enabled (cf parameters used far below)
and with such DNS records:

```
# dig +short NAPTR ko.sip.provider.com
50 30 "S" "SIP+D2U" "" _sip._udp.ko.sip.provider.com.

# dig +short SRV _sip._udp.ko.sip.provider.com.
10 10 5060 endpoint-01.k0.sip.provider.com.
10 10 5060 endpoint-02.k0.sip.provider.com.
10 10 5060 endpoint-03.k0.sip.provider.com.
10 10 5060 endpoint-04.k0.sip.provider.com.
10 10 5060 endpoint-05.k0.sip.provider.com.
10 10 5060 endpoint-06.k0.sip.provider.com.
10 10 5060 endpoint-07.k0.sip.provider.com.
10 10 5060 endpoint-08.k0.sip.provider.com.
10 10 5060 endpoint-09.k0.sip.provider.com.

# Each SRV result above has a corresponding
# 'A' record so that command below gives a correct IP:
# dig +short A endpoint-01.k0.sip.provider.com.
```

To reproduce, relay a request towards it, like:
 `$du="sip:ko.sip.provider.com"`

#### Debugging data

One interesting thing is that Kamailio behaves exactly the same as the `sip-dig` tool.
But `sip-dig` seems to be limited on the DNS reply size it can handle (cf my comment below about the RFC).
Does Kamailio have this same kind of limitation regarding DNS resolution?

#### Log Messages

##### Failure example: with 9 SRV records

```
DEBUG: <core> [core/dns_cache.c:527]: _dns_hash_find(): (ko.sip.provider.com(26), 35), h=275
DEBUG: <core> [core/mem/q_malloc.c:482]: qm_free(): qm_free(0x7ff6a20f0000, 0x7ff6a27777d8), called from core: core/dns_cache.c: dns_destroy_entry(151)
DEBUG: <core> [core/mem/q_malloc.c:526]: qm_free(): freeing frag. 0x7ff6a27777a0 alloc'ed from core: core/dns_cache.c: dns_cache_mk_rd_entry(1110)
DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff7234f4010, 58) called from core: core/resolve.c: get_record(862)
DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff7234f4010, 64) returns address 0x7ff72363d8f8 frag. 0x7ff72363d8c0 (size=64) on 1 -th hit
DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff7234f4010, 92) called from core: core/resolve.c: dns_naptr_parser(405)
DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff7234f4010, 96) returns address 0x7ff72363d9a0 frag. 0x7ff72363d968 (size=96) on 1 -th hit
DEBUG: <core> [core/resolve.c:984]: get_record(): skipping 0 NS (p=0x558fb300dba7, end=0x558fb300dba7)
DEBUG: <core> [core/resolve.c:997]: get_record(): parsing 0 ARs (p=0x558fb300dba7, end=0x558fb300dba7)
DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff6a20f0000, 216) called from core: core/dns_cache.c: dns_cache_mk_rd_entry(1110)
DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff6a20f0000, 216) returns address 0x7ff6a27748a8 frag. 0x7ff6a2774870 (size=232) on 1 -th hit
DEBUG: <core> [core/mem/q_malloc.c:482]: qm_free(): qm_free(0x7ff7234f4010, 0x7ff72363d9a0), called from core: core/resolve.c: free_rdata_list(678)
DEBUG: <core> [core/mem/q_malloc.c:526]: qm_free(): freeing frag. 0x7ff72363d968 alloc'ed from core: core/resolve.c: dns_naptr_parser(405)
DEBUG: <core> [core/mem/q_malloc.c:482]: qm_free(): qm_free(0x7ff7234f4010, 0x7ff72363d8f8), called from core: core/resolve.c: free_rdata_list(679)
DEBUG: <core> [core/mem/q_malloc.c:526]: qm_free(): freeing frag. 0x7ff72363d8c0 alloc'ed from core: core/resolve.c: get_record(862)
DEBUG: <core> [core/dns_cache.c:1633]: dns_get_related(): (0x7ff6a27748a8 (ko.sip.provider.com, 35), 35, *(nil)) (0)
DEBUG: <core> [core/dns_cache.c:739]: dns_cache_add_unsafe(): adding ko.sip.provider.com(26) 35 (flags=0) at 275
DEBUG: <core> [core/dns_cache.c:2614]: dns_naptr_sip_iterate(): found a valid sip NAPTR rr _sip._udp.ko.sip.provider.com, proto 1
DEBUG: <core> [core/resolve.c:1182]: naptr_choose(): o:-1 w:-1 p:0, o:50 w:30 p:1
DEBUG: <core> [core/resolve.c:1197]: naptr_choose(): changed
DEBUG: <core> [core/dns_cache.c:2625]: dns_naptr_sip_iterate(): choosed NAPTR rr _sip._udp.ko.sip.provider.com, proto 1 tried: 0x0
DEBUG: <core> [core/dns_cache.c:527]: _dns_hash_find(): (_sip._udp.ko.sip.provider.com(36), 33), h=989
DEBUG: <core> [core/dns_cache.c:3041]: dns_srv_resolve_ip(): ("_sip._udp.ko.sip.provider.com", 0, 0), ret=-5, ip=
DEBUG: <core> [core/dns_cache.c:527]: _dns_hash_find(): (_sip._udp.ko.sip.provider.com(36), 33), h=989
DEBUG: <core> [core/dns_cache.c:3041]: dns_srv_resolve_ip(): ("_sip._udp.ko.sip.provider.com", 0, 0), ret=-5, ip=
DEBUG: <core> [core/dns_cache.c:527]: _dns_hash_find(): (_sip._tcp.ko.sip.provider.com(36), 33), h=772
DEBUG: <core> [core/dns_cache.c:3041]: dns_srv_resolve_ip(): ("_sip._tcp.ko.sip.provider.com", 0, 0), ret=-5, ip=
DEBUG: <core> [core/dns_cache.c:527]: _dns_hash_find(): (_sips._tcp.ko.sip.provider.com(37), 33), h=786
DEBUG: <core> [core/dns_cache.c:3041]: dns_srv_resolve_ip(): ("_sips._tcp.ko.sip.provider.com", 0, 0), ret=-5, ip=
DEBUG: <core> [core/dns_cache.c:527]: _dns_hash_find(): (ko.sip.provider.com(26), 1), h=275
DEBUG: <core> [core/dns_cache.c:2803]: dns_a_resolve(): (ko.sip.provider.com, 0) returning -7
DEBUG: <core> [core/dns_cache.c:3167]: dns_srv_sip_resolve(): (ko.sip.provider.com, 0, 0), ip, ret=-7
ERROR: tm [ut.h:284]: uri2dst2(): failed to resolve "ko.sip.provider.com" :unresolvable A or AAAA request (-7)
```

##### Comparison with a working example (only 3 SRV records)

```
DEBUG: <core> [core/dns_cache.c:527]: _dns_hash_find(): (ok.sip.provider.com(26), 35), h=275
DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff7234f4010, 58) called from core: core/resolve.c: get_record(862)
DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff7234f4010, 64) returns address 0x7ff723613ff8 frag. 0x7ff723613fc0 (size=64) on 1 -th hit
DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff7234f4010, 92) called from core: core/resolve.c: dns_naptr_parser(405)
DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff7234f4010, 96) returns address 0x7ff7236140a0 frag. 0x7ff723614068 (size=96) on 1 -th hit
DEBUG: <core> [core/resolve.c:984]: get_record(): skipping 0 NS (p=0x558fb300dba7, end=0x558fb300dba7)
DEBUG: <core> [core/resolve.c:997]: get_record(): parsing 0 ARs (p=0x558fb300dba7, end=0x558fb300dba7)
DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff6a20f0000, 216) called from core: core/dns_cache.c: dns_cache_mk_rd_entry(1110)
DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff6a20f0000, 216) returns address 0x7ff6a27755b8 frag. 0x7ff6a2775580 (size=376) on 1 -th hit
DEBUG: <core> [core/mem/q_malloc.c:482]: qm_free(): qm_free(0x7ff7234f4010, 0x7ff7236140a0), called from core: core/resolve.c: free_rdata_list(678)
DEBUG: <core> [core/mem/q_malloc.c:526]: qm_free(): freeing frag. 0x7ff723614068 alloc'ed from core: core/resolve.c: dns_naptr_parser(405)
DEBUG: <core> [core/mem/q_malloc.c:482]: qm_free(): qm_free(0x7ff7234f4010, 0x7ff723613ff8), called from core: core/resolve.c: free_rdata_list(679)
DEBUG: <core> [core/mem/q_malloc.c:526]: qm_free(): freeing frag. 0x7ff723613fc0 alloc'ed from core: core/resolve.c: get_record(862)
DEBUG: <core> [core/dns_cache.c:1633]: dns_get_related(): (0x7ff6a27755b8 (ok.sip.provider.com, 35), 35, *(nil)) (0)
DEBUG: <core> [core/dns_cache.c:739]: dns_cache_add_unsafe(): adding ok.sip.provider.com(26) 35 (flags=0) at 275
DEBUG: <core> [core/dns_cache.c:2614]: dns_naptr_sip_iterate(): found a valid sip NAPTR rr _sip._udp.ok.sip.provider.com, proto 1
DEBUG: <core> [core/resolve.c:1182]: naptr_choose(): o:-1 w:-1 p:0, o:50 w:30 p:1
DEBUG: <core> [core/resolve.c:1197]: naptr_choose(): changed
DEBUG: <core> [core/dns_cache.c:2625]: dns_naptr_sip_iterate(): choosed NAPTR rr _sip._udp.ok.sip.provider.com, proto 1 tried: 0x0
DEBUG: <core> [core/dns_cache.c:527]: _dns_hash_find(): (_sip._udp.ok.sip.provider.com(36), 33), h=989
DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff7234f4010, 68) called from core: core/resolve.c: get_record(862)
DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff7234f4010, 72) returns address 0x7ff723613ff8 frag. 0x7ff723613fc0 (size=72) on 1 -th hit
DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff7234f4010, 46) called from core: core/resolve.c: dns_srv_parser(318)
DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff7234f4010, 48) returns address 0x7ff7236140a8 frag. 0x7ff723614070 (size=48) on 1 -th hit
DEBUG: <core> [core/resolve.c:984]: get_record(): skipping 0 NS (p=0x558fb300dbb4, end=0x558fb300dbb4)
DEBUG: <core> [core/resolve.c:997]: get_record(): parsing 0 ARs (p=0x558fb300dbb4, end=0x558fb300dbb4)
DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff6a20f0000, 176) called from core: core/dns_cache.c: dns_cache_mk_rd_entry(1110)
DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff6a20f0000, 176) returns address 0x7ff6a2775900 frag. 0x7ff6a27758c8 (size=176) on 1 -th hit
DEBUG: <core> [core/mem/q_malloc.c:482]: qm_free(): qm_free(0x7ff7234f4010, 0x7ff7236140a8), called from core: core/resolve.c: free_rdata_list(678)
DEBUG: <core> [core/mem/q_malloc.c:526]: qm_free(): freeing frag. 0x7ff723614070 alloc'ed from core: core/resolve.c: dns_srv_parser(318)
DEBUG: <core> [core/mem/q_malloc.c:482]: qm_free(): qm_free(0x7ff7234f4010, 0x7ff723613ff8), called from core: core/resolve.c: free_rdata_list(679)
DEBUG: <core> [core/mem/q_malloc.c:526]: qm_free(): freeing frag. 0x7ff723613fc0 alloc'ed from core: core/resolve.c: get_record(862)
DEBUG: <core> [core/dns_cache.c:1633]: dns_get_related(): (0x7ff6a2775900 (_sip._udp.ok.sip.provider.com, 33), 33, *(nil)) (0)
DEBUG: <core> [core/dns_cache.c:739]: dns_cache_add_unsafe(): adding _sip._udp.ok.sip.provider.com(36) 33 (flags=0) at 989
DEBUG: <core> [core/dns_cache.c:2222]: dns_srv_get_nxt_rr(): (0x7ff6a2775900, 0, 0, 1457300027): selected 0/1 in grp. 0 (rand_w=0, rr=0x7ff6a2775968 rd=0x7ff6a2775980 p=10 w=10 rsum=10)
DEBUG: <core> [core/dns_cache.c:527]: _dns_hash_find(): (endpoint.ok.sip.provider.com(38), 1), h=530
DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff7234f4010, 70) called from core: core/resolve.c: get_record(862)
DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff7234f4010, 72) returns address 0x7ff723613ff8 frag. 0x7ff723613fc0 (size=72) on 1 -th hit
DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff7234f4010, 4) called from core: core/resolve.c: dns_a_parser(474)
DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff7234f4010, 8) returns address 0x7ff7236140a8 frag. 0x7ff723614070 (size=8) on 1 -th hit
DEBUG: <core> [core/resolve.c:984]: get_record(): skipping 0 NS (p=0x558fb300db8e, end=0x558fb300db8e)
DEBUG: <core> [core/resolve.c:997]: get_record(): parsing 0 ARs (p=0x558fb300db8e, end=0x558fb300db8e)
DEBUG: <core> [core/mem/q_malloc.c:374]: qm_malloc(): qm_malloc(0x7ff6a20f0000, 136) called from core: core/dns_cache.c: dns_cache_mk_rd_entry(1110)
DEBUG: <core> [core/mem/q_malloc.c:419]: qm_malloc(): qm_malloc(0x7ff6a20f0000, 136) returns address 0x7ff6a2775a18 frag. 0x7ff6a27759e0 (size=136) on 1 -th hit
DEBUG: <core> [core/mem/q_malloc.c:482]: qm_free(): qm_free(0x7ff7234f4010, 0x7ff7236140a8), called from core: core/resolve.c: free_rdata_list(678)
DEBUG: <core> [core/mem/q_malloc.c:526]: qm_free(): freeing frag. 0x7ff723614070 alloc'ed from core: core/resolve.c: dns_a_parser(474)
DEBUG: <core> [core/mem/q_malloc.c:482]: qm_free(): qm_free(0x7ff7234f4010, 0x7ff723613ff8), called from core: core/resolve.c: free_rdata_list(679)
DEBUG: <core> [core/mem/q_malloc.c:526]: qm_free(): freeing frag. 0x7ff723613fc0 alloc'ed from core: core/resolve.c: get_record(862)
DEBUG: <core> [core/dns_cache.c:1633]: dns_get_related(): (0x7ff6a2775a18 (endpoint.ok.sip.provider.com, 1), 1, *(nil)) (0)
DEBUG: <core> [core/dns_cache.c:739]: dns_cache_add_unsafe(): adding endpoint.ok.sip.provider.com(38) 1 (flags=0) at 530
DEBUG: <core> [core/dns_cache.c:2803]: dns_a_resolve(): (endpoint.ok.sip.provider.com, 0) returning 0
DEBUG: <core> [core/dns_cache.c:3041]: dns_srv_resolve_ip(): ("_sip._udp.ok.sip.provider.com", 0, 0), ret=0, ip=[RESOLVED_IP]
DEBUG: <core> [core/dns_cache.c:3241]: dns_naptr_sip_resolve(): (ok.sip.provider.com, 0, 0), srv0, ret=0

```

### Possible Solutions

I had a quick look inside the code and did not find any limitation about a maximum number of records.
There are some max defined in `dns_cache.c` but I did not found a relation between them and my issue.

Could there be a limitation in result size? Here is what I got from my RFCs reading regarding that:

* Extract from **RFC 2782 DNS RR** (mentioned in RFC 3263 as being the RFC to follow for implementing DNS in SIP):

> Currently there's a practical limit of 512 bytes for DNS replies.
> Until all resolvers can handle larger responses, domain administrators are strongly advised to keep their SRV replies below 512 bytes.

There is a RFC about how to deal with truncated messages:

> If a truncated response comes back from an SRV query, the rules described in RFC 2181 (https://tools.ietf.org/html/rfc2181#page-11) shall apply.

### Additional Information

  * **Kamailio Version** - kamailio 5.3.8

```
dns_try_naptr=yes
dns_tcp_pref = 1
dns_udp_pref = 1
dns_tls_pref = 1
dns_srv_lb=yes
use_dns_failover=yes
use_dns_cache=yes
dns_cache_max_ttl=30
```

* **Operating System**: Debian 9.13 on Docker

Thanks

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/2651
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kamailio.org/pipermail/sr-dev/attachments/20210224/c9aea66a/attachment-0001.htm>


More information about the sr-dev mailing list