Hello Henning and Richard,
Henning Westerholt helped me focus in the code:
You find the implementation of the MTU handling in the
src/core/udp_server.c file. Its just setting the appropriate socket option right now.
I think I found a few bugs, centering around
https://github.com/kamailio/kamailio/blob/master/src/core/udp_server.c#L331…
The file clearly shows how the option is processed,
(pmtu_discovery) ? IP_PMTUDISC_DO : IP_PMTUDISC_DONT
This is IPv4-only, and it looks like a bug that no check on the family
is done before this is set. Note that Linux defines
/usr/include/linux/in6.h: #define IPV6_MTU_DISCOVER 23
/usr/include/linux/in.h: #define IP_MTU_DISCOVER 10
In general, Path MTU discovery only applies to connected sockets,
which is not what happens in udp_server.c -- the IPv4 version sets
the DF flag, which made me wonder if that actually gets handled at all.
The IP_RECVERR flag described in ip(7) is used and is intended for such
connectionless MTU handling. For IPv6, there is an IPV6_RECVERR,
/usr/include/linux/in6.h: #define IPV6_RECVERR 25
/usr/include/linux/in.h: #define IP_RECVERR 11
The IPV6 variant is absent, which would be another bug.
(FYI, I use an IPv6-only setup, probably why this turns up.)
This being the mechanism to handle MTU discovery for unconnected
sockets, I read ip(7) and it mentions a flag MSG_ERRQUEUE to be
used with recvmsg(). I could not find this flag in Kamailio, so
I suspect that this treatment was not completed after adding the
IP_RECVERR flag.
An approach that would always be safe AFAIK is to change a socket
with this kind of error to a connected socket, and set the lower
MTU on that. And then, continue sending. Connecting over UDP is
kind-of free, and avoids relying on another protocol in the peer.
The expense would be grabbing an extra socket, which is why it
may be better to await Path MTU failure.
Richard Fuchs explained in detail what happens:
5. The application wants to send another packet to the
same destination
(e.g. in Kamailio's case probably a retransmission of the first one,
as that packet was never acknowledged).
6. The application does exactly the same thing as in step 1.
7. The kernel now knows about the smaller PMTU to that packet's
destination and will therefore fragment the packet appropriately
before sending the fragments out.
These last steps however, only apply to a _connected_ UDP socket. I chased
for that in the given file, but did not find it.
I suppose there are also problems in Linux' double-action of MTU as
implied MRU -- it means that you cannot be conservative in what you
send and liberal in what you accept -- that would have been a useful
OS-level strategy. In lieu of that, I suppose it is an application
problem :'-(
This in general feels like it is outside my reach. I can understand it,
but cannot fix it. Have I hereby submitted a bug, or is an issue on
GitHub the proper path?
Thanks,
Rick van Rein