Hi,
bad checksum may be caused by the virtual network adapter that does not perform hardware checksum. I had this issue on a kvm test setup.
MTU is to be looked at, as udp does not handle very well packet fragmentation ( to say the least ). Could you reproduce the same issue using an openvpn tcp virtual link between your two servers ?
Le 12/02/2015 12:23, Andrey Utkin a écrit :
We experience strange networking issue, not exactly specific to kamailio, but still related to it. Rtpengine's "ng" interface uses UDP. Protocol messages contains SDP, and for encrypted video call those messages exceed 1500 bytes. Everything works fine within localhost, but when rtpengine and Kamailio are on different hosts, and when hosts are Amazon-hosted, we have trouble.
This is experienced with l3.large, t2.micro with Ubuntu 14. I believe we don't have any special settings over system defaults. We send a large datagram from remote host, e.g. with such trivial app in python:
import socket UDP_IP = "123.123.123.123" # remote host IP UDP_PORT = 33333 MESSAGE = """ .....0010......0020......0030......0040......0050......0060......0070......0080......0090......0100 .....0110......0120......0130......0140......0150......0160......0170......0180......0190......0200 .....0210......0220......0230......0240......0250......0260......0270......0280......0290......0300 .....0310......0320......0330......0340......0350......0360......0370......0380......0390......0400 .....0410......0420......0430......0440......0450......0460......0470......0480......0490......0500 .....0510......0520......0530......0540......0550......0560......0570......0580......0590......0600 .....0610......0620......0630......0640......0650......0660......0670......0680......0690......0700 .....0710......0720......0730......0740......0750......0760......0770......0780......0790......0800 .....0810......0820......0830......0840......0850......0860......0870......0880......0890......0900 .....0910......0920......0930......0940......0950......0960......0970......0980......0990......1000 .....1010......1020......1030......1040......1050......1060......1070......1080......1090......1100 .....1110......1120......1130......1140......1150......1160......1170......1180......1190......1200 .....1210......1220......1230......1240......1250......1260......1270......1280......1290......1300 .....1310......1320......1330......1340......1350......1360......1370......1380......1390......1400 .....1410......1420......1430......1440......1450......1460......1470......1480......1490......1500 .....1510......1520......1530......1540......1550......1560......1570......1580......1590......1600 .....1610......1620......1630......1640......1650......1660......1670......1680......1690......1700 .....1710......1720......1730......1740......1750......1760......1770......1780......1790......1800 .....1810......1820......1830......1840......1850......1860......1870......1880......1890......1900 .....1910......1920......1930......1940......1950......1960......1970......1980......1990......2000""" print "UDP target IP:", UDP_IP print "UDP target port:", UDP_PORT print "message:", MESSAGE sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) sock.sendto(MESSAGE, (UDP_IP, UDP_PORT))
Then we listen on that port with such trivial python app:
import socket UDP_IP = "172.31.4.102" # local ip UDP_PORT = 33333 sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) sock.bind((UDP_IP, UDP_PORT)) while True: data, addr = sock.recvfrom(0x10000) print "received message:", data
Meanwhile, we monitor the traffic with e.g. ngrep: ngrep -t -e -d any -W byline -O large_udp.pcap port 33333 or '(ip[6:2]' '&' '0x1fff)' '!=' '0' (the part after "or" catches segments of segmented packets) About the host: # uname -a Linux hostname 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15 17:43:14 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux Also linux-image-3.13.0-36-generic and linux-image-3.13.0-45-generic behave in same way.
What we see:
- ngrep shows the packets with correct contents. All segments are delivered.
- application doesn't get any data at all
Rarely dmesg shows such messages: [ 102.161679] UDP: bad checksum. From 123.123.123.124:56439 to 172.31.4.102:33333 ulen 2008 but it is logged really rarely, so this is surely not what happens on every packet transmission. This test works fine on e.g. cheapest DigitalOcean VPS. I am concerned with this issue because rtpengine software has UDP interface. So on Amazon hosts this interface works only within localhost, and I cannot distribute software to different nodes.
Any thoughts? What's wrong, how to fix?