Hi Daniel,

Today it happened again. I have more information:

We have currently running 4x Debian 9 servers with Kamailio v5:

KamailioA & KamailioB
KamailioC & KamailioD

Servers A/B only take care of SSL offloading and loadbalance/failover between servers C/D. (using dispatcher with minimal config)

Flow would be:

User <-> KamailioA/B <-> KamailioC/D <-> ...

Today KamailioB stop replying... and I got the backtraces.

From KamailioA: 0 problems
From KamailioB:

root:~/bt2# grep DISPATCHER /var/log/kamailio/kamailio.log
Jul  1 15:03:06 13cn4 sbc[14833]: WARNING: <script>: [DISPATCHER] - Destination down: OPTIONS sip:A.A.A.A:5060 (<null>)
Jul  1 15:03:06 13cn4 sbc[14833]: WARNING: <script>: [DISPATCHER] - Destination down: OPTIONS sip:B.B.B.B:5060 (<null>)
Jul  1 15:39:50 13cn4 sbc[14818]: WARNING: <script>: [DISPATCHER] - Destination up: OPTIONS sip:A.A.A.A:5060
Jul  1 15:39:50 13cn4 sbc[14818]: WARNING: <script>: [DISPATCHER] - Destination up: OPTIONS sip:B.B.B.B:5060
root:~/bt2#

A.A.A.A and B.B.B.B would be KamailioC/D.

Note, this only happened in KamailioB, KamailioA had 0 problems.

When I connected to the server, I did the following:

# for i in `kamctl ps | grep PID | awk '{print $2}' | tr -d ","`; do gdb /usr/sbin/kamailio -ex "bt full" --batch $i >> $i.txt 2>&1; done

Well, I got the backtraces per PID, the thing is, right after running that command, traffic started flowing again (I didn't restart or anything), hence the timestamps:

root:~/bt2# ls -lh
total 356K
-rw-r--r-- 1 root root 1.8K Jul  1 15:39 14816.txt
-rw-r--r-- 1 root root  15K Jul  1 15:39 14817.txt
-rw-r--r-- 1 root root  15K Jul  1 15:39 14818.txt
-rw-r--r-- 1 root root  15K Jul  1 15:39 14819.txt
-rw-r--r-- 1 root root  48K Jul  1 15:39 14820.txt
-rw-r--r-- 1 root root  47K Jul  1 15:39 14821.txt
-rw-r--r-- 1 root root  15K Jul  1 15:39 14822.txt
-rw-r--r-- 1 root root  15K Jul  1 15:39 14823.txt
-rw-r--r-- 1 root root  15K Jul  1 15:39 14824.txt
-rw-r--r-- 1 root root 3.4K Jul  1 15:39 14825.txt
-rw-r--r-- 1 root root 3.4K Jul  1 15:39 14826.txt
-rw-r--r-- 1 root root 3.4K Jul  1 15:39 14827.txt
-rw-r--r-- 1 root root 3.4K Jul  1 15:39 14828.txt
-rw-r--r-- 1 root root 3.4K Jul  1 15:39 14829.txt
-rw-r--r-- 1 root root 3.4K Jul  1 15:39 14830.txt
-rw-r--r-- 1 root root 3.4K Jul  1 15:39 14831.txt
-rw-r--r-- 1 root root 3.4K Jul  1 15:39 14832.txt
-rw-r--r-- 1 root root  42K Jul  1 15:39 14833.txt
-rw-r--r-- 1 root root 1.8K Jul  1 15:39 14834.txt
-rw-r--r-- 1 root root 2.2K Jul  1 15:39 14835.txt
-rw-r--r-- 1 root root 4.0K Jul  1 15:39 14836.txt
-rw-r--r-- 1 root root 2.9K Jul  1 15:39 14837.txt
-rw-r--r-- 1 root root 2.8K Jul  1 15:39 14838.txt
-rw-r--r-- 1 root root 4.7K Jul  1 15:39 14839.txt
-rw-r--r-- 1 root root 2.7K Jul  1 15:39 14840.txt
-rw-r--r-- 1 root root 2.7K Jul  1 15:39 14841.txt
-rw-r--r-- 1 root root 2.7K Jul  1 15:39 14842.txt
-rw-r--r-- 1 root root 7.5K Jul  1 15:39 14843.txt
-rw-r--r-- 1 root root 2.7K Jul  1 15:39 14844.txt
-rw-r--r-- 1 root root 2.7K Jul  1 15:39 14845.txt
-rw-r--r-- 1 root root 7.5K Jul  1 15:39 14846.txt
-rw-r--r-- 1 root root 2.7K Jul  1 15:39 14847.txt
-rw-r--r-- 1 root root 7.5K Jul  1 15:39 14848.txt
-rw-r--r-- 1 root root 4.2K Jul  1 15:39 14849.txt
root:~/bt2#

backtrace_20170701_1539.tar.gz

I have tried to look at the backtraces, but to me they seem ok (kind of like the previous ones, where Kamailio is just waiting for new requests).

So at this point my assumption is that something is triggering dispatcher to see both nodes down and therefor stops processing traffic, when dispatcher sees the nodes up again, all starts working.

Let me know what you think Daniel and how I can investigate this further.

I also have tcpdump captures at the time, I will check to see if OPTIONS are actually being sent out or not to try to narrow this a little more.

Thanks!
Joel.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.