<p>Hi Daniel,</p>
<p>Today it happened again. I have more information:</p>
<p>We have currently running 4x Debian 9 servers with Kamailio v5:</p>
<p>KamailioA & KamailioB<br>
KamailioC & KamailioD</p>
<p>Servers A/B only take care of SSL offloading and loadbalance/failover between servers C/D. (using dispatcher with minimal config)</p>
<p>Flow would be:</p>
<p>User <-> KamailioA/B <-> KamailioC/D <-> ...</p>
<p>Today KamailioB stop replying... and I got the backtraces.</p>
<p>From KamailioA: 0 problems<br>
From KamailioB:</p>
<pre><code>root:~/bt2# grep DISPATCHER /var/log/kamailio/kamailio.log
Jul 1 15:03:06 13cn4 sbc[14833]: WARNING: <script>: [DISPATCHER] - Destination down: OPTIONS sip:A.A.A.A:5060 (<null>)
Jul 1 15:03:06 13cn4 sbc[14833]: WARNING: <script>: [DISPATCHER] - Destination down: OPTIONS sip:B.B.B.B:5060 (<null>)
Jul 1 15:39:50 13cn4 sbc[14818]: WARNING: <script>: [DISPATCHER] - Destination up: OPTIONS sip:A.A.A.A:5060
Jul 1 15:39:50 13cn4 sbc[14818]: WARNING: <script>: [DISPATCHER] - Destination up: OPTIONS sip:B.B.B.B:5060
root:~/bt2#
</code></pre>
<p>A.A.A.A and B.B.B.B would be KamailioC/D.</p>
<p>Note, this only happened in KamailioB, KamailioA had 0 problems.</p>
<p>When I connected to the server, I did the following:</p>
<pre><code># for i in `kamctl ps | grep PID | awk '{print $2}' | tr -d ","`; do gdb /usr/sbin/kamailio -ex "bt full" --batch $i >> $i.txt 2>&1; done
</code></pre>
<p>Well, I got the backtraces per PID, the thing is, right after running that command, traffic started flowing again (I didn't restart or anything), hence the timestamps:</p>
<pre><code>root:~/bt2# ls -lh
total 356K
-rw-r--r-- 1 root root 1.8K Jul 1 15:39 14816.txt
-rw-r--r-- 1 root root 15K Jul 1 15:39 14817.txt
-rw-r--r-- 1 root root 15K Jul 1 15:39 14818.txt
-rw-r--r-- 1 root root 15K Jul 1 15:39 14819.txt
-rw-r--r-- 1 root root 48K Jul 1 15:39 14820.txt
-rw-r--r-- 1 root root 47K Jul 1 15:39 14821.txt
-rw-r--r-- 1 root root 15K Jul 1 15:39 14822.txt
-rw-r--r-- 1 root root 15K Jul 1 15:39 14823.txt
-rw-r--r-- 1 root root 15K Jul 1 15:39 14824.txt
-rw-r--r-- 1 root root 3.4K Jul 1 15:39 14825.txt
-rw-r--r-- 1 root root 3.4K Jul 1 15:39 14826.txt
-rw-r--r-- 1 root root 3.4K Jul 1 15:39 14827.txt
-rw-r--r-- 1 root root 3.4K Jul 1 15:39 14828.txt
-rw-r--r-- 1 root root 3.4K Jul 1 15:39 14829.txt
-rw-r--r-- 1 root root 3.4K Jul 1 15:39 14830.txt
-rw-r--r-- 1 root root 3.4K Jul 1 15:39 14831.txt
-rw-r--r-- 1 root root 3.4K Jul 1 15:39 14832.txt
-rw-r--r-- 1 root root 42K Jul 1 15:39 14833.txt
-rw-r--r-- 1 root root 1.8K Jul 1 15:39 14834.txt
-rw-r--r-- 1 root root 2.2K Jul 1 15:39 14835.txt
-rw-r--r-- 1 root root 4.0K Jul 1 15:39 14836.txt
-rw-r--r-- 1 root root 2.9K Jul 1 15:39 14837.txt
-rw-r--r-- 1 root root 2.8K Jul 1 15:39 14838.txt
-rw-r--r-- 1 root root 4.7K Jul 1 15:39 14839.txt
-rw-r--r-- 1 root root 2.7K Jul 1 15:39 14840.txt
-rw-r--r-- 1 root root 2.7K Jul 1 15:39 14841.txt
-rw-r--r-- 1 root root 2.7K Jul 1 15:39 14842.txt
-rw-r--r-- 1 root root 7.5K Jul 1 15:39 14843.txt
-rw-r--r-- 1 root root 2.7K Jul 1 15:39 14844.txt
-rw-r--r-- 1 root root 2.7K Jul 1 15:39 14845.txt
-rw-r--r-- 1 root root 7.5K Jul 1 15:39 14846.txt
-rw-r--r-- 1 root root 2.7K Jul 1 15:39 14847.txt
-rw-r--r-- 1 root root 7.5K Jul 1 15:39 14848.txt
-rw-r--r-- 1 root root 4.2K Jul 1 15:39 14849.txt
root:~/bt2#
</code></pre>
<p><a href="https://github.com/kamailio/kamailio/files/1117494/backtrace_20170701_1539.tar.gz">backtrace_20170701_1539.tar.gz</a></p>
<p>I have tried to look at the backtraces, but to me they seem ok (kind of like the previous ones, where Kamailio is just waiting for new requests).</p>
<p>So at this point my assumption is that something is triggering dispatcher to see both nodes down and therefor stops processing traffic, when dispatcher sees the nodes up again, all starts working.</p>
<p>Let me know what you think Daniel and how I can investigate this further.</p>
<p>I also have tcpdump captures at the time, I will check to see if OPTIONS are actually being sent out or not to try to narrow this a little more.</p>
<p>Thanks!<br>
Joel.</p>
<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">—<br />You are receiving this because you are subscribed to this thread.<br />Reply to this email directly, <a href="https://github.com/kamailio/kamailio/issues/1172#issuecomment-312461004">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe-auth/AF36ZbnG3bbtQkeoQmk4yw8SXkA88rZZks5sJtX7gaJpZM4OIFBs">mute the thread</a>.<img alt="" height="1" src="https://github.com/notifications/beacon/AF36ZWIMCvd-F2qwKRSzIOY4yWz5tRo6ks5sJtX7gaJpZM4OIFBs.gif" width="1" /></p>
<div itemscope itemtype="http://schema.org/EmailMessage">
<div itemprop="action" itemscope itemtype="http://schema.org/ViewAction">
<link itemprop="url" href="https://github.com/kamailio/kamailio/issues/1172#issuecomment-312461004"></link>
<meta itemprop="name" content="View Issue"></meta>
</div>
<meta itemprop="description" content="View this Issue on GitHub"></meta>
</div>
<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/kamailio/kamailio","title":"kamailio/kamailio","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/kamailio/kamailio"}},"updates":{"snippets":[{"icon":"PERSON","message":"@joelsdc in #1172: Hi Daniel, \r\n\r\nToday it happened again. I have more information:\r\n\r\nWe have currently running 4x Debian 9 servers with Kamailio v5:\r\n\r\nKamailioA \u0026 KamailioB\r\nKamailioC \u0026 KamailioD\r\n\r\nServers A/B only take care of SSL offloading and loadbalance/failover between servers C/D. (using dispatcher with minimal config)\r\n\r\nFlow would be:\r\n\r\nUser \u003c-\u003e KamailioA/B \u003c-\u003e KamailioC/D \u003c-\u003e ...\r\n\r\nToday KamailioB stop replying... and I got the backtraces.\r\n\r\nFrom KamailioA: 0 problems\r\nFrom KamailioB: \r\n\r\n```\r\nroot:~/bt2# grep DISPATCHER /var/log/kamailio/kamailio.log\r\nJul 1 15:03:06 13cn4 sbc[14833]: WARNING: \u003cscript\u003e: [DISPATCHER] - Destination down: OPTIONS sip:A.A.A.A:5060 (\u003cnull\u003e)\r\nJul 1 15:03:06 13cn4 sbc[14833]: WARNING: \u003cscript\u003e: [DISPATCHER] - Destination down: OPTIONS sip:B.B.B.B:5060 (\u003cnull\u003e)\r\nJul 1 15:39:50 13cn4 sbc[14818]: WARNING: \u003cscript\u003e: [DISPATCHER] - Destination up: OPTIONS sip:A.A.A.A:5060\r\nJul 1 15:39:50 13cn4 sbc[14818]: WARNING: \u003cscript\u003e: [DISPATCHER] - Destination up: OPTIONS sip:B.B.B.B:5060\r\nroot:~/bt2#\r\n```\r\nA.A.A.A and B.B.B.B would be KamailioC/D.\r\n\r\nNote, this only happened in KamailioB, KamailioA had 0 problems.\r\n\r\n\r\nWhen I connected to the server, I did the following:\r\n\r\n```\r\n# for i in `kamctl ps | grep PID | awk '{print $2}' | tr -d \",\"`; do gdb /usr/sbin/kamailio -ex \"bt full\" --batch $i \u003e\u003e $i.txt 2\u003e\u00261; done\r\n```\r\n\r\nWell, I got the backtraces per PID, the thing is, right after running that command, traffic started flowing again (I didn't restart or anything), hence the timestamps:\r\n\r\n```\r\nroot:~/bt2# ls -lh\r\ntotal 356K\r\n-rw-r--r-- 1 root root 1.8K Jul 1 15:39 14816.txt\r\n-rw-r--r-- 1 root root 15K Jul 1 15:39 14817.txt\r\n-rw-r--r-- 1 root root 15K Jul 1 15:39 14818.txt\r\n-rw-r--r-- 1 root root 15K Jul 1 15:39 14819.txt\r\n-rw-r--r-- 1 root root 48K Jul 1 15:39 14820.txt\r\n-rw-r--r-- 1 root root 47K Jul 1 15:39 14821.txt\r\n-rw-r--r-- 1 root root 15K Jul 1 15:39 14822.txt\r\n-rw-r--r-- 1 root root 15K Jul 1 15:39 14823.txt\r\n-rw-r--r-- 1 root root 15K Jul 1 15:39 14824.txt\r\n-rw-r--r-- 1 root root 3.4K Jul 1 15:39 14825.txt\r\n-rw-r--r-- 1 root root 3.4K Jul 1 15:39 14826.txt\r\n-rw-r--r-- 1 root root 3.4K Jul 1 15:39 14827.txt\r\n-rw-r--r-- 1 root root 3.4K Jul 1 15:39 14828.txt\r\n-rw-r--r-- 1 root root 3.4K Jul 1 15:39 14829.txt\r\n-rw-r--r-- 1 root root 3.4K Jul 1 15:39 14830.txt\r\n-rw-r--r-- 1 root root 3.4K Jul 1 15:39 14831.txt\r\n-rw-r--r-- 1 root root 3.4K Jul 1 15:39 14832.txt\r\n-rw-r--r-- 1 root root 42K Jul 1 15:39 14833.txt\r\n-rw-r--r-- 1 root root 1.8K Jul 1 15:39 14834.txt\r\n-rw-r--r-- 1 root root 2.2K Jul 1 15:39 14835.txt\r\n-rw-r--r-- 1 root root 4.0K Jul 1 15:39 14836.txt\r\n-rw-r--r-- 1 root root 2.9K Jul 1 15:39 14837.txt\r\n-rw-r--r-- 1 root root 2.8K Jul 1 15:39 14838.txt\r\n-rw-r--r-- 1 root root 4.7K Jul 1 15:39 14839.txt\r\n-rw-r--r-- 1 root root 2.7K Jul 1 15:39 14840.txt\r\n-rw-r--r-- 1 root root 2.7K Jul 1 15:39 14841.txt\r\n-rw-r--r-- 1 root root 2.7K Jul 1 15:39 14842.txt\r\n-rw-r--r-- 1 root root 7.5K Jul 1 15:39 14843.txt\r\n-rw-r--r-- 1 root root 2.7K Jul 1 15:39 14844.txt\r\n-rw-r--r-- 1 root root 2.7K Jul 1 15:39 14845.txt\r\n-rw-r--r-- 1 root root 7.5K Jul 1 15:39 14846.txt\r\n-rw-r--r-- 1 root root 2.7K Jul 1 15:39 14847.txt\r\n-rw-r--r-- 1 root root 7.5K Jul 1 15:39 14848.txt\r\n-rw-r--r-- 1 root root 4.2K Jul 1 15:39 14849.txt\r\nroot:~/bt2#\r\n```\r\n[backtrace_20170701_1539.tar.gz](https://github.com/kamailio/kamailio/files/1117494/backtrace_20170701_1539.tar.gz)\r\n\r\n\r\nI have tried to look at the backtraces, but to me they seem ok (kind of like the previous ones, where Kamailio is just waiting for new requests).\r\n\r\nSo at this point my assumption is that something is triggering dispatcher to see both nodes down and therefor stops processing traffic, when dispatcher sees the nodes up again, all starts working.\r\n\r\nLet me know what you think Daniel and how I can investigate this further.\r\n\r\nI also have tcpdump captures at the time, I will check to see if OPTIONS are actually being sent out or not to try to narrow this a little more.\r\n\r\nThanks!\r\nJoel.\r\n\r\n"}],"action":{"name":"View Issue","url":"https://github.com/kamailio/kamailio/issues/1172#issuecomment-312461004"}}}</script>