<h3>Description</h3>
<p>When restarting kamailio nodes in our infrastructure we noticed that under traffic some nodes started using the 100% of the CPU, with the precious help of <a class="user-mention" data-hovercard-type="user" data-hovercard-url="/hovercards?user_id=3305097" data-octo-click="hovercard-link-click" data-octo-dimensions="link_type:self" href="https://github.com/giavac">@giavac</a> we were able to track down the issue to an infinite loop inside the htable module when synchronizing somewhat big (60K) htables via dmq</p>
<h3>Troubleshooting</h3>
<h4>Reproduction</h4>
<p>Have 1 kamailio instance with a 60K+ htable and start a new instance, the first instance will try to send the whole table to the new instance and it will enter an infinite loop which consumes 100% of the CPU</p>
<p>This is caused by a double call to <strong>ht_dmq_cell_group_flush</strong> which creates a circular list on the json structure hierarchy, the second call happens in this block of code (hence why it's required a 60K htable):<br>
<a href="https://github.com/kamailio/kamailio/blob/5.2/src/modules/htable/ht_dmq.c#L509">https://github.com/kamailio/kamailio/blob/5.2/src/modules/htable/ht_dmq.c#L509</a></p>
<p>When this happens <strong>ht_dmq_cell_group_flush</strong> try to add <strong>ht_dmq_jdoc_cell_group.jdoc_cells</strong> inside <strong>ht_dmq_jdoc_cell_group.jdoc->root</strong> but this root already has <strong>json_cells</strong> as its child<br>
so when <strong>srjson_AddItemToObject</strong> is called (and in turn <strong>srjson_AddItemToArray</strong>) it gets appended as the child of itself:<br>
<a href="https://github.com/kamailio/kamailio/blob/master/src/lib/srutils/srjson.c#L813">https://github.com/kamailio/kamailio/blob/master/src/lib/srutils/srjson.c#L813</a></p>
<p>This circular structure then causes a loop when calling <strong>srjson_PrintUnformatted</strong> because in the <strong>print_object</strong> function the circular list is looped over:<br>
<a href="https://github.com/kamailio/kamailio/blob/master/src/lib/srutils/srjson.c#L679">https://github.com/kamailio/kamailio/blob/master/src/lib/srutils/srjson.c#L679</a></p>
<h3>Possible Solutions</h3>
<p>One possible solution could be to destroy and init again the <strong>ht_dmq_jdoc_cell_group</strong> structure after calling the flush:</p>
<pre><code>if (ht_dmq_jdoc_cell_group.size >= dmq_cell_group_max_size) {
  LM_DBG("sending group count[%d]size[%d]\n", ht_dmq_jdoc_cell_group.count, ht_dmq_jdoc_cell_group.size);
  if (ht_dmq_cell_group_flush(node) < 0) {
    ht_slot_unlock(ht, i);
    goto error;
  }
  ht_dmq_cell_group_destroy();
  ht_dmq_cell_group_init();
}
</code></pre>
<p>But we are not sure about the performance implications.</p>
<h3>Additional Information</h3>
<p><code># kamailio -v version: kamailio 5.2.1 (x86_64/linux) 44e488 flags: STATS: Off, USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144 MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. id: 44e488  compiled on 11:52:58 Feb 21 2019 with gcc 5.4.0 </code></p>
<ul>
<li><strong>Operating System</strong>:<br>
ubuntu:xenial docker container<br>
<code># uname -a Linux kama-0 4.4.0-135-generic #161-Ubuntu SMP Mon Aug 27 10:45:01 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux</code></li>
</ul>

<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">—<br />You are receiving this because you are subscribed to this thread.<br />Reply to this email directly, <a href="https://github.com/kamailio/kamailio/issues/1863">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe-auth/AF36ZSEp-iEfhx_sTfByUI8s4FzD9yRqks5vPqo-gaJpZM4bHiKi">mute the thread</a>.<img src="https://github.com/notifications/beacon/AF36ZY-0aOIsVRCqX86BeEIIRyj8JX79ks5vPqo-gaJpZM4bHiKi.gif" height="1" width="1" alt="" /></p>
<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/kamailio/kamailio","title":"kamailio/kamailio","subtitle":"GitHub repository","main_image_url":"https://github.githubassets.com/images/email/message_cards/header.png","avatar_image_url":"https://github.githubassets.com/images/email/message_cards/avatar.png","action":{"name":"Open in GitHub","url":"https://github.com/kamailio/kamailio"}},"updates":{"snippets":[{"icon":"DESCRIPTION","message":"Infinite loop inside htable module during dmq synch (#1863)"}],"action":{"name":"View Issue","url":"https://github.com/kamailio/kamailio/issues/1863"}}}</script>
<script type="application/ld+json">[
{
"@context": "http://schema.org",
"@type": "EmailMessage",
"potentialAction": {
"@type": "ViewAction",
"target": "https://github.com/kamailio/kamailio/issues/1863",
"url": "https://github.com/kamailio/kamailio/issues/1863",
"name": "View Issue"
},
"description": "View this Issue on GitHub",
"publisher": {
"@type": "Organization",
"name": "GitHub",
"url": "https://github.com"
}
}
]</script>