[kamailio/kamailio] Infinite loop inside htable module during dmq synch (#1863) - sr-dev

21 Feb 2019


      ### Description
When restarting kamailio nodes in our infrastructure we noticed that under traffic some nodes started using the 100% of the CPU, with the precious help of @giavac we were able to track down the issue to an infinite loop inside the htable module when synchronizing somewhat big (60K) htables via dmq
### Troubleshooting
#### Reproduction
Have 1 kamailio instance with a 60K+ htable and start a new instance, the first instance will try to send the whole table to the new instance and it will enter an infinite loop which consumes 100% of the CPU
This is caused by a double call to **ht_dmq_cell_group_flush** which creates a circular list on the json structure hierarchy, the second call happens in this block of code (hence why it's required a 60K htable):
https://github.com/kamailio/kamailio/blob/5.2/src/modules/htable/ht_dmq.c#L5...
When this happens **ht_dmq_cell_group_flush** try to add **ht_dmq_jdoc_cell_group.jdoc_cells** inside **ht_dmq_jdoc_cell_group.jdoc->root** but this root already has **json_cells** as its child
so when **srjson_AddItemToObject** is called (and in turn **srjson_AddItemToArray**) it gets appended as the child of itself:
https://github.com/kamailio/kamailio/blob/master/src/lib/srutils/srjson.c#L8...
This circular structure then causes a loop when calling **srjson_PrintUnformatted** because in the **print_object** function the circular list is looped over:
https://github.com/kamailio/kamailio/blob/master/src/lib/srutils/srjson.c#L6...
### Possible Solutions
One possible solution could be to destroy and init again the **ht_dmq_jdoc_cell_group** structure after calling the flush:
```
if (ht_dmq_jdoc_cell_group.size >= dmq_cell_group_max_size) {
  LM_DBG("sending group count[%d]size[%d]\n", ht_dmq_jdoc_cell_group.count, ht_dmq_jdoc_cell_group.size);
  if (ht_dmq_cell_group_flush(node) < 0) {
    ht_slot_unlock(ht, i);
    goto error;
  }
  ht_dmq_cell_group_destroy();
  ht_dmq_cell_group_init();
}
```
But we are not sure about the performance implications.
### Additional Information
`# kamailio -v
version: kamailio 5.2.1 (x86_64/linux) 44e488
flags: STATS: Off, USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144 MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: 44e488 
compiled on 11:52:58 Feb 21 2019 with gcc 5.4.0
`
* **Operating System**:
ubuntu:xenial docker container
`# uname -a
Linux kama-0 4.4.0-135-generic #161-Ubuntu SMP Mon Aug 27 10:45:01 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux`
-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1863