<div dir="ltr"><div dir="ltr"><div dir="ltr">Hi guys,<div><br></div><div>We had Kamailio 5.1.4 with websocket module. Unfortunately, our clients don't support websocket keepalive mechanism at all, so I used TCP keepalive instead with the following parameters:</div><div><br></div><div><div style="color:rgb(212,212,212);background-color:rgb(30,30,30);font-family:"Droid Sans Mono",monospace,monospace,"Droid Sans Fallback";font-size:14px;line-height:19px;white-space:pre-wrap"><div><span style="color:rgb(86,156,214)">tcp_keepalive</span>=yes</div><div><span style="color:rgb(86,156,214)">tcp_keepcnt</span>=6</div><div><span style="color:rgb(86,156,214)">tcp_keepidle</span>=60</div><div><span style="color:rgb(86,156,214)">tcp_keepintvl</span>=10</div></div></div><div><br></div><div><br></div><div>and set up KEEPALIVE_MECHANISM_NONE:</div><div><br></div><div><div style="color:rgb(212,212,212);background-color:rgb(30,30,30);font-family:"Droid Sans Mono",monospace,monospace,"Droid Sans Fallback";font-size:14px;line-height:19px;white-space:pre-wrap"><div>modparam(<span style="color:rgb(206,145,120)">"websocket"</span>, <span style="color:rgb(206,145,120)">"keepalive_mechanism"</span>, 0)</div></div></div><div><br></div><div><br></div><div>During load testing and debugging, when 8k clients sent registrations, it was found out that shared memory was not freed after closing connections. So I've decided to add new keepalive mechanism that periodically checks TCP connection related to websocket:</div><div><br></div><div style="color:rgb(212,212,212);background-color:rgb(30,30,30);font-family:"Droid Sans Mono",monospace,monospace,"Droid Sans Fallback";font-size:14px;line-height:19px;white-space:pre-wrap"><div><span style="color:rgb(86,156,214)">enum</span></div><div>{</div><div>    KEEPALIVE_MECHANISM_NONE = <span style="color:rgb(181,206,168)">0</span>,</div><div>    KEEPALIVE_MECHANISM_PING = <span style="color:rgb(181,206,168)">1</span>,</div><div>    KEEPALIVE_MECHANISM_PONG = <span style="color:rgb(181,206,168)">2</span>,</div><div>    KEEPALIVE_MECHANISM_TCP_CONN_CHECK = <span style="color:rgb(181,206,168)">3</span></div></div><div><span style="background-color:rgb(30,30,30);color:rgb(212,212,212);font-family:"Droid Sans Mono",monospace,monospace,"Droid Sans Fallback";font-size:14px;white-space:pre-wrap">};</span>  </div><div><br></div><div><br></div><div>and added the line to config:</div><div><br></div><div><div style="color:rgb(212,212,212);background-color:rgb(30,30,30);font-family:"Droid Sans Mono",monospace,monospace,"Droid Sans Fallback";font-size:14px;line-height:19px;white-space:pre-wrap"><div><span style="color:rgb(106,153,85)"># Enable custom tcp-connection-health based keepalive mechanism (3)</span></div><div><span style="color:rgb(106,153,85)"># KEEPALIVE_MECHANISM_NONE = 0,</span></div><div><span style="color:rgb(106,153,85)"># KEEPALIVE_MECHANISM_PING = 1,</span></div><div><span style="color:rgb(106,153,85)"># KEEPALIVE_MECHANISM_PONG = 2</span></div><div><span style="color:rgb(106,153,85)"># KEEPALIVE_MECHANISM_TCP_CONN_CHECK = 3</span></div><div>modparam(<span style="color:rgb(206,145,120)">"websocket"</span>, <span style="color:rgb(206,145,120)">"keepalive_mechanism"</span>, 3)</div></div></div><div><br></div><div><br></div><div>Also, I've implemented the mechanism in ws_keepalive function:</div><div><br></div><div><div style="color:rgb(212,212,212);background-color:rgb(30,30,30);font-family:"Droid Sans Mono",monospace,monospace,"Droid Sans Fallback";font-size:14px;line-height:19px;white-space:pre-wrap"><div><span style="color:rgb(86,156,214)">void</span> <span style="color:rgb(220,220,170)">ws_keepalive</span>(<span style="color:rgb(86,156,214)">unsigned</span> <span style="color:rgb(86,156,214)">int</span> ticks, <span style="color:rgb(86,156,214)">void</span> *param)</div><div>{</div><div>    <span style="color:rgb(86,156,214)">int</span> check_time =</div><div>            (<span style="color:rgb(86,156,214)">int</span>)<span style="color:rgb(220,220,170)">time</span>(<span style="color:rgb(86,156,214)">NULL</span>) - <span style="color:rgb(220,220,170)">cfg_get</span>(websocket, ws_cfg, keepalive_timeout);</div><br><div>    <span style="color:rgb(78,201,176)">ws_connection_t</span> **list = <span style="color:rgb(86,156,214)">NULL</span>, **list_head = <span style="color:rgb(86,156,214)">NULL</span>;</div><div>    <span style="color:rgb(78,201,176)">ws_connection_t</span> *wsc = <span style="color:rgb(86,156,214)">NULL</span>;</div><br><div>    <span style="color:rgb(106,153,85)">/* get an array of pointer to all ws connection */</span></div><div>    list_head = <span style="color:rgb(220,220,170)">wsconn_get_list</span>();</div><div>    <span style="color:rgb(197,134,192)">if</span>(!list_head)</div><div>        <span style="color:rgb(197,134,192)">return</span>;</div><br><div>    list = list_head;</div><div>    wsc = *list_head;</div><div>    <span style="color:rgb(197,134,192)">while</span>(wsc && wsc-><span style="color:rgb(156,220,254)">last_used</span> < check_time) {</div><div>        <span style="color:rgb(197,134,192)">if</span> (ws_keepalive_mechanism == KEEPALIVE_MECHANISM_TCP_CONN_CHECK) {</div><div>            <span style="color:rgb(86,156,214)">struct</span> tcp_connection *con = <span style="color:rgb(220,220,170)">tcpconn_get</span>(wsc-><span style="color:rgb(156,220,254)">id</span>, <span style="color:rgb(181,206,168)">0</span>, <span style="color:rgb(181,206,168)">0</span>, <span style="color:rgb(181,206,168)">0</span>, <span style="color:rgb(181,206,168)">0</span>);</div><div>            <span style="color:rgb(197,134,192)">if</span>(!con) {</div><div>                <span style="color:rgb(220,220,170)">LM_INFO</span>(<span style="color:rgb(206,145,120)">"tcp connection has been lost</span><span style="color:rgb(215,186,125)">\n</span><span style="color:rgb(206,145,120)">"</span>);</div><div>                wsc-><span style="color:rgb(156,220,254)">state</span> = WS_S_CLOSING;</div><div>            }</div><div>        }</div><br><div>        <span style="color:rgb(197,134,192)">if</span>(wsc-><span style="color:rgb(156,220,254)">state</span> == WS_S_CLOSING || wsc-><span style="color:rgb(156,220,254)">awaiting_pong</span>) {</div><div>            <span style="color:rgb(220,220,170)">LM_INFO</span>(<span style="color:rgb(206,145,120)">"forcibly closing connection</span><span style="color:rgb(215,186,125)">\n</span><span style="color:rgb(206,145,120)">"</span>);</div><div>            <span style="color:rgb(220,220,170)">wsconn_close_now</span>(wsc);</div><div>        } <span style="color:rgb(197,134,192)">else</span> {</div><div>            <span style="color:rgb(86,156,214)">int</span> opcode = (ws_keepalive_mechanism == KEEPALIVE_MECHANISM_PING)</div><div>                                ? OPCODE_PING</div><div>                                : OPCODE_PONG;</div><div>            <span style="color:rgb(220,220,170)">ping_pong</span>(wsc, opcode);</div><div>        }</div><br><div>        wsc = *(++list);</div><div>    }</div><br><div>    <span style="color:rgb(220,220,170)">wsconn_put_list</span>(list_head);</div><div>}</div></div></div><div><br></div><div><br></div><div>and changed memory allocation method in wsconn_get_list and wsconn_put_list methods from pkg to shm, because, as it turned out during load testing, </div><div>pkg_malloc (the C malloc) may cousing fails under huge loads.</div><div><br></div><div>These modifications solved the problem. But about a week ago we've started switching to ver. 5.2.1 and found a lot of changes in the websocket module. So, I've added my changes in this commit <a href="https://github.com/korizza/kamailio/commit/b3e03d03574ff4ff076005bb8a01d7461af2f8f5">https://github.com/korizza/kamailio/commit/b3e03d03574ff4ff076005bb8a01d7461af2f8f5</a> . Please take a look.</div><div><br></div><div>BTW: adding ws_conn_put_id in this commit <a href="https://github.com/kamailio/kamailio/commit/a975bca1702ea2f3db47f834f7e4da2786ced201#diff-59c50f19ab1ccf4afe10617cdc346bc2">https://github.com/kamailio/kamailio/commit/a975bca1702ea2f3db47f834f7e4da2786ced201#diff-59c50f19ab1ccf4afe10617cdc346bc2</a> did not solve problem with ref counter increasing.</div><div><br></div><div><br></div><div>Thanks,</div><div>Andrey Deykunov</div><div><br></div><div><br></div><div><br></div><div><br></div><div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div></div></div></div></div>