Hi Daniel,
Thank you for the prompt response. We are looking into migrating to 5.5.x shortly.
We did add some additional debugging and got an interesting result. ``` Dec 31 18:29:08.819765 eqx-sjc-ecv3 kamailio_edge[19096]: WARNING: websocket [ws_frame.c:810]: ws_keepalive(): forcibly closing connection
grep "tcp id = 19132" /var/log/kamailio/kamailio_edge.log
Dec 31 18:29:07.815876 eqx-sjc-ecv3 kamailio_edge[19096]: INFO: websocket [ws_conn.c:371]: wsconn_close_now(): wsconn_close_now: wsc = 0xa2ac94e0, tcp con = 0x9edb0e80, tcp id = 19132 Dec 31 18:29:08.819965 eqx-sjc-ecv3 kamailio_edge[19096]: INFO: websocket [ws_conn.c:371]: wsconn_close_now(): wsconn_close_now: wsc = 0xa2ac94e0, tcp con = 0x00000000, tcp id = 19132 Dec 31 18:29:09.824694 eqx-sjc-ecv3 kamailio_edge[19096]: INFO: websocket [ws_conn.c:371]: wsconn_close_now(): wsconn_close_now: wsc = 0xa2ac94e0, tcp con = 0x00000000, tcp id = 19132 Dec 31 18:29:10.833513 eqx-sjc-ecv3 kamailio_edge[19096]: INFO: websocket [ws_conn.c:371]: wsconn_close_now(): wsconn_close_now: wsc = 0xa2ac94e0, tcp con = 0x00000000, tcp id = 19132 Dec 31 18:29:11.838090 eqx-sjc-ecv3 kamailio_edge[19096]: INFO: websocket [ws_conn.c:371]: wsconn_close_now(): wsconn_close_now: wsc = 0xa2ac94e0, tcp con = 0x00000000, tcp id = 19132 Dec 31 18:29:12.841246 eqx-sjc-ecv3 kamailio_edge[19096]: INFO: websocket [ws_conn.c:371]: wsconn_close_now(): wsconn_close_now: wsc = 0xa2ac94e0, tcp con = 0x00000000, tcp id = 19132 Dec 31 18:38:20.299126 eqx-sjc-ecv3 kamailio_edge[19218]: INFO: <core> [core/tcp_main.c:1584]: _tcpconn_detach(): _tcpconn_detach: tcpconn=0x9edb0e80, tcp id = 19132 ```
I did want to get your thoughts on the current implementation of the `ws_keepalive` functionality. ``` if(wsc->state == WS_S_CLOSING || wsc->awaiting_pong) { LM_WARN("forcibly closing connection\n"); wsconn_close_now(wsc); ```
When `wsconn_close_now` is called, the connection state is set to `S_CONN_BAD`. On subsequent `wsconn_close_now` calls, `tcpconn_get` -> `_tcpconn_find` calls will return 0 due to [this line](https://github.com/dialpad/kamailio/blob/master/src/core/tcp_main.c#L1650): `if ((id==c->id)&&(c->state!=S_CONN_BAD)) {`.
Ie. `_tcpconn_find` skips over connections in the `S_CONN_BAD` state.
Why is `wsconn_close_now` being called multiple times? It looks like `ws_keepalive` isn't removing the websocket connection from the list of open connections. Instead, it's [de-referencing an extra time](https://github.com/kamailio/kamailio/blob/master/src/modules/websocket/ws_fr...). ``` void ws_keepalive(unsigned int ticks, void *param) { ... while(list_head[i].id!=-1) { //For each websocket connection, add a refcount wsc = wsconn_get(list_head[i].id); if(wsc && wsc->last_used < check_time) { if(wsc->state == WS_S_CLOSING || wsc->awaiting_pong) { LM_WARN("forcibly closing connection\n"); //Remove extra refcount? wsconn_close_now(wsc); } else if (ws_keepalive_mechanism == KEEPALIVE_MECHANISM_CONCHECK) { tcp_connection_t *con = tcpconn_get(wsc->id, 0, 0, 0, 0); if(con==NULL) { LM_INFO("tcp connection has been lost\n"); wsc->state = WS_S_CLOSING; } else { tcpconn_put(con); } } else { int opcode = (ws_keepalive_mechanism == KEEPALIVE_MECHANISM_PING) ? OPCODE_PING : OPCODE_PONG; ping_pong(wsc, opcode); } } if(wsc) { //Remove refcount wsconn_put_id(list_head[i].id); } i++;
}
wsconn_put_list_ids(list_head); } ```
What about modifying the keepalive to do something like this instead? ``` for(i = 0; list_head[i].id!=-1; i++) { wsc = wsconn_get(list_head[i].id); if(wsc->state == WS_S_CLOSING) { /* Skip closing connections */ continue; } else if (wsc->awaiting_pong) { LM_WARN("forcibly closing connection, no pong received\n"); wsconn_close_now(wsc); continue; } else if ... ```
On skipping `wsconn_close_now` in the `WS_S_CLOSING` case, should it really be the responsibility of the keepalive to close the connection in that case?