Hi Daniel,

Thank you for the prompt response. We are looking into migrating to 5.5.x shortly.

We did add some additional debugging and got an interesting result.

Dec 31 18:29:08.819765 eqx-sjc-ecv3 kamailio_edge[19096]: WARNING: websocket [ws_frame.c:810]: ws_keepalive(): forcibly closing connection

> grep "tcp id = 19132" /var/log/kamailio/kamailio_edge.log
Dec 31 18:29:07.815876 eqx-sjc-ecv3 kamailio_edge[19096]: INFO: websocket [ws_conn.c:371]: wsconn_close_now(): wsconn_close_now: wsc = 0xa2ac94e0, tcp con = 0x9edb0e80, tcp id = 19132
Dec 31 18:29:08.819965 eqx-sjc-ecv3 kamailio_edge[19096]: INFO: websocket [ws_conn.c:371]: wsconn_close_now(): wsconn_close_now: wsc = 0xa2ac94e0, tcp con = 0x00000000, tcp id = 19132
Dec 31 18:29:09.824694 eqx-sjc-ecv3 kamailio_edge[19096]: INFO: websocket [ws_conn.c:371]: wsconn_close_now(): wsconn_close_now: wsc = 0xa2ac94e0, tcp con = 0x00000000, tcp id = 19132
Dec 31 18:29:10.833513 eqx-sjc-ecv3 kamailio_edge[19096]: INFO: websocket [ws_conn.c:371]: wsconn_close_now(): wsconn_close_now: wsc = 0xa2ac94e0, tcp con = 0x00000000, tcp id = 19132
Dec 31 18:29:11.838090 eqx-sjc-ecv3 kamailio_edge[19096]: INFO: websocket [ws_conn.c:371]: wsconn_close_now(): wsconn_close_now: wsc = 0xa2ac94e0, tcp con = 0x00000000, tcp id = 19132
Dec 31 18:29:12.841246 eqx-sjc-ecv3 kamailio_edge[19096]: INFO: websocket [ws_conn.c:371]: wsconn_close_now(): wsconn_close_now: wsc = 0xa2ac94e0, tcp con = 0x00000000, tcp id = 19132
Dec 31 18:38:20.299126 eqx-sjc-ecv3 kamailio_edge[19218]: INFO: <core> [core/tcp_main.c:1584]: _tcpconn_detach(): _tcpconn_detach: tcpconn=0x9edb0e80, tcp id = 19132

I did want to get your thoughts on the current implementation of the ws_keepalive functionality.

	if(wsc->state == WS_S_CLOSING || wsc->awaiting_pong) {
		LM_WARN("forcibly closing connection\n");
		wsconn_close_now(wsc);

When wsconn_close_now is called, the connection state is set to S_CONN_BAD.
On subsequent wsconn_close_now calls, tcpconn_get -> _tcpconn_find calls will return 0 due to this line: if ((id==c->id)&&(c->state!=S_CONN_BAD)) {.

Ie. _tcpconn_find skips over connections in the S_CONN_BAD state.

Why is wsconn_close_now being called multiple times? It looks like ws_keepalive isn't removing the websocket connection from the list of open connections. Instead, it's de-referencing an extra time.

void ws_keepalive(unsigned int ticks, void *param)
{
    ...
    while(list_head[i].id!=-1) {
        //For each websocket connection, add a refcount 
        wsc = wsconn_get(list_head[i].id);
        if(wsc && wsc->last_used < check_time) {
            if(wsc->state == WS_S_CLOSING || wsc->awaiting_pong) {
                LM_WARN("forcibly closing connection\n");
                //Remove extra refcount?
                wsconn_close_now(wsc);
            } else if (ws_keepalive_mechanism == KEEPALIVE_MECHANISM_CONCHECK) {
                tcp_connection_t *con = tcpconn_get(wsc->id, 0, 0, 0, 0);
                if(con==NULL) {
                    LM_INFO("tcp connection has been lost\n");
                    wsc->state = WS_S_CLOSING;
                } else {
                    tcpconn_put(con);
                }
            } else {
                int opcode = (ws_keepalive_mechanism == KEEPALIVE_MECHANISM_PING)
                         ? OPCODE_PING
                         : OPCODE_PONG;
                ping_pong(wsc, opcode);
            }
        }
        if(wsc) {
            //Remove refcount
            wsconn_put_id(list_head[i].id);
        }
        i++;

    }

    wsconn_put_list_ids(list_head);
}

What about modifying the keepalive to do something like this instead?

for(i = 0; list_head[i].id!=-1; i++) {
    wsc = wsconn_get(list_head[i].id);
    if(wsc->state == WS_S_CLOSING) {
        /* Skip closing connections */
        continue;
  } else if (wsc->awaiting_pong) {
      LM_WARN("forcibly closing connection, no pong received\n");
      wsconn_close_now(wsc);
      continue;
  } else if ...

On skipping wsconn_close_now in the WS_S_CLOSING case, should it really be the responsibility of the keepalive to close the connection in that case?


Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you are subscribed to this thread.Message ID: <kamailio/kamailio/issues/2990/1006088547@github.com>