[sr-dev] [kamailio/kamailio] WebSocket keepalive logging "forcibly closing connection" with TCP connection errors (Issue #2990)

Max Wardell notifications at github.com
Wed Jan 5 22:24:11 CET 2022


Hi Daniel,

Thank you for the prompt response. We are looking into migrating to 5.5.x shortly.

We did add some additional debugging and got an interesting result.
```
Dec 31 18:29:08.819765 eqx-sjc-ecv3 kamailio_edge[19096]: WARNING: websocket [ws_frame.c:810]: ws_keepalive(): forcibly closing connection

> grep "tcp id = 19132" /var/log/kamailio/kamailio_edge.log
Dec 31 18:29:07.815876 eqx-sjc-ecv3 kamailio_edge[19096]: INFO: websocket [ws_conn.c:371]: wsconn_close_now(): wsconn_close_now: wsc = 0xa2ac94e0, tcp con = 0x9edb0e80, tcp id = 19132
Dec 31 18:29:08.819965 eqx-sjc-ecv3 kamailio_edge[19096]: INFO: websocket [ws_conn.c:371]: wsconn_close_now(): wsconn_close_now: wsc = 0xa2ac94e0, tcp con = 0x00000000, tcp id = 19132
Dec 31 18:29:09.824694 eqx-sjc-ecv3 kamailio_edge[19096]: INFO: websocket [ws_conn.c:371]: wsconn_close_now(): wsconn_close_now: wsc = 0xa2ac94e0, tcp con = 0x00000000, tcp id = 19132
Dec 31 18:29:10.833513 eqx-sjc-ecv3 kamailio_edge[19096]: INFO: websocket [ws_conn.c:371]: wsconn_close_now(): wsconn_close_now: wsc = 0xa2ac94e0, tcp con = 0x00000000, tcp id = 19132
Dec 31 18:29:11.838090 eqx-sjc-ecv3 kamailio_edge[19096]: INFO: websocket [ws_conn.c:371]: wsconn_close_now(): wsconn_close_now: wsc = 0xa2ac94e0, tcp con = 0x00000000, tcp id = 19132
Dec 31 18:29:12.841246 eqx-sjc-ecv3 kamailio_edge[19096]: INFO: websocket [ws_conn.c:371]: wsconn_close_now(): wsconn_close_now: wsc = 0xa2ac94e0, tcp con = 0x00000000, tcp id = 19132
Dec 31 18:38:20.299126 eqx-sjc-ecv3 kamailio_edge[19218]: INFO: <core> [core/tcp_main.c:1584]: _tcpconn_detach(): _tcpconn_detach: tcpconn=0x9edb0e80, tcp id = 19132
```

I did want to get your thoughts on the current implementation of the `ws_keepalive` functionality.
```
	if(wsc->state == WS_S_CLOSING || wsc->awaiting_pong) {
		LM_WARN("forcibly closing connection\n");
		wsconn_close_now(wsc);
```

When `wsconn_close_now` is called, the connection state is set to `S_CONN_BAD`.
On subsequent `wsconn_close_now` calls, `tcpconn_get` -> `_tcpconn_find` calls will return 0 due to [this line](https://github.com/dialpad/kamailio/blob/master/src/core/tcp_main.c#L1650): `if ((id==c->id)&&(c->state!=S_CONN_BAD)) {`.

Ie. `_tcpconn_find` skips over connections in the `S_CONN_BAD` state.

Why is `wsconn_close_now` being called multiple times? It looks like `ws_keepalive` isn't removing the websocket connection from the list of open connections. Instead, it's [de-referencing an extra time](https://github.com/kamailio/kamailio/blob/master/src/modules/websocket/ws_frame.c#L810).
```
void ws_keepalive(unsigned int ticks, void *param)
{
    ...
    while(list_head[i].id!=-1) {
        //For each websocket connection, add a refcount 
        wsc = wsconn_get(list_head[i].id);
        if(wsc && wsc->last_used < check_time) {
            if(wsc->state == WS_S_CLOSING || wsc->awaiting_pong) {
                LM_WARN("forcibly closing connection\n");
                //Remove extra refcount?
                wsconn_close_now(wsc);
            } else if (ws_keepalive_mechanism == KEEPALIVE_MECHANISM_CONCHECK) {
                tcp_connection_t *con = tcpconn_get(wsc->id, 0, 0, 0, 0);
                if(con==NULL) {
                    LM_INFO("tcp connection has been lost\n");
                    wsc->state = WS_S_CLOSING;
                } else {
                    tcpconn_put(con);
                }
            } else {
                int opcode = (ws_keepalive_mechanism == KEEPALIVE_MECHANISM_PING)
                         ? OPCODE_PING
                         : OPCODE_PONG;
                ping_pong(wsc, opcode);
            }
        }
        if(wsc) {
            //Remove refcount
            wsconn_put_id(list_head[i].id);
        }
        i++;

    }

    wsconn_put_list_ids(list_head);
}
```

What about modifying the keepalive to do something like this instead?
```
for(i = 0; list_head[i].id!=-1; i++) {
    wsc = wsconn_get(list_head[i].id);
    if(wsc->state == WS_S_CLOSING) {
        /* Skip closing connections */
        continue;
  } else if (wsc->awaiting_pong) {
      LM_WARN("forcibly closing connection, no pong received\n");
      wsconn_close_now(wsc);
      continue;
  } else if ...
```

On skipping `wsconn_close_now` in the `WS_S_CLOSING` case, should it really be the responsibility of the keepalive to close the connection in that case?

-- 
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/2990#issuecomment-1006088547
You are receiving this because you are subscribed to this thread.

Message ID: <kamailio/kamailio/issues/2990/1006088547 at github.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kamailio.org/pipermail/sr-dev/attachments/20220105/0eae06a6/attachment.htm>


More information about the sr-dev mailing list