Description

We had Kamailio 5.1.4 with websocket module. Unfortunately, our clients don't support websocket keepalive mechanism at all, so I used TCP keepalive instead with the following parameters:

tcp_keepalive=yes
tcp_keepcnt=6
tcp_keepidle=60
tcp_keepintvl=10

and set up KEEPALIVE_MECHANISM_NONE:

modparam("websocket", "keepalive_mechanism", 0)

During load testing and debugging, when 8k clients sent registrations, it was found out that shared memory was not freed after closing connections (ws_connection_list_t *wsconn_used_list variable in ws_conn.c ).

Possible Solutions

I've decided to add new keepalive mechanism that periodically checks TCP connection related to websocket:

enum
{
    KEEPALIVE_MECHANISM_NONE = 0,
    KEEPALIVE_MECHANISM_PING = 1,
    KEEPALIVE_MECHANISM_PONG = 2,
    KEEPALIVE_MECHANISM_TCP_CONN_CHECK = 3
};  

and added the line to config:

# Enable custom tcp-connection-health based keepalive mechanism (3)
# KEEPALIVE_MECHANISM_NONE = 0,
# KEEPALIVE_MECHANISM_PING = 1,
# KEEPALIVE_MECHANISM_PONG = 2
# KEEPALIVE_MECHANISM_TCP_CONN_CHECK = 3
modparam("websocket", "keepalive_mechanism", 3)

Also, I've implemented the mechanism in ws_keepalive function:

void ws_keepalive(unsigned int ticks, void *param)
{
    int check_time =
            (int)time(NULL) - cfg_get(websocket, ws_cfg, keepalive_timeout);

    ws_connection_t **list = NULL, **list_head = NULL;
    ws_connection_t *wsc = NULL;

    /* get an array of pointer to all ws connection */
    list_head = wsconn_get_list();
    if(!list_head)
        return;

    list = list_head;
    wsc = *list_head;
    while(wsc && wsc->last_used < check_time) {
        if (ws_keepalive_mechanism == KEEPALIVE_MECHANISM_TCP_CONN_CHECK) {
            struct tcp_connection *con = tcpconn_get(wsc->id, 0, 0, 0, 0);
            if(!con) {
                LM_INFO("tcp connection has been lost\n");
                wsc->state = WS_S_CLOSING;
            }
        }

        if(wsc->state == WS_S_CLOSING || wsc->awaiting_pong) {
            LM_INFO("forcibly closing connection\n");
            wsconn_close_now(wsc);
        } else {
            int opcode = (ws_keepalive_mechanism == KEEPALIVE_MECHANISM_PING)
                                ? OPCODE_PING
                                : OPCODE_PONG;
            ping_pong(wsc, opcode);
        }

        wsc = *(++list);
    }

    wsconn_put_list(list_head);
}

and changed memory allocation method in wsconn_get_list and wsconn_put_list methods from pkg to shm, because, as it turned out during load testing, using pkg_malloc (the C malloc) in this functions may cousing fails under serious loads.

These modifications solved the problem. But about a week ago we've started switching to ver. 5.2.1 and found a lot of changes in the websocket module. So, I've added my changes in this commit korizza@b3e03d0 . Please take a look.

Additional Information

Adding ws_conn_put_id in this commit a975bca#diff-59c50f19ab1ccf4afe10617cdc346bc2 did not solve problem with ref counter increasing.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.