Hi,
please keep the list in CC.
Have you looked already into the ws_keepalive function? It uses another method of iterating over the ws_connection list. It iterates over a ws_connection_id list with an index variable
and uses wsconn_get(..) to get the actual connection.
Just guessing, but maybe this method works better for you.
Cheers,
Henning
--
Henning Westerholt –
https://skalatan.de/blog/
Kamailio services –
https://gilawa.com
From: Andrey Deykunov <deykunov@gmail.com>
Sent: Monday, May 25, 2020 5:26 PM
To: Henning Westerholt <hw@skalatan.de>
Subject: Re: [sr-dev] Fwd: RPC command to close all WS connections.
Hi Henning,
Thanks for your response. Kamailio has crashed when one of our scripts tried getting statistics about websocket module. As I see in core dump, shared memory was previously corrupted:
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /var/lib/ums/sbin/kamailio...done.
[New LWP 7234]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/var/lib/ums/sbin/kamailio -m 2048 -M 12 -P /var/run/kamailio/kamailio.pid -f /'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 __strcmp_sse2_unaligned ()
at ../sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S:32
#0 __strcmp_sse2_unaligned () at ../sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S:32
#1 0x00007eff2ff2e2f5 in rpc_mod_print (rpc=0x7eff2f843540 <binrpc_callbacks>, ctx=0x7ffe484adb38, mname=0x18e4aa9 "websocket", stats=0x18ffd70, flag=2) at mod_stats.c:117
#2 0x00007eff2ff2e0eb in rpc_mod_print_one (rpc=0x7eff2f843540 <binrpc_callbacks>, ctx=0x7ffe484adb38, mname=0x18e4aa9 "websocket", pkg_stats=0x18fe3b0, shm_stats=0x18ffd70, flag=2) at mod_stats.c:159
#3 0x00007eff2ff2dee1 in rpc_mod_mem_stats_mode (rpc=0x7eff2f843540 <binrpc_callbacks>, ctx=0x7ffe484adb38, fmode=0) at mod_stats.c:239
#4 0x00007eff2ff2d84f in rpc_mod_mem_stats (rpc=0x7eff2f843540 <binrpc_callbacks>, ctx=0x7ffe484adb38) at mod_stats.c:251
#5 0x00007eff2f612c80 in process_rpc_req (buf=0x18e4a94 "\241\003\035Cw\274L\221\nmod.stats", size=36, bytes_needed=0x7ffe484adf80, sh=0x7ffe484adef0, saved_state=0x18f4a98) at binrpc_run.c:678
#6 0x00007eff2f60072f in handle_stream_read (s_c=0x18e4a60, idx=-1) at io_listener.c:511
#7 0x00007eff2f5fc121 in handle_io (fm=0x7effb341f0a0, events=1, idx=-1) at io_listener.c:706
#8 0x00007eff2f5fa93a in io_wait_loop_epoll (h=0x7eff2f843348 <io_h>, t=10, repeat=0) at ./../../core/io_wait.h:1062
#9 0x00007eff2f5ee62c in io_listen_loop (fd_no=2, cs_lst=0x17f9940) at io_listener.c:281
#10 0x00007eff2f62472c in mod_child (rank=0) at ctl.c:338
#11 0x0000000000638c14 in init_mod_child (m=0x7effb3297cd8, rank=0) at core/sr_module.c:780
#12 0x000000000063862d in init_mod_child (m=0x7effb32983a0, rank=0) at core/sr_module.c:776
#13 0x000000000063862d in init_mod_child (m=0x7effb3298840, rank=0) at core/sr_module.c:776
#14 0x000000000063862d in init_mod_child (m=0x7effb3298d50, rank=0) at core/sr_module.c:776
#15 0x000000000063862d in init_mod_child (m=0x7effb32991f0, rank=0) at core/sr_module.c:776
#16 0x000000000063862d in init_mod_child (m=0x7effb3299968, rank=0) at core/sr_module.c:776
#17 0x000000000063862d in init_mod_child (m=0x7effb3299fd8, rank=0) at core/sr_module.c:776
#18 0x000000000063862d in init_mod_child (m=0x7effb329a460, rank=0) at core/sr_module.c:776
#19 0x00000000006385b2 in init_child (rank=0) at core/sr_module.c:825
#20 0x000000000043140c in main_loop () at main.c:1753
#21 0x000000000043df6f in main (argc=9, argv=0x7ffe484b22b8) at main.c:2802
I assume that 'ws.close_all' could affect this corruption because failover service uses this command after node becomes passive. Also I'm using my slightly revised variant of ws_keepalive function with KEEPALIVE_MECHANISM_CONCHECK
enabled.
Perhaps, something went wrong while websocket connections were removed within this function:
void ws_keepalive(unsigned int ticks, void *param)
{
int check_time =
(int)time(NULL) - cfg_get(websocket, ws_cfg, keepalive_timeout);
ws_connection_id_t *list_head = NULL;
ws_connection_t *wsc = NULL;
int i = 0;
int idx = (int)(long)param;
LM_INFO("Keepalive tick\n");
/* get an array of pointer to all ws connection */
list_head = wsconn_get_list_ids(idx);
if(!list_head)
return;
while(list_head[i].id!=-1) {
wsc = wsconn_get(list_head[i].id);
if(wsc && wsc->last_used < check_time) {
if(wsc->state == WS_S_CLOSING || wsc->awaiting_pong) {
LM_WARN("forcibly closing connection\n");
wsconn_close_now(wsc);
} else if (ws_keepalive_mechanism == KEEPALIVE_MECHANISM_CONCHECK) {
tcp_connection_t *con = tcpconn_get(wsc->id, 0, 0, 0, 0);
if(con==NULL) {
LM_INFO("tcp connection has been lost -> removing ws_conn\n");
if(wsconn_rm(wsc, WSCONN_EVENTROUTE_YES) < 0)
LM_ERR("removing WebSocket connection\n");
} else {
if (con->state == S_CONN_BAD) {
LM_INFO("tcp connection is bad and supposed to be removed\n");
if(wsconn_rm(wsc, WSCONN_EVENTROUTE_YES) < 0)
LM_ERR("removing WebSocket connection\n");
con->send_flags.f |= SND_F_CON_CLOSE;
con->timeout = get_ticks_raw();
}
tcpconn_put(con);
}
} else {
int opcode = (ws_keepalive_mechanism == KEEPALIVE_MECHANISM_PING)
? OPCODE_PING
: OPCODE_PONG;
ping_pong(wsc, opcode);
}
}
if(wsc) {
wsconn_put_id(list_head[i].id);
}
i++;
}
wsconn_put_list_ids(list_head);
}
Thanks,
Andrey
пн, 25 мая 2020 г. в 16:18, Henning Westerholt <hw@skalatan.de>:
Hi Andrey,
as you know, there is an RPC command ws.close() and also a script function to close sessions remotely. So, it should be certainly possible to close the sessions over RPC.
Not being the author of this particular module, it is hard to say why it crashes. Try the usual debugging techniques, e.g. gdb to find more clues.
If you have a working version, please open a pull request. In case of more questions, just ask on this list again.
Cheers,
Henning
--
Henning Westerholt – https://skalatan.de/blog/
Kamailio services – https://gilawa.com
From: sr-dev <sr-dev-bounces@lists.kamailio.org> On Behalf Of Andrey Deykunov
Sent: Monday, May 18, 2020 1:02 PM
To: Kamailio (SER) - Development Mailing List <sr-dev@lists.kamailio.org>
Subject: [sr-dev] Fwd: RPC command to close all WS connections.
---------- Forwarded message ---------
От: Andrey Deykunov <deykunov@gmail.com>
Date: пн, 18 мая 2020 г. в 12:24
Subject: RPC command to close all WS connections.
To: Daniel-Constantin Mierla <miconda@gmail.com>
Hi Daniel,
We're using two nodes (active and passive) of our PBX server in production. When an active node becomes passive, we should forcibly close all WS connections, established by clients on this node. So, I've added 'ws.close_all' command to websocket module to let our failover service be able closing WS connections remotely.
I've added the following code to ws_frame.c:
void ws_rpc_close_all(rpc_t *rpc, void *ctx)
{
ws_connection_t **list = NULL, **list_head = NULL;
ws_connection_t *wsc = NULL;
int ret;
list_head = wsconn_get_list();
if(!list_head)
return;
list = list_head;
wsc = *list_head;
while(wsc) {
LM_WARN("Closing connection\n");
ret = close_connection(&wsc, LOCAL_CLOSE, 1000, str_status_normal_closure);
wsc = *(++list);
}
wsconn_put_list(list_head);
}
but I think this code may be unsafe and could corrupt shared memory, because I've got some segfaults during failovers after adding this command.
What do you think? Is it possible to close connections properly and safety for shared memory?
Thanks,
Andrey