### Description On server used WebRTC clients. Some time Kamailio start flooding messages like this ``` INFO: websocket [ws_frame.c:814]: ws_keepalive(): tcp connection has been lost ``` I have enabled debug logs and can see more detailed output. ``` INFO: websocket [ws_frame.c:814]: ws_keepalive(): tcp connection has been lost DEBUG: websocket [ws_conn.c:485]: wsconn_put_id(): wsconn put id [49] DEBUG: websocket [ws_conn.c:490]: wsconn_put_id(): wsc [0xffff7b4af448] refcnt [2] DEBUG: websocket [ws_conn.c:425]: wsconn_put_mode(): wsconn_put start for [0xffff7b4af448] refcnt [2] DEBUG: websocket [ws_conn.c:439]: wsconn_put_mode(): wsconn_put end for [0xffff7b4af448] refcnt [1] DEBUG: websocket [ws_conn.c:461]: wsconn_get(): wsconn_get for id [0] DEBUG: websocket [ws_conn.c:461]: wsconn_get(): wsconn_get for id [0] DEBUG: websocket [ws_conn.c:461]: wsconn_get(): wsconn_get for id [52] DEBUG: websocket [ws_conn.c:467]: wsconn_get(): wsconn_get returns wsc [0xffff7b4df8f8] refcnt [2] INFO: websocket [ws_frame.c:814]: ws_keepalive(): tcp connection has been lost DEBUG: websocket [ws_conn.c:485]: wsconn_put_id(): wsconn put id [52] DEBUG: websocket [ws_conn.c:490]: wsconn_put_id(): wsc [0xffff7b4df8f8] refcnt [2] DEBUG: websocket [ws_conn.c:425]: wsconn_put_mode(): wsconn_put start for [0xffff7b4df8f8] refcnt [2] DEBUG: websocket [ws_conn.c:439]: wsconn_put_mode(): wsconn_put end for [0xffff7b4df8f8] refcnt [1] DEBUG: websocket [ws_conn.c:461]: wsconn_get(): wsconn_get for id [0] DEBUG: websocket [ws_conn.c:461]: wsconn_get(): wsconn_get for id [0] DEBUG: websocket [ws_conn.c:461]: wsconn_get(): wsconn_get for id [0] DEBUG: websocket [ws_conn.c:461]: wsconn_get(): wsconn_get for id [0] DEBUG: websocket [ws_conn.c:461]: wsconn_get(): wsconn_get for id [61] DEBUG: websocket [ws_conn.c:467]: wsconn_get(): wsconn_get returns wsc [0xffff7b57bfa0] refcnt [2] INFO: websocket [ws_frame.c:814]: ws_keepalive(): tcp connection has been lost DEBUG: websocket [ws_conn.c:485]: wsconn_put_id(): wsconn put id [61] DEBUG: websocket [ws_conn.c:490]: wsconn_put_id(): wsc [0xffff7b57bfa0] refcnt [2] DEBUG: websocket [ws_conn.c:425]: wsconn_put_mode(): wsconn_put start for [0xffff7b57bfa0] refcnt [2] DEBUG: websocket [ws_conn.c:439]: wsconn_put_mode(): wsconn_put end for [0xffff7b57bfa0] refcnt [1] DEBUG: websocket [ws_conn.c:461]: wsconn_get(): wsconn_get for id [0] DEBUG: websocket [ws_conn.c:461]: wsconn_get(): wsconn_get for id [0] DEBUG: websocket [ws_conn.c:461]: wsconn_get(): wsconn_get for id [0] DEBUG: websocket [ws_conn.c:461]: wsconn_get(): wsconn_get for id [70] DEBUG: websocket [ws_conn.c:467]: wsconn_get(): wsconn_get returns wsc [0xffff7b5cc770] refcnt [2] INFO: websocket [ws_frame.c:814]: ws_keepalive(): tcp connection has been lost DEBUG: websocket [ws_conn.c:485]: wsconn_put_id(): wsconn put id [70] DEBUG: websocket [ws_conn.c:490]: wsconn_put_id(): wsc [0xffff7b5cc770] refcnt [2] DEBUG: websocket [ws_conn.c:425]: wsconn_put_mode(): wsconn_put start for [0xffff7b5cc770] refcnt [2] DEBUG: websocket [ws_conn.c:439]: wsconn_put_mode(): wsconn_put end for [0xffff7b5cc770] refcnt [1] DEBUG: websocket [ws_conn.c:461]: wsconn_get(): wsconn_get for id [0] DEBUG: websocket [ws_conn.c:461]: wsconn_get(): wsconn_get for id [0] DEBUG: websocket [ws_conn.c:461]: wsconn_get(): wsconn_get for id [76] DEBUG: websocket [ws_conn.c:467]: wsconn_get(): wsconn_get returns wsc [0xffff7b62d0d0] refcnt [2] INFO: websocket [ws_frame.c:814]: ws_keepalive(): tcp connection has been lost ``` On the server installed commit 1d03d9e8312e6e9b494f6d243fe9b2b4ea890eaf ### Troubleshooting I do not know. For me it looks random.
#### Reproduction I do not know. For me it looks random.
#### Debugging Data not applicable
#### Log Messages provided above
#### SIP Traffic not applicable
### Possible Solutions not known
### Additional Information
* **Kamailio Version** Used commit 1d03d9e8312e6e9b494f6d243fe9b2b4ea890eaf
* **Operating System**:
``` [root@sbc-stage-a0 kamailio]# cat /etc/os-release NAME="CentOS Stream" VERSION="8" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="8" PLATFORM_ID="platform:el8" PRETTY_NAME="CentOS Stream 8" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:8" HOME_URL="https://centos.org/" BUG_REPORT_URL="https://bugzilla.redhat.com/" REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux 8" REDHAT_SUPPORT_PRODUCT_VERSION="CentOS Stream" ``` kernel ``` Linux sbc-stage-a0.nga911.com 4.18.0-365.el8.aarch64 #1 SMP Thu Feb 10 16:12:32 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux ```
If only the info message is a concern, then the `debug` parameter can be set to 1. The info logs are sort of special 'debug' messages that can occur many times and the solution is to lower `debug` threshold. It is not unusual that connections are lost because of client side or intermediary nodes like nat or ip routers, firewalls, ...
If you want to suggest turning info to debug message in this case, then over the time we can end up only with two log levels **debug** and **print-always** (or **error**), while the current design with many more log levels expect that **info** level can still be pretty verbose, but not as **debug**.
Besides info messages, I don't see errors, so I am not sure what you mean with the remark in the title that **cannot be terminated**. It may take time to be removed from the internal list, clean up being done on timer. Or maybe you can elaborate more what is the exact problem: the info logs or something else behind that.
Thanks, Daniel @miconda The real issue is that Kamailio stopped processing WebRTC clients/connections and I need to restart Kamailio daemon.
Rate of this info message to high and we do not have such an amount of WebRTC clients. For me, it looks like some loop in the connection handling.
I will try to collect additional info when the issue is reproduced. I want test: 1. TCP SIP clients message processing; 2. UDP client processing; 3. basic HTTP request handling.
Relevant commit e7bd1570e3788e9d04047b4d1d00cb8bd8b172d3 Issue reproduced and kamcmd I can see ``` kamcmd> ws.dump { connections: { 55344: wss:2605:84c0:81:2::101:58431 -> wss:2605:84c0:48:5::6:2856 (state: OPEN, last used 77s ago, sub-protocol: sip) 92612: wss:2605:84c0:81:10::104:61581 -> wss:2605:84c0:48:5::6:2856 (state: OPEN, last used 601s ago, sub-protocol: sip) 92613: wss:2600:1f18:40ed:da30:9265:c6fa:e808:9cf9:50223 -> wss:2605:84c0:48:5::6:2856 (state: OPEN, last used 557s ago, sub-protocol: sip) 87920: wss:2605:84c0:80:1730::104:51995 -> wss:2605:84c0:48:5::6:2856 (state: OPEN, last used 266s ago, sub-protocol: sip) } info: { wscounter: 4 truncated: no } } ``` I closed all connections using `ws.close` and a new connection was created again from the client. The same error message was generated for the new WebSocket connection.
I attached to running process and get backtrache ``` (gdb) bt #0 wsconn_get (id=272) at ws_conn.c:467 #1 0x0000ffff81da9328 in ws_keepalive (ticks=115406406, param=0x2) at ws_frame.c:806 #2 0x00000000006c9d4c in fork_sync_timer (child_id=-1, desc=0xffff81db0ea8 "WEBSOCKET KEEPALIVE", make_sock=1, f=0xffff81da92a4 <ws_keepalive>, param=0x2, interval=1000) at core/timer_proc.c:267 #3 0x0000ffff81d99e94 in child_init (rank=0) at websocket.c:303 #4 0x00000000005f30b8 in init_mod_child (m=0xffff8cc913b0, rank=0) at core/sr_module.c:877 #5 0x00000000005f2c48 in init_mod_child (m=0xffff8cce61e8, rank=0) at core/sr_module.c:870 #6 0x00000000005f2c48 in init_mod_child (m=0xffff8cdb27b0, rank=0) at core/sr_module.c:870 #7 0x00000000005f2c48 in init_mod_child (m=0xffff8cdd8798, rank=0) at core/sr_module.c:870 #8 0x00000000005f2c48 in init_mod_child (m=0xffff8cdd9448, rank=0) at core/sr_module.c:870 #9 0x00000000005f2c48 in init_mod_child (m=0xffff8cddaf00, rank=0) at core/sr_module.c:870 #10 0x00000000005f2c48 in init_mod_child (m=0xffff8ce15c28, rank=0) at core/sr_module.c:870 #11 0x00000000005f2c48 in init_mod_child (m=0xffff8ce76c50, rank=0) at core/sr_module.c:870 #12 0x00000000005f2c48 in init_mod_child (m=0xffff8ce77400, rank=0) at core/sr_module.c:870 #13 0x00000000005f2c48 in init_mod_child (m=0xffff8ceddd18, rank=0) at core/sr_module.c:870 #14 0x00000000005f3ad4 in init_child (rank=0) at core/sr_module.c:931 #15 0x0000000000432c00 in main_loop () at main.c:1841 #16 0x000000000043f088 in main (argc=11, argv=0xffffcd7942a8) at main.c:3074 (gdb) bt full #0 wsconn_get (id=272) at ws_conn.c:467 id_hash = 272 wsc = 0xffff86824708 __func__ = "wsconn_get" #1 0x0000ffff81da9328 in ws_keepalive (ticks=115406406, param=0x2) at ws_frame.c:806 check_time = 1671278465 list_head = 0xffff8ccbec28 wsc = 0x0 i = 0 idx = 2 __func__ = "ws_keepalive" #2 0x00000000006c9d4c in fork_sync_timer (child_id=-1, desc=0xffff81db0ea8 "WEBSOCKET KEEPALIVE", make_sock=1, f=0xffff81da92a4 <ws_keepalive>, param=0x2, interval=1000) at core/timer_proc.c:267 pid = 0 ts1 = 1846502504 ts2 = 1000 #3 0x0000ffff81d99e94 in child_init (rank=0) at websocket.c:303 i = 2 __func__ = "child_init" #4 0x00000000005f30b8 in init_mod_child (m=0xffff8cc913b0, rank=0) at core/sr_module.c:877 ret = 0 __func__ = "init_mod_child" #5 0x00000000005f2c48 in init_mod_child (m=0xffff8cce61e8, rank=0) at core/sr_module.c:870 ret = 65535 __func__ = "init_mod_child" #6 0x00000000005f2c48 in init_mod_child (m=0xffff8cdb27b0, rank=0) at core/sr_module.c:870 ret = 0 __func__ = "init_mod_child" #7 0x00000000005f2c48 in init_mod_child (m=0xffff8cdd8798, rank=0) at core/sr_module.c:870 ret = 0 __func__ = "init_mod_child" #8 0x00000000005f2c48 in init_mod_child (m=0xffff8cdd9448, rank=0) at core/sr_module.c:870 ret = 0 __func__ = "init_mod_child" #9 0x00000000005f2c48 in init_mod_child (m=0xffff8cddaf00, rank=0) at core/sr_module.c:870 ret = 0 __func__ = "init_mod_child" #10 0x00000000005f2c48 in init_mod_child (m=0xffff8ce15c28, rank=0) at core/sr_module.c:870 ret = 0 __func__ = "init_mod_child" #11 0x00000000005f2c48 in init_mod_child (m=0xffff8ce76c50, rank=0) at core/sr_module.c:870 ret = 0 __func__ = "init_mod_child" #12 0x00000000005f2c48 in init_mod_child (m=0xffff8ce77400, rank=0) at core/sr_module.c:870 ret = 0 __func__ = "init_mod_child" #13 0x00000000005f2c48 in init_mod_child (m=0xffff8ceddd18, rank=0) at core/sr_module.c:870 ret = 1 __func__ = "init_mod_child" #14 0x00000000005f3ad4 in init_child (rank=0) at core/sr_module.c:931 ret = -847692824 type = 0x903658 "PROC_MAIN" __func__ = "init_child" #15 0x0000000000432c00 in main_loop () at main.c:1841 i = 8 pid = 5599 si = 0x0 si_desc = "udp receiver child=7 sock=[2605:84c0:48:5::6]:5080\000\214\377\377\000\000 =y\315\377\377\000\000\270\017\201\214\377\377\000\000\360<y\315\377\377\000\000\220U\335\214\377\377\000\000\020=y\315\377\377\000\000 =y\315\377\377\000\000 =y\315\377\377\000\000\360<y\315\377\377\000\000\320\377\377\377\200\377\377\377" nrprocs = 8 woneinit = 1 __func__ = "main_loop" #16 0x000000000043f088 in main (argc=11, argv=0xffffcd7942a8) at main.c:3074 cfg_stream = 0x404c02a0 c = -1 r = 0 tmp = 0xffffcd79fe7a "" tmp_len = 0 port = 0 proto = 65535 ahost = 0x0 aport = 0 options = 0x8c7da0 ":f:cm:M:dVIhEeb:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:" ret = -1 seed = 1316653355 rfd = 4 debug_save = 0 debug_flag = 0 dont_fork_cnt = 2 n_lst = 0xffffffff p = 0xffff8dfe4348 <__libc_start_main+160> "" st = {st_dev = 25, st_ino = 17801, st_mode = 16832, st_nlink = 2, st_uid = 992, st_gid = 987, st_rdev = 0, __pad1 = 0, st_size = 40, st_blksize = 65536, __pad2 = 0, st_blocks = 0, st_atim = {tv_sec = 1669922150, tv_nsec = 859999981}, st_mtim = {tv_sec = 1669922150, tv_nsec = 859999981}, st_ctim = {tv_sec = 1669922150, tv_nsec = 859999981}, __glibc_reserved = {0, 0}} tbuf = '\000' <repeats 56 times>, "x\376\063\216\377\377\000\000h\376\063\216\377\377\000\000\b\376\063\216\377\377\000\000(\376\063\216\377\377\000\000\070\376\063\216\377\377\000\000\250\376\063\216\377\377\000\000\270\376\063\216\377\377\000\000\310\376\063\216\377\377\000\000H\376\063\216\377\377\000\000X\376\063\216\377\377", '\000' <repeats 18 times>, "\330\375\063\216\377\377", '\000' <repeats 42 times>... option_index = 12 long_options = {{name = 0x8ca178 "help", has_arg = 0, flag = 0x0, val = 104}, {name = 0x8c4fe0 "version", has_arg = 0, flag = 0x0, val = 118}, {name = 0x8ca180 "alias", has_arg = 1, flag = 0x0, val = 1024}, {name = 0x8ca188 "subst", has_arg = 1, flag = 0x0, val = 1025}, {name = 0x8ca190 "substdef", has_arg = 1, flag = 0x0, val = 1026}, {name = 0x8ca1a0 "substdefs", has_arg = 1, flag = 0x0, val = 1027}, {name = 0x8ca1b0 "server-id", has_arg = 1, flag = 0x0, val = 1028}, {name = 0x8ca1c0 "loadmodule", has_arg = 1, flag = 0x0, val = 1029}, {name = 0x8ca1d0 "modparam", has_arg = 1, flag = 0x0, val = 1030}, {name = 0x8ca1e0 "log-engine", has_arg = 1, flag = 0x0, val = 1031}, {name = 0x8ca1f0 "debug", has_arg = 1, flag = 0x0, val = 1032}, {name = 0x8ca1f8 "cfg-print", has_arg = 0, flag = 0x0, val = 1033}, {name = 0x8ca208 "atexit", has_arg = 1, flag = 0x0, val = 1034}, {name = 0x0, has_arg = 0, flag = 0x0, val = 0}} __func__ = "main" ```
and websocket print ``` (gdb) p *wsc $3 = {state = WS_S_REMOVING, awaiting_pong = 0, rmticks = 115406401, last_used = 1670038082, used_prev = 0x0, used_next = 0xffff86854bb8, id = 272, id_hash = 272, id_prev = 0x0, id_next = 0x0, rcv = {src_ip = {af = 10, len = 16, u = {addrl = {1153063344836707622, 360569445166350336}, addr32 = {3229877542, 268468480, 0, 83951616}, addr16 = {1318, 49284, 33024, 4096, 0, 0, 0, 1281}, addr = "&\005\204\300\000\201\000\020\000\000\000\000\000\000\001\005"}}, dst_ip = {af = 10, len = 16, u = {addrl = {360367138256717094, 432345564227567616}, addr32 = {3229877542, 83904512, 0, 100663296}, addr16 = {1318, 49284, 18432, 1280, 0, 0, 0, 1536}, addr = "&\005\204\300\000H\000\005\000\000\000\000\000\000\000\006"}}, src_port = 52472, dst_port = 2856, proto_reserved1 = 272, proto_reserved2 = 0, src_su = {s = {sa_family = 10, sa_data = "\314\370\000\000\000\000&\005\204\300\000\201\000\020"}, sin = {sin_family = 10, sin_port = 63692, sin_addr = {s_addr = 0}, sin_zero = "&\005\204\300\000\201\000\020"}, sin6 = {sin6_family = 10, sin6_port = 63692, sin6_flowinfo = 0, sin6_addr = {__in6_u = {__u6_addr8 = "&\005\204\300\000\201\000\020\000\000\000\000\000\000\001\005", __u6_addr16 = {1318, 49284, 33024, 4096, 0, 0, 0, 1281}, __u6_addr32 = {3229877542, 268468480, 0, 83951616}}}, sin6_scope_id = 0}, sas = {ss_family = 10, __ss_padding = "\314\370\000\000\000\000&\005\204\300\000\201\000\020\000\000\000\000\000\000\001\005\000\000\000\000\000\000\000\000\206\000\000\000\000\000\000\000h\257\221", '\000' <repeats 13 times>, "\377\377\377\377\000\000\001\000@\362\223\205\377\377\000\000\200\070y\315\000\000\000\000(BY\000\000\000\000\000\060\070y\315\001\000\000\000\020\306օ\377\377\000\000\000\267\006\215\377\377\000\000H\334\316\203\377\377\000", __ss_align = 7845426436717441312}}, bind_address = 0xffff8c7f5f30, rflags = (unknown: 0), proto = 3 '\003', proto_pad0 = 0 '\000', proto_pad1 = 0}, sub_protocol = 1, refcnt = {val = 2}, run_event = 1, frag_buf = {s = 0xffff86824830 "", len = 0}} ```
For me, it looks like for some reason connection cannot be removed.
Also according to GDB `*wsc` connection from port 52472 but not such connection in kamcmd output ``` kamcmd> [root@ippbx-0b tmp]# kamcmd ws.dump { connections: { 92617: wss:2605:84c0:80:1730::104:52560 -> wss:2605:84c0:48:5::6:2856 (state: OPEN, last used 47s ago, sub-protocol: sip) 92618: wss:2605:84c0:81:2::101:64512 -> wss:2605:84c0:48:5::6:2856 (state: OPEN, last used 48s ago, sub-protocol: sip) 92744: wss:2600:1f18:40ed:da30:9265:c6fa:e808:9cf9:51040 -> wss:2605:84c0:48:5::6:2856 (state: OPEN, last used 48s ago, sub-protocol: sip) 92745: wss:2605:84c0:81:10::104:62050 -> wss:2605:84c0:48:5::6:2856 (state: OPEN, last used 48s ago, sub-protocol: sip) } info: { wscounter: 4 truncated: no } } ```
I have cherry-picked commit e7bd1570e3788e9d04047b4d1d00cb8bd8b172d3 into 5.6 branch and installed it on the server. Now I can see WebSocket ID in Kamailio logs but cannot find it in the log the WebSocket connection is created. Now I can ln the Kamailio log. ``` INFO: WebSocket [ws_frame.c:814]: ws_keepalive(): tcp connection has been lost (id: 34 wsc: 0xffff7e390d90) ``` In the log, I found when the connection created ``` INFO: <script>: websocket|log|opened websocket 34 from 2605:84c0:90:11::101:60157 ``` PCAP for TCP connection via port 60157 is attached. [tcp_60157.pcap.gz](https://github.com/kamailio/kamailio/files/10417302/tcp_60157.pcap.gz)
For me looks like the WebSocket TCP connection closed properly. But Komilio is cleaning not all memory structures related to this connection.
I created this GDB script ``` define check_wsc while(wsc->state == WS_S_REMOVING) p wsc->id p wsc->rmticks cont end end
at 183996 b ws_conn.c:694 condition 1 wsc->state == WS_S_REMOVING set pagination off c check_wsc ``` Where I watch the process with 183996 is `WEBSOCKET TIMER`. ``` [root@ippbx-0a ~]# kamcmd ps | grep WEBSOCKET 183989 WEBSOCKET KEEPALIVE 183992 WEBSOCKET KEEPALIVE 183994 WEBSOCKET KEEPALIVE 183996 WEBSOCKET TIMER ```
Script output shows the webssocket `rmticks` always increased and timeout for `rm_delay_interval` never fired. For me that is a reason why websocket connection cannot be closed.
``` (gdb) c Continuing.
Breakpoint 1, ws_timer (ticks=78432190, param=0x0) at ws_conn.c:694 694 if(wsc->state == WS_S_REMOVING (gdb) check_wsc $1 = 541 $2 = 78432190
Breakpoint 1, ws_timer (ticks=78432191, param=0x0) at ws_conn.c:694 694 if(wsc->state == WS_S_REMOVING $13 = 541 $14 = 78432191
Breakpoint 1, ws_timer (ticks=78432192, param=0x0) at ws_conn.c:694 694 if(wsc->state == WS_S_REMOVING $25 = 541 $26 = 78432192
Breakpoint 1, ws_timer (ticks=78432193, param=0x0) at ws_conn.c:694 694 if(wsc->state == WS_S_REMOVING $37 = 541 $38 = 78432193
Breakpoint 1, ws_timer (ticks=78432194, param=0x0) at ws_conn.c:694 694 if(wsc->state == WS_S_REMOVING $49 = 541 $50 = 78432194
Breakpoint 1, ws_timer (ticks=78432195, param=0x0) at ws_conn.c:694 694 if(wsc->state == WS_S_REMOVING $61 = 541 $62 = 78432195 ```
This happens because of race conditions with websocket state. [ws_frame.c:816](https://github.com/kamailio/kamailio/blob/ac4c2d4579f121395876de2cc6f7649f3c...) - setting socked state `WS_S_CLOSING` [ws_conn.c:703](https://github.com/kamailio/kamailio/blob/master/src/modules/websocket/ws_co...) - setting socket state `WS_S_REMOVING` and updates `rmticks` value.
I will create a pull request to break race conditions.
Related commit pushed a while ago.
Closed #3278 as completed.