Hello,
I have one installation with a strange pkg.stats output: root@host:~# kamcmd pkg.stats index 36 { entry: 36 pid: 2599 rank: -4 used: 16234288 free: 15788240 real_used: 16900864 }
After some time I've checked it again and found that used and real_used fields grew up while free wasn't changed: root@host:~# kamcmd pkg.stats index 36 { entry: 36 pid: 2599 rank: -4 used: 19393184 free: 15788240 real_used: 20125968 }
Additional info: - kamailio-4.0.2 started with -m 128 -M 16 - Process:: ID=36 PID=2599 Type=tcp main process - this server works as a websocket(ws,wss) to udp/tcp gateway.
How is it possible that real_used is bigger that -M allows and free value is not changing while used/real_used are growing up.
Didn't check master branch before writing previous email. There were some commits about memory leaks in websocket module. Will try master.
Hello,
can you get the type of the process with 'kamctl ps'?
Cheers, Daniel
On 9/20/13 6:51 PM, Vitaliy Aleksandrov wrote:
Didn't check master branch before writing previous email. There were some commits about memory leaks in websocket module. Will try master.
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
Additional info: - kamailio-4.0.2 started with -m 128 -M 16 - Process:: ID=36 PID=2599 Type=tcp main process - this server works as a websocket(ws,wss) to udp/tcp gateway.
PKG: Also top output showed that RES column of tcp workers was constantly growing.
SHM: After some time kamailio stoped to accept wss connections and printed "ssl bug #1491 workaround: not enough memory for safe operation". According to tls_server.c it's a sign of shm leak.
Master branch has a lot of fixes to websocket module which solve problems with pkg and shm memory:
Is it a good a good idea to use websocket module from the master branch with a 4.0 verion ?
Hello,
can you get the type of the process with 'kamctl ps'?
Cheers, Daniel
On 9/20/13 6:51 PM, Vitaliy Aleksandrov wrote:
Didn't check master branch before writing previous email. There were some commits about memory leaks in websocket module. Will try master.
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
I switched to the latest master branch and it seems it works better, but unfortunately I can't understand how much PKG memory kamailio really uses to know it still has problems with PKG.
For instance "kamcmd pkg.stats" always shows that tcp_main process has free: 32627984 (started with -M 32), while real_used: 22560048 and RES in top output for the same process is 8904.
Hello,
the output of top is not relevant, because kamailio uses an internal memory manager. If system memory is increasing, then it is likely to be from an external library.
I saw there was work on websocket, not being the developer I don't know if it something to be backported. Maybe Peter or Hugh can jump here with clarifications.
Cheers, Daniel
On 9/25/13 12:40 PM, Vitaliy Aleksandrov wrote:
I switched to the latest master branch and it seems it works better, but unfortunately I can't understand how much PKG memory kamailio really uses to know it still has problems with PKG.
For instance "kamcmd pkg.stats" always shows that tcp_main process has free: 32627984 (started with -M 32), while real_used: 22560048 and RES in top output for the same process is 8904.
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
At first thanks for trying to help.
It's my fault that I messed up "top" to this story, just wanted to show that while my system is working just fine: 1. "used" and "real_used" fields of a process (tcp receiver) is bigger that I set in -M 2. "free" hasn't changed from the last restart.
root@proxy:~# kamcmd pkg.stats | grep '2480' -A 4 pid: 2480 rank: 17 used: 34156336 free: 32611216 real_used: 35228672
root@proxy:~# ps aux|grep 2480 kamailio 2480 0.0 0.3 348160 15836 ? S Sep23 0:10 /usr/local/kamailio-master/sbin/kamailio -f /etc/kamailio/kamailio.cfg -P /var/run/kamailio/kamailio.pid -m 128 -M 32 -u kamailio -g kamailio
root@proxy:~# kamctl ps | grep 2480 Process:: ID=22 PID=2480 Type=tcp receiver (generic) child=6
Hi, I fixed some memory leaks in master on 4th July.
The main leak I was investigating was the tcp connection structures used by the websocket module. When the connection is used, the ref count is increased, and should be decreased when each packet/transaction etc has completed. Each connection includes a tcp buffer, so leaking these can use up the memory very fast. I put some extra info into the kamcmd core.tcp_list command to display the refcount. If during use, the refcount is always increasing, then this structure will not be freed when the connection closes, but it will disappear from the list.
I went through and checked that all tcpconn_get and tcpconn_put calls were done, this improved the situation a lot, but I don't think I fully fixed it - we were still getting some increasing refcounts. I sent a message to the dev list (http://lists.sip-router.org/pipermail/sr-dev/2013-July/020624.html) to check if my use of tcpconn_put/get was correct, but no-one agreed or disagreed. Maybe it just needs some extra pairs of eyes on the code.
Hugh
On 26/09/2013 10:18, Daniel-Constantin Mierla wrote:
Hello,
the output of top is not relevant, because kamailio uses an internal memory manager. If system memory is increasing, then it is likely to be from an external library.
I saw there was work on websocket, not being the developer I don't know if it something to be backported. Maybe Peter or Hugh can jump here with clarifications.
Cheers, Daniel
On 9/25/13 12:40 PM, Vitaliy Aleksandrov wrote:
I switched to the latest master branch and it seems it works better, but unfortunately I can't understand how much PKG memory kamailio really uses to know it still has problems with PKG.
For instance "kamcmd pkg.stats" always shows that tcp_main process has free: 32627984 (started with -M 32), while real_used: 22560048 and RES in top output for the same process is 8904.
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
Yes, I found you commit. That's why now I'm using latest master.
tcl_list shows 200+ tcp connections and only a few of them have ref_count bigger that 1. netstat shows the same number of established connections. If lost tcp_conn structures are not shown in tcp_list how can I check if it is my problem and there are lost chunks of memory somewhere ?
When I was searching for the way how correctly decrement ref_count in tcp_conn I decided to call tcpconn_put(), as you did for webcoskets. I found that there are several function to decrement it: - tcpconn_put() - tcpconn_chld_put() - atomic_dec_and_test().
When I was looking for examples in other modules I saw that tcp_read.c uses tcpconn_chld_put(), but forward.h and msg_translator.c uses only tcpconn_get() without decrementing ref counter at all.
I hope someone who knows tcp implementation in kamailio can shed light on this question.
I found one place where tcpconn_put() never called after tcpconn_get(): --- a/msg_translator.c +++ b/msg_translator.c @@ -2509,9 +2509,11 @@ char* via_builder( unsigned int *len, } else if (con->rcv.proto==PROTO_WSS) { memcpy(line_buf+MY_VIA_LEN-4, "WSS ", 4); } else { + tcpconn_put(con); LOG(L_CRIT, "BUG: via_builder: unknown proto %d\n", con->rcv.proto); return 0; } + tcpconn_put(con); }else if (send_info->proto==PROTO_WSS){ memcpy(line_buf+MY_VIA_LEN-4, "WSS ", 4); }else{
I've tried this patch and it fixed my problem with constantly growing ref_count for WSS connections.
Could you please share why nathelper aggregates both WS and WSS transports to "ws" and then msg_translator have to detect the type of a connection to a destination to build correct via ?
modules/nathelper/nathelper.c create_rcv_uri() function : case PROTO_WS: case PROTO_WSS: proto.s = "WS"; proto.len = 2; break;
On 30 September 2013 17:14, Vitaliy Aleksandrov vitalik.voip@gmail.comwrote:
Could you please share why nathelper aggregates both WS and WSS transports to "ws" and then msg_translator have to detect the type of a connection to a destination to build correct via ?
modules/nathelper/nathelper.c create_rcv_uri() function : case PROTO_WS: case PROTO_WSS: proto.s = "WS"; proto.len = 2; break;
Because when the transport is WS (WebSockets over TCP) the URI has a transport parameter like this ";transport=ws" and when the transport is WSS (Secure WebSockets over TLS over TCP) the URI has a transport parameter like this ";transport=ws". In other words, the transport parameter is the same for both and you need to make the determination within Kamailio core by checking how the specified socket is actually used.
Hello,
Thank you for the explanation. Could somebody review the patch in the attachment ? I tried to fix the problem with a growing tcpconn->refcnt for websocket connections.
On 30 September 2013 17:14, Vitaliy Aleksandrov <vitalik.voip@gmail.com mailto:vitalik.voip@gmail.com> wrote:
Could you please share why nathelper aggregates both WS and WSS transports to "ws" and then msg_translator have to detect the type of a connection to a destination to build correct via ? modules/nathelper/nathelper.c create_rcv_uri() function : case PROTO_WS: case PROTO_WSS: proto.s = "WS"; proto.len = 2; break;
Because when the transport is WS (WebSockets over TCP) the URI has a transport parameter like this ";transport=ws" and when the transport is WSS (Secure WebSockets over TLS over TCP) the URI has a transport parameter like this ";transport=ws". In other words, the transport parameter is the same for both and you need to make the determination within Kamailio core by checking how the specified socket is actually used.
-- Peter Dunkley Technical Director Crocodile RCS Ltd
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
Done.
The patch is in master and the 4.0 branch.
Thanks,
Peter
On 1 October 2013 08:22, Vitaliy Aleksandrov vitalik.voip@gmail.com wrote:
Hello,
Thank you for the explanation. Could somebody review the patch in the attachment ? I tried to fix the problem with a growing tcpconn->refcnt for websocket connections.
On 30 September 2013 17:14, Vitaliy Aleksandrov vitalik.voip@gmail.comwrote:
Could you please share why nathelper aggregates both WS and WSS transports to "ws" and then msg_translator have to detect the type of a connection to a destination to build correct via ?
modules/nathelper/nathelper.c create_rcv_uri() function : case PROTO_WS: case PROTO_WSS: proto.s = "WS"; proto.len = 2; break;
Because when the transport is WS (WebSockets over TCP) the URI has a transport parameter like this ";transport=ws" and when the transport is WSS (Secure WebSockets over TLS over TCP) the URI has a transport parameter like this ";transport=ws". In other words, the transport parameter is the same for both and you need to make the determination within Kamailio core by checking how the specified socket is actually used.
-- Peter Dunkley Technical Director Crocodile RCS Ltd
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing listsr-users@lists.sip-router.orghttp://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
On 10/01/2013 12:54 PM, Peter Dunkley wrote:
Done.
The patch is in master and the 4.0 branch.
Thanks,
Peter
Thanks for applying the patch. It looks like websockets in master are stable now. No memory leaks are detected for the last two days. pks.stats still shows some strange numbers, but this is another issue. Hope to get some free time in the nearest future to dig more about memory stats.