Hi everybody,
regarding our TCP/TLS stability problems we have no decided to make test with kamailio 1.5.1 Nevertheless it would be interesting if there is a chance to get rid of this problems.
Is anybody using TLS?
Used modules: SNMP, mySQL
Summary of problems Errors may be related to the following log file entries
un 17 09:01:27 si-.... /usr/local/sbin/openser[13921]: WARNING:core:send2child: no free tcp receiver, connection passed to the leastbusy one (6)
Jun 17 08:54:52 si-.... /usr/local/sbin/openser[13921]: ERROR:core:tcpconn_new: shared memory allocation failure
Jun 17 08:54:52 si-... /usr/local/sbin/openser[13921]: ERROR:core:handle_new_connect: tcpconn_new failed, closing socket
And a few of these also (7613 times):
Jun 17 08:57:24 si-... /usr/local/sbin/openser[13880]: ERROR:core:tls_accept: some error in SSL:
Jun 17 08:57:24 si-... /usr/local/sbin/openser[13880]: ERROR:core:tls_print_errstack: error:1409C041:SSL routines:SSL3_SETUP_BUFFERS:malloc failure
shared memory consumption shared memory is continously increasing (set to 1024) PKG_MEM is 1 MB
high CPU load for some openser processes normally after some days we get a high CPU load (50-90%) for a small number of the openser processes It looks like an endless loop and requires restart of openser There may be an endless loop in
Pass_fd.c
again:
ret=sendmsg(unix_socket, &msg, 0);
if (ret<0){
if (errno==EINTR) goto again;
LM_CRIT("sendmsg failed on %d: %s\n", unix_socket, strerror(errno));
}
any comments on that?
Mit besten Grüßen | Best regards Albert Munder Robert Bosch GmbH IT Systems Engineering (CI/ISE) Postfach 30 02 20 70442 Stuttgart GERMANY www.bosch.com Tel. +49 711 811-40562 Fax +49 711 811-5113333 Albert.Munder@de.bosch.com Robert Bosch GmbH, Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart HRB 14000 Aufsichtsratsvorsitzender: Hermann Scholl; Geschäftsführung: Franz Fehrenbach, Siegfried Dais; Bernd Bohr, Wolfgang Chur, Rudolf Colm, Gerhard Kümmel, Wolfgang Malchow, Peter Marks; Volkmar Denner, Peter Tyroller.
________________________________ Von: Henning Westerholt [mailto:henning.westerholt@1und1.de] Gesendet: Dienstag, 30. Juni 2009 17:25 An: users@lists.kamailio.org Cc: Munder Albert (CI/ISE) Betreff: Re: [Kamailio-Users] OpenSER stability problems in pilot project
On Dienstag, 30. Juni 2009, Munder Albert (CI/ISE) wrote:
[..] We are running OpenSER in a pilot project and unfortunately have some stability problems.
Hallo Albert,
- Appr. 5000 subscriber accounts
- Appr. 1200 simultaneously registered users
- Signalling encrypted with TLS
- Media data encrypted with SRTP
- Clients: softphones and hardphones
- Re-registration time for clients: 3600 sec
I've not that much experience with TCP, but don't think that this numbers should be a problem in a setup like this.
OpenSER configuration · Works as stateful SIP Proxy 1 mySQL database 2 Version 1.3.4.-TLS 3 Tcp_children: 100 --> is it recommended to increase this number?
This are quite a lot of children, but ok.
4 Udp_children: 20 5 Tcp_connection_timeout: 3600 6 Shared memory: · -m 512 when error occurred 1 Now set to 1024
How much PKG_MEM do you use? The default value?
Problems
- Shared memory consumption
Shared memory usage is permanently increasing (about 50 MB per day) Application already crashed twice
This could be a memory leak, what modules do you use? And do you use any proprietary modules? You could use the memory debugging to further investigate this: http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:memory
First messages were, these, repeated thousands of times (5915 times): Jun 17 08:54:52 si-.... /usr/local/sbin/openser[13921]: ERROR:core:tcpconn_new: shared memory allocation failure Jun 17 08:54:52 si-... /usr/local/sbin/openser[13921]: ERROR:core:handle_new_connect: tcpconn_new failed, closing socket And a few of these also (7613 times): Jun 17 08:57:24 si-... /usr/local/sbin/openser[13880]: ERROR:core:tls_accept: some error in SSL: Jun 17 08:57:24 si-... /usr/local/sbin/openser[13880]: ERROR:core:tls_print_errstack: error:1409C041:SSL routines:SSL3_SETUP_BUFFERS:malloc failure
This are caused from insufficient memory conditions. I can't comment on the TCP and TLS errors. But before really starting to investigate this problem, would it be possible for you to use a more recent version, e.g. kamailio 1.5.1 for testing?
- TCP errors, lost SIP messages
Examples from error messages: 14.100 times in log file from 17.06.09 Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]: ERROR:core:tcp_blocking_connect: poll error: flags 18 Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]: ERROR:core:tcp_blocking_connect: failed to retrieve SO_ERROR (111) Connection refused Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]: ERROR:core:tcpconn_connect: tcp_blocking_connect failed Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]: ERROR:core:tcp_send: connect failed Jun 17 04:03:15 si-.. /usr/local/sbin/openser[13863]: ERROR:tm:msg_send: tcp_send failed Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]: ERROR:tm:t_forward_nonack: sending request failed
Appears at least 20 000 times; and in the day of the last shared memory errors, it was 225.794 times in the log file (note that the number in parenthesis is usually 1 or 2, but on that day it has reached 6): Jun 17 09:01:27 si-.... /usr/local/sbin/openser[13921]: WARNING:core:send2child: no free tcp receiver, connection passed to the leastbusy one (6) Jun 17 09:01:27 si-... /usr/local/sbin/openser[13921]: WARNING:core:send2child: no free tcp receiver, connection passed to the leastbusy one (5)
- Certificate validation problems
TCP traffic is currently significantly increased by some ( appr. 70) clients which failed to validate the TLS certificate. Registration is repeated every 5 sec.
Circa 30 thousand per day (on that day, it was 37.162 times in log) Jun 17 04:03:10 si-024lc008 /usr/local/sbin/openser[13801]: ERROR:core:tls_accept: some error in SSL: Jun 17 04:03:10 si-024lc008 /usr/local/sbin/openser[13801]: ERROR:core:tls_print_errstack: error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca
Best regards,
Henning