Hello,
Sorry, next try with posting this message. Attempts with attachment failed due to message size. We are running OpenSER in a pilot project and unfortunately have some stability problems. Any help or hints are appreciated.
Project information OpenSER is used in a pilot project with * Appr. 5000 subscriber accounts * Appr. 1200 simultaneously registered users * Signalling encrypted with TLS * Media data encrypted with SRTP * Clients: softphones and hardphones * Re-registration time for clients: 3600 sec
OpenSER configuration · Works as stateful SIP Proxy 1 mySQL database 2 Version 1.3.4.-TLS 3 Tcp_children: 100 --> is it recommended to increase this number? 4 Udp_children: 20 5 Tcp_connection_timeout: 3600 6 Shared memory: · -m 512 when error occurred 1 Now set to 1024
Problems * Shared memory consumption Shared memory usage is permanently increasing (about 50 MB per day) Application already crashed twice
First messages were, these, repeated thousands of times (5915 times): Jun 17 08:54:52 si-.... /usr/local/sbin/openser[13921]: ERROR:core:tcpconn_new: shared memory allocation failure Jun 17 08:54:52 si-... /usr/local/sbin/openser[13921]: ERROR:core:handle_new_connect: tcpconn_new failed, closing socket And a few of these also (7613 times): Jun 17 08:57:24 si-... /usr/local/sbin/openser[13880]: ERROR:core:tls_accept: some error in SSL: Jun 17 08:57:24 si-... /usr/local/sbin/openser[13880]: ERROR:core:tls_print_errstack: error:1409C041:SSL routines:SSL3_SETUP_BUFFERS:malloc failure
* TCP errors, lost SIP messages
Examples from error messages: 14.100 times in log file from 17.06.09 Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]: ERROR:core:tcp_blocking_connect: poll error: flags 18 Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]: ERROR:core:tcp_blocking_connect: failed to retrieve SO_ERROR (111) Connection refused Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]: ERROR:core:tcpconn_connect: tcp_blocking_connect failed Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]: ERROR:core:tcp_send: connect failed Jun 17 04:03:15 si-.. /usr/local/sbin/openser[13863]: ERROR:tm:msg_send: tcp_send failed Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]: ERROR:tm:t_forward_nonack: sending request failed
Appears at least 20 000 times; and in the day of the last shared memory errors, it was 225.794 times in the log file (note that the number in parenthesis is usually 1 or 2, but on that day it has reached 6): Jun 17 09:01:27 si-.... /usr/local/sbin/openser[13921]: WARNING:core:send2child: no free tcp receiver, connection passed to the leastbusy one (6) Jun 17 09:01:27 si-... /usr/local/sbin/openser[13921]: WARNING:core:send2child: no free tcp receiver, connection passed to the leastbusy one (5)
* Certificate validation problems TCP traffic is currently significantly increased by some ( appr. 70) clients which failed to validate the TLS certificate. Registration is repeated every 5 sec.
Circa 30 thousand per day (on that day, it was 37.162 times in log) Jun 17 04:03:10 si-024lc008 /usr/local/sbin/openser[13801]: ERROR:core:tls_accept: some error in SSL: Jun 17 04:03:10 si-024lc008 /usr/local/sbin/openser[13801]: ERROR:core:tls_print_errstack: error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca
Mit besten Grüßen | Best regards Albert Munder Robert Bosch GmbH IT Systems Engineering (CI/ISE) Postfach 30 02 20 70442 Stuttgart GERMANY www.bosch.com Tel. +49 711 811-40562 Fax +49 711 811-5113333 Albert.Munder@de.bosch.com Robert Bosch GmbH, Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart HRB 14000 Aufsichtsratsvorsitzender: Hermann Scholl; Geschäftsführung: Franz Fehrenbach, Siegfried Dais; Bernd Bohr, Wolfgang Chur, Rudolf Colm, Gerhard Kümmel, Wolfgang Malchow, Peter Marks; Volkmar Denner, Peter Tyroller.
You report that this pilot is using 1.3.4-TLS. The last available version is 1.5.1. Any special reason to use this 'old' version?
A time ago (a month?) someone reported to the list shared memory problems... maybe are related... From what I remember upgrade to the last SVN was the right solution.
Edson.
Munder Albert (CI/ISE) escreveu:
Hello,
Sorry, next try with posting this message. Attempts with attachment failed due to message size. We are running OpenSER in a pilot project and unfortunately have some stability problems. Any help or hints are appreciated.
Project information OpenSER is used in a pilot project with
* Appr. 5000 subscriber accounts * Appr. 1200 simultaneously registered users * Signalling encrypted with TLS * Media data encrypted with SRTP * Clients: softphones and hardphones * Re-registration time for clients: 3600 sec
OpenSER configuration
* Works as stateful SIP Proxy * mySQL database * Version 1.3.4.-TLS * Tcp_children: 100 --> is it recommended to increase this number? * Udp_children: 20 * Tcp_connection_timeout: 3600 * Shared memory: * -m 512 when error occurred * Now set to 1024
*Problems*
* Shared memory consumption
Shared memory usage is permanently increasing (about 50 MB per day) Application already crashed twice
First messages were, these, repeated thousands of times (5915 times): Jun 17 08:54:52 si-…. /usr/local/sbin/openser[13921]: ERROR:core:tcpconn_new: shared memory allocation failure Jun 17 08:54:52 si-… /usr/local/sbin/openser[13921]: ERROR:core:handle_new_connect: tcpconn_new failed, closing socket And a few of these also (7613 times): Jun 17 08:57:24 si-… /usr/local/sbin/openser[13880]: ERROR:core:tls_accept: some error in SSL: Jun 17 08:57:24 si-… /usr/local/sbin/openser[13880]: ERROR:core:tls_print_errstack: error:1409C041:SSL routines:SSL3_SETUP_BUFFERS:malloc failure
* TCP errors, lost SIP messages
Examples from error messages: 14.100 times in log file from 17.06.09 Jun 17 04:03:15 si-… /usr/local/sbin/openser[13863]: ERROR:core:tcp_blocking_connect: poll error: flags 18 Jun 17 04:03:15 si-… /usr/local/sbin/openser[13863]: ERROR:core:tcp_blocking_connect: failed to retrieve SO_ERROR (111) Connection refused Jun 17 04:03:15 si-… /usr/local/sbin/openser[13863]: ERROR:core:tcpconn_connect: tcp_blocking_connect failed Jun 17 04:03:15 si-… /usr/local/sbin/openser[13863]: ERROR:core:tcp_send: connect failed Jun 17 04:03:15 si-.. /usr/local/sbin/openser[13863]: ERROR:tm:msg_send: tcp_send failed Jun 17 04:03:15 si-… /usr/local/sbin/openser[13863]: ERROR:tm:t_forward_nonack: sending request failed
Appears at least 20 000 times; and in the day of the last shared memory errors, it was 225.794 times in the log file (note that the number in parenthesis is usually 1 or 2, but on that day it has reached 6): Jun 17 09:01:27 si-…. /usr/local/sbin/openser[13921]: WARNING:core:send2child: no free tcp receiver, connection passed to the leastbusy one (6) Jun 17 09:01:27 si-… /usr/local/sbin/openser[13921]: WARNING:core:send2child: no free tcp receiver, connection passed to the leastbusy one (5)
* Certificate validation problems
TCP traffic is currently significantly increased by some ( appr. 70) clients which failed to validate the TLS certificate. Registration is repeated every 5 sec.
Circa 30 thousand per day (on that day, it was 37.162 times in log) Jun 17 04:03:10 si-024lc008 /usr/local/sbin/openser[13801]: ERROR:core:tls_accept: some error in SSL: Jun 17 04:03:10 si-024lc008 /usr/local/sbin/openser[13801]: ERROR:core:tls_print_errstack: error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca
Mit besten Grüßen | Best regards *Albert Munder* Robert Bosch GmbH IT Systems Engineering (CI/ISE) Postfach 30 02 20 70442 Stuttgart GERMANY _www.bosch.com_ Tel. +49 711 811-40562 Fax +49 711 811-5113333 _Albert.Munder@de.bosch.com_ Robert Bosch GmbH, Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart HRB 14000 Aufsichtsratsvorsitzender: Hermann Scholl; Geschäftsführung: Franz Fehrenbach, Siegfried Dais; Bernd Bohr, Wolfgang Chur, Rudolf Colm, Gerhard Kümmel, Wolfgang Malchow, Peter Marks; Volkmar Denner, Peter Tyroller.
Kamailio (OpenSER) - Users mailing list Users@lists.kamailio.org http://lists.kamailio.org/cgi-bin/mailman/listinfo/users http://lists.openser-project.org/cgi-bin/mailman/listinfo/users
Edson - Lists schrieb:
You report that this pilot is using 1.3.4-TLS. The last available version is 1.5.1. Any special reason to use this 'old' version?
A time ago (a month?) someone reported to the list shared memory problems... maybe are related... From what I remember upgrade to the last SVN was the right solution.
I think it was me reporting problems with shared memory. We had a problem accessing htable from perl. We solved this by temporarily saving the values from htable in AVPs and accessing AVPs from perl:
http://www.mail-archive.com/users@lists.kamailio.org/msg04818.html
Perhaps you are trying something similar?
Christian
On Dienstag, 30. Juni 2009, Munder Albert (CI/ISE) wrote:
[..] We are running OpenSER in a pilot project and unfortunately have some stability problems.
Hallo Albert,
Appr. 5000 subscriber accounts
Appr. 1200 simultaneously registered users
Signalling encrypted with TLS
Media data encrypted with SRTP
Clients: softphones and hardphones
Re-registration time for clients: 3600 sec
I've not that much experience with TCP, but don't think that this numbers should be a problem in a setup like this.
OpenSER configuration · Works as stateful SIP Proxy 1 mySQL database 2 Version 1.3.4.-TLS 3 Tcp_children: 100 --> is it recommended to increase this number?
This are quite a lot of children, but ok.
4 Udp_children: 20 5 Tcp_connection_timeout: 3600 6 Shared memory: · -m 512 when error occurred 1 Now set to 1024
How much PKG_MEM do you use? The default value?
Problems
Shared memory consumption
Shared memory usage is permanently increasing (about 50 MB per day) Application already crashed twice
This could be a memory leak, what modules do you use? And do you use any proprietary modules? You could use the memory debugging to further investigate this: http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:memory
First messages were, these, repeated thousands of times (5915 times): Jun 17 08:54:52 si-.... /usr/local/sbin/openser[13921]: ERROR:core:tcpconn_new: shared memory allocation failure Jun 17 08:54:52 si-... /usr/local/sbin/openser[13921]: ERROR:core:handle_new_connect: tcpconn_new failed, closing socket And a few of these also (7613 times): Jun 17 08:57:24 si-... /usr/local/sbin/openser[13880]: ERROR:core:tls_accept: some error in SSL: Jun 17 08:57:24 si-... /usr/local/sbin/openser[13880]: ERROR:core:tls_print_errstack: error:1409C041:SSL routines:SSL3_SETUP_BUFFERS:malloc failure
This are caused from insufficient memory conditions. I can't comment on the TCP and TLS errors. But before really starting to investigate this problem, would it be possible for you to use a more recent version, e.g. kamailio 1.5.1 for testing?
TCP errors, lost SIP messages
Examples from error messages: 14.100 times in log file from 17.06.09 Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]: ERROR:core:tcp_blocking_connect: poll error: flags 18 Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]: ERROR:core:tcp_blocking_connect: failed to retrieve SO_ERROR (111) Connection refused Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]: ERROR:core:tcpconn_connect: tcp_blocking_connect failed Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]: ERROR:core:tcp_send: connect failed Jun 17 04:03:15 si-.. /usr/local/sbin/openser[13863]: ERROR:tm:msg_send: tcp_send failed Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]: ERROR:tm:t_forward_nonack: sending request failed
Appears at least 20 000 times; and in the day of the last shared memory errors, it was 225.794 times in the log file (note that the number in parenthesis is usually 1 or 2, but on that day it has reached 6): Jun 17 09:01:27 si-.... /usr/local/sbin/openser[13921]: WARNING:core:send2child: no free tcp receiver, connection passed to the leastbusy one (6) Jun 17 09:01:27 si-... /usr/local/sbin/openser[13921]: WARNING:core:send2child: no free tcp receiver, connection passed to the leastbusy one (5)
Certificate validation problems
TCP traffic is currently significantly increased by some ( appr. 70) clients which failed to validate the TLS certificate. Registration is repeated every 5 sec.
Circa 30 thousand per day (on that day, it was 37.162 times in log) Jun 17 04:03:10 si-024lc008 /usr/local/sbin/openser[13801]: ERROR:core:tls_accept: some error in SSL: Jun 17 04:03:10 si-024lc008 /usr/local/sbin/openser[13801]: ERROR:core:tls_print_errstack: error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca
Best regards,
Henning
Henning Westerholt schrieb:
On Dienstag, 30. Juni 2009, Munder Albert (CI/ISE) wrote:
[..] We are running OpenSER in a pilot project and unfortunately have some stability problems.
Hallo Albert,
- Appr. 5000 subscriber accounts
- Appr. 1200 simultaneously registered users
- Signalling encrypted with TLS
- Media data encrypted with SRTP
- Clients: softphones and hardphones
- Re-registration time for clients: 3600 sec
How is the network topology? Are there NAT or Firewalls between the phones and the SIP proxy?
I yes (that means that the SIP proxy can not establish TCP connections to the SIP phones), you should have NAT keepalive activated in the clients. Further make sure that the SIP proxy does not close idle TCP connections, use: http://www.kamailio.net/docs/modules/1.5.x/registrar.html#id2477171
I've not that much experience with TCP, but don't think that this numbers should be a problem in a setup like this.
OpenSER configuration · Works as stateful SIP Proxy 1 mySQL database 2 Version 1.3.4.-TLS
Why do you use an old (unmaintained) version? Update ...
3 Tcp_children: 100 --> is it recommended to increase this number?
This are quite a lot of children, but ok.
It depends on how much memory you have. :-)
I always had problems with children > 30. I think it is not necessary to have more then 10 children.
4 Udp_children: 20
same here.
5 Tcp_connection_timeout: 3600
much too high. This can block a process up to 1 hour. Set it to 1 or 2. Also set tcp_send_timeout to 1 or 2.
6 Shared memory: · -m 512 when error occurred 1 Now set to 1024
How much PKG_MEM do you use? The default value?
Problems
- Shared memory consumption
Shared memory usage is permanently increasing (about 50 MB per day) Application already crashed twice
This could be a memory leak, what modules do you use? And do you use any proprietary modules? You could use the memory debugging to further investigate this: http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:memory
First messages were, these, repeated thousands of times (5915 times): Jun 17 08:54:52 si-.... /usr/local/sbin/openser[13921]: ERROR:core:tcpconn_new: shared memory allocation failure Jun 17 08:54:52 si-... /usr/local/sbin/openser[13921]: ERROR:core:handle_new_connect: tcpconn_new failed, closing socket And a few of these also (7613 times): Jun 17 08:57:24 si-... /usr/local/sbin/openser[13880]: ERROR:core:tls_accept: some error in SSL: Jun 17 08:57:24 si-... /usr/local/sbin/openser[13880]: ERROR:core:tls_print_errstack: error:1409C041:SSL routines:SSL3_SETUP_BUFFERS:malloc failure
This are caused from insufficient memory conditions. I can't comment on the TCP and TLS errors. But before really starting to investigate this problem, would it be possible for you to use a more recent version, e.g. kamailio 1.5.1 for testing?
- TCP errors, lost SIP messages
Examples from error messages: 14.100 times in log file from 17.06.09 Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]: ERROR:core:tcp_blocking_connect: poll error: flags 18 Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]: ERROR:core:tcp_blocking_connect: failed to retrieve SO_ERROR (111) Connection refused Jun 17 04:03:15
si-...
/usr/local/sbin/openser[13863]: ERROR:core:tcpconn_connect: tcp_blocking_connect failed Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]: ERROR:core:tcp_send: connect failed
Jun 17
04:03:15 si-.. /usr/local/sbin/openser[13863]: ERROR:tm:msg_send:
tcp_send
failed Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]: ERROR:tm:t_forward_nonack: sending request failed
Appears at least 20 000 times; and in the day of the last shared memory errors, it was 225.794 times in the log file (note that the number in parenthesis is usually 1 or 2, but on that day it has reached 6): Jun 17 09:01:27 si-.... /usr/local/sbin/openser[13921]: WARNING:core:send2child: no free tcp receiver, connection passed to the leastbusy one (6) Jun 17 09:01:27 si-... /usr/local/sbin/openser[13921]:
WARNING:core:send2child: no
free tcp receiver, connection passed to the leastbusy one (5)
You should know that TCP/TLS is blocking in openser - that means that during TCP sending or TCP connection setup the process which processes the SIP message is blocked.
Thus, to avoid long blocking reduce the timers as suggested.
Further, make sure that TCP connections are stable (don't close them) - they should be open all the time - further the connections should be established by the clients.
Which phones do you use?
Use fix_nated_contact and fix_nated_register to achieve that the proxy sends replies and in-dialog requests via the existing TCP connection.
One more: sip-router has lots of TCP improvements compared to openser core. E.g. this feature is useful if the clients are behind NAT/FW and the proxy should not even try to establish a TCP/TLS connection the clients: http://sip-router.org/wiki/cookbooks/core-cookbook/devel#tcp_no_connect
- Certificate validation problems
TCP traffic is currently significantly increased by some ( appr. 70) clients which failed to validate the TLS certificate. Registration is repeated every 5 sec.
Circa 30 thousand per day (on that day, it was 37.162 times in log) Jun 17 04:03:10 si-024lc008 /usr/local/sbin/openser[13801]: ERROR:core:tls_accept: some error in SSL: Jun 17 04:03:10 si-024lc008 /usr/local/sbin/openser[13801]: ERROR:core:tls_print_errstack: error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca
Seems to be a problem in the clients. Import the root CA certificate into the clients.
regards klaus
Hi everybody,
regarding our TCP/TLS stability problems we have no decided to make test with kamailio 1.5.1 Nevertheless it would be interesting if there is a chance to get rid of this problems.
Is anybody using TLS?
Used modules: SNMP, mySQL
Summary of problems Errors may be related to the following log file entries
un 17 09:01:27 si-.... /usr/local/sbin/openser[13921]: WARNING:core:send2child: no free tcp receiver, connection passed to the leastbusy one (6)
Jun 17 08:54:52 si-.... /usr/local/sbin/openser[13921]: ERROR:core:tcpconn_new: shared memory allocation failure
Jun 17 08:54:52 si-... /usr/local/sbin/openser[13921]: ERROR:core:handle_new_connect: tcpconn_new failed, closing socket
And a few of these also (7613 times):
Jun 17 08:57:24 si-... /usr/local/sbin/openser[13880]: ERROR:core:tls_accept: some error in SSL:
Jun 17 08:57:24 si-... /usr/local/sbin/openser[13880]: ERROR:core:tls_print_errstack: error:1409C041:SSL routines:SSL3_SETUP_BUFFERS:malloc failure
shared memory consumption shared memory is continously increasing (set to 1024) PKG_MEM is 1 MB
high CPU load for some openser processes normally after some days we get a high CPU load (50-90%) for a small number of the openser processes It looks like an endless loop and requires restart of openser There may be an endless loop in
Pass_fd.c
again:
ret=sendmsg(unix_socket, &msg, 0);
if (ret<0){
if (errno==EINTR) goto again;
LM_CRIT("sendmsg failed on %d: %s\n", unix_socket, strerror(errno));
}
any comments on that?
Mit besten Grüßen | Best regards Albert Munder Robert Bosch GmbH IT Systems Engineering (CI/ISE) Postfach 30 02 20 70442 Stuttgart GERMANY www.bosch.com Tel. +49 711 811-40562 Fax +49 711 811-5113333 Albert.Munder@de.bosch.com Robert Bosch GmbH, Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart HRB 14000 Aufsichtsratsvorsitzender: Hermann Scholl; Geschäftsführung: Franz Fehrenbach, Siegfried Dais; Bernd Bohr, Wolfgang Chur, Rudolf Colm, Gerhard Kümmel, Wolfgang Malchow, Peter Marks; Volkmar Denner, Peter Tyroller.
________________________________ Von: Henning Westerholt [mailto:henning.westerholt@1und1.de] Gesendet: Dienstag, 30. Juni 2009 17:25 An: users@lists.kamailio.org Cc: Munder Albert (CI/ISE) Betreff: Re: [Kamailio-Users] OpenSER stability problems in pilot project
On Dienstag, 30. Juni 2009, Munder Albert (CI/ISE) wrote:
[..] We are running OpenSER in a pilot project and unfortunately have some stability problems.
Hallo Albert,
- Appr. 5000 subscriber accounts
- Appr. 1200 simultaneously registered users
- Signalling encrypted with TLS
- Media data encrypted with SRTP
- Clients: softphones and hardphones
- Re-registration time for clients: 3600 sec
I've not that much experience with TCP, but don't think that this numbers should be a problem in a setup like this.
OpenSER configuration · Works as stateful SIP Proxy 1 mySQL database 2 Version 1.3.4.-TLS 3 Tcp_children: 100 --> is it recommended to increase this number?
This are quite a lot of children, but ok.
4 Udp_children: 20 5 Tcp_connection_timeout: 3600 6 Shared memory: · -m 512 when error occurred 1 Now set to 1024
How much PKG_MEM do you use? The default value?
Problems
- Shared memory consumption
Shared memory usage is permanently increasing (about 50 MB per day) Application already crashed twice
This could be a memory leak, what modules do you use? And do you use any proprietary modules? You could use the memory debugging to further investigate this: http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:memory
First messages were, these, repeated thousands of times (5915 times): Jun 17 08:54:52 si-.... /usr/local/sbin/openser[13921]: ERROR:core:tcpconn_new: shared memory allocation failure Jun 17 08:54:52 si-... /usr/local/sbin/openser[13921]: ERROR:core:handle_new_connect: tcpconn_new failed, closing socket And a few of these also (7613 times): Jun 17 08:57:24 si-... /usr/local/sbin/openser[13880]: ERROR:core:tls_accept: some error in SSL: Jun 17 08:57:24 si-... /usr/local/sbin/openser[13880]: ERROR:core:tls_print_errstack: error:1409C041:SSL routines:SSL3_SETUP_BUFFERS:malloc failure
This are caused from insufficient memory conditions. I can't comment on the TCP and TLS errors. But before really starting to investigate this problem, would it be possible for you to use a more recent version, e.g. kamailio 1.5.1 for testing?
- TCP errors, lost SIP messages
Examples from error messages: 14.100 times in log file from 17.06.09 Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]: ERROR:core:tcp_blocking_connect: poll error: flags 18 Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]: ERROR:core:tcp_blocking_connect: failed to retrieve SO_ERROR (111) Connection refused Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]: ERROR:core:tcpconn_connect: tcp_blocking_connect failed Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]: ERROR:core:tcp_send: connect failed Jun 17 04:03:15 si-.. /usr/local/sbin/openser[13863]: ERROR:tm:msg_send: tcp_send failed Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]: ERROR:tm:t_forward_nonack: sending request failed
Appears at least 20 000 times; and in the day of the last shared memory errors, it was 225.794 times in the log file (note that the number in parenthesis is usually 1 or 2, but on that day it has reached 6): Jun 17 09:01:27 si-.... /usr/local/sbin/openser[13921]: WARNING:core:send2child: no free tcp receiver, connection passed to the leastbusy one (6) Jun 17 09:01:27 si-... /usr/local/sbin/openser[13921]: WARNING:core:send2child: no free tcp receiver, connection passed to the leastbusy one (5)
- Certificate validation problems
TCP traffic is currently significantly increased by some ( appr. 70) clients which failed to validate the TLS certificate. Registration is repeated every 5 sec.
Circa 30 thousand per day (on that day, it was 37.162 times in log) Jun 17 04:03:10 si-024lc008 /usr/local/sbin/openser[13801]: ERROR:core:tls_accept: some error in SSL: Jun 17 04:03:10 si-024lc008 /usr/local/sbin/openser[13801]: ERROR:core:tls_print_errstack: error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca
Best regards,
Henning
Munder Albert (CI/ISE) schrieb:
Hi everybody,
regarding our TCP/TLS stability problems we have no decided to make test with kamailio 1.5.1 Nevertheless it would be interesting if there is a chance to get rid of this problems.
Is anybody using TLS?
Used modules: SNMP, mySQL
Summary of problems Errors may be related to the following log file entries
un 17 09:01:27 si-…. /usr/local/sbin/openser[13921]: WARNING:core:send2child: no free tcp receiver, connection passed to the leastbusy one (6)
That means that all of the tcp workers are currently busy by having connections assigned to it. That does not mean, that the worker process is really busy.
Jun 17 08:54:52 si-…. /usr/local/sbin/openser[13921]: ERROR:core:tcpconn_new: shared memory allocation failure Jun 17 08:54:52 si-… /usr/local/sbin/openser[13921]: ERROR:core:handle_new_connect: tcpconn_new failed, closing socket
You are running out of shared memory. Either you allocate too much or there is somewhere a memory leak.
Please debug according to the following howto: http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:memory
And a few of these also (7613 times):
Jun 17 08:57:24 si-… /usr/local/sbin/openser[13880]: ERROR:core:tls_accept: some error in SSL:
Jun 17 08:57:24 si-… /usr/local/sbin/openser[13880]: ERROR:core:tls_print_errstack: error:1409C041:SSL routines:SSL3_SETUP_BUFFERS:malloc failure
openssl is running out of memory. openssl does not use openser's memory manager but uses the standard OS malloc.
MAybe there are so many TCP/TLS connections that you run out of memory? Strange.
*shared memory consumption* shared memory is continously increasing (set to 1024)
What do you mean with "continously increasing". Openser's memory manager allocates the memory for shared memory during startup. During runtime, openser's shared memory stays constant.
If you experience increasing shared memory then this must be caused from standard OS malloc which is used by other libraries (e.g. openssl, libxml, mysqlclient, ...)
In this case there can be a bug in the library itself or openser uses the library in a wrong way.
regards Klaus
PKG_MEM is 1 MB
*high CPU load for some openser processes* normally after some days we get a high CPU load (50-90%) for a small number of the openser processes It looks like an endless loop and requires restart of openser There may be an endless loop in
Pass_fd.c
again:
ret=sendmsg(unix_socket, &msg, 0);
if (ret<0){
if (errno==EINTR) goto again;
LM_CRIT("sendmsg failed on %d: %s\n", unix_socket, strerror(errno));
}
any comments on that?
Mit besten Grüßen | Best regards *Albert Munder* Robert Bosch GmbH IT Systems Engineering (CI/ISE) Postfach 30 02 20 70442 Stuttgart GERMANY www.bosch.com Tel. +49 711 811-40562 Fax +49 711 811-5113333 ___Albert.Munder@de.bosch.com_ Robert Bosch GmbH, Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart HRB 14000 Aufsichtsratsvorsitzender: Hermann Scholl; Geschäftsführung: Franz Fehrenbach, Siegfried Dais; Bernd Bohr, Wolfgang Chur, Rudolf Colm, Gerhard Kümmel, Wolfgang Malchow, Peter Marks; Volkmar Denner, Peter Tyroller.
*Von:* Henning Westerholt [mailto:henning.westerholt@1und1.de] *Gesendet:* Dienstag, 30. Juni 2009 17:25 *An:* users@lists.kamailio.org *Cc:* Munder Albert (CI/ISE) *Betreff:* Re: [Kamailio-Users] OpenSER stability problems in pilot project
On Dienstag, 30. Juni 2009, Munder Albert (CI/ISE) wrote:
[..] We are running OpenSER in a pilot project and unfortunately have some stability problems.
Hallo Albert,
- Appr. 5000 subscriber accounts
- Appr. 1200 simultaneously registered users
- Signalling encrypted with TLS
- Media data encrypted with SRTP
- Clients: softphones and hardphones
- Re-registration time for clients: 3600 sec
I've not that much experience with TCP, but don't think that this numbers should be a problem in a setup like this.
OpenSER configuration · Works as stateful SIP Proxy 1 mySQL database 2 Version 1.3.4.-TLS 3 Tcp_children: 100 --> is it recommended to increase this number?
This are quite a lot of children, but ok.
4 Udp_children: 20 5 Tcp_connection_timeout: 3600 6 Shared memory: · -m 512 when error occurred 1 Now set to 1024
How much PKG_MEM do you use? The default value?
Problems
- Shared memory consumption
Shared memory usage is permanently increasing (about 50 MB per day) Application already crashed twice
This could be a memory leak, what modules do you use? And do you use any proprietary modules? You could use the memory debugging to further investigate this: http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:memory
First messages were, these, repeated thousands of times (5915 times): Jun 17 08:54:52 si-.... /usr/local/sbin/openser[13921]: ERROR:core:tcpconn_new: shared memory allocation failure Jun 17 08:54:52 si-... /usr/local/sbin/openser[13921]: ERROR:core:handle_new_connect: tcpconn_new failed, closing socket And a few of these also (7613 times): Jun 17 08:57:24 si-... /usr/local/sbin/openser[13880]: ERROR:core:tls_accept: some error in SSL: Jun 17 08:57:24 si-... /usr/local/sbin/openser[13880]: ERROR:core:tls_print_errstack: error:1409C041:SSL routines:SSL3_SETUP_BUFFERS:malloc failure
This are caused from insufficient memory conditions. I can't comment on the TCP and TLS errors. But before really starting to investigate this problem, would it be possible for you to use a more recent version, e.g. kamailio 1.5.1 for testing?
- TCP errors, lost SIP messages
Examples from error messages: 14.100 times in log file from 17.06.09 Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]: ERROR:core:tcp_blocking_connect: poll error: flags 18 Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]: ERROR:core:tcp_blocking_connect: failed to retrieve SO_ERROR (111) Connection refused Jun 17 04:03:15
si-...
/usr/local/sbin/openser[13863]: ERROR:core:tcpconn_connect: tcp_blocking_connect failed Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]: ERROR:core:tcp_send: connect failed
Jun 17
04:03:15 si-.. /usr/local/sbin/openser[13863]: ERROR:tm:msg_send:
tcp_send
failed Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]: ERROR:tm:t_forward_nonack: sending request failed
Appears at least 20 000 times; and in the day of the last shared memory errors, it was 225.794 times in the log file (note that the number in parenthesis is usually 1 or 2, but on that day it has reached 6): Jun 17 09:01:27 si-.... /usr/local/sbin/openser[13921]: WARNING:core:send2child: no free tcp receiver, connection passed to the leastbusy one (6) Jun 17 09:01:27 si-... /usr/local/sbin/openser[13921]:
WARNING:core:send2child: no
free tcp receiver, connection passed to the leastbusy one (5)
- Certificate validation problems
TCP traffic is currently significantly increased by some ( appr. 70) clients which failed to validate the TLS certificate. Registration is repeated every 5 sec.
Circa 30 thousand per day (on that day, it was 37.162 times in log) Jun 17 04:03:10 si-024lc008 /usr/local/sbin/openser[13801]: ERROR:core:tls_accept: some error in SSL: Jun 17 04:03:10 si-024lc008 /usr/local/sbin/openser[13801]: ERROR:core:tls_print_errstack: error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca
Best regards,
Henning
Kamailio (OpenSER) - Users mailing list Users@lists.kamailio.org http://lists.kamailio.org/cgi-bin/mailman/listinfo/users http://lists.openser-project.org/cgi-bin/mailman/listinfo/users