[Kamailio-Users] OpenSER stability problems in pilot project

Henning Westerholt henning.westerholt at 1und1.de
Tue Jun 30 17:25:14 CEST 2009


On Dienstag, 30. Juni 2009, Munder Albert (CI/ISE) wrote:
> [..]
> We are running OpenSER in a pilot project and
> unfortunately have some stability problems. 

Hallo Albert,

> *       Appr. 5000 subscriber accounts
> *       Appr. 1200 simultaneously registered users
> *       Signalling encrypted with TLS
> *       Media data encrypted with SRTP
> *       Clients: softphones and hardphones
> *       Re-registration time for clients:       3600 sec

I've not that much experience with TCP, but don't think that this numbers 
should be a problem in a setup like this.

> OpenSER configuration
> ·       Works as stateful SIP Proxy
> 1       mySQL database
> 2       Version 1.3.4.-TLS
> 3       Tcp_children:   100 --> is it recommended to increase this number?

This are quite a lot of children, but ok.

> 4       Udp_children:   20
> 5       Tcp_connection_timeout: 3600
> 6       Shared memory:
> ·       -m 512 when error occurred
> 1       Now set to 1024

How much PKG_MEM do you use? The default value?

> Problems
> *       Shared memory consumption
> Shared memory usage is permanently increasing (about 50 MB per day)
> Application already crashed twice

This could be a memory leak, what modules do you use? And do you use any 
proprietary modules? You could use the memory debugging to further investigate 
this: http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:memory

> First messages were, these, repeated thousands of times (5915 times):
> Jun 17 08:54:52 si-.... /usr/local/sbin/openser[13921]:
> ERROR:core:tcpconn_new: shared memory allocation failure Jun 17 08:54:52
> si-... /usr/local/sbin/openser[13921]: ERROR:core:handle_new_connect:
> tcpconn_new failed, closing socket And a few of these also (7613 times):
> Jun 17 08:57:24 si-... /usr/local/sbin/openser[13880]:
> ERROR:core:tls_accept: some error in SSL: Jun 17 08:57:24 si-...
> /usr/local/sbin/openser[13880]: ERROR:core:tls_print_errstack:
> error:1409C041:SSL routines:SSL3_SETUP_BUFFERS:malloc failure

This are caused from insufficient memory conditions. I can't comment on the 
TCP and TLS errors. But before really starting to investigate this problem, 
would it be possible for you to use a more recent version, e.g. kamailio 1.5.1 
for testing?

> *       TCP errors, lost SIP messages
>
> Examples from error messages:
> 14.100 times in log file from 17.06.09
> Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]:
> ERROR:core:tcp_blocking_connect: poll error: flags 18 Jun 17 04:03:15
> si-... /usr/local/sbin/openser[13863]: ERROR:core:tcp_blocking_connect:
> failed to retrieve SO_ERROR (111) Connection refused Jun 17 04:03:15 si-...
> /usr/local/sbin/openser[13863]: ERROR:core:tcpconn_connect:
> tcp_blocking_connect failed Jun 17 04:03:15 si-...
> /usr/local/sbin/openser[13863]: ERROR:core:tcp_send: connect failed Jun 17
> 04:03:15 si-.. /usr/local/sbin/openser[13863]: ERROR:tm:msg_send: tcp_send
> failed Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]:
> ERROR:tm:t_forward_nonack: sending request failed
>
> Appears at least 20 000  times; and in the day of the last shared memory
> errors, it was 225.794 times in the log file (note that the number in
> parenthesis is usually 1 or 2, but on that day it has reached 6): Jun 17
> 09:01:27 si-.... /usr/local/sbin/openser[13921]: WARNING:core:send2child:
> no free tcp receiver, connection passed to the leastbusy one (6) Jun 17
> 09:01:27 si-... /usr/local/sbin/openser[13921]: WARNING:core:send2child: no
> free tcp receiver, connection passed to the leastbusy one (5)
>
> *       Certificate validation problems
> TCP traffic is currently significantly increased by some ( appr. 70)
> clients which failed to validate the TLS certificate. Registration is
> repeated every 5 sec.
>
> Circa 30 thousand per day (on that day, it was 37.162 times in log)
> Jun 17 04:03:10 si-024lc008 /usr/local/sbin/openser[13801]:
> ERROR:core:tls_accept: some error in SSL: Jun 17 04:03:10 si-024lc008
> /usr/local/sbin/openser[13801]: ERROR:core:tls_print_errstack:
> error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca

Best regards,

Henning
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sip-router.org/pipermail/sr-users/attachments/20090630/6c3a808e/attachment.htm>


More information about the sr-users mailing list