On Dienstag, 30. Juni 2009, Munder Albert (CI/ISE) wrote:
> [..]
> We are running OpenSER in a pilot project and
> unfortunately have some stability problems.


Hallo Albert,


> * Appr. 5000 subscriber accounts
> * Appr. 1200 simultaneously registered users
> * Signalling encrypted with TLS
> * Media data encrypted with SRTP
> * Clients: softphones and hardphones
> * Re-registration time for clients: 3600 sec


I've not that much experience with TCP, but don't think that this numbers should be a problem in a setup like this.


> OpenSER configuration
> · Works as stateful SIP Proxy
> 1 mySQL database
> 2 Version 1.3.4.-TLS
> 3 Tcp_children: 100 --> is it recommended to increase this number?


This are quite a lot of children, but ok.


> 4 Udp_children: 20
> 5 Tcp_connection_timeout: 3600
> 6 Shared memory:
> · -m 512 when error occurred
> 1 Now set to 1024


How much PKG_MEM do you use? The default value?


> Problems
> * Shared memory consumption
> Shared memory usage is permanently increasing (about 50 MB per day)
> Application already crashed twice


This could be a memory leak, what modules do you use? And do you use any proprietary modules? You could use the memory debugging to further investigate this: http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:memory


> First messages were, these, repeated thousands of times (5915 times):
> Jun 17 08:54:52 si-.... /usr/local/sbin/openser[13921]:
> ERROR:core:tcpconn_new: shared memory allocation failure Jun 17 08:54:52
> si-... /usr/local/sbin/openser[13921]: ERROR:core:handle_new_connect:
> tcpconn_new failed, closing socket And a few of these also (7613 times):
> Jun 17 08:57:24 si-... /usr/local/sbin/openser[13880]:
> ERROR:core:tls_accept: some error in SSL: Jun 17 08:57:24 si-...
> /usr/local/sbin/openser[13880]: ERROR:core:tls_print_errstack:
> error:1409C041:SSL routines:SSL3_SETUP_BUFFERS:malloc failure


This are caused from insufficient memory conditions. I can't comment on the TCP and TLS errors. But before really starting to investigate this problem, would it be possible for you to use a more recent version, e.g. kamailio 1.5.1 for testing?


> * TCP errors, lost SIP messages
>
> Examples from error messages:
> 14.100 times in log file from 17.06.09
> Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]:
> ERROR:core:tcp_blocking_connect: poll error: flags 18 Jun 17 04:03:15
> si-... /usr/local/sbin/openser[13863]: ERROR:core:tcp_blocking_connect:
> failed to retrieve SO_ERROR (111) Connection refused Jun 17 04:03:15 si-...
> /usr/local/sbin/openser[13863]: ERROR:core:tcpconn_connect:
> tcp_blocking_connect failed Jun 17 04:03:15 si-...
> /usr/local/sbin/openser[13863]: ERROR:core:tcp_send: connect failed Jun 17
> 04:03:15 si-.. /usr/local/sbin/openser[13863]: ERROR:tm:msg_send: tcp_send
> failed Jun 17 04:03:15 si-... /usr/local/sbin/openser[13863]:
> ERROR:tm:t_forward_nonack: sending request failed
>
> Appears at least 20 000 times; and in the day of the last shared memory
> errors, it was 225.794 times in the log file (note that the number in
> parenthesis is usually 1 or 2, but on that day it has reached 6): Jun 17
> 09:01:27 si-.... /usr/local/sbin/openser[13921]: WARNING:core:send2child:
> no free tcp receiver, connection passed to the leastbusy one (6) Jun 17
> 09:01:27 si-... /usr/local/sbin/openser[13921]: WARNING:core:send2child: no
> free tcp receiver, connection passed to the leastbusy one (5)
>
> * Certificate validation problems
> TCP traffic is currently significantly increased by some ( appr. 70)
> clients which failed to validate the TLS certificate. Registration is
> repeated every 5 sec.
>
> Circa 30 thousand per day (on that day, it was 37.162 times in log)
> Jun 17 04:03:10 si-024lc008 /usr/local/sbin/openser[13801]:
> ERROR:core:tls_accept: some error in SSL: Jun 17 04:03:10 si-024lc008
> /usr/local/sbin/openser[13801]: ERROR:core:tls_print_errstack:
> error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca


Best regards,


Henning