Hi!
With kamailio 3.0 I encounter lots of crashes. I am using SNOM 320 and eyebeam clients. I think crashes are mostly correlated with SNOM 320.
Server is CentOS5.4. OpenSSL is openssl-0.9.8e-12.el5_4.1.
First crashed Kamailio always with the same backtrace:
Program terminated with signal 11, Segmentation fault. [New process 1580] #0 0x0017451d in kssl_keytab_is_available () from /lib/libssl.so.6 (gdb) bt #0 0x0017451d in kssl_keytab_is_available () from /lib/libssl.so.6 #1 0x00156d01 in ssl3_choose_cipher () from /lib/libssl.so.6 #2 0x00151b45 in ssl3_get_client_hello () from /lib/libssl.so.6 #3 0x00152525 in ssl3_accept () from /lib/libssl.so.6 #4 0x00167eda in SSL_accept () from /lib/libssl.so.6 #5 0x004a6cff in tls_accept (c=0xb60ff4a0, error=0x0) at tls_server.c:327 #6 0x004ac15b in tls_h_fix_read_conn (c=0xb60ff4a0) at tls_server.c:1005 #7 0x0813d146 in tcp_read_req (con=0xb60ff4a0, bytes_read=0xbfe248b4, read_flags=0xbfe248b0) at tcp_read.c:654 #8 0x0813d8ef in handle_io (fm=0x82aa9d0, events=1, idx=-1) at tcp_read.c:930 #9 0x08141513 in tcp_receive_loop (unix_sock=27) at io_wait.h:1057 #10 0x0810fc1b in tcp_init_children () at tcp_main.c:4253 #11 0x0809ae69 in main_loop () at main.c:1525 #12 0x0809bc02 in main (argc=1, argv=0xbfe24cc4) at main.c:2251 (gdb) quit
Then I configured RSA ciphers (as suggested by others) and now I get different backtraces. Here is an example where Kamailio wrote 2 core files:
1. Program terminated with signal 11, Segmentation fault. [New process 1735] #0 0x004d01d3 in free_hash_table () at h_table.c:423 423 clist_foreach_safe(&_tm_table->entries[i], p_cell, tmp_cell, (gdb) bt #0 0x004d01d3 in free_hash_table () at h_table.c:423 #1 0x004de1a0 in tm_shutdown () at t_funcs.c:126 #2 0x08104ef2 in destroy_modules () at sr_module.c:635 #3 0x08098400 in cleanup (show_status=1) at main.c:509 #4 0x0809912e in shutdown_children (sig=<value optimized out>, show_status=1) at main.c:649 #5 0x08099aa4 in handle_sigs () at main.c:740 #6 0x0809aab9 in main_loop () at main.c:1562 #7 0x0809bc02 in main (argc=1, argv=0xbfc70b74) at main.c:2251
2. Program terminated with signal 11, Segmentation fault. [New process 1759] #0 0x004f24ce in t_reply_matching (p_msg=0x8298cf0, p_branch=0xbfc70424) at t_lookup.c:983 983 if (p_cell->label != entry_label) (gdb) bt #0 0x004f24ce in t_reply_matching (p_msg=0x8298cf0, p_branch=0xbfc70424) at t_lookup.c:983 #1 0x004f5559 in t_check_msg (p_msg=0x8298cf0, param_branch=0xbfc70424) at t_lookup.c:1138 #2 0x004f5e94 in t_check (p_msg=0x8298cf0, param_branch=0xbfc70424) at t_lookup.c:1180 #3 0x005140d9 in reply_received (p_msg=0x8298cf0) at t_reply.c:1897 #4 0x0808c764 in forward_reply (msg=0x8298cf0) at forward.c:689 #5 0x080c401e in receive_msg ( buf=0xb60fe088 "SIP/2.0 200 Ok\r\nVia: SIP/2.0/TLS 83.136.32.167:5061;branch=z9hG4bKcc38.0a7d6be7.0;i=2\r\nVia: SIP/2.0/TLS 10.10.0.51:40487;received=83.136.33.3;branch=z9hG4bK-d8754z-7b0f1727475bda32-1---d8754z-;rport=2"..., len=1150, rcv_info=0xb60fded4) at receive.c:257 #6 0x0813cf41 in tcp_read_req (con=0xb60fdec0, bytes_read=0xbfc70768, read_flags=0xbfc70760) at tcp_read.c:761 #7 0x0813da0b in handle_io (fm=0x82aa708, events=1, idx=-1) at tcp_read.c:980 #8 0x08141513 in tcp_receive_loop (unix_sock=25) at io_wait.h:1057 #9 0x0810fc1b in tcp_init_children () at tcp_main.c:4253 #10 0x0809ae69 in main_loop () at main.c:1525 #11 0x0809bc02 in main (argc=1, argv=0xbfc70b74) at main.c:2251
Attached is the log file.
I can provided intensive logs (memdbg) too.
thanks klaus
On Feb 22, 2010 at 19:16, Klaus Darilion klaus.mailinglists@pernau.at wrote:
Hi!
With kamailio 3.0 I encounter lots of crashes. I am using SNOM 320 and eyebeam clients. I think crashes are mostly correlated with SNOM 320.
Server is CentOS5.4. OpenSSL is openssl-0.9.8e-12.el5_4.1.
First crashed Kamailio always with the same backtrace:
Program terminated with signal 11, Segmentation fault. [New process 1580] #0 0x0017451d in kssl_keytab_is_available () from /lib/libssl.so.6 (gdb) bt #0 0x0017451d in kssl_keytab_is_available () from /lib/libssl.so.6
This one looks like some openssl bug. What's strange is that it's in some kerberos code, which is supposed to be disabled. Could you try with another openssl version, or even better compile openssl with debug info and link with it?
What exactly did you need to do for reproducing it with a snom (any settings you think there might be relevant both on the phone and in the tls handling in the script)?
[...]
Then I configured RSA ciphers (as suggested by others) and now I get different backtraces. Here is an example where Kamailio wrote 2 core files:
- Program terminated with signal 11, Segmentation fault.
[New process 1735] #0 0x004d01d3 in free_hash_table () at h_table.c:423
This crashed on shutdown for some reason (corrupted list?).
[...]
- Program terminated with signal 11, Segmentation fault.
[New process 1759] #0 0x004f24ce in t_reply_matching (p_msg=0x8298cf0, p_branch=0xbfc70424) at t_lookup.c:983 983 if (p_cell->label != entry_label)
Hmm, this looks like a corrupted transaction list. It's hard to say if it's caused by tls or is something unrelated. Could you print p_cell and p_cell->label in gdb?
(gdb) bt #0 0x004f24ce in t_reply_matching (p_msg=0x8298cf0, p_branch=0xbfc70424) at t_lookup.c:983 #1 0x004f5559 in t_check_msg (p_msg=0x8298cf0, param_branch=0xbfc70424) at t_lookup.c:1138 #2 0x004f5e94 in t_check (p_msg=0x8298cf0, param_branch=0xbfc70424) at t_lookup.c:1180 #3 0x005140d9 in reply_received (p_msg=0x8298cf0) at t_reply.c:1897 #4 0x0808c764 in forward_reply (msg=0x8298cf0) at forward.c:689 #5 0x080c401e in receive_msg ( buf=0xb60fe088 "SIP/2.0 200 Ok\r\nVia: SIP/2.0/TLS 83.136.32.167:5061;branch=z9hG4bKcc38.0a7d6be7.0;i=2\r\nVia: SIP/2.0/TLS 10.10.0.51:40487;received=83.136.33.3;branch=z9hG4bK-d8754z-7b0f1727475bda32-1---d8754z-;rport=2"..., len=1150, rcv_info=0xb60fded4) at receive.c:257 #6 0x0813cf41 in tcp_read_req (con=0xb60fdec0, bytes_read=0xbfc70768, read_flags=0xbfc70760) at tcp_read.c:761 #7 0x0813da0b in handle_io (fm=0x82aa708, events=1, idx=-1) at tcp_read.c:980 #8 0x08141513 in tcp_receive_loop (unix_sock=25) at io_wait.h:1057 #9 0x0810fc1b in tcp_init_children () at tcp_main.c:4253 #10 0x0809ae69 in main_loop () at main.c:1525 #11 0x0809bc02 in main (argc=1, argv=0xbfc70b74) at main.c:2251
Andrei
On Feb 22, 2010 at 20:00, Andrei Pelinescu-Onciul andrei@iptel.org wrote:
On Feb 22, 2010 at 19:16, Klaus Darilion klaus.mailinglists@pernau.at wrote:
Hi!
With kamailio 3.0 I encounter lots of crashes. I am using SNOM 320 and eyebeam clients. I think crashes are mostly correlated with SNOM 320.
Server is CentOS5.4. OpenSSL is openssl-0.9.8e-12.el5_4.1.
First crashed Kamailio always with the same backtrace:
Program terminated with signal 11, Segmentation fault. [New process 1580] #0 0x0017451d in kssl_keytab_is_available () from /lib/libssl.so.6 (gdb) bt #0 0x0017451d in kssl_keytab_is_available () from /lib/libssl.so.6
This one looks like some openssl bug. What's strange is that it's in some kerberos code, which is supposed to be disabled.
It seems there is a code path inside openssl, for which the old disable kerberos trick does not work and causes crashes instead. Probably we haven't seen it before because we didn't tried a client that advertised KRB5 key exchange and an openssl compiled with kerberos support. The proper fix is to upgrade to a newer openssl (real 0.9.8e or above). For people stuck with older openssl versions, I will try to find a way to disable kerberos in a better way.
Could you try in the same configuration (no RSA workaround on the phone), but with the following tls module options:
modparam("tls", "cipher_list", "DEFAULT:!KRB5")
If this works, please try also the RSA stuff on the phone and see if it still crashes in that case (although that case might be unrelated). If everything is ok I'll add it to the code, so that it will be done automatically for openssl < 0.9.8e release.
Note that another very strange thing is the version number.
From your log:
INFO: tls [tls_init.c:486]: tls: _init_tls_h: compiled with openssl version "OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008" (0x0090802f), kerberos ^^^^^^^ ^^^^^^^^^^
0x0090802f means openssl 0.9.8b (release) and not 0.9.8e. It looks like OPENSSL_VERSION_NUMBER is different from OPENSSL_VERSION_TEXT in you include/openssl/opensslv.h. Out o curiosity could you verify that that's the case?
Andrei
Some analysis ...
Am 22.02.2010 22:21, schrieb Andrei Pelinescu-Onciul:
Note that another very strange thing is the version number. From your log: INFO: tls [tls_init.c:486]: tls: _init_tls_h: compiled with openssl version "OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008" (0x0090802f), kerberos ^^^^^^^ ^^^^^^^^^^
0x0090802f means openssl 0.9.8b (release) and not 0.9.8e. It looks like OPENSSL_VERSION_NUMBER is different from OPENSSL_VERSION_TEXT in you include/openssl/opensslv.h. Out o curiosity could you verify that that's the case?
# cat /etc/redhat-release CentOS release 5.4 (Final)
# rpm -qa|grep ssl openssl-devel-0.9.8e-12.el5_4.1 openssl-0.9.8e-12.el5_4.1
extract from /usr/include/openssl/opensslv.h:
#define OPENSSL_VERSION_NUMBER 0x0090802fL #ifdef OPENSSL_FIPS #define OPENSSL_VERSION_TEXT "OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008" #else #define OPENSSL_VERSION_TEXT "OpenSSL 0.9.8e-rhel5 01 Jul 2008" #endif #define OPENSSL_VERSION_PTEXT " part of " OPENSSL_VERSION_TEXT
really strange. I downloaded and extracted RH openssl-0.9.8e-12.el5_4.1.src.rpm and greped for "OPENSSL_VERSION":
openssl-fips-0.9.8e-abi.patch:-#define OPENSSL_VERSION_NUMBER 0x00908060L openssl-fips-0.9.8e-abi.patch:+#define OPENSSL_VERSION_NUMBER 0x0090802fL openssl-fips-0.9.8e-abi.patch:-#define OPENSSL_VERSION_TEXT "OpenSSL 0.9.8f-fips-dev xx XXXX xxxx" openssl-fips-0.9.8e-abi.patch:+#define OPENSSL_VERSION_TEXT "OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008"
Further the SRPM contains openssl-fips-0.9.8e-usa.tar.bz2 which contains a opensslv.h with:
#define OPENSSL_VERSION_NUMBER 0x00908060L #ifdef OPENSSL_FIPS #define OPENSSL_VERSION_TEXT "OpenSSL 0.9.8f-fips-dev xx XXXX xxxx" #else #define OPENSSL_VERSION_TEXT "OpenSSL 0.9.8f-dev xx XXXX xxxx" #endif #define OPENSSL_VERSION_PTEXT " part of " OPENSSL_VERSION_TEXT
Looks like RH is using a 0.9.8f-dev version, which is named 0.9.8e, and numbered like 0.9.8b - maybe for compatibility reasons with earlier RHEL5 releases.
regards klaus
Andrei Pelinescu-Onciul wrote:
On Feb 22, 2010 at 19:16, Klaus Darilion klaus.mailinglists@pernau.at wrote:
Hi!
With kamailio 3.0 I encounter lots of crashes. I am using SNOM 320 and eyebeam clients. I think crashes are mostly correlated with SNOM 320.
Server is CentOS5.4. OpenSSL is openssl-0.9.8e-12.el5_4.1.
First crashed Kamailio always with the same backtrace:
Program terminated with signal 11, Segmentation fault. [New process 1580] #0 0x0017451d in kssl_keytab_is_available () from /lib/libssl.so.6 (gdb) bt #0 0x0017451d in kssl_keytab_is_available () from /lib/libssl.so.6
This one looks like some openssl bug. What's strange is that it's in some kerberos code, which is supposed to be disabled. Could you try with another openssl version, or even better compile openssl with debug info and link with it?
I also guess this is something RH/CentOS related. Rebuilding openssl myself could help, but the thing is that this is current CentOS/Redhat which is used rather often.
klaus
Hi Andrei!
Am 22.02.2010 20:00, schrieb Andrei Pelinescu-Onciul:
What exactly did you need to do for reproducing it with a snom (any settings you think there might be relevant both on the phone and in the tls handling in the script)?
I think the script is not relevant. Seems like it crashes immediately with the first TLS request before entering script.
[1323]: DEBUG: <core> [ip_addr.c:116]: tcpconn_new: new tcp connection: 83.136.33.3 [1323]: DEBUG: <core> [tcp_main.c:1052]: tcpconn_new: on port 2239, type 3 [1323]: DEBUG: <core> [tcp_main.c:1351]: tcpconn_add: hashes: 638:2058:3048, 1 [1323]: DEBUG: <core> [io_wait.h:361]: DBG: io_watch_add(0x8217f60, 27, 2, 0xb60e6390), fd_no=19 [1323]: DEBUG: <core> [io_wait.h:588]: DBG: io_watch_del (0x8217f60, 27, -1, 0x0) fd_no=20 called [1323]: DEBUG: <core> [tcp_main.c:3627]: tcp: DBG: sendig to child, events 1 [1323]: DEBUG: <core> [tcp_main.c:3336]: send2child: to tcp child 0 10(1319), 0xb60e6390 [1319]: DEBUG: <core> [tcp_read.c:884]: received n=4 con=0xb60e6390, fd=8 [1319]: DEBUG: tls [tls_server.c:109]: Using TLS domain TLSs<default> [1323]: : <core> [pass_fd.c:283]: ERROR: receive_fd: EOF on 18 [1323]: DEBUG: <core> [tcp_main.c:3038]: DBG: handle_ser_child: dead child 10, pid 1319 (shutting down?) [1323]: DEBUG: <core> [io_wait.h:588]: DBG: io_watch_del (0x8217f60, 18, -1, 0x0) fd_no=19 called [1323]: DEBUG: <core> [tcp_main.c:2826]: DBG: handle_tcp_child: dead tcp child 0 (pid 1319, no 10) (shutting down?) [1323]: DEBUG: <core> [io_wait.h:588]: DBG: io_watch_del (0x8217f60, 20, -1, 0x0) fd_no=18 called [1297]: ALERT: <core> [main.c:722]: child process 1319 exited by a signal 11
Attached is the complete log file from kamailio startup until crash with first message.
Further I have attached the bt of the generated cores (2 cores were generated). As you already said this looks like a bug in openssl.
# cat /etc/redhat-release CentOS release 5.4 (Final)
# rpm -qa|grep ssl openssl-devel-0.9.8e-12.el5_4.1 openssl-0.9.8e-12.el5_4.1
Strangly, the ciphers announced by SNOM do not indicate Kerberos at all:
SSL Record Layer: Handshake Protocol: Client Hello Content Type: Handshake (22) Version: TLS 1.0 (0x0301) Length: 63 Handshake Protocol: Client Hello Handshake Type: Client Hello (1) Length: 59 Version: TLS 1.0 (0x0301) Random Session ID Length: 0 Cipher Suites Length: 20 Cipher Suites (10 suites) Cipher Suite: TLS_RSA_WITH_RC4_128_MD5 (0x0004) Cipher Suite: TLS_RSA_WITH_RC4_128_SHA (0x0005) Cipher Suite: TLS_RSA_WITH_NULL_MD5 (0x0001) Cipher Suite: TLS_RSA_WITH_NULL_SHA (0x0002) Cipher Suite: TLS_DH_anon_WITH_3DES_EDE_CBC_SHA (0x001b) Cipher Suite: TLS_DH_anon_WITH_RC4_128_MD5 (0x0018) Cipher Suite: TLS_RSA_WITH_DES_CBC_SHA (0x0009) Cipher Suite: TLS_RSA_EXPORT1024_WITH_RC4_56_SHA (0x0064) Cipher Suite: TLS_RSA_EXPORT1024_WITH_DES_CBC_SHA (0x0062) Cipher Suite: TLS_DH_anon_WITH_DES_CBC_SHA (0x001a) Compression Methods Length: 1 Compression Methods (1 method) Compression Method: null (0)
Attached is also the pcap file of the SNOM phone.
regards Klaus