after a few weeks, i generated new sip router from latest master. if failed to start with these kind of messages:
*** glibc detected *** /usr/sbin/sip-proxy: double free or corruption (out): 0x08868750 *** Not starting sip-proxy: invalid configuration file!
0(23949) INFO: tls [tls_init.c:375]: tls: init_tls: disabling compression... Inconsistency detected by ld.so: dl-open.c: 221: dl_open_worker: Assertion `_dl_debug_initialize (0, args->nsid)->r_state == RT_CONSISTENT' failed!
an earlier version of sip router master starts ok with the same config.
any ideas what is going on?
-- juha
Hello,
if you are using ctl module, edit its Makefile and comment the line:
DEFS+=-DCTL_SYSTEM_MALLOC
I updated to make possible to compile ctl to use system malloc in order to cope with large buffers to print RPC command result. For the moment it is enabled (otherwise it will use pkg, like so far). I expect glib detects problems with its memory allocator, so now ctl using system allocator, it could be the reason.
If works, then we found the place of the problem and means the module has a double free of a pointer, which has to be solved (pkg figures out that at runtime).
Also, what is your glib version? Since I did it, all is working fine on my test system.
If not, the problem is somewhere else, send the version of the working instance.
Cheers, Daniel
On 1/13/12 11:57 PM, Juha Heinanen wrote:
after a few weeks, i generated new sip router from latest master. if failed to start with these kind of messages:
*** glibc detected *** /usr/sbin/sip-proxy: double free or corruption (out): 0x08868750 *** Not starting sip-proxy: invalid configuration file!
0(23949) INFO: tls [tls_init.c:375]: tls: init_tls: disabling compression... Inconsistency detected by ld.so: dl-open.c: 221: dl_open_worker: Assertion `_dl_debug_initialize (0, args->nsid)->r_state == RT_CONSISTENT' failed!
an earlier version of sip router master starts ok with the same config.
any ideas what is going on?
-- juha
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
Daniel-Constantin Mierla writes:
if you are using ctl module, edit its Makefile and comment the line:
DEFS+=-DCTL_SYSTEM_MALLOC
i did that and compiled/installed ctl again. no change. still the same errors at startup.
Also, what is your glib version? Since I did it, all is working fine on my test system.
my system is debian squeeze with this kind of libglib:
ii libglib2.0-0 2.24.2-1 The GLib library of C routines
If not, the problem is somewhere else, send the version of the working instance.
working one is
version: sip-proxy 3.3.0-dev2 (i386/linux) compiled on 13:22:48 Dec 4 2011 with cc 4.4.5
-- juha
daniel,
i copied whole ctl module from working dec 9 version of master to current master and then rebuild sip router. when i start it, i still get
*** glibc detected *** /usr/sbin/sip-proxy: double free or corruption (out): 0x08a6d750 *** Not starting sip-proxy: invalid configuration file!
0(16953) INFO: tls [tls_init.c:375]: tls: init_tls: disabling compression... Inconsistency detected by ld.so: dl-open.c: 221: dl_open_worker: Assertion `_dl_debug_initialize (0, args->nsid)->r_state == RT_CONSISTENT' failed!
so looks like this issue is not related to ctl module, but is something else.
-- juha
Hello,
I will try to test it this evening on a debian and look over the changes from Dec 9 to current to see what could be the cause. There was another change in tcp code to clone the buffer, but was rather small and I double checked.
Can you skip loading the tls module and see if it works without it? Tls mod was touched a bit, but not related to memory at all, but where init of tls happens.
Cheers, Daniel
On 1/16/12 8:17 PM, Juha Heinanen wrote:
daniel,
i copied whole ctl module from working dec 9 version of master to current master and then rebuild sip router. when i start it, i still get
*** glibc detected *** /usr/sbin/sip-proxy: double free or corruption (out): 0x08a6d750 *** Not starting sip-proxy: invalid configuration file!
0(16953) INFO: tls [tls_init.c:375]: tls: init_tls: disabling compression... Inconsistency detected by ld.so: dl-open.c: 221: dl_open_worker: Assertion `_dl_debug_initialize (0, args->nsid)->r_state == RT_CONSISTENT' failed!
so looks like this issue is not related to ctl module, but is something else.
-- juha
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
-- Daniel-Constantin Mierla -- http://www.asipto.com http://linkedin.com/in/miconda -- http://twitter.com/miconda
Hello,
On 1/16/12 8:31 PM, Juha Heinanen wrote:
Daniel-Constantin Mierla writes:
Can you skip loading the tls module and see if it works without it? Tls mod was touched a bit, but not related to memory at all, but where init of tls happens.
i get the same errors also when tls module is not loaded and
enable_tls=0
ok, just wanted to exclude that too from the list. I ran on an ubuntu with default config and all goes fine here, sure there is a different set of loaded module.
To understand the proxy is not starting at all, if you run with higher debug mode, where is it stopping? What is the last module/core file that prints messages?
From the man page of malloc, set MALLOC_CHECK_ environment variable to 2 and run again, you should get a coredump and extract the backtrace -- that should give a proper hint.
Also, what is the output of 'ldd --version'?
Cheers, Daniel
I've spotted a couple of bugs that have been recently introduced into the pua and presence mudules that can cause a crash. The presence one can occur at start up when it loads up data from the database. Not sure if this is what your seeing, but we'll be checking in a couple of fixes for this later today
Paul
-----Original Message----- From: Daniel-Constantin Mierla Sent: Monday, January 16, 2012 8:58 PM To: Development mailing list of the sip-router project Subject: Re: [sr-dev] current master fails to start
Hello,
On 1/16/12 8:31 PM, Juha Heinanen wrote:
Daniel-Constantin Mierla writes:
Can you skip loading the tls module and see if it works without it? Tls mod was touched a bit, but not related to memory at all, but where init of tls happens.
i get the same errors also when tls module is not loaded and
enable_tls=0
ok, just wanted to exclude that too from the list. I ran on an ubuntu with default config and all goes fine here, sure there is a different set of loaded module.
To understand the proxy is not starting at all, if you run with higher debug mode, where is it stopping? What is the last module/core file that prints messages?
From the man page of malloc, set MALLOC_CHECK_ environment variable to
2 and run again, you should get a coredump and extract the backtrace -- that should give a proper hint.
Also, what is the output of 'ldd --version'?
Cheers, Daniel
daniel,
thanks for the MALLOC_CHECK_ tip. after i set it to 2, i found out that starting failure with current master is due to my local module that contains a small number of non-general purpose functions.
gdb reports on the core file:
(gdb) where #0 0xb7786424 in __kernel_vsyscall () #1 0xb7638751 in raise () from /lib/i686/cmov/libc.so.6 #2 0xb763bb82 in abort () from /lib/i686/cmov/libc.so.6 #3 0xb7679484 in ?? () from /lib/i686/cmov/libc.so.6 #4 0xb767bed4 in ?? () from /lib/i686/cmov/libc.so.6 #5 0xb767c9a8 in ?? () from /lib/i686/cmov/libc.so.6 #6 0xb767de08 in malloc () from /lib/i686/cmov/libc.so.6 #7 0xb778de23 in ?? () from /lib/ld-linux.so.2 #8 0xb778df7b in ?? () from /lib/ld-linux.so.2 #9 0xb778e526 in ?? () from /lib/ld-linux.so.2 #10 0xb7793e9c in ?? () from /lib/ld-linux.so.2 #11 0xb77947f6 in ?? () from /lib/ld-linux.so.2 #12 0xb7793192 in ?? () from /lib/ld-linux.so.2 #13 0xb7798b81 in ?? () from /lib/ld-linux.so.2 #14 0xb77947f6 in ?? () from /lib/ld-linux.so.2 #15 0xb77985c6 in ?? () from /lib/ld-linux.so.2 #16 0xb776cc0b in ?? () from /lib/i686/cmov/libdl.so.2 #17 0xb77947f6 in ?? () from /lib/ld-linux.so.2 #18 0xb776d09c in ?? () from /lib/i686/cmov/libdl.so.2 #19 0xb776cb41 in dlopen () from /lib/i686/cmov/libdl.so.2 #20 0x080f72e4 in load_module (mod_path=0xb7221478 "local") at sr_module.c:569 #21 0x0817e332 in yyparse () at cfg.y:1709 #22 0x08095a1f in main (argc=18, argv=0xbfb55c24) at main.c:2084
which does not tell much to me. Makefile of the module is like this:
include ../../Makefile.defs
auto_gen= NAME=local.so
DEFS+=-DOPENSER_MOD_INTERFACE
SERLIBPATH=../../lib SER_LIBS+=$(SERLIBPATH)/kmi/kmi SER_LIBS+=$(SERLIBPATH)/srdb1/srdb1 SER_LIBS+=$(SERLIBPATH)/kcore/kcore
include ../../Makefile.modules
the module contains mod_init, child_init, mi_child_init, and destroy functions:
struct module_exports exports = { "local", DEFAULT_DLFLAGS, /* dlopen flags */ cmds, /* Exported functions */ params, /* Exported parameters */ 0, /* exported statistics */ mi_cmds, /* exported MI functions */ 0, /* exported pseudo-variables */ 0, /* extra processes */ mod_init, /* module initialization function */ 0, /* response function*/ destroy, /* destroy function */ child_init /* child initialization function */ };
i added LM_INFO calls to the beginning of each, and nothing gets printed to syslog, which makes me think that the crash happens before any of them is called.
-- juha
i wrote a new version of local module (below) that is just a skeleton. when i start sip router, i get to console:
Not starting sip-proxy: invalid configuration file!
but i don't get anything about what is wrong with it.
gdb shows:
gdb) where #0 0xb7740424 in __kernel_vsyscall () #1 0xb75f2751 in *__GI_raise (sig=6)p at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #2 0xb75f5b82 in *__GI_abort () at abort.c:92 #3 0xb7633484 in __malloc_assert (assertion=<value optimized out>, file=<value optimized out>, line=4636, function=0xb76ea490 "_int_malloc") at malloc.c:352 #4 0xb7635ed4 in _int_malloc (av=<value optimized out>, bytes=<value optimized out>) at malloc.c:4636 #5 0xb7637c8c in *__GI___libc_malloc (bytes=0) at malloc.c:3661 #6 0xb767e2cc in analyze (preg=<value optimized out>, pattern=<value optimized out>, length=73, syntax=4436732) at regcomp.c:1135 #7 re_compile_internal (preg=<value optimized out>, pattern=<value optimized out>, length=73, syntax=4436732) at regcomp.c:804 #8 0xb767ee5e in __regcomp (preg=0xbf963844, pattern=0xb6ddb800 "^auth_db|dialplan|domain|htable|lcr|local|msilo|mtree|permissions|usrloc$", cflags=11) at regcomp.c:506 #9 0x0809a500 in set_mod_param_regex ( regex=0xb6ddb668 "auth_db|dialplan|domain|htable|lcr|local|msilo|mtree|permissions|usrloc", name=0xb6ddb6f0 "db_url", type=1, val=0xb6ddb778) at modparam.c:86 #10 0x0817e4fd in yyparse () at cfg.y:1733 #11 0x08095a1f in main (argc=18, argv=0xbf963f24) at main.c:2084
from that i see that the problem is that my new local module does not have db_url module param, but that does not explain why malloc causes the crash. this is with export MALLOC_CHECK_=2. if i do not define MALLOC_CHECK_=2, i still don't get any more stuff to console about the problem in config.
-- juha
---------------------------------------------------------------------------
/* * Local module */
#include "../../mod_fix.h"
MODULE_VERSION
int f1(struct sip_msg* _m, char* _uri_user_sp, char* _uri_host_sp) { return 1; }
int f2(struct sip_msg* _msg, char* _s1, char* _s2) { return 1; }
int f3(struct sip_msg* _msg, char* _sp, char* _s2) { return 1; }
int f4(struct sip_msg* _m, char* _condition, char* _s2) { return 1; }
int f5(struct sip_msg* _m, char* _condition, char* _str2) { return 1; }
int f6(struct sip_msg* _m, char* _condition, char* _str2) { return 1; }
/* Exported functions */ static cmd_export_t cmds[] = { {"f1", (cmd_function)f1, 2, fixup_pvar_pvar, fixup_free_pvar_pvar, REQUEST_ROUTE}, {"f2", (cmd_function)f2, 0, 0, 0, REQUEST_ROUTE}, {"f3", (cmd_function)f3, 1, fixup_pvar_null, fixup_free_pvar_null, REQUEST_ROUTE|FAILURE_ROUTE|BRANCH_ROUTE}, {"f4", (cmd_function)f4, 1, 0, 0, REQUEST_ROUTE | FAILURE_ROUTE}, {"f5", (cmd_function)f5, 1, 0, 0, REQUEST_ROUTE | FAILURE_ROUTE}, {"f6", (cmd_function)f6, 1, 0, 0, REQUEST_ROUTE | FAILURE_ROUTE}, {0, 0, 0, 0, 0, 0} };
static int mod_init(void) { return 0; }
/* Exported parameters */
/* Module interface */ struct module_exports exports = { "local", DEFAULT_DLFLAGS, /* dlopen flags */ cmds, /* Exported functions */ 0, /* Exported parameters */ 0, /* exported statistics */ 0, /* exported MI functions */ 0, /* exported pseudo-variables */ 0, /* extra processes */ mod_init, /* module initialization function */ 0, /* response function*/ 0, /* destroy function */ 0 /* child initialization function */ };
i tried to start sip router without loading my local module. crash still happens at the same point as before, i.e., when sip router tries to compile modparam that contains a regular expression:
modparam("auth_db|dialplan|domain|htable|lcr|msilo|mtree|permissions|usrloc", "db_url", "MYSQL_SIP_PROXY_URL")
(gdb) where #0 0xb78bd424 in __kernel_vsyscall () #1 0xb776f751 in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #2 0xb7772b82 in *__GI_abort () at abort.c:92 #3 0xb77b0484 in __malloc_assert (assertion=<value optimized out>, file=<value optimized out>, line=4636, function=0xb7867490 "_int_malloc") at malloc.c:352 #4 0xb77b2ed4 in _int_malloc (av=<value optimized out>, bytes=<value optimized out>) at malloc.c:4636 #5 0xb77b4c8c in *__GI___libc_malloc (bytes=0) at malloc.c:3661 #6 0xb77fb2cc in analyze (preg=<value optimized out>, pattern=<value optimized out>, length=67, syntax=4436732) at regcomp.c:1135 #7 re_compile_internal (preg=<value optimized out>, pattern=<value optimized out>, length=67, syntax=4436732) at regcomp.c:804 #8 0xb77fbe5e in __regcomp (preg=0xbff52264, pattern=0xb7358610 "^auth_db|dialplan|domain|htable|lcr|msilo|mtree|permissions|usrloc$", cflags=11) at regcomp.c:506 #9 0x0809a500 in set_mod_param_regex ( regex=0xb7358478 "auth_db|dialplan|domain|htable|lcr|msilo|mtree|permissions|usrloc", name=0xb7358500 "db_url", type=1, val=0xb7358588) at modparam.c:86 #10 0x0817e4fd in yyparse () at cfg.y:1733 #11 0x08095a1f in main (argc=18, argv=0xbff52944) at main.c:2084
i think regcomp.c is part of c library in debian squeeze.
-- juha
i installed my sip router package on another debian hosts and there it starts fine. so perhaps there is a memory or some other hardware problem on my other host that causes these malloc related crashes.
-- juha
On Sunday 22 January 2012, Juha Heinanen wrote:
i installed my sip router package on another debian hosts and there it starts fine. so perhaps there is a memory or some other hardware problem on my other host that causes these malloc related crashes.
On Saturday 21 January 2012, Juha Heinanen wrote:
i think regcomp.c is part of c library in debian squeeze.
Are you sure your libc is from squeeze and you do not have a libc version 2.13? If it is 2.13, it might me a wrong use of memcpy somewhere in the code. That problem almost always shows itself as vague, seemingly random, failures.
Alex Hermann writes:
Are you sure your libc is from squeeze and you do not have a libc version 2.13? If it is 2.13, it might me a wrong use of memcpy somewhere in the code. That problem almost always shows itself as vague, seemingly random, failures.
alex,
thanks for the pointer, but libc version is 2.11.2-10.
i installed a very bare debian squeeze on a virtual host on the same pc and there kamailio starts fine. i then wrote a program that compares versions of all packages that exist in the virtual host debian to the ones that exist in the real host and they are all the same. so the crash is somehow related to the extra packages that only exist in the real host.
i guess my only option is to reinstall squeeze on the real host.
-- juha