Juha Heinanen wrote:
i installed a very bare debian squeeze on a virtual host on the same pc and there kamailio starts fine.
kamailio started fine on virtual host, but the reason turned out to be that i didn't have geoip enabled in the virtual host kamailio instance. once i loaded also geoip module, kamailio crashed the same way as on the real host. i also tried another real host running debian squeeze and also there, kamailio crashes if and only if geoip module is enabled.
the crash happens very early during config processing. on one host like this:
(gdb) where #0 0xb7870424 in __kernel_vsyscall () #1 0xb75ce781 in raise () from /lib/i686/cmov/libc.so.6 #2 0xb75d1bb2 in abort () from /lib/i686/cmov/libc.so.6 #3 0xb760f584 in ?? () from /lib/i686/cmov/libc.so.6 #4 0xb7611fd4 in ?? () from /lib/i686/cmov/libc.so.6 #5 0xb7613d8c in malloc () from /lib/i686/cmov/libc.so.6 #6 0xb765a7ac in ?? () from /lib/i686/cmov/libc.so.6 #7 0xb765b33e in regcomp () from /lib/i686/cmov/libc.so.6 #8 0x0809a880 in set_mod_param_regex ( regex=0xb71a25f0 "auth_db|dialplan|domain|htable|lcr|local|msilo|mtree|permissions|usrloc", name=0xb71a2678 "db_url", type=1, val=0xb71a2700) at modparam.c:86 #9 0x081807dd in yyparse () at cfg.y:1733 #10 0x08095d9f in main (argc=18, argv=0xbf808934) at main.c:2084
and on another like this:
#0 0xb788a424 in __kernel_vsyscall () #1 0xb75e3781 in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #2 0xb75e6bb2 in *__GI_abort () at abort.c:92 #3 0xb7624584 in __malloc_assert (assertion=<value optimized out>, file=<value optimized out>, line=4636, function=0xb76dbc10 "_int_malloc") at malloc.c:352 #4 0xb7626fd4 in _int_malloc (av=<value optimized out>, bytes=<value optimized out>) at malloc.c:4636 #5 0xb7628d8c in *__GI___libc_malloc (bytes=8) at malloc.c:3661 #6 0xb769690a in nss_parse_file (database=0xb76dcf07 "passwd", alternate_name=0x0, defconfig=0xb76e0d40 "compat [NOTFOUND=return] files", ni=0xb76ff834) at nsswitch.c:542 #7 *__GI___nss_database_lookup (database=0xb76dcf07 "passwd", alternate_name=0x0, defconfig=0xb76e0d40 "compat [NOTFOUND=return] files", ni=0xb76ff834) at nsswitch.c:134 #8 0xb769728e in *__GI___nss_passwd_lookup2 (ni=0xbf96ebe8, fct_name=<value optimized out>, fct2_name=<value optimized out>, fctp=0xbf96ebe4) at XXX-lookup.c:71 #9 0xb764f7ff in __getpwnam_r (name=0xbf96f8cc "root", resbuf=0xb76fdc1c, buffer=0x9497150 "\320\323o\267\320\323o\267d symbol: mod_register", buflen=1024, result=0xbf96ec18) at ../nss/getXXbyYY_r.c:200 #10 0xb764f14f in getpwnam (name=0xbf96f8cc "root") at ../nss/getXXbyYY.c:117 #11 0x08134560 in user2uid (uid=0x81f1c20, gid=0x81f1c24, user=0xbf96f8cc "root") at ut.c:57 #12 0x080ebb91 in init_shm () at shm_init.c:63 #13 0x081807aa in yyparse () at cfg.y:1728 #14 0x08095d9f in main (argc=18, argv=0xbf96f324) at main.c:2084
crash happens no matter if my local module is loaded or it is not the culprit.
could there be something wrong with geoip module even when the backtraces are not geoip related?
any suggesting on how to dig further are welcome.
-- juha
here is more info about the crash at startup on debian squeeze.
in the config.cfg below, if i remove any one of the modules from loadmodule list, sip router starts fine. the config as such results in crash:
# /usr/sbin/sip-proxy -f /etc/sip-proxy/sip-proxy.cfg -P /var/run/sip-proxy/sip-proxy.pid -m 32 -M 4 -u root -g root -w /var/run/sip-proxy -n 4 loading modules under /usr/lib/sip-proxy/modules loading modules under /usr/lib/sip-proxy/modules_k sip-proxy: malloc.c:4636: _int_malloc: Assertion `victim->fd_nextsize->bk_nextsize == victim' failed. Aborted
-- juha
# --- Core params
debug=2 listen=192.98.102.10:5060 fork=yes log_stderror=no log_facility=LOG_LOCAL0 check_via=no dns=no rev_dns=no dns_try_ipv6=no dns_try_naptr=on dns_udp_pref=1 dns_tcp_pref=1 dns_sctp_pref=1 dns_tls_pref=1 dns_retr_time=5 dns_retr_no=1 dns_use_search_list=no use_dns_cache=on use_dns_failover=on dns_srv_loadbalancing=on use_dst_blacklist=on sip_warning=no tcp_accept_aliases=no tcp_connect_timeout=5 tcp_send_timeout=5 tcp_connection_lifetime=3610 auto_aliases=no exit_timeout=1800 syn_branch=0 enable_sctp=0 enable_tls=no stun_allow_stun=0
# --- Modules
loadpath "/usr/lib/sip-proxy/modules" loadmodule "db_mysql" loadmodule "avpops" loadmodule "dialplan" loadmodule "enum" loadmodule "geoip" loadmodule "lcr" loadmodule "mi_rpc" loadmodule "mtree" loadmodule "sl" loadmodule "utils" loadpath "/usr/lib/sip-proxy/modules_k" loadmodule "tmx" loadmodule "cfgutils" loadmodule "sqlops" loadmodule "pv"
# ---- Module parameters
# -- generic module parameters modparam("dialplan|lcr|mtree", "db_url", "mysql://foo:bar@localhost/ser")
# -- geoip params modparam("geoip", "path", "/usr/share/GeoIP/GeoIPCity.dat")
# ---- Request routing logic
route { exit; }
i went through december commits to master and found that before this commit:
http://git.sip-router.org/cgi-bin/gitweb.cgi?p=sip-router;a=commit;h=3775eb7...
sip router works ok with the config below, but does not start after the commit.
could you daniel check if there is something in the commit that could explain the problem?
-- juha
---------------------------------------------------------------------------
# --- Core params
debug=2 listen=192.98.102.10:5060 fork=yes log_stderror=no log_facility=LOG_LOCAL0 check_via=no dns=no rev_dns=no dns_try_ipv6=no dns_try_naptr=on dns_udp_pref=1 dns_tcp_pref=1 dns_sctp_pref=1 dns_tls_pref=1 dns_retr_time=5 dns_retr_no=1 dns_use_search_list=no use_dns_cache=on use_dns_failover=on dns_srv_loadbalancing=on use_dst_blacklist=on sip_warning=no tcp_accept_aliases=no tcp_connect_timeout=5 tcp_send_timeout=5 tcp_connection_lifetime=3610 auto_aliases=no exit_timeout=1800 syn_branch=0 enable_sctp=0 enable_tls=no stun_allow_stun=0
# --- Modules
loadpath "/usr/lib/sip-proxy/modules" loadmodule "db_mysql" loadmodule "avpops" loadmodule "dialplan" loadmodule "enum" loadmodule "geoip" loadmodule "lcr" loadmodule "mi_rpc" loadmodule "mtree" loadmodule "tm" loadmodule "sl" loadmodule "utils" loadpath "/usr/lib/sip-proxy/modules_k" loadmodule "tmx" loadmodule "cfgutils" loadmodule "sqlops" loadmodule "pv"
# ---- Module parameters
# -- generic module parameters modparam("dialplan|lcr|mtree", "db_url", "mysql://openxg:openxg123@localhost/ser")
# -- geoip params modparam("geoip", "path", "/usr/share/GeoIP/GeoIPCity.dat")
# -- lcr params modparam("lcr", "lcr_count", 1) modparam("lcr", "gw_uri_avp", "$avp(lcr_gw_uri)") modparam("lcr", "ruri_user_avp", "$avp(lcr_ruri_user)")
# -- mtree params modparam("mtree", "mtree", "name=test;type=2;dbtable=mtrees;") modparam("mtree", "mt_allow_duplicates", 1)
# ---- Request routing logic
route {
sl_send_reply("200", "OK"); exit;
}
On Thursday 02 February 2012 06:03:13 Juha Heinanen wrote:
i went through december commits to master and found that before this commit:
router;a=commit;h=3775eb7730b2cd5491864109945b31f15df28f1a
sip router works ok with the config below, but does not start after the commit.
This looks suspicious in that commit:
+ _tr_buffer_list = (char**)malloc(TR_BUFFER_SLOTS);
Try changing it to:
+ _tr_buffer_list = (char**)malloc(TR_BUFFER_SLOTS * sizeof(char*));
On Thursday 02 February 2012 09:34:31 Alex Hermann wrote:
On Thursday 02 February 2012 06:03:13 Juha Heinanen wrote:
i went through december commits to master and found that before this
commit:
router;a=commit;h=3775eb7730b2cd5491864109945b31f15df28f1a
sip router works ok with the config below, but does not start after the commit.
This looks suspicious in that commit:
_tr_buffer_list = (char**)malloc(TR_BUFFER_SLOTS);
Try changing it to:
_tr_buffer_list = (char**)malloc(TR_BUFFER_SLOTS * sizeof(char*));
While looking at this code a question emerged: is mod_register() called before or after forking the subprocesses? (I hope the answer is after)
Hello,
On 2/2/12 9:42 AM, Alex Hermann wrote:
On Thursday 02 February 2012 09:34:31 Alex Hermann wrote:
On Thursday 02 February 2012 06:03:13 Juha Heinanen wrote:
i went through december commits to master and found that before this
commit:
router;a=commit;h=3775eb7730b2cd5491864109945b31f15df28f1a
sip router works ok with the config below, but does not start after the commit.
This looks suspicious in that commit:
_tr_buffer_list = (char**)malloc(TR_BUFFER_SLOTS);
Try changing it to:
_tr_buffer_list = (char**)malloc(TR_BUFFER_SLOTS * sizeof(char*));
While looking at this code a question emerged: is mod_register() called before or after forking the subprocesses? (I hope the answer is after)
before forking (when the module is loaded), why do you hope is after?
Cheers, Daniel
On Thursday 02 February 2012, Daniel-Constantin Mierla wrote:
On 2/2/12 9:42 AM, Alex Hermann wrote:
While looking at this code a question emerged: is mod_register() called before or after forking the subprocesses? (I hope the answer is after)
before forking (when the module is loaded), why do you hope is after?
I may misunderstand the concept of shared memory on forking, but isn't the allocated buffer shared among all forked processes then? Without any locking, funny things are going to happen.
On Thursday 02 February 2012, Alex Hermann wrote:
On Thursday 02 February 2012, Daniel-Constantin Mierla wrote:
On 2/2/12 9:42 AM, Alex Hermann wrote:
While looking at this code a question emerged: is mod_register() called before or after forking the subprocesses? (I hope the answer is after)
before forking (when the module is loaded), why do you hope is after?
I may misunderstand the concept of shared memory on forking,
Apparantly i did. Sorry for the noise.
On 2/2/12 11:13 AM, Alex Hermann wrote:
On Thursday 02 February 2012, Alex Hermann wrote:
On Thursday 02 February 2012, Daniel-Constantin Mierla wrote:
On 2/2/12 9:42 AM, Alex Hermann wrote:
While looking at this code a question emerged: is mod_register() called before or after forking the subprocesses? (I hope the answer is after)
before forking (when the module is loaded), why do you hope is after?
I may misunderstand the concept of shared memory on forking,
Apparantly i did. Sorry for the noise.
no problem -- in this case is system private memory, as I wanted to keep like previously, where no pkg or shm was used for this purpose (it was a static buffer).
Cheers, Daniel
Hello,
On 2/2/12 9:34 AM, Alex Hermann wrote:
On Thursday 02 February 2012 06:03:13 Juha Heinanen wrote:
i went through december commits to master and found that before this commit:
router;a=commit;h=3775eb7730b2cd5491864109945b31f15df28f1a
sip router works ok with the config below, but does not start after the commit.
This looks suspicious in that commit:
_tr_buffer_list = (char**)malloc(TR_BUFFER_SLOTS);
Try changing it to:
_tr_buffer_list = (char**)malloc(TR_BUFFER_SLOTS * sizeof(char*));
that's right, thanks for spotting. You can commit the change.
Cheers, Daniel
Alex Hermann writes:
This looks suspicious in that commit:
_tr_buffer_list = (char**)malloc(TR_BUFFER_SLOTS);
Try changing it to:
_tr_buffer_list = (char**)malloc(TR_BUFFER_SLOTS * sizeof(char*));
alex,
thanks for the tip. i made the above change and after that sip router starts ok. i'll make more tests tomorrow my time.
in the meantime, perhaps daniel has a comment on this?
-- juha
Hello,
On 2/2/12 6:03 AM, Juha Heinanen wrote:
i went through december commits to master and found that before this commit:
http://git.sip-router.org/cgi-bin/gitweb.cgi?p=sip-router;a=commit;h=3775eb7...
sip router works ok with the config below, but does not start after the commit.
could you daniel check if there is something in the commit that could explain the problem?
there was a problem with the commit, spotted by Alex Hermann -- should be committed soon, I will do it later when I get to office if Alex or someone else does not do it meanwhile.
Hopefully will fix your case. It is strange that I tried with pv and all goes fine on my computers. The problem was not allocating char pointer structures, but just chars. The error message from libc was a bit misleading, though...
The goal of the change is to have a pool of buffer where to store transformation values instead of only one. Some transforamtions can be chained, in this way avoiding overwriting conflicts. next step in mind is to make the size of configurable, in case someone deals with large values (like xcap bodies), then he/she can adjust by module parameters.
Cheers, Daniel
Daniel-Constantin Mierla writes:
Hopefully will fix your case. It is strange that I tried with pv and all goes fine on my computers. The problem was not allocating char pointer structures, but just chars. The error message from libc was a bit misleading, though...
i tried also with my regular config.cfg and it worked too after the fix. loading pv module alone didn't cause the problem. it was a combination of modules and the crash occurred usually during parsing of the config file.
these kind of problems are time consuming to analyze. hopefully we don't get hit by another one any time soon.
thanks, juha