[sr-dev] SCSCF crashing during registration
Daniel Ciprus
daniel.ciprus at acision.com
Thu Mar 13 23:15:01 CET 2014
Looks like kamailio-debuginfo rpm was from older version of kamailio. I'm not able to reproduce core file anymore. Could somebody please be so kind and explain why ?
Secondly:
Mar 13 18:12:57 ricvmf-fusion01 kam-scscf[13524]: WARNING: tm [t_lookup.c:1536]: t_unref(): WARNING: script writer didn't release transaction
Mar 13 18:12:57 ricvmf-fusion01 kam-pcscf[13639]: ERROR: ims_usrloc_pcscf [udomain.c:400]: update_pcontact(): no more shm_mem
Mar 13 18:12:57 ricvmf-fusion01 kam-pcscf[13639]: ERROR: ims_registrar_pcscf [save.c:208]: update_contacts(): failed to update pcscf contact
Mar 13 18:12:57 ricvmf-fusion01 kam-pcscf[13639]: ERROR: ims_usrloc_pcscf [udomain.c:400]: update_pcontact(): no more shm_mem
Mar 13 18:12:57 ricvmf-fusion01 kam-pcscf[13639]: ERROR: ims_registrar_pcscf [save.c:208]: update_contacts(): failed to update pcscf contact
Mar 13 18:12:57 ricvmf-fusion01 kam-pcscf[13639]: ERROR: ims_usrloc_pcscf [udomain.c:400]: update_pcontact(): no more shm_mem
Mar 13 18:12:57 ricvmf-fusion01 kam-pcscf[13639]: ERROR: ims_registrar_pcscf [save.c:208]: update_contacts(): failed to update pcscf contact
Mar 13 18:12:57 ricvmf-fusion01 kam-pcscf[13639]: ERROR: ims_usrloc_pcscf [udomain.c:400]: update_pcontact(): no more shm_mem
Mar 13 18:12:57 ricvmf-fusion01 kam-pcscf[13639]: ERROR: ims_registrar_pcscf [save.c:208]: update_contacts(): failed to update pcscf contact
Mar 13 18:12:57 ricvmf-fusion01 kam-pcscf[13657]: ERROR: <core> [tcp_main.c:4237]: handle_tcpconn_ev(): connect 10.67.64.29:1305 failed
Mar 13 18:12:57 ricvmf-fusion01 kam-scscf[13524]: INFO: ims_registrar_scscf [cxdx_sar.c:79]: create_return_code(): created AVP successfully : [saa_return_code] - [1]
Mar 13 18:12:57 ricvmf-fusion01 kam-scscf[13524]: WARNING: tm [t_lookup.c:1536]: t_unref(): WARNING: script writer didn't release transaction
Mar 13 18:12:57 ricvmf-fusion01 kam-pcscf[13639]: ERROR: ims_usrloc_pcscf [udomain.c:400]: update_pcontact(): no more shm_mem
Mar 13 18:12:57 ricvmf-fusion01 kam-pcscf[13639]: ERROR: ims_registrar_pcscf [save.c:208]: update_contacts(): failed to update pcscf contact
Mar 13 18:12:57 ricvmf-fusion01 kam-pcscf[13639]: ERROR: ims_usrloc_pcscf [udomain.c:400]: update_pcontact(): no more shm_mem
Mar 13 18:12:57 ricvmf-fusion01 kam-pcscf[13639]: ERROR: ims_registrar_pcscf [save.c:208]: update_contacts(): failed to update pcscf contact
Mar 13 18:12:57 ricvmf-fusion01 kam-pcscf[13639]: ERROR: ims_usrloc_pcscf [udomain.c:400]: update_pcontact(): no more shm_mem
Mar 13 18:12:57 ricvmf-fusion01 kam-pcscf[13639]: ERROR: ims_registrar_pcscf [save.c:208]: update_contacts(): failed to update pcscf contact
Mar 13 18:12:57 ricvmf-fusion01 kam-pcscf[13639]: ERROR: ims_usrloc_pcscf [udomain.c:400]: update_pcontact(): no more shm_mem
Mar 13 18:12:57 ricvmf-fusion01 kam-pcscf[13639]: ERROR: ims_registrar_pcscf [save.c:208]: update_contacts(): failed to update pcscf contact
As far as I remember there was configurable called "mem=XXX" but I don't see it in the devel cookbook anymore. Any idea what replaced this variable ?
@Hugh: we'll have beer on me once you get here ;)
On 03/13/2014 05:47 PM, Hugh Waite wrote:
Dan,
There are two cores because of a crash in one process followed by a crash when the other processes are trying to shutdown.
What's interesting is that the bt doesn't show useful pointers. If you have installed from RPMs make sure the kamailio-debuginfo is from the same build as the other RPMs.
Also, do the logs say anything? There should be a log entry from the kernel for the segfault/signal that says which module crashed (e.g. registrar.so) and possibly (hopefully) an error message just before that.
Hugh
On 13/03/2014 19:53, Daniel Ciprus wrote:
Jason,
I've tried multiple combinations for pattern but I'm getting only 2 core files ...
Details:
~]# cat /proc/sys/kernel/core_pattern
/tmp/core.%e.sig%s.%p
~]# lsb_release -a
LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: RedHatEnterpriseServer
Description: Red Hat Enterprise Linux Server release 6.5 (Santiago)
Release: 6.5
Codename: Santiago
(gdb) bt
#0 0x00000000005350b0 in ?? ()
#1 0x000000000053542a in ?? ()
#2 0x00000000005356c7 in timer_main ()
#3 0x000000000046d572 in main_loop ()
#4 0x000000000047030b in main ()
(gdb) bt full
#0 0x00000000005350b0 in ?? ()
No symbol table info available.
#1 0x000000000053542a in ?? ()
No symbol table info available.
#2 0x00000000005356c7 in timer_main ()
No symbol table info available.
#3 0x000000000046d572 in main_loop ()
No symbol table info available.
#4 0x000000000047030b in main ()
No symbol table info available.
(gdb)
(gdb) bt
#0 0x00000031ba432925 in raise () from /lib64/libc.so.6
#1 0x00000031ba434105 in abort () from /lib64/libc.so.6
#2 0x0000000000546750 in ?? ()
#3 0x000000000054853a in qm_free ()
#4 0x00007f23d98f87de in free_local_ack_unsafe (lack=0x7f23d3319d70) at uac.c:600
#5 0x00007f23d988ea57 in free_cell (dead_cell=0x7f23d3319a70) at h_table.c:217
#6 0x00007f23d988f2ee in free_hash_table () at h_table.c:441
#7 0x00007f23d98a2fca in tm_shutdown () at t_funcs.c:122
#8 0x00000000004f7c7a in destroy_modules ()
#9 0x0000000000466e63 in cleanup ()
#10 0x0000000000467f65 in ?? ()
#11 0x0000000000469679 in handle_sigs ()
#12 0x000000000046db19 in main_loop ()
#13 0x000000000047030b in main ()
(gdb) bt full
#0 0x00000031ba432925 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x00000031ba434105 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x0000000000546750 in ?? ()
No symbol table info available.
#3 0x000000000054853a in qm_free ()
No symbol table info available.
#4 0x00007f23d98f87de in free_local_ack_unsafe (lack=0x7f23d3319d70) at uac.c:600
__FUNCTION__ = "free_local_ack_unsafe"
#5 0x00007f23d988ea57 in free_cell (dead_cell=0x7f23d3319a70) at h_table.c:217
b = 0x0
i = 0
rpl = 0x0
tt = 0x0
foo = 0x2fd3221000
cbs = 0x0
cbs_tmp = 0x7f23d35386b8
__FUNCTION__ = "free_cell"
#6 0x00007f23d988f2ee in free_hash_table () at h_table.c:441
p_cell = 0x7f23d3319a70
tmp_cell = 0x7f23d353dca0
i = 580
__FUNCTION__ = "free_hash_table"
#7 0x00007f23d98a2fca in tm_shutdown () at t_funcs.c:122
__FUNCTION__ = "tm_shutdown"
#8 0x00000000004f7c7a in destroy_modules ()
No symbol table info available.
#9 0x0000000000466e63 in cleanup ()
No symbol table info available.
#10 0x0000000000467f65 in ?? ()
No symbol table info available.
#11 0x0000000000469679 in handle_sigs ()
No symbol table info available.
#12 0x000000000046db19 in main_loop ()
No symbol table info available.
#13 0x000000000047030b in main ()
No symbol table info available.
(gdb)
On 03/13/2014 02:58 PM, Jason Penton wrote:
I don't think these cores indicate the real crash... I'd like to get some more detail on what actually happened? Daniel, can you re-create? Keep in mind that if your core dump config on your box is not configured to name your cores according to process id or timestamp one core will overwrite the other..... as a result you will never see the core that is the root cause.
Which OS are you running?
if Linux, I use the following in /etc/sysctl.conf:
kernel.core_pattern=/tmp/core.%e.%p.%h.%t
On Thu, Mar 13, 2014 at 8:45 PM, Carsten Bock <carsten at ng-voice.com<mailto:carsten at ng-voice.com>> wrote:
It looks a little bit like a "double free".
You could try to disable the call to "abort()" in case this happens:
mem_safety=1
See: http://www.kamailio.org/wiki/cookbooks/devel/core#mem_safety
Kind regards,
Carsten
2014-03-13 19:44 GMT+01:00 Carsten Bock <carsten at ng-voice.com<mailto:carsten at ng-voice.com>>:
> It looks a little bit like a "double free".
>
> You could try to disable the call to "abort()" in case this happens:
>
>
> 2014-03-13 17:22 GMT+01:00 Daniel Ciprus <daniel.ciprus at acision.com<mailto:daniel.ciprus at acision.com>>:
>> There are no more core files on the filesystem :-(
>>
>> On 03/13/2014 12:18 PM, Jason Penton wrote:
>>
>> I'm afraid this is also not the correct core. Can you check the timestamp on
>> the cores? Can you re-create the crash and send me the correct core?
>>
>>
>>
>>
>> On Thu, Mar 13, 2014 at 5:36 PM, Daniel Ciprus <daniel.ciprus at acision.com<mailto:daniel.ciprus at acision.com>>
>> wrote:
>>>
>>> So I cleaned up my junkyard and I got 2 core files:
>>>
>>> (gdb) bt
>>> #0 0x00000000005350b0 in ?? ()
>>> #1 0x000000000053542a in ?? ()
>>> #2 0x00000000005356c7 in timer_main ()
>>> #3 0x000000000046d572 in main_loop ()
>>> #4 0x000000000047030b in main ()
>>> (gdb) bt full
>>> #0 0x00000000005350b0 in ?? ()
>>>
>>> No symbol table info available.
>>> #1 0x000000000053542a in ?? ()
>>>
>>> No symbol table info available.
>>> #2 0x00000000005356c7 in timer_main ()
>>>
>>> No symbol table info available.
>>> #3 0x000000000046d572 in main_loop ()
>>>
>>> No symbol table info available.
>>> #4 0x000000000047030b in main ()
>>>
>>> No symbol table info available.
>>> (gdb)
>>>
>>>
>>> (gdb) bt full
>>> #0 0x00000031ba432925 in raise () from /lib64/libc.so.6
>>> No symbol table info available.
>>> #1 0x00000031ba434105 in abort () from /lib64/libc.so.6
>>> No symbol table info available.
>>> #2 0x0000000000546750 in ?? ()
>>> No symbol table info available.
>>> #3 0x000000000054853a in qm_free ()
>>> No symbol table info available.
>>> #4 0x00007f5bf7d5a7de in free_local_ack_unsafe (lack=0x7f5bf1894528) at
>>> uac.c:600
>>> __FUNCTION__ = "free_local_ack_unsafe"
>>> #5 0x00007f5bf7cf0a57 in free_cell (dead_cell=0x7f5bf1894228) at
>>> h_table.c:217
>>>
>>> b = 0x0
>>> i = 0
>>> rpl = 0x0
>>> tt = 0x0
>>> foo = 0x2ff1683000
>>> cbs = 0x0
>>> cbs_tmp = 0x7f5bf198e508
>>> __FUNCTION__ = "free_cell"
>>> #6 0x00007f5bf7cf12ee in free_hash_table () at h_table.c:441
>>> p_cell = 0x7f5bf1894228
>>> tmp_cell = 0x7f5bf1894228
>>> i = 3533
>>> __FUNCTION__ = "free_hash_table"
>>> #7 0x00007f5bf7d04fca in tm_shutdown () at t_funcs.c:122
>>>
>>> __FUNCTION__ = "tm_shutdown"
>>> #8 0x00000000004f7c7a in destroy_modules ()
>>> No symbol table info available.
>>> #9 0x0000000000466e63 in cleanup ()
>>> No symbol table info available.
>>> #10 0x0000000000467f65 in ?? ()
>>> No symbol table info available.
>>> #11 0x0000000000469679 in handle_sigs ()
>>> No symbol table info available.
>>> #12 0x000000000046db19 in main_loop ()
>>> No symbol table info available.
>>> #13 0x000000000047030b in main ()
>>> No symbol table info available.
>>> (gdb)
>>>
>>>
>>> On 03/13/2014 11:18 AM, Jason Penton wrote:
>>>
>>> Hi Daniel,
>>>
>>> this is the wrong core file. This is the one created on shutdown of
>>> kamailio. Can you do a bt on the other core file that you probably have...
>>>
>>> Cheers
>>> Jason
>>>
>>>
>>> On Thu, Mar 13, 2014 at 5:05 PM, Daniel Ciprus <daniel.ciprus at acision.com<mailto:daniel.ciprus at acision.com>>
>>> wrote:
>>>>
>>>> Folks,
>>>>
>>>> This is happening during the registration on SCSCF.
>>>>
>>>> Server:: kamailio (4.2.0-dev2 (x86_64/linux))
>>>> Build:: mi_core.c compiled on 10:01:09 Mar 13 2014 with gcc 4.4.6
>>>> Flags:: STATS: Off, USE_TCP, USE_TLS, TLS_HOOKS, USE_RAW_SOCKS,
>>>> DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC,
>>>> DBG_QM_MALLOC, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE,
>>>> USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES
>>>> GIT:: unknown
>>>> Now:: Thu Mar 13 11:04:47 2014
>>>> Up since:: Thu Mar 13 10:58:12 2014
>>>> Up time:: 395 [sec]
>>>>
>>>> (gdb) bt
>>>> #0 0x00000031ba432925 in raise () from /lib64/libc.so.6
>>>> #1 0x00000031ba434105 in abort () from /lib64/libc.so.6
>>>> #2 0x0000000000546750 in ?? ()
>>>> #3 0x000000000054853a in qm_free ()
>>>> #4 0x00007fb4def5b7de in free_local_ack_unsafe (lack=0x7fb4d8b31728) at
>>>> uac.c:600
>>>> #5 0x00007fb4deef1a57 in free_cell (dead_cell=0x7fb4d8b31428) at
>>>> h_table.c:217
>>>> #6 0x00007fb4deef22ee in free_hash_table () at h_table.c:441
>>>> #7 0x00007fb4def05fca in tm_shutdown () at t_funcs.c:122
>>>> #8 0x00000000004f7c7a in destroy_modules ()
>>>> #9 0x0000000000466e63 in cleanup ()
>>>> #10 0x0000000000467f65 in ?? ()
>>>> #11 0x0000000000469679 in handle_sigs ()
>>>> #12 0x000000000046db19 in main_loop ()
>>>> #13 0x000000000047030b in main ()
>>>> (gdb) bt full
>>>> #0 0x00000031ba432925 in raise () from /lib64/libc.so.6
>>>> No symbol table info available.
>>>> #1 0x00000031ba434105 in abort () from /lib64/libc.so.6
>>>> No symbol table info available.
>>>> #2 0x0000000000546750 in ?? ()
>>>> No symbol table info available.
>>>> #3 0x000000000054853a in qm_free ()
>>>> No symbol table info available.
>>>> #4 0x00007fb4def5b7de in free_local_ack_unsafe (lack=0x7fb4d8b31728) at
>>>> uac.c:600
>>>> __FUNCTION__ = "free_local_ack_unsafe"
>>>> #5 0x00007fb4deef1a57 in free_cell (dead_cell=0x7fb4d8b31428) at
>>>> h_table.c:217
>>>> b = 0x0
>>>> i = 0
>>>> rpl = 0x0
>>>> tt = 0x0
>>>> foo = 0x2fd8a8b000
>>>> cbs = 0x0
>>>> cbs_tmp = 0x7fb4d8d9c9e0
>>>> __FUNCTION__ = "free_cell"
>>>> #6 0x00007fb4deef22ee in free_hash_table () at h_table.c:441
>>>> p_cell = 0x7fb4d8b31428
>>>> tmp_cell = 0x7fb4d8b31428
>>>> i = 11517
>>>> __FUNCTION__ = "free_hash_table"
>>>> #7 0x00007fb4def05fca in tm_shutdown () at t_funcs.c:122
>>>> __FUNCTION__ = "tm_shutdown"
>>>> #8 0x00000000004f7c7a in destroy_modules ()
>>>> No symbol table info available.
>>>> #9 0x0000000000466e63 in cleanup ()
>>>> No symbol table info available.
>>>> #10 0x0000000000467f65 in ?? ()
>>>> No symbol table info available.
>>>> #11 0x0000000000469679 in handle_sigs ()
>>>> No symbol table info available.
>>>> #12 0x000000000046db19 in main_loop ()
>>>> No symbol table info available.
>>>> #13 0x000000000047030b in main ()
>>>> No symbol table info available.
>>>> (gdb)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Daniel Ciprus
>>>> Integration engineer
>>>> http://www.acision.com
>>>>
>>>> 9954 Mayland Dr
>>>> Suite 3100
>>>> Richmond, VA 23233
>>>> USA
>>>> T: +1 804 762 5601<tel:%2B1%20804%20762%205601>
>>>> E: daniel.ciprus at acision.com<mailto:daniel.ciprus at acision.com>
>>>>
>>>> ________________________________
>>>> This e-mail and any attachment is for authorised use by the intended
>>>> recipient(s) only. It may contain proprietary material, confidential
>>>> information and/or be subject to legal privilege. It should not be copied,
>>>> disclosed to, retained or used by, any other party. If you are not an
>>>> intended recipient then please promptly delete this e-mail and any
>>>> attachment and all copies and inform the sender. Thank you for
>>>> understanding.
>>>>
>>>>
>>>> _______________________________________________
>>>> sr-dev mailing list
>>>> sr-dev at lists.sip-router.org<mailto:sr-dev at lists.sip-router.org>
>>>> http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
>>>>
>>>
>>>
>>> --
>>> Daniel Ciprus
>>> Integration engineer
>>> http://www.acision.com
>>>
>>> 9954 Mayland Dr
>>> Suite 3100
>>> Richmond, VA 23233
>>> USA
>>> T: +1 804 762 5601<tel:%2B1%20804%20762%205601>
>>> E: daniel.ciprus at acision.com<mailto:daniel.ciprus at acision.com>
>>>
>>> ________________________________
>>> This e-mail and any attachment is for authorised use by the intended
>>> recipient(s) only. It may contain proprietary material, confidential
>>> information and/or be subject to legal privilege. It should not be copied,
>>> disclosed to, retained or used by, any other party. If you are not an
>>> intended recipient then please promptly delete this e-mail and any
>>> attachment and all copies and inform the sender. Thank you for
>>> understanding.
>>>
>>>
>>> _______________________________________________
>>> sr-dev mailing list
>>> sr-dev at lists.sip-router.org<mailto:sr-dev at lists.sip-router.org>
>>> http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
>>>
>>
>>
>> --
>> Daniel Ciprus
>> Integration engineer
>> http://www.acision.com
>>
>> 9954 Mayland Dr
>> Suite 3100
>> Richmond, VA 23233
>> USA
>> T: +1 804 762 5601<tel:%2B1%20804%20762%205601>
>> E: daniel.ciprus at acision.com<mailto:daniel.ciprus at acision.com>
>>
>> ________________________________
>> This e-mail and any attachment is for authorised use by the intended
>> recipient(s) only. It may contain proprietary material, confidential
>> information and/or be subject to legal privilege. It should not be copied,
>> disclosed to, retained or used by, any other party. If you are not an
>> intended recipient then please promptly delete this e-mail and any
>> attachment and all copies and inform the sender. Thank you for
>> understanding.
>>
>>
>> _______________________________________________
>> sr-dev mailing list
>> sr-dev at lists.sip-router.org<mailto:sr-dev at lists.sip-router.org>
>> http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
>>
>
>
>
> --
> Carsten Bock
> CEO (Geschäftsführer)
>
> ng-voice GmbH
> Schomburgstr. 80
> D-22767 Hamburg / Germany
>
> http://www.ng-voice.com
> mailto:carsten at ng-voice.com<mailto:carsten at ng-voice.com>
>
> Office +49 40 34927219<tel:%2B49%2040%2034927219>
> Fax +49 40 34927220<tel:%2B49%2040%2034927220>
>
> Sitz der Gesellschaft: Hamburg
> Registergericht: Amtsgericht Hamburg, HRB 120189
> Geschäftsführer: Carsten Bock
> Ust-ID: DE279344284
>
> Hier finden Sie unsere handelsrechtlichen Pflichtangaben:
> http://www.ng-voice.com/imprint/
--
Carsten Bock
CEO (Geschäftsführer)
ng-voice GmbH
Schomburgstr. 80
D-22767 Hamburg / Germany
http://www.ng-voice.com
mailto:carsten at ng-voice.com<mailto:carsten at ng-voice.com>
Office +49 40 34927219<tel:%2B49%2040%2034927219>
Fax +49 40 34927220<tel:%2B49%2040%2034927220>
Sitz der Gesellschaft: Hamburg
Registergericht: Amtsgericht Hamburg, HRB 120189
Geschäftsführer: Carsten Bock
Ust-ID: DE279344284
Hier finden Sie unsere handelsrechtlichen Pflichtangaben:
http://www.ng-voice.com/imprint/
--
Daniel Ciprus
Integration engineer
http://www.acision.com
9954 Mayland Dr
Suite 3100
Richmond, VA 23233
USA
T: +1 804 762 5601
E: daniel.ciprus at acision.com<mailto:daniel.ciprus at acision.com>
________________________________
This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you for understanding.
_______________________________________________
sr-dev mailing list
sr-dev at lists.sip-router.org<mailto:sr-dev at lists.sip-router.org>
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
--
Hugh Waite
Principal Design Engineer
Crocodile RCS Ltd.
--
Daniel Ciprus
Integration engineer
http://www.acision.com
9954 Mayland Dr
Suite 3100
Richmond, VA 23233
USA
T: +1 804 762 5601
E: daniel.ciprus at acision.com<mailto:daniel.ciprus at acision.com>
________________________________
This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you for understanding.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sip-router.org/pipermail/sr-dev/attachments/20140313/c85c9e7e/attachment-0001.html>
More information about the sr-dev
mailing list