Hello,

thanks for your quick reply, my answer is inline.

2011/3/2 Daniel-Constantin Mierla <miconda@gmail.com>
Hello,

looks like related to the callbacks for dialog module. Are you loading other modules that require dialog module?
 
we are using some features of dialog module such as ending dialogs after a timeout period, and we are using engage_mediaproxy() function, as well. It's an old configuration we had to put in production with no  time enough to test. Do you recommend not to use dialog module if not strictly required?
 

Checking the time staps from the acc and the crash log, the BYE for the dialog was before the crash but the To-tag is not printed from dlg_hash.c, although it is in the acc for INVITE and BYE. Do you have parallel forking in front of this SIP server? I mean, is there another proxy that can do parallel forking then send two or more branches to this instance?

AFAIK the the client who is sending that calls is not doing parallel forking, they are sending calls over a SIP trunk to our Kamailio. They are calling to PSTN numbers and we are sending that calls to a gateway, so they shouldn't do parallel forking, I'll get some traces to check it.  
 
I will dig in more to see what went wrong there.

I got the attached debug level logs for the reported problem, hope it helps. I'm still trying to find out why the core is not being generated. 
 
Thanks,
Daniel
Thanks a lot,
Anton
 


On 3/2/11 4:34 PM, Anton Roman wrote:
Hi all,

we are running Kamailio 3.1.2 in a production environment, using the dialog module, and it crashed two hours ago.


Here you have the logs we got (addtional log fragments with the acc records involved in this call are appended at the end of the mail):

Mar  2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28927]: CRITICAL: dialog [dlg_hash.c:599]: bogus ref -1 with cnt 1 for dlg 0x7f23f472db30 [2490:1070436595] with clid 'e0a20cb844d211e0acd8001422093865@<CLIENT IP>' and tags '1577886432-3759264324-335599788-1698171170' ''
Mar  2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28927]: : <core> [mem/q_malloc.c:446]: BUG: qm_free: freeing already freed pointer, first free: dialog: dlg_cb.c: destroy_dlg_callbacks_list(80) - aborting
Mar  2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28896]: ALERT: <core> [main.c:741]: child process 28927 exited by a signal 6
Mar  2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28896]: ALERT: <core> [main.c:744]: core was not generated
Mar  2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28896]: INFO: <core> [main.c:756]: INFO: terminating due to SIGCHLD
Mar  2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28948]: INFO: <core> [main.c:807]: INFO: signal 15 received
Mar  2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28942]: INFO: <core> [main.c:807]: INFO: signal 15 received

We get the kamailio code from git last week:

sercmd> core.info
{
    version: kamailio 3.1.2
    id: 4ace86
    compiler: gcc 4.3.2
    compiled: 09:12:36 Feb 23 2011
    flags: STATS: Off, USE_IPV6, USE_TCP, USE_TLS, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, DBG_QM_MALLOC, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES
}

The problem looks like this other one already fixed: http://lists.sip-router.org/pipermail/sr-users/2009-November/027351.html

We set the Kamailio to debug level in case it happens again.

On the other side, I need to know why the core is not been generated. I have already checked the points mentioned in  http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:corefiles

1. disable_core_dump is not set in the config file.

2. From /etc/default/kamailio:
...
DUMP_CORE=yes
...

2. From /etc/init.d/kamailio:
...
if test "$DUMP_CORE" = "yes" ; then
    # set proper ulimit
    ulimit -c unlimited

    # directory for the core dump files
     COREDIR=/home/corefiles
     [ -d $COREDIR ] || mkdir $COREDIR
     chmod 777 $COREDIR
     echo "$COREDIR/core.%e.sig%s.%p" > /proc/sys/kernel/core_pattern
fi
...

4. Writting permissions of $COREDIR

ls -hall /home
...
drwxrwxrwx  2 root   root   4.0K 2010-12-21 09:15 corefiles
...

What else should I check?

Thanks in advance,
regards

Antón


Acc records related to the dialog whose destruction causes the problem:

Mar  2 14:42:44 kamailio2 /usr/local/sbin/kamailio[28902]: NOTICE: acc [acc.c:275]: ACC: transaction answered: timestamp=1299073364;method=INVITE;from_tag=1577886432-3759264324-335599788-1698171170;to_tag=5FFAEA34-6A;call_id=e0a20cb844d211e0acd8001422093865@<client IP>;code=200;reason=OK;src_user=<caller number>;src_domain=
<client IP>;dst_ouser=<called number>;dst_user=<called number>;dst_domain=10.90.1.251;src_ip=<client IP>

...

Mar  2 14:42:44 kamailio2 /usr/local/sbin/kamailio[28920]: NOTICE: acc [acc.c:275]: ACC: request acknowledged: timestamp=1299073364;method=ACK;from_tag=1577886432-3759264324-335599788-1698171170;to_tag=5FFAEA34-6A;call_id=e0a20cb844d211e0acd8001422093865@
<client IP>;code=200;reason=OK;src_user=<caller number>;src_domain=<client IP>;dst_ouser=<called number>;dst_user=<called number>;dst_domain=10.90.1.251;src_ip=<client IP>
...


Mar  2 14:43:00 kamailio2 /usr/local/sbin/kamailio[28903]: ERROR: <script>:  ACK WITHOUT MATCHING TRANSACTION in e0a20cb844d211e0acd8001422093865@<client IP> call... ignore and discard.

...

Mar  2 14:43:00 kamailio2 /usr/local/sbin/kamailio[28904]: NOTICE: acc [acc.c:275]: ACC: transaction answered: timestamp=1299073380;method=BYE;from_tag=1577886432-3759264324-335599788-1698171170;to_tag=5FFAEA34-6A;call_id=e0a20cb844d211e0acd8001422093865@<client IP>;code=200;reason=OK;src_user=<caller number>;src_domain=<client IP>;dst_ouser=<called number>;dst_user=<called number>;dst_domain=10.90.1.251;src_ip=<client IP>

_______________________________________________ SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users

-- 
Daniel-Constantin Mierla
http://www.asipto.com