Hello,

I kind of forgot about this thread ...

The backtrace is clean and it doesn't show like a block/infinite loop when you took it. Are you sure you took it when kamailio used 100%CPU?

How many times you run 'n' after backtrace? The loop in cbp could have tens of iterations.

Maybe you can try again and let's see if you get same results on backtrace/gdb.

Cheers,
Daniel


On 25/11/13 21:43, Daniel-Constantin Mierla wrote:
Hello,

the backtrace is ok for the moment. I will look over it and come back with results.

Cheers,
Daniel

On 11/25/13 12:10 PM, Efelin Novak wrote:
Hi Daniel,

sorry it took me more than I expected. Is this sufficient? Meanwhile I found out that this happens when fr_inv_timer triggers and dialog module is turned on.

Backtrace:

bt
#0  futex_release (lock=0xb4b90798) at ../../mem/../futexlock.h:137
#1  next_state_dlg (dlg=dlg@entry=0xb4bc5a38, event=event@entry=4, old_state=old_state@entry=0xbfef08a0, new_state=new_state@entry=0xbfef089c, unref=unref@entry=0xbfef08a4) at dlg_hash.c:950
#2  0xb6b05479 in dlg_onreply (t=0xb4bc5ce8, type=1048576, param=0xbfef097c) at dlg_handlers.c:469
#3  0xb6cc2336 in run_trans_callbacks_internal (cb_lst=0xb4bc5d28, type=type@entry=1048576, trans=0xb4bc5ce8, params=params@entry=0xbfef097c) at t_hooks.c:290
#4  0xb6cc25da in run_trans_callbacks_with_buf (type=type@entry=1048576, rbuf=rbuf@entry=0xb4bc5d54, req=req@entry=0xb4bc6a08, repl=repl@entry=0xffffffff, flags=flags@entry=505) at t_hooks.c:336
#5  0xb6cee953 in _reply_light (trans=trans@entry=0xb4bc5ce8,
    buf=0xb731f588 "SIP/2.0 505 Error\r\nVia: SIP/2.0/UDP 192.168.21.4;branch=z9hG4bKa6f5.8ca73657.0\r\nVia: SIP/2.0/UDP 10.0.0.95;rport=5060;branch=z9hG4bKBc143aFe81jDj\r\nFrom: \"USER\" <sip:USER@10.0.0.95>;t"..., len=len@entry=434, code=code@entry=505, lock=0, bm=0xbfef0e48, to_tag_len=<error reading variable: Unhandled dwarf expression opcode 0xfa>,
    to_tag=<error reading variable: Unhandled dwarf expression opcode 0xfa>) at t_reply.c:628
#6  0xb6ceedeb in _reply (trans=0xb4bc5ce8, trans@entry=0x1f9, p_msg=p_msg@entry=0xb731f568, code=505, text=0xb731f568 "Error", lock=0) at t_reply.c:782
#7  0xb6ceefd7 in t_reply_unsafe (t=0x1f9, t@entry=0xb4bc5ce8, p_msg=0xb731f568, p_msg@entry=0xb6d196c0, code=0, text=text@entry=0xb731f568 "Error") at t_reply.c:1531
#8  0xb6cd6904 in w_t_reply (msg=0xb6d196c0, p1=0xb731c988 "8]0\267\001", p2=0xb731c9b8 "8P0\267 ") at tm.c:1283
#9  0x080651cf in do_action (h=h@entry=0xbfef1258, a=a@entry=0xb7305d98, msg=msg@entry=0xb6d196c0) at action.c:1080
#10 0x08064137 in run_actions (h=h@entry=0xbfef1258, a=a@entry=0xb73053c8, msg=msg@entry=0xb6d196c0) at action.c:1573
#11 0x0806cedd in run_top_route (a=0xb73053c8, msg=msg@entry=0xb6d196c0, c=c@entry=0x0) at action.c:1658
#12 0xb6ceba98 in run_failure_handlers (t=t@entry=0xb4bc5ce8, rpl=0xffffffff, code=408, extra_flags=96) at t_reply.c:1028
#13 0xb6cece22 in t_should_relay_response (Trans=Trans@entry=0xb4bc5ce8, new_code=new_code@entry=408, branch=branch@entry=0, should_store=should_store@entry=0xbfef1464,
    should_relay=should_relay@entry=0xbfef1460, cancel_data=cancel_data@entry=0xbfef1510, reply=reply@entry=0xffffffff) at t_reply.c:1304
#14 0xb6cef076 in relay_reply (t=t@entry=0xb4bc5ce8, p_msg=p_msg@entry=0xffffffff, branch=branch@entry=0, msg_status=msg_status@entry=408, cancel_data=cancel_data@entry=0xbfef1510,
    do_put_on_wait=do_put_on_wait@entry=0) at t_reply.c:1707
#15 0xb6cc28b4 in fake_reply (t=t@entry=0xb4bc5ce8, branch=0, code=code@entry=408) at timer.c:354
#16 0xb6cc34b0 in final_response_handler (t=0xb4bc5ce8, r_buf=0xb4bc5dc8) at timer.c:526
#17 retr_buf_handler (ticks=272914717, tl=0xb4bc5ddc, p=0x3e8) at timer.c:584
#18 0x08183325 in timer_list_expire (slow_mark=12, slow_l=0xb49f2438, h=0xb49f2338, t=272914717) at timer.c:894
#19 timer_handler () at timer.c:959
#20 timer_main () at timer.c:998
#21 0x080b8c65 in main_loop () at main.c:1709
#22 0x08063bfe in main (argc=1, argv=0xbfef1844) at main.c:2566

(I have changed  USER and IP address to hide my network :))

After this when I hit 'n' multiple times I got following output in infinite loop:

22 0x08063bfe in main (argc=1, argv=0xbfef1844) at main.c:2566
(gdb) n
next_state_dlg (dlg=dlg@entry=0xb4bc5a38, event=event@entry=4, old_state=old_state@entry=0xbfef08a0, new_state=new_state@entry=0xbfef089c, unref=unref@entry=0xbfef08a4) at dlg_hash.c:952
952        LM_DBG("dialog %p changed from state %d to "
(gdb)
955    }
(gdb)
dlg_onreply (t=0xb4bc5ce8, type=1048576, param=0xbfef097c) at dlg_handlers.c:470
470        dlg_run_event_route(dlg, (rpl==FAKED_REPLY)?NULL:rpl, old_state, new_state);
(gdb)
472        if (new_state==DLG_STATE_EARLY) {
(gdb)
479        if (new_state==DLG_STATE_CONFIRMED_NA &&
(gdb)
540        if ( new_state==DLG_STATE_DELETED
(gdb)
542                        || old_state==DLG_STATE_EARLY) ) {
(gdb)
541                    && (old_state==DLG_STATE_UNCONFIRMED
(gdb)
559        if (unref) dlg_unref(dlg, unref);
(gdb)
563        dlg_release(dlg);
(gdb)
565    }
(gdb)
run_trans_callbacks_internal (cb_lst=0xb4bc5d28, type=type@entry=1048576, trans=0xb4bc5ce8, params=params@entry=0xbfef097c) at t_hooks.c:292
292            cbp=cbp->next;
(gdb)
283        while(cbp){
(gdb)
286            if ( (cbp->types)&type ) {
(gdb)
292            cbp=cbp->next;
(gdb)
283        while(cbp){
(gdb)
286            if ( (cbp->types)&type ) {
(gdb)
287                DBG("DBG: trans=%p, callback type %d, id %d entered\n",
(gdb)
290                cbp->callback( trans, type, params );
(gdb)
289                params->param = &(cbp->param);
(gdb)
290                cbp->callback( trans, type, params );
(gdb)
292            cbp=cbp->next;
(gdb)
283        while(cbp){
(gdb)
286            if ( (cbp->types)&type ) {
(gdb)
292            cbp=cbp->next;
(gdb)
283        while(cbp){
(gdb)
286            if ( (cbp->types)&type ) {
(gdb)
287                DBG("DBG: trans=%p, callback type %d, id %d entered\n",
(gdb)
290                cbp->callback( trans, type, params );
(gdb)
289                params->param = &(cbp->param);
(gdb)
290                cbp->callback( trans, type, params );
(gdb)
292            cbp=cbp->next;
(gdb)
283        while(cbp){

Regards

Efelin


2013/11/20 Daniel-Constantin Mierla <miconda@gmail.com>
Hello,

I will investigate more -- meanwhile had some traveling. It would speed up if you can send the backtrace of one process that blocks when you applied the patch.

You need to connect with gdb to it:

gdb /path/to/kamailio _PID_

replace _PID_ with the pid of blocked kamailio process.

Cheers,
Daniel


On 11/15/13 5:50 PM, Efelin Novak wrote:
Hi Daniel,

thanks for a reply. I applied the patch and now the Kamailio just prints

WARNING: tm [t_lookup.c:1564]: t_unref(): WARNING: script writer didn't release transaction

and than freezes without any log. It does not resend the incoming "winning" failure reply neither response to any other messages, not even to a new calls. It just freezes. All kamailio processes are running and eating the whole 4-core processor.

Restart of Kamailio solves this problem.

Any ideas how to continue with debug?

Thanks

Efelin


2013/11/15 Daniel-Constantin Mierla <miconda@gmail.com>
Hello,

can you try attached patch?

Let me know if all goes fine and I will commit it to the repository.

Cheers,
Daniel


On 11/15/13 10:25 AM, Efelin Novak wrote:
Hi,

when I use t_reply("505", "Error"); in my failure route, the response is not forwarded and following is written into a log:

kamailio[26216]: WARNING: tm [t_lookup.c:1559]: t_unref(): WARNING: script writer didn't release transaction

plus next line is written exactly 416000 times into a log afterwards:

kamailio[32685]: CRITICAL: dialog [dlg_hash.c:794]: log_next_state_dlg(): bogus event 4 in state 5 for dlg 0xb4af6588 [2575:7017] with clid '121d44f0-6555f4c8' and tags 'd12546d053aadc68o2' ''

My point is to change the incoming code from users and append a Q.850 reason code.
Is there any other way how to do this or a way how to fix this?
I'm using Kamilio 4.0.4 on Debian 7.1


The code is as follows:
failure_route[MANAGE_FAILURE]
{
        if (t_is_canceled()) {
                exit;
        }
        if($T_reply_code == 408 && isflagset(10))
        {
                xlog("Ringing timeout");
                append_to_reply("Reason: Q.850;cause=28\r\n");
                t_reply("505", "Error");
        }
}


_______________________________________________
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
sr-users@lists.sip-router.org
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users

-- 
Daniel-Constantin Mierla - http://www.asipto.com
http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda
Kamailio Advanced Trainings - Berlin, Nov 25-28
  - more details about Kamailio trainings at http://www.asipto.com -

_______________________________________________
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
sr-users@lists.sip-router.org
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users



-- 
Daniel-Constantin Mierla - http://www.asipto.com
http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda
Kamailio Advanced Trainings - Berlin, Nov 25-28
  - more details about Kamailio trainings at http://www.asipto.com -


-- 
Daniel-Constantin Mierla - http://www.asipto.com
http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda

-- 
Daniel-Constantin Mierla - http://www.asipto.com
http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda