Hi Daniel,

I tried again and this time i think I have managed to catch the buggy part:

Now when I do "next" command, the while loop seems to be infinite.

(gdb) n
286            if ( (cbp->types)&type ) {
(gdb) n
292            cbp=cbp->next;
(gdb)
283        while(cbp){
(gdb)
286            if ( (cbp->types)&type ) {
(gdb)
292            cbp=cbp->next;
(gdb)
283        while(cbp){
(gdb)
286            if ( (cbp->types)&type ) {
(gdb)
292            cbp=cbp->next;
(gdb)
.
.
.
283        while(cbp){
(gdb)
286            if ( (cbp->types)&type ) {
(gdb)
292            cbp=cbp->next;
(gdb)
283        while(cbp){
(gdb)
286            if ( (cbp->types)&type ) {
(gdb)
292            cbp=cbp->next;
(gdb)
283        while(cbp){
(gdb)
286            if ( (cbp->types)&type ) {
(gdb) bt
#0  run_trans_callbacks_internal (cb_lst=cb_lst@entry=0xb4b6f078, type=type@entry=2, trans=trans@entry=0xb4b6f038, params=params@entry=0xbfc7b77c) at t_hooks.c:286
#1  0xb6c6b506 in run_trans_callbacks (type=type@entry=2, trans=0xb4b6f038, req=req@entry=0xb4b6fd58, rpl=rpl@entry=0xb72da078, code=183) at t_hooks.c:317
#2  0xb6c7193f in t_reply_matching (p_msg=p_msg@entry=0xb72da078, p_branch=p_branch@entry=0xbfc7b954) at t_lookup.c:1055
#3  0xb6c71be6 in t_check_msg (p_msg=p_msg@entry=0xb72da078, param_branch=param_branch@entry=0xbfc7b954) at t_lookup.c:1158
#4  0xb6c726b3 in t_check (p_msg=p_msg@entry=0xb72da078, param_branch=param_branch@entry=0xbfc7b954) at t_lookup.c:1200
#5  0xb6c9af34 in reply_received (p_msg=0xb72da078) at t_reply.c:2071
#6  0x080aa31c in do_forward_reply (msg=msg@entry=0xb72da078, mode=mode@entry=0) at forward.c:799
#7  0x080aacfb in forward_reply (msg=msg@entry=0xb72da078) at forward.c:882
#8  0x080ef520 in receive_msg (
    buf=buf@entry=0x82f2440 "SIP/2.0 183 Session Progress\r\nVia: SIP/2.0/UDP 192.168.0.5;branch=z9hG4bK7f2c.e047cf93.0\r\nVia: SIP/2.0/UDP 192.168.0.6;branch=z9hG4bK7f2c.67bed544.0\r\nVia: SIP/2.0/UDP 192.168.1.7;rport=5060;branch"..., len=<optimized out>, rcv_info=rcv_info@entry=0xbfc7bd30) at receive.c:272
#9  0x0818d66e in udp_rcv_loop () at udp_server.c:557
#10 0x080b8519 in main_loop () at main.c:1638
#11 0x08063bfe in main (argc=1, argv=0xbfc7bfc4) at main.c:2566


and debug=4 after fr_inv_timer hits is:

 2(30851) DEBUG: tm [t_reply.c:1211]: t_should_relay_response(): ->>>>>>>>> T_code=180, new_code=408
 2(30851) DEBUG: tm [t_lookup.c:1100]: t_check_msg(): DEBUG: t_check_msg: msg id=0 global id=0 T start=0xb4be6cc8
 2(30851) DEBUG: tm [t_lookup.c:1172]: t_check_msg(): DEBUG: t_check_msg: T already found!
 2(30851) DEBUG: tmx [t_var.c:467]: pv_get_tm_reply_code(): reply code is <408>
 2(30851) INFO: <script>: XLOG: 77f655b4-749c-11e3-9d4c-a127c67fb0f5 [MANAGE_FAILURE] Failure triggered 408 Time: 1388770123.897007 2(30851) DEBUG: tm [t_lookup.c:1100]: t_check_msg(): DEBUG: t_check_msg: msg id=0 global id=0 T start=0xb4be6cc8
 2(30851) DEBUG: tm [t_lookup.c:1172]: t_check_msg(): DEBUG: t_check_msg: T already found!
 2(30851) DEBUG: tm [t_lookup.c:1100]: t_check_msg(): DEBUG: t_check_msg: msg id=0 global id=0 T start=0xb4be6cc8
 2(30851) DEBUG: tm [t_lookup.c:1172]: t_check_msg(): DEBUG: t_check_msg: T already found!
 2(30851) DEBUG: tmx [t_var.c:467]: pv_get_tm_reply_code(): reply code is <408>
 2(30851) ERROR: <script>: Ringing timeout 2(30851) DEBUG: tm [t_lookup.c:1100]: t_check_msg(): DEBUG: t_check_msg: msg id=0 global id=0 T start=0xb4be6cc8
 2(30851) DEBUG: tm [t_lookup.c:1172]: t_check_msg(): DEBUG: t_check_msg: T already found!
 2(30851) DEBUG: tm [tm.c:1282]: w_t_reply(): DEBUG: t_reply_unsafe called from w_t_reply
 2(30851) DEBUG: <core> [msg_translator.c:206]: check_via_address(): check_via_address(192.168.0.6, 192.168.0.6, 0)
 2(30851) DEBUG: tm [t_hooks.c:288]: run_trans_callbacks_internal(): DBG: trans=0xb4be6cc8, callback type 1048576, id 0 entered
 2(30851) DEBUG: dialog [dlg_hash.c:600]: dlg_lookup(): ref dlg 0xb4be6a28 with 1 -> 2
 2(30851) DEBUG: dialog [dlg_hash.c:602]: dlg_lookup(): dialog id=11951 found on entry 3135
 2(30851) DEBUG: dialog [dlg_hash.c:954]: next_state_dlg(): dialog 0xb4be6a28 changed from state 2 to state 5, due event 4 (ref 2)
 2(30851) DEBUG: dialog [dlg_hash.c:747]: dlg_ref(): ref dlg 0xb4be6a28 with 1 -> 3
 2(30851) DEBUG: dialog [dlg_handlers.c:1484]: dlg_run_event_route(): executing event_route 3 on state 5
 2(30851) DEBUG: <core> [parser/parse_to.c:799]: parse_to(): end of header reached, state=10
 2(30851) DEBUG: <core> [parser/msg_parser.c:190]: get_hdr_field(): DEBUG: get_hdr_field: <To> [20]; uri=[you@kamailio.org]
 2(30851) DEBUG: <core> [parser/msg_parser.c:192]: get_hdr_field(): DEBUG: to body [<you@kamailio.org>
]
 2(30851) INFO: <script>: XLOG: 123 [dialog:failed] Dialog failed 2(30851) INFO: <script>: XLOG: 123 [dialog:failed] IP is 1.0.0.127 2(30851) DEBUG: dialog [dlg_hash.c:600]: dlg_lookup(): ref dlg 0xb4be6a28 with 1 -> 4
 2(30851) DEBUG: dialog [dlg_hash.c:602]: dlg_lookup(): dialog id=11951 found on entry 3135
 2(30851) DEBUG: dialog [dlg_var.c:215]: print_lists(): Internal var-list ((nil)):
 2(30851) DEBUG: dialog [dlg_var.c:225]: print_lists(): Dialog var-list (0xb4be6c18):
 2(30851) DEBUG: dialog [dlg_var.c:230]: print_lists(): _uac_funew=sip:phone1@example.com (flags 1)
 2(30851) DEBUG: dialog [dlg_var.c:230]: print_lists(): _uac_fu=sip:phone2@example.com (flags 1)
 2(30851) DEBUG: dialog [dlg_hash.c:765]: dlg_unref(): unref dlg 0xb4be6a28 with 1 -> 3
 2(30851) WARNING: tm [t_lookup.c:1564]: t_unref(): WARNING: script writer didn't release transaction
 2(30851) DEBUG: tm [t_reply.c:1551]: cleanup_uac_timers(): DEBUG: cleanup_uac_timers: RETR/FR timers reset
 2(30851) DEBUG: dialog [dlg_handlers.c:322]: dlg_iuid_sfree(): freeing dlg iuid [3135:11951] (0xb4be7998)
 2(30851) DEBUG: dialog [dlg_hash.c:765]: dlg_unref(): unref dlg 0xb4be6a28 with 1 -> 2
 2(30851) DEBUG: dialog [dlg_handlers.c:543]: dlg_onreply(): dialog 0xb4be6a28 failed (negative reply)
 2(30851) DEBUG: dialog [dlg_hash.c:765]: dlg_unref(): unref dlg 0xb4be6a28 with 1 -> 1
 2(30851) DEBUG: dialog [dlg_handlers.c:944]: dlg_set_tm_waitack(): registering TMCB to wait for negative ACK
 2(30851) DEBUG: dialog [dlg_hash.c:747]: dlg_ref(): ref dlg 0xb4be6a28 with 1 -> 2
 2(30851) DEBUG: dialog [dlg_hash.c:765]: dlg_unref(): unref dlg 0xb4be6a28 with 1 -> 1
 2(30851) DEBUG: tm [t_hooks.c:288]: run_trans_callbacks_internal(): DBG: trans=0xb4be6cc8, callback type 1048576, id 0 entered
 2(30851) DEBUG: dialog [dlg_hash.c:600]: dlg_lookup(): ref dlg 0xb4be6a28 with 1 -> 2
 2(30851) DEBUG: dialog [dlg_hash.c:602]: dlg_lookup(): dialog id=11951 found on entry 3135
 2(30851) DEBUG: dialog [dlg_hash.c:954]: next_state_dlg(): dialog 0xb4be6a28 changed from state 5 to state 5, due event 4 (ref 2)
 2(30851) DEBUG: dialog [dlg_hash.c:765]: dlg_unref(): unref dlg 0xb4be6a28 with 1 -> 1
 2(30851) DEBUG: tm [t_hooks.c:288]: run_trans_callbacks_internal(): DBG: trans=0xb4be6cc8, callback type 1048576, id 0 entered
 2(30851) DEBUG: dialog [dlg_hash.c:600]: dlg_lookup(): ref dlg 0xb4be6a28 with 1 -> 2
.
(infinite)
.

Regards

Efelin


2013/12/29 Kelvin Chua <kelchy@gmail.com>
when i remove the 
    t_reply("404","not found")
in failure_route, everything works alright.
note: i intend to use the t_reply if there are no more branches available, instead of a self generated 408

then, putting a t_release() on event_route[dialog:failed] got rid of the error
    WARNING: tm [t_lookup.c:1564]: t_unref(): WARNING: script writer didn't release transaction

no i do not have
    CRITICAL: dialog [dlg_hash.c:794]: log_next_state_dlg(): bogus event 4 in state 5 for dlg ... 
in my logs

Kelvin Chua


On Sat, Dec 28, 2013 at 6:22 AM, Daniel-Constantin Mierla <miconda@gmail.com> wrote:
Hello,

do you get also a log message like:

CRITICAL: dialog [dlg_hash.c:794]: log_next_state_dlg(): bogus event 4 in state 5 for dlg ... ?

The warning itself should be harmless. Maybe you can grab all the log messages with debug=3, that will help to get more hits about what happens.

Cheers,
Daniel



On 26/12/13 06:35, Kelvin Chua wrote:
Hi guys,

was this issue resolved?
I encountered this issue also (4.0.5), in failure_route, i used t_reply("404","not found")
never sent out, instead i get this in logs, 
WARNING: tm [t_lookup.c:1564]: t_unref(): WARNING: script writer didn't release transaction

Kelvin Chua


On Tue, Nov 26, 2013 at 4:43 AM, Daniel-Constantin Mierla <miconda@gmail.com> wrote:
Hello,

the backtrace is ok for the moment. I will look over it and come back with results.

Cheers,
Daniel


On 11/25/13 12:10 PM, Efelin Novak wrote:
Hi Daniel,

sorry it took me more than I expected. Is this sufficient? Meanwhile I found out that this happens when fr_inv_timer triggers and dialog module is turned on.

Backtrace:

bt
#0  futex_release (lock=0xb4b90798) at ../../mem/../futexlock.h:137
#1  next_state_dlg (dlg=dlg@entry=0xb4bc5a38, event=event@entry=4, old_state=old_state@entry=0xbfef08a0, new_state=new_state@entry=0xbfef089c, unref=unref@entry=0xbfef08a4) at dlg_hash.c:950
#2  0xb6b05479 in dlg_onreply (t=0xb4bc5ce8, type=1048576, param=0xbfef097c) at dlg_handlers.c:469
#3  0xb6cc2336 in run_trans_callbacks_internal (cb_lst=0xb4bc5d28, type=type@entry=1048576, trans=0xb4bc5ce8, params=params@entry=0xbfef097c) at t_hooks.c:290
#4  0xb6cc25da in run_trans_callbacks_with_buf (type=type@entry=1048576, rbuf=rbuf@entry=0xb4bc5d54, req=req@entry=0xb4bc6a08, repl=repl@entry=0xffffffff, flags=flags@entry=505) at t_hooks.c:336
#5  0xb6cee953 in _reply_light (trans=trans@entry=0xb4bc5ce8,
    buf=0xb731f588 "SIP/2.0 505 Error\r\nVia: SIP/2.0/UDP 192.168.21.4;branch=z9hG4bKa6f5.8ca73657.0\r\nVia: SIP/2.0/UDP 10.0.0.95;rport=5060;branch=z9hG4bKBc143aFe81jDj\r\nFrom: \"USER\" <sip:USER@10.0.0.95>;t"..., len=len@entry=434, code=code@entry=505, lock=0, bm=0xbfef0e48, to_tag_len=<error reading variable: Unhandled dwarf expression opcode 0xfa>,
    to_tag=<error reading variable: Unhandled dwarf expression opcode 0xfa>) at t_reply.c:628
#6  0xb6ceedeb in _reply (trans=0xb4bc5ce8, trans@entry=0x1f9, p_msg=p_msg@entry=0xb731f568, code=505, text=0xb731f568 "Error", lock=0) at t_reply.c:782
#7  0xb6ceefd7 in t_reply_unsafe (t=0x1f9, t@entry=0xb4bc5ce8, p_msg=0xb731f568, p_msg@entry=0xb6d196c0, code=0, text=text@entry=0xb731f568 "Error") at t_reply.c:1531
#8  0xb6cd6904 in w_t_reply (msg=0xb6d196c0, p1=0xb731c988 "8]0\267\001", p2=0xb731c9b8 "8P0\267 ") at tm.c:1283
#9  0x080651cf in do_action (h=h@entry=0xbfef1258, a=a@entry=0xb7305d98, msg=msg@entry=0xb6d196c0) at action.c:1080
#10 0x08064137 in run_actions (h=h@entry=0xbfef1258, a=a@entry=0xb73053c8, msg=msg@entry=0xb6d196c0) at action.c:1573
#11 0x0806cedd in run_top_route (a=0xb73053c8, msg=msg@entry=0xb6d196c0, c=c@entry=0x0) at action.c:1658
#12 0xb6ceba98 in run_failure_handlers (t=t@entry=0xb4bc5ce8, rpl=0xffffffff, code=408, extra_flags=96) at t_reply.c:1028
#13 0xb6cece22 in t_should_relay_response (Trans=Trans@entry=0xb4bc5ce8, new_code=new_code@entry=408, branch=branch@entry=0, should_store=should_store@entry=0xbfef1464,
    should_relay=should_relay@entry=0xbfef1460, cancel_data=cancel_data@entry=0xbfef1510, reply=reply@entry=0xffffffff) at t_reply.c:1304
#14 0xb6cef076 in relay_reply (t=t@entry=0xb4bc5ce8, p_msg=p_msg@entry=0xffffffff, branch=branch@entry=0, msg_status=msg_status@entry=408, cancel_data=cancel_data@entry=0xbfef1510,
    do_put_on_wait=do_put_on_wait@entry=0) at t_reply.c:1707
#15 0xb6cc28b4 in fake_reply (t=t@entry=0xb4bc5ce8, branch=0, code=code@entry=408) at timer.c:354
#16 0xb6cc34b0 in final_response_handler (t=0xb4bc5ce8, r_buf=0xb4bc5dc8) at timer.c:526
#17 retr_buf_handler (ticks=272914717, tl=0xb4bc5ddc, p=0x3e8) at timer.c:584
#18 0x08183325 in timer_list_expire (slow_mark=12, slow_l=0xb49f2438, h=0xb49f2338, t=272914717) at timer.c:894
#19 timer_handler () at timer.c:959
#20 timer_main () at timer.c:998
#21 0x080b8c65 in main_loop () at main.c:1709
#22 0x08063bfe in main (argc=1, argv=0xbfef1844) at main.c:2566

(I have changed  USER and IP address to hide my network :))

After this when I hit 'n' multiple times I got following output in infinite loop:

22 0x08063bfe in main (argc=1, argv=0xbfef1844) at main.c:2566
(gdb) n
next_state_dlg (dlg=dlg@entry=0xb4bc5a38, event=event@entry=4, old_state=old_state@entry=0xbfef08a0, new_state=new_state@entry=0xbfef089c, unref=unref@entry=0xbfef08a4) at dlg_hash.c:952
952        LM_DBG("dialog %p changed from state %d to "
(gdb)
955    }
(gdb)
dlg_onreply (t=0xb4bc5ce8, type=1048576, param=0xbfef097c) at dlg_handlers.c:470
470        dlg_run_event_route(dlg, (rpl==FAKED_REPLY)?NULL:rpl, old_state, new_state);
(gdb)
472        if (new_state==DLG_STATE_EARLY) {
(gdb)
479        if (new_state==DLG_STATE_CONFIRMED_NA &&
(gdb)
540        if ( new_state==DLG_STATE_DELETED
(gdb)
542                        || old_state==DLG_STATE_EARLY) ) {
(gdb)
541                    && (old_state==DLG_STATE_UNCONFIRMED
(gdb)
559        if (unref) dlg_unref(dlg, unref);
(gdb)
563        dlg_release(dlg);
(gdb)
565    }
(gdb)
run_trans_callbacks_internal (cb_lst=0xb4bc5d28, type=type@entry=1048576, trans=0xb4bc5ce8, params=params@entry=0xbfef097c) at t_hooks.c:292
292            cbp=cbp->next;
(gdb)
283        while(cbp){
(gdb)
286            if ( (cbp->types)&type ) {
(gdb)
292            cbp=cbp->next;
(gdb)
283        while(cbp){
(gdb)
286            if ( (cbp->types)&type ) {
(gdb)
287                DBG("DBG: trans=%p, callback type %d, id %d entered\n",
(gdb)
290                cbp->callback( trans, type, params );
(gdb)
289                params->param = &(cbp->param);
(gdb)
290                cbp->callback( trans, type, params );
(gdb)
292            cbp=cbp->next;
(gdb)
283        while(cbp){
(gdb)
286            if ( (cbp->types)&type ) {
(gdb)
292            cbp=cbp->next;
(gdb)
283        while(cbp){
(gdb)
286            if ( (cbp->types)&type ) {
(gdb)
287                DBG("DBG: trans=%p, callback type %d, id %d entered\n",
(gdb)
290                cbp->callback( trans, type, params );
(gdb)
289                params->param = &(cbp->param);
(gdb)
290                cbp->callback( trans, type, params );
(gdb)
292            cbp=cbp->next;
(gdb)
283        while(cbp){

Regards

Efelin


2013/11/20 Daniel-Constantin Mierla <miconda@gmail.com>
Hello,

I will investigate more -- meanwhile had some traveling. It would speed up if you can send the backtrace of one process that blocks when you applied the patch.

You need to connect with gdb to it:

gdb /path/to/kamailio _PID_

replace _PID_ with the pid of blocked kamailio process.

Cheers,
Daniel


On 11/15/13 5:50 PM, Efelin Novak wrote:
Hi Daniel,

thanks for a reply. I applied the patch and now the Kamailio just prints

WARNING: tm [t_lookup.c:1564]: t_unref(): WARNING: script writer didn't release transaction

and than freezes without any log. It does not resend the incoming "winning" failure reply neither response to any other messages, not even to a new calls. It just freezes. All kamailio processes are running and eating the whole 4-core processor.

Restart of Kamailio solves this problem.

Any ideas how to continue with debug?

Thanks

Efelin


2013/11/15 Daniel-Constantin Mierla <miconda@gmail.com>
Hello,

can you try attached patch?

Let me know if all goes fine and I will commit it to the repository.

Cheers,
Daniel


On 11/15/13 10:25 AM, Efelin Novak wrote:
Hi,

when I use t_reply("505", "Error"); in my failure route, the response is not forwarded and following is written into a log:

kamailio[26216]: WARNING: tm [t_lookup.c:1559]: t_unref(): WARNING: script writer didn't release transaction

plus next line is written exactly 416000 times into a log afterwards:

kamailio[32685]: CRITICAL: dialog [dlg_hash.c:794]: log_next_state_dlg(): bogus event 4 in state 5 for dlg 0xb4af6588 [2575:7017] with clid '121d44f0-6555f4c8' and tags 'd12546d053aadc68o2' ''

My point is to change the incoming code from users and append a Q.850 reason code.
Is there any other way how to do this or a way how to fix this?
I'm using Kamilio 4.0.4 on Debian 7.1


The code is as follows:
failure_route[MANAGE_FAILURE]
{
        if (t_is_canceled()) {
                exit;
        }
        if($T_reply_code == 408 && isflagset(10))
        {
                xlog("Ringing timeout");
                append_to_reply("Reason: Q.850;cause=28\r\n");
                t_reply("505", "Error");
        }
}


_______________________________________________
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
sr-users@lists.sip-router.org
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users

-- 
Daniel-Constantin Mierla - http://www.asipto.com
http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda
Kamailio Advanced Trainings - Berlin, Nov 25-28
  - more details about Kamailio trainings at http://www.asipto.com -

_______________________________________________
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
sr-users@lists.sip-router.org
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users



-- 
Daniel-Constantin Mierla - http://www.asipto.com
http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda
Kamailio Advanced Trainings - Berlin, Nov 25-28
  - more details about Kamailio trainings at http://www.asipto.com -


-- 
Daniel-Constantin Mierla - http://www.asipto.com
http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda

_______________________________________________
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
sr-users@lists.sip-router.org
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users



-- 
Daniel-Constantin Mierla - http://www.asipto.com
http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda