[SR-Users] crash after using async_route()

Daniel-Constantin Mierla miconda at gmail.com
Wed Sep 6 15:01:16 CEST 2017


OK, I will wait a bit and then backport.

Thanks for testing and assisting with troubleshooting.

Daniel


On 06.09.17 14:29, Vitaliy Aleksandrov wrote:
> Thanks for the quick fix.
>
> Installed the latest 5.0 branch with the mentioned patch and had no
> crashes so far.
> Will do an additional testing and inform if find any issues.
>
> On Wed, Sep 6, 2017 at 12:25 PM, Daniel-Constantin Mierla
> <miconda at gmail.com <mailto:miconda at gmail.com>> wrote:
>
>     I think I caught the issue and fixed with commit
>     b672d8ef63715cf816390a05ce7a441377c3e468 in master branch.
>
>     It was caused by not resetting the T_ASYNC_CONTINUE flag after
>     t_continue(), which caused other parts of code to not reset the
>     reply field of any branch. The reply field could have been set by
>     another process, so at the time of destroying the transaction, the
>     pointer could have been to memory zone of another process, so
>     access it caused the crash.
>
>     Along with this fix, I added few other safety checks in my way to
>     investigate the issue.
>
>     Can you cherry pick this commit and test in branch 5.0? I want to
>     be sure there is no obvious side effect before porting it.
>
>     Cheers,
>     Daniel
>
>
>     On 05.09.17 11:02, Daniel-Constantin Mierla wrote:
>>
>>     Hello,
>>
>>     does it happen to have the pcap (or ngrep) with the sip traffic
>>     for the call? It will be useful to see the flow with
>>     requests/replies/retransmissions and their timestamps...
>>
>>     Is this version the snapshot of 5.0.2 release or a build from
>>     branch 5.0?
>>
>>     Cheers,
>>     Daniel
>>
>>
>>     On 05.09.17 10:01, Vitaliy Aleksandrov wrote:
>>>     Hello kamailio list,
>>>
>>>     Recently found a problem in my configuration that uses
>>>     async_route() functionality.
>>>     It crashes after several calls when wait_timer fires.
>>>
>>>     #0  0xb74a8556 in raise () from /lib/libc.so.6
>>>     #1  0xb74a9d78 in abort () from /lib/libc.so.6
>>>     #2  0x08293ae2 in qm_free (qmp=0xad65d000, p=0x3d64692d,
>>>     file=0xb6216a16 "tm: h_table.c", func=0xb621663c
>>>     <__FUNCTION__.18751> "free_cell_helper", line=187,
>>>     mname=0xb621664d "tm") at core/mem/q_malloc.c:471
>>>     #3  0xb613f103 in free_cell_helper (dead_cell=0xae2cd210,
>>>     silent=0, fname=0xb6239ea5 "timer.c", fline=655) at h_table.c:187
>>>     #4  0xb61e7758 in wait_handler (ti=557858937,
>>>     wait_tl=0xae2cd258, data=0xae2cd210) at timer.c:655
>>>     #5  0x0826a2cc in timer_list_expire (t=557858937, h=0xad6b9668,
>>>     slow_l=0xad6ba144, slow_mark=312) at core/timer.c:874
>>>     #6  0x08267cb1 in timer_handler () at core/timer.c:939
>>>     #7  0x0826a4d3 in timer_main () at core/timer.c:978
>>>     #8  0x08069575 in main_loop () at main.c:1721
>>>     #9  0x080707ca in main (argc=11, argv=0xbf85f044) at main.c:2723
>>>
>>>     When crash happens, kamailio prints the following message:
>>>     Sep  4 16:15:38 [18938]: : <core> [core/mem/q_malloc.c:469]:
>>>     qm_free(): BUG: qm_free: bad pointer 0x70707553 (out of memory
>>>     block!) called from tm: h_table.c: free_cell_helper(187) - aborting
>>>
>>>     Also had a few crashes in retransmission_handler():
>>>
>>>     #0  0xb750b556 in raise () from /lib/libc.so.6
>>>     #1  0xb750cd78 in abort () from /lib/libc.so.6
>>>     #2  0xb6249b5a in retransmission_handler (r_buf=0xae036674) at
>>>     timer.c:367
>>>     #3  0xb6247558 in retr_buf_handler (ticks=1234464444,
>>>     tl=0xae036688, p=0x1f40) at timer.c:594
>>>     #4  0x0826a2cc in timer_list_expire (t=1234464444, h=0xad71c668,
>>>     slow_l=0xad71cd44, slow_mark=2232) at core/timer.c:874
>>>     #5  0x08267cb1 in timer_handler () at core/timer.c:939
>>>     #6  0x0826a4d3 in timer_main () at core/timer.c:978
>>>     #7  0x08069575 in main_loop () at main.c:1721
>>>     #8  0x080707ca in main (argc=11, argv=0xbff64134) at main.c:2723
>>>
>>>     ERROR: tm [timer.c:366]: retransmission_handler(): transaction
>>>     0xae0365e0 scheduled for deletion and called from RETR timer
>>>     (flags 6d)
>>>
>>>     Both timers fired for an INVITE transaction that was previously
>>>     suspended by async_route(), then resumed, sent out and received
>>>     a 4xx reply (407).
>>>
>>>     This configuration worked fine with kamailio 4.2.x and problem
>>>     appeared after upgrading to 5.0.2.
>>>
>>>     Trying to figure out how to narrow down the problem. Any input
>>>     is appreciated.
>>>
>>>
>>>     _______________________________________________
>>>     Kamailio (SER) - Users Mailing List
>>>     sr-users at lists.kamailio.org <mailto:sr-users at lists.kamailio.org>
>>>     https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>>>     <https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users>
>>
>>     -- 
>>     Daniel-Constantin Mierla
>>     www.twitter.com/miconda <http://www.twitter.com/miconda> -- www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda>
>>     Kamailio Advanced Training - www.asipto.com <http://www.asipto.com>
>>     Kamailio World Conference - www.kamailioworld.com <http://www.kamailioworld.com>
>
>     -- 
>     Daniel-Constantin Mierla
>     www.twitter.com/miconda <http://www.twitter.com/miconda> -- www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda>
>     Kamailio Advanced Training - www.asipto.com <http://www.asipto.com>
>     Kamailio World Conference - www.kamailioworld.com <http://www.kamailioworld.com>
>
>

-- 
Daniel-Constantin Mierla
www.twitter.com/miconda -- www.linkedin.com/in/miconda
Kamailio Advanced Training - www.asipto.com
Kamailio World Conference - www.kamailioworld.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kamailio.org/pipermail/sr-users/attachments/20170906/ea597cb9/attachment.html>


More information about the sr-users mailing list