[SR-Users] Fwd: kamailio with evapi crashing on high volume

Jayesh Nambiar jayesh1017 at gmail.com
Tue Oct 2 10:58:54 CEST 2018


Thanks Daniel.
I'll upgrade to the latest stable version once it is out.

- Jayesh

On Tue, Oct 2, 2018 at 10:55 AM Daniel-Constantin Mierla <miconda at gmail.com>
wrote:

> Hello,
>
> I looked at the core files when you reported, but due to busy times
> traveling, I didn't get the chance to follow up. However, the core files
> are truncated, so no useful details there.
>
> We will release 5.1.6 these days, it has few more bits that might help
> troubleshooting here, if there will be another good core dump file.
>
> Cheers,
> Daniel
>
> On 26.09.18 09:25, Jayesh Nambiar wrote:
>
> Hello Daniel,
>
> Just a quick update on this. After running from the 5.1 branch twice it
> happened that Server became unresponsive. It never crashed and on forceful
> shutdown it only generated a core with shutdown routine. Here is what the
> syslog contained off while the calls were being sent to kamailio but
> kamailio was not processing any calls:
>
> Sep 22 16:09:04 triviamedia /usr/local/kamailio/sbin/kamailio[16006]:
> CRITICAL: <core> [core/mem/q_malloc.c:502]: qm_free(): BUG: freeing already
> freed pointer (0x7fcd23ce5648), called from tm: h_table.c:
> free_cell_helper(235), first free tm: h_table.c: free_cell_helper(235) -
> aborting
> Sep 22 16:09:04 triviamedia /usr/local/kamailio/sbin/kamailio[16006]:
> CRITICAL: <core> [core/mem/q_malloc.c:502]: qm_free(): BUG: freeing already
> freed pointer (0x7fcd24a38ea8), called from tm: h_table.c:
> free_cell_helper(234), first free tm: h_table.c: free_cell_helper(234) -
> aborting
> Sep 22 16:09:04 triviamedia /usr/local/kamailio/sbin/kamailio[16006]:
> CRITICAL: <core> [core/mem/q_malloc.c:502]: qm_free(): BUG: freeing already
> freed pointer (0x7fcd23ce5648), called from tm: h_table.c:
> free_cell_helper(235), first free tm: h_table.c: free_cell_helper(235) -
> aborting
> Sep 22 16:09:04 triviamedia /usr/local/kamailio/sbin/kamailio[16006]:
> CRITICAL: <core> [core/mem/q_malloc.c:502]: qm_free(): BUG: freeing already
> freed pointer (0x7fcd24a38ea8), called from tm: h_table.c:
> free_cell_helper(234), first free tm: h_table.c: free_cell_helper(234) -
> aborting
> Sep 22 16:09:04 triviamedia /usr/local/kamailio/sbin/kamailio[16006]:
> CRITICAL: <core> [core/mem/q_malloc.c:502]: qm_free(): BUG: freeing already
> freed pointer (0x7fcd23ce5648), called from tm: h_table.c:
> free_cell_helper(235), first free tm: h_table.c: free_cell_helper(235) -
> aborting
> Sep 22 16:09:04 triviamedia /usr/local/kamailio/sbin/kamailio[16006]:
> CRITICAL: <core> [core/mem/q_malloc.c:502]: qm_free(): BUG: freeing already
> freed pointer (0x7fcd24a38ea8), called from tm: h_table.c:
> free_cell_helper(234), first free tm: h_table.c: free_cell_helper(234) -
> aborting
> Sep 22 16:09:04 triviamedia /usr/local/kamailio/sbin/kamailio[16006]:
> CRITICAL: <core> [core/mem/q_malloc.c:502]: qm_free(): BUG: freeing already
> freed pointer (0x7fcd23ce5648), called from tm: h_table.c:
> free_cell_helper(235), first fre
>
> And here's the "bt full" of the core file:
> [New LWP 16003]
> Cannot access memory at address 0x7fcd64e1a148
> Cannot access memory at address 0x7fcd64e1a140
> Failed to read a valid object file image from memory.
> Core was generated by `/usr/local/kamailio/sbin/kamailio -P
> /var/run/siptrunk.pid -f /usr/local/carrie'.
> Program terminated with signal SIGABRT, Aborted.
> #0  0x00007fcd6443f428 in ?? ()
> (gdb) bt full
> #0  0x00007fcd6443f428 in ?? ()
> No symbol table info available.
> Backtrace stopped: Cannot access memory at address 0x7ffed40e9e68
>
> Thanks,
>
> - Jayesh
>
> On Sat, Sep 15, 2018 at 7:09 PM Jayesh Nambiar <jayesh1017 at gmail.com>
> wrote:
>
>> Hello Daniel,
>> I have now git cloned from 5.1 branch and restarted kamailio today. Also
>> an observation; that when one particular customer with around 40-50cps
>> capacity starts his dialing is when the kamailio kept crashing. I will
>> observe in the coming week and report of any crashes accordingly.
>>
>> Thanks,
>>
>> - Jayesh
>>
>> On Wed, Sep 12, 2018 at 3:11 AM Jayesh Nambiar <jayesh1017 at gmail.com>
>> wrote:
>>
>>> Thank you Daniel. I will deploy from 5.1 branch. I will be able to
>>> update it in next two days. Since this is a production system, I will have
>>> to have find a proper time slot to get this updated. Will let you know once
>>> that is done.
>>> Thanks again.
>>>
>>> - Jayesh
>>>
>>> On Tue, Sep 11, 2018 at 2:16 PM Daniel-Constantin Mierla <
>>> miconda at gmail.com> wrote:
>>>
>>>> Hi Jayesh,
>>>>
>>>> I looked over the weekend and seems like a race or a double free. I
>>>> backported some patches to tm module from master to 5.1 branch (a counter
>>>> to see if there is more than one free). Would you be able to deploy from
>>>> branch 5.1 or nightly builds for 5.1? With the next crash I can evaluate
>>>> (rule out or go for it) if it is double free or not.
>>>>
>>>> I haven't written you yet, because I planned to dig more into libev
>>>> (which is used by evapi), to see if it creates some threads that can mess
>>>> up with some kamailio globals, although I expect it is event loop in a
>>>> single thread. Other work related project took a lot of time these days, so
>>>> not there yet ...
>>>> Cheers,
>>>> Daniel
>>>>
>>>> On 11.09.18 09:56, Jayesh Nambiar wrote:
>>>>
>>>> Hi Daniel,
>>>> Just a gentle follow up on this. Could you find anything useful?
>>>>
>>>> Thanks,
>>>>
>>>> - Jayesh
>>>>
>>>>
>>>>>>> On 07.09.18 09:56, Jayesh Nambiar wrote:
>>>>>>>
>>>>>>> Hi Daniel,
>>>>>>> This happens randomly. It is not a testbed and never reproducible on
>>>>>>> the test server. I can still give you access if you wish to analyse the
>>>>>>> core files to check whats happening here exactly.
>>>>>>> It simply feels like on heavy load the evapi socket gives up.
>>>>>>> Earlier I was running 5.0.2 and in that kamailio would get stuck by not
>>>>>>> sending any async events on evapi socket. Not even the
>>>>>>> evapi:connection-closed event gets triggered. After I upgraded to latest
>>>>>>> stable version (5.1.5), it at least started crashing. Here are two core
>>>>>>> dumps put onto pastebin.
>>>>>>> https://pastebin.com/nn6gJapm
>>>>>>> https://pastebin.com/ph7b8vFH
>>>>>>>
>>>>>>> Thanks for all the support,
>>>>>>>
>>>>>>> - Jayesh
>>>>>>>
>>>>>>> On Fri, Sep 7, 2018 at 1:55 AM Daniel-Constantin Mierla <
>>>>>>> miconda at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> are you able to reproduce it somehow? Or just happens randomly?
>>>>>>>>
>>>>>>>> Is it on a testbed where I could get access to investigate the
>>>>>>>> corefiles? If not, then I will ask for more details from cores over email,
>>>>>>>> first would be 'bt full' for both cores and 'p *dead_cell' for the second
>>>>>>>> one.
>>>>>>>> Cheers,
>>>>>>>> Daniel
>>>>>>>>
>>>>>>>> On 05.09.18 17:47, Jayesh Nambiar wrote:
>>>>>>>>
>>>>>>>> Hi Daniel,
>>>>>>>> Have got these core dumps. Let me know if I should be doing a bt
>>>>>>>> full. I'll pastebin and send. Thanks,
>>>>>>>>
>>>>>>>> Core was generated by `/usr/local/kamailio/sbin/kamailio -P
>>>>>>>> /var/run/siptrunk.pid -f /usr/local/carrie'.
>>>>>>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>>>>>>> #0  0x0000000000505d30 in sip_msg_shm_clone
>>>>>>>> (org_msg=0x7f6ab0d9f618, sip_msg_len=0x7ffdddb2e8bc, clone_lumps=1) at
>>>>>>>> core/sip_msg_clone.c:491
>>>>>>>> 491 LUMP_LIST_LEN(len, org_msg->add_rm);
>>>>>>>> (gdb) bt
>>>>>>>> #0  0x0000000000505d30 in sip_msg_shm_clone
>>>>>>>> (org_msg=0x7f6ab0d9f618, sip_msg_len=0x7ffdddb2e8bc, clone_lumps=1) at
>>>>>>>> core/sip_msg_clone.c:491
>>>>>>>> #1  0x00007f6af0bdf68d in fake_req (shmem_msg=0x7f6ab0d9f618,
>>>>>>>> extra_flags=0, uac=0x7f6ab1738980, len=0x7ffdddb2e8bc) at t_reply.c:854
>>>>>>>> #2  0x00007f6af0c3aa27 in t_continue_helper (hash_index=58039,
>>>>>>>> label=413633661, rtact=0x7f6af10500f0, cbname=0x0, cbparam=0x0) at
>>>>>>>> t_suspend.c:293
>>>>>>>> #3  0x00007f6af0c3eed4 in t_continue (hash_index=58039,
>>>>>>>> label=413633661, route=0x7f6af10500f0) at t_suspend.c:583
>>>>>>>> #4  0x00007f6aae4dd010 in w_t_continue (msg=0x7ffdddb2fa60,
>>>>>>>> idx=0x7f6af1098e90 "8\306\t\361j\177", lbl=0x7f6af1098ff0
>>>>>>>> "\240\264\t\361j\177", rtn=0x7f6af1099150 "0\275\t\361j\177") at
>>>>>>>> tmx_mod.c:760
>>>>>>>> #5  0x000000000045b477 in do_action (h=0x7ffdddb2f850,
>>>>>>>> a=0x7f6af109ab38, msg=0x7ffdddb2fa60) at core/action.c:1085
>>>>>>>> #6  0x0000000000467fd5 in run_actions (h=0x7ffdddb2f850,
>>>>>>>> a=0x7f6af1096630, msg=0x7ffdddb2fa60) at core/action.c:1565
>>>>>>>> #7  0x000000000045b234 in do_action (h=0x7ffdddb2f850,
>>>>>>>> a=0x7f6af10a0f80, msg=0x7ffdddb2fa60) at core/action.c:1058
>>>>>>>> #8  0x0000000000467fd5 in run_actions (h=0x7ffdddb2f850,
>>>>>>>> a=0x7f6af10a0f80, msg=0x7ffdddb2fa60) at core/action.c:1565
>>>>>>>> #9  0x0000000000468797 in run_top_route (a=0x7f6af10a0f80,
>>>>>>>> msg=0x7ffdddb2fa60, c=0x0) at core/action.c:1654
>>>>>>>> #10 0x00007f6aabe79370 in evapi_run_cfg_route
>>>>>>>> (evenv=0x7ffdddb30250, rt=3, rtname=0x7f6aac08cb18 <_evapi_rts+56>) at
>>>>>>>> evapi_dispatch.c:161
>>>>>>>> #11 0x00007f6aabe7f271 in evapi_recv_client (loop=0x7f6aabe698e0,
>>>>>>>> watcher=0x27af5e0, revents=1) at evapi_dispatch.c:467
>>>>>>>> #12 0x00007f6aabc5fd73 in ev_invoke_pending () from
>>>>>>>> /usr/lib/x86_64-linux-gnu/libev.so.4
>>>>>>>> #13 0x00007f6aabc633de in ev_run () from
>>>>>>>> /usr/lib/x86_64-linux-gnu/libev.so.4
>>>>>>>> #14 0x00007f6aabe7867c in ev_loop (loop=0x7f6aabe698e0, flags=0) at
>>>>>>>> /usr/include/ev.h:835
>>>>>>>> #15 0x00007f6aabe83fc6 in evapi_run_dispatcher
>>>>>>>> (laddr=0x7f6af0f72300 "127.0.0.1", lport=8060) at evapi_dispatch.c:705
>>>>>>>> #16 0x00007f6aabe6e262 in child_init (rank=0) at evapi_mod.c:213
>>>>>>>> #17 0x0000000000542cad in init_mod_child (m=0x7f6af0f71b70, rank=0)
>>>>>>>> at core/sr_module.c:943
>>>>>>>> #18 0x0000000000542971 in init_mod_child (m=0x7f6af0f72968, rank=0)
>>>>>>>> at core/sr_module.c:939
>>>>>>>> #19 0x0000000000542971 in init_mod_child (m=0x7f6af0f73d38, rank=0)
>>>>>>>> at core/sr_module.c:939
>>>>>>>> #20 0x0000000000542971 in init_mod_child (m=0x7f6af0f74670, rank=0)
>>>>>>>> at core/sr_module.c:939
>>>>>>>> #21 0x0000000000542971 in init_mod_child (m=0x7f6af0f76708, rank=0)
>>>>>>>> at core/sr_module.c:939
>>>>>>>> #22 0x0000000000542971 in init_mod_child (m=0x7f6af0f76c08, rank=0)
>>>>>>>> at core/sr_module.c:939
>>>>>>>> #23 0x0000000000542971 in init_mod_child (m=0x7f6af0f770d0, rank=0)
>>>>>>>> at core/sr_module.c:939
>>>>>>>> #24 0x0000000000542971 in init_mod_child (m=0x7f6af0f77cf0, rank=0)
>>>>>>>> at core/sr_module.c:939
>>>>>>>> #25 0x0000000000542971 in init_mod_child (m=0x7f6af0f78808, rank=0)
>>>>>>>> at core/sr_module.c:939
>>>>>>>> #26 0x0000000000542971 in init_mod_child (m=0x7f6af0f78bd8, rank=0)
>>>>>>>> at core/sr_module.c:939
>>>>>>>> #27 0x0000000000542971 in init_mod_child (m=0x7f6af0f794c8, rank=0)
>>>>>>>> at core/sr_module.c:939
>>>>>>>> #28 0x0000000000542971 in init_mod_child (m=0x7f6af0f79920, rank=0)
>>>>>>>> at core/sr_module.c:939
>>>>>>>> #29 0x0000000000542971 in init_mod_child (m=0x7f6af0f7a330, rank=0)
>>>>>>>> at core/sr_module.c:939
>>>>>>>> #30 0x0000000000542971 in init_mod_child (m=0x7f6af0f7afd0, rank=0)
>>>>>>>> at core/sr_module.c:939
>>>>>>>> #31 0x0000000000542971 in init_mod_child (m=0x7f6af0f7bc80, rank=0)
>>>>>>>> at core/sr_module.c:939
>>>>>>>> #32 0x000000000054303d in init_child (rank=0) at
>>>>>>>> core/sr_module.c:970
>>>>>>>> #33 0x0000000000425399 in main_loop () at main.c:1701
>>>>>>>> #34 0x000000000042bd5c in main (argc=13, argv=0x7ffdddb31088) at
>>>>>>>> main.c:2638
>>>>>>>>
>>>>>>>> And this:
>>>>>>>> [New LWP 15804]
>>>>>>>> [Thread debugging using libthread_db enabled]
>>>>>>>> Using host libthread_db library
>>>>>>>> "/lib/x86_64-linux-gnu/libthread_db.so.1".
>>>>>>>> Core was generated by `/usr/local/kamailio/sbin/kamailio -P
>>>>>>>> /var/run/siptrunk.pid -f /usr/local/carrie'.
>>>>>>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>>>>>>> #0  0x00007f6af0b90b70 in free_cell_helper
>>>>>>>> (dead_cell=0x7f6ab0a6baa8, silent=0, fname=0x7f6af0c8f630 "timer.c",
>>>>>>>> fline=654) at h_table.c:230
>>>>>>>> 230 foo = tt->next;
>>>>>>>> (gdb) bt
>>>>>>>> #0  0x00007f6af0b90b70 in free_cell_helper
>>>>>>>> (dead_cell=0x7f6ab0a6baa8, silent=0, fname=0x7f6af0c8f630 "timer.c",
>>>>>>>> fline=654) at h_table.c:230
>>>>>>>> #1  0x00007f6af0c24409 in wait_handler (ti=932640643,
>>>>>>>> wait_tl=0x7f6ab0a6bb28, data=0x7f6ab0a6baa8) at timer.c:654
>>>>>>>> #2  0x00000000004bb445 in timer_list_expire (t=932640643,
>>>>>>>> h=0x7f6ab03ad158, slow_l=0x7f6ab03ae480, slow_mark=271) at core/timer.c:874
>>>>>>>> #3  0x00000000004bb8ab in timer_handler () at core/timer.c:939
>>>>>>>> #4  0x00000000004bbd30 in timer_main () at core/timer.c:978
>>>>>>>> #5  0x00000000004250f9 in main_loop () at main.c:1691
>>>>>>>> #6  0x000000000042bd5c in main (argc=13, argv=0x7ffdddb31088) at
>>>>>>>> main.c:2638
>>>>>>>>
>>>>>>>> On Wed, Sep 5, 2018 at 3:13 PM Daniel-Constantin Mierla <
>>>>>>>> miconda at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> the backtrace doesn't show any hint about kamailio, only from
>>>>>>>>> closelog() up.
>>>>>>>>>
>>>>>>>>> It may be the core generated by shutdown procedure, have you
>>>>>>>>> enabled one core file per pid/process? If not, do it and reproduce the
>>>>>>>>> issue again, you may get two core files, one being the runtime issue and
>>>>>>>>> the other one from shutdown procedure, which likely is an effect of the
>>>>>>>>> other one. The one from the runtime is more relevant.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Daniel
>>>>>>>>>
>>>>>>>>> On 05.09.18 10:09, Jayesh Nambiar wrote:
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>> I'm using kamailio 5.1.5 with evapi. I have a node.js connecting
>>>>>>>>> with kamailio evapi to which I send events and also consume events based on
>>>>>>>>> which I do the routing. I have 8 evapi workers defined in the config.
>>>>>>>>> The problem is that kamailio randomly crashes on high load. I'm
>>>>>>>>> assuming that it is related to the evapi module as rest of the config is
>>>>>>>>> pretty straight forward. I could get a core file and here's the core dump:
>>>>>>>>> [New LWP 14042]
>>>>>>>>> [Thread debugging using libthread_db enabled]
>>>>>>>>> Using host libthread_db library
>>>>>>>>> “/lib/x86_64-linux-gnu/libthread_db.so.1”.
>>>>>>>>> Core was generated by `/usr/local/kamailio/sbin/kamailio -P
>>>>>>>>> /var/run/siptrunk.pid -f /usr/local/carrie’.
>>>>>>>>> Program terminated with signal SIGABRT, Aborted.
>>>>>>>>> #0  0x00007f9995283428 in __GI_raise (sig=sig at entry=6) at
>>>>>>>>> ../sysdeps/unix/sysv/linux/raise.c:54
>>>>>>>>> 54    ../sysdeps/unix/sysv/linux/raise.c: No such file or
>>>>>>>>> directory.
>>>>>>>>> (gdb) bt
>>>>>>>>> #0  0x00007f9995283428 in __GI_raise (sig=sig at entry=6) at
>>>>>>>>> ../sysdeps/unix/sysv/linux/raise.c:54
>>>>>>>>> #1  0x00007f999528502a in __GI_abort () at abort.c:89
>>>>>>>>> #2  0x000000000041a029 in sig_alarm_abort (signo=14) at main.c:646
>>>>>>>>> #3  <signal handler called>
>>>>>>>>> #4  0x00007f999534f497 in __libc_cleanup_routine (f=<optimized
>>>>>>>>> out>) at ../sysdeps/nptl/libc-lockP.h:291
>>>>>>>>> #5  closelog () at ../misc/syslog.c:415
>>>>>>>>> #6  0x0000000000000000 in ?? ()
>>>>>>>>>
>>>>>>>>> Any help in this regards is would allow me to identify the reason
>>>>>>>>> of the crash. Thanks for the support.
>>>>>>>>>
>>>>>>>>> - Jayesh
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Kamailio (SER) - Users Mailing Listsr-users at lists.kamailio.orghttps://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Daniel-Constantin Mierla -- www.asipto.comwww.twitter.com/miconda -- www.linkedin.com/in/miconda
>>>>>>>>> Kamailio World Conference -- www.kamailioworld.com
>>>>>>>>> Kamailio Advanced Training, Nov 12-14, 2018, in Berlin -- www.asipto.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>> --
>>>>>>>> Daniel-Constantin Mierla -- www.asipto.comwww.twitter.com/miconda -- www.linkedin.com/in/miconda
>>>>>>>> Kamailio World Conference -- www.kamailioworld.com
>>>>>>>> Kamailio Advanced Training, Nov 12-14, 2018, in Berlin -- www.asipto.com
>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>>> Daniel-Constantin Mierla -- www.asipto.comwww.twitter.com/miconda -- www.linkedin.com/in/miconda
>>>>>>> Kamailio World Conference -- www.kamailioworld.com
>>>>>>> Kamailio Advanced Training, Nov 12-14, 2018, in Berlin -- www.asipto.com
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> Daniel-Constantin Mierla -- www.asipto.comwww.twitter.com/miconda -- www.linkedin.com/in/miconda
>>>>>> Kamailio World Conference -- www.kamailioworld.com
>>>>>> Kamailio Advanced Training, Nov 12-14, 2018, in Berlin -- www.asipto.com
>>>>>>
>>>>>>
>>>> --
>>>> Daniel-Constantin Mierla -- www.asipto.comwww.twitter.com/miconda -- www.linkedin.com/in/miconda
>>>> Kamailio World Conference -- www.kamailioworld.com
>>>> Kamailio Advanced Training, Nov 12-14, 2018, in Berlin -- www.asipto.com
>>>>
>>>>
>
> _______________________________________________
> Kamailio (SER) - Users Mailing Listsr-users at lists.kamailio.orghttps://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>
>
> --
> Daniel-Constantin Mierla -- www.asipto.comwww.twitter.com/miconda -- www.linkedin.com/in/miconda
> Kamailio World Conference -- www.kamailioworld.com
> Kamailio Advanced Training, Nov 12-14, 2018, in Berlin -- www.asipto.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kamailio.org/pipermail/sr-users/attachments/20181002/01f6bd9b/attachment.html>


More information about the sr-users mailing list