[SR-Users] Fwd: kamailio with evapi crashing on high volume

Jayesh Nambiar jayesh1017 at gmail.com
Wed Sep 26 09:25:34 CEST 2018


Hello Daniel,

Just a quick update on this. After running from the 5.1 branch twice it
happened that Server became unresponsive. It never crashed and on forceful
shutdown it only generated a core with shutdown routine. Here is what the
syslog contained off while the calls were being sent to kamailio but
kamailio was not processing any calls:

Sep 22 16:09:04 triviamedia /usr/local/kamailio/sbin/kamailio[16006]:
CRITICAL: <core> [core/mem/q_malloc.c:502]: qm_free(): BUG: freeing already
freed pointer (0x7fcd23ce5648), called from tm: h_table.c:
free_cell_helper(235), first free tm: h_table.c: free_cell_helper(235) -
aborting
Sep 22 16:09:04 triviamedia /usr/local/kamailio/sbin/kamailio[16006]:
CRITICAL: <core> [core/mem/q_malloc.c:502]: qm_free(): BUG: freeing already
freed pointer (0x7fcd24a38ea8), called from tm: h_table.c:
free_cell_helper(234), first free tm: h_table.c: free_cell_helper(234) -
aborting
Sep 22 16:09:04 triviamedia /usr/local/kamailio/sbin/kamailio[16006]:
CRITICAL: <core> [core/mem/q_malloc.c:502]: qm_free(): BUG: freeing already
freed pointer (0x7fcd23ce5648), called from tm: h_table.c:
free_cell_helper(235), first free tm: h_table.c: free_cell_helper(235) -
aborting
Sep 22 16:09:04 triviamedia /usr/local/kamailio/sbin/kamailio[16006]:
CRITICAL: <core> [core/mem/q_malloc.c:502]: qm_free(): BUG: freeing already
freed pointer (0x7fcd24a38ea8), called from tm: h_table.c:
free_cell_helper(234), first free tm: h_table.c: free_cell_helper(234) -
aborting
Sep 22 16:09:04 triviamedia /usr/local/kamailio/sbin/kamailio[16006]:
CRITICAL: <core> [core/mem/q_malloc.c:502]: qm_free(): BUG: freeing already
freed pointer (0x7fcd23ce5648), called from tm: h_table.c:
free_cell_helper(235), first free tm: h_table.c: free_cell_helper(235) -
aborting
Sep 22 16:09:04 triviamedia /usr/local/kamailio/sbin/kamailio[16006]:
CRITICAL: <core> [core/mem/q_malloc.c:502]: qm_free(): BUG: freeing already
freed pointer (0x7fcd24a38ea8), called from tm: h_table.c:
free_cell_helper(234), first free tm: h_table.c: free_cell_helper(234) -
aborting
Sep 22 16:09:04 triviamedia /usr/local/kamailio/sbin/kamailio[16006]:
CRITICAL: <core> [core/mem/q_malloc.c:502]: qm_free(): BUG: freeing already
freed pointer (0x7fcd23ce5648), called from tm: h_table.c:
free_cell_helper(235), first fre

And here's the "bt full" of the core file:
[New LWP 16003]
Cannot access memory at address 0x7fcd64e1a148
Cannot access memory at address 0x7fcd64e1a140
Failed to read a valid object file image from memory.
Core was generated by `/usr/local/kamailio/sbin/kamailio -P
/var/run/siptrunk.pid -f /usr/local/carrie'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007fcd6443f428 in ?? ()
(gdb) bt full
#0  0x00007fcd6443f428 in ?? ()
No symbol table info available.
Backtrace stopped: Cannot access memory at address 0x7ffed40e9e68

Thanks,

- Jayesh

On Sat, Sep 15, 2018 at 7:09 PM Jayesh Nambiar <jayesh1017 at gmail.com> wrote:

> Hello Daniel,
> I have now git cloned from 5.1 branch and restarted kamailio today. Also
> an observation; that when one particular customer with around 40-50cps
> capacity starts his dialing is when the kamailio kept crashing. I will
> observe in the coming week and report of any crashes accordingly.
>
> Thanks,
>
> - Jayesh
>
> On Wed, Sep 12, 2018 at 3:11 AM Jayesh Nambiar <jayesh1017 at gmail.com>
> wrote:
>
>> Thank you Daniel. I will deploy from 5.1 branch. I will be able to update
>> it in next two days. Since this is a production system, I will have to have
>> find a proper time slot to get this updated. Will let you know once that is
>> done.
>> Thanks again.
>>
>> - Jayesh
>>
>> On Tue, Sep 11, 2018 at 2:16 PM Daniel-Constantin Mierla <
>> miconda at gmail.com> wrote:
>>
>>> Hi Jayesh,
>>>
>>> I looked over the weekend and seems like a race or a double free. I
>>> backported some patches to tm module from master to 5.1 branch (a counter
>>> to see if there is more than one free). Would you be able to deploy from
>>> branch 5.1 or nightly builds for 5.1? With the next crash I can evaluate
>>> (rule out or go for it) if it is double free or not.
>>>
>>> I haven't written you yet, because I planned to dig more into libev
>>> (which is used by evapi), to see if it creates some threads that can mess
>>> up with some kamailio globals, although I expect it is event loop in a
>>> single thread. Other work related project took a lot of time these days, so
>>> not there yet ...
>>> Cheers,
>>> Daniel
>>>
>>> On 11.09.18 09:56, Jayesh Nambiar wrote:
>>>
>>> Hi Daniel,
>>> Just a gentle follow up on this. Could you find anything useful?
>>>
>>> Thanks,
>>>
>>> - Jayesh
>>>
>>>
>>>>>> On 07.09.18 09:56, Jayesh Nambiar wrote:
>>>>>>
>>>>>> Hi Daniel,
>>>>>> This happens randomly. It is not a testbed and never reproducible on
>>>>>> the test server. I can still give you access if you wish to analyse the
>>>>>> core files to check whats happening here exactly.
>>>>>> It simply feels like on heavy load the evapi socket gives up. Earlier
>>>>>> I was running 5.0.2 and in that kamailio would get stuck by not sending any
>>>>>> async events on evapi socket. Not even the evapi:connection-closed event
>>>>>> gets triggered. After I upgraded to latest stable version (5.1.5), it at
>>>>>> least started crashing. Here are two core dumps put onto pastebin.
>>>>>> https://pastebin.com/nn6gJapm
>>>>>> https://pastebin.com/ph7b8vFH
>>>>>>
>>>>>> Thanks for all the support,
>>>>>>
>>>>>> - Jayesh
>>>>>>
>>>>>> On Fri, Sep 7, 2018 at 1:55 AM Daniel-Constantin Mierla <
>>>>>> miconda at gmail.com> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> are you able to reproduce it somehow? Or just happens randomly?
>>>>>>>
>>>>>>> Is it on a testbed where I could get access to investigate the
>>>>>>> corefiles? If not, then I will ask for more details from cores over email,
>>>>>>> first would be 'bt full' for both cores and 'p *dead_cell' for the second
>>>>>>> one.
>>>>>>> Cheers,
>>>>>>> Daniel
>>>>>>>
>>>>>>> On 05.09.18 17:47, Jayesh Nambiar wrote:
>>>>>>>
>>>>>>> Hi Daniel,
>>>>>>> Have got these core dumps. Let me know if I should be doing a bt
>>>>>>> full. I'll pastebin and send. Thanks,
>>>>>>>
>>>>>>> Core was generated by `/usr/local/kamailio/sbin/kamailio -P
>>>>>>> /var/run/siptrunk.pid -f /usr/local/carrie'.
>>>>>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>>>>>> #0  0x0000000000505d30 in sip_msg_shm_clone (org_msg=0x7f6ab0d9f618,
>>>>>>> sip_msg_len=0x7ffdddb2e8bc, clone_lumps=1) at core/sip_msg_clone.c:491
>>>>>>> 491 LUMP_LIST_LEN(len, org_msg->add_rm);
>>>>>>> (gdb) bt
>>>>>>> #0  0x0000000000505d30 in sip_msg_shm_clone (org_msg=0x7f6ab0d9f618,
>>>>>>> sip_msg_len=0x7ffdddb2e8bc, clone_lumps=1) at core/sip_msg_clone.c:491
>>>>>>> #1  0x00007f6af0bdf68d in fake_req (shmem_msg=0x7f6ab0d9f618,
>>>>>>> extra_flags=0, uac=0x7f6ab1738980, len=0x7ffdddb2e8bc) at t_reply.c:854
>>>>>>> #2  0x00007f6af0c3aa27 in t_continue_helper (hash_index=58039,
>>>>>>> label=413633661, rtact=0x7f6af10500f0, cbname=0x0, cbparam=0x0) at
>>>>>>> t_suspend.c:293
>>>>>>> #3  0x00007f6af0c3eed4 in t_continue (hash_index=58039,
>>>>>>> label=413633661, route=0x7f6af10500f0) at t_suspend.c:583
>>>>>>> #4  0x00007f6aae4dd010 in w_t_continue (msg=0x7ffdddb2fa60,
>>>>>>> idx=0x7f6af1098e90 "8\306\t\361j\177", lbl=0x7f6af1098ff0
>>>>>>> "\240\264\t\361j\177", rtn=0x7f6af1099150 "0\275\t\361j\177") at
>>>>>>> tmx_mod.c:760
>>>>>>> #5  0x000000000045b477 in do_action (h=0x7ffdddb2f850,
>>>>>>> a=0x7f6af109ab38, msg=0x7ffdddb2fa60) at core/action.c:1085
>>>>>>> #6  0x0000000000467fd5 in run_actions (h=0x7ffdddb2f850,
>>>>>>> a=0x7f6af1096630, msg=0x7ffdddb2fa60) at core/action.c:1565
>>>>>>> #7  0x000000000045b234 in do_action (h=0x7ffdddb2f850,
>>>>>>> a=0x7f6af10a0f80, msg=0x7ffdddb2fa60) at core/action.c:1058
>>>>>>> #8  0x0000000000467fd5 in run_actions (h=0x7ffdddb2f850,
>>>>>>> a=0x7f6af10a0f80, msg=0x7ffdddb2fa60) at core/action.c:1565
>>>>>>> #9  0x0000000000468797 in run_top_route (a=0x7f6af10a0f80,
>>>>>>> msg=0x7ffdddb2fa60, c=0x0) at core/action.c:1654
>>>>>>> #10 0x00007f6aabe79370 in evapi_run_cfg_route (evenv=0x7ffdddb30250,
>>>>>>> rt=3, rtname=0x7f6aac08cb18 <_evapi_rts+56>) at evapi_dispatch.c:161
>>>>>>> #11 0x00007f6aabe7f271 in evapi_recv_client (loop=0x7f6aabe698e0,
>>>>>>> watcher=0x27af5e0, revents=1) at evapi_dispatch.c:467
>>>>>>> #12 0x00007f6aabc5fd73 in ev_invoke_pending () from
>>>>>>> /usr/lib/x86_64-linux-gnu/libev.so.4
>>>>>>> #13 0x00007f6aabc633de in ev_run () from
>>>>>>> /usr/lib/x86_64-linux-gnu/libev.so.4
>>>>>>> #14 0x00007f6aabe7867c in ev_loop (loop=0x7f6aabe698e0, flags=0) at
>>>>>>> /usr/include/ev.h:835
>>>>>>> #15 0x00007f6aabe83fc6 in evapi_run_dispatcher (laddr=0x7f6af0f72300
>>>>>>> "127.0.0.1", lport=8060) at evapi_dispatch.c:705
>>>>>>> #16 0x00007f6aabe6e262 in child_init (rank=0) at evapi_mod.c:213
>>>>>>> #17 0x0000000000542cad in init_mod_child (m=0x7f6af0f71b70, rank=0)
>>>>>>> at core/sr_module.c:943
>>>>>>> #18 0x0000000000542971 in init_mod_child (m=0x7f6af0f72968, rank=0)
>>>>>>> at core/sr_module.c:939
>>>>>>> #19 0x0000000000542971 in init_mod_child (m=0x7f6af0f73d38, rank=0)
>>>>>>> at core/sr_module.c:939
>>>>>>> #20 0x0000000000542971 in init_mod_child (m=0x7f6af0f74670, rank=0)
>>>>>>> at core/sr_module.c:939
>>>>>>> #21 0x0000000000542971 in init_mod_child (m=0x7f6af0f76708, rank=0)
>>>>>>> at core/sr_module.c:939
>>>>>>> #22 0x0000000000542971 in init_mod_child (m=0x7f6af0f76c08, rank=0)
>>>>>>> at core/sr_module.c:939
>>>>>>> #23 0x0000000000542971 in init_mod_child (m=0x7f6af0f770d0, rank=0)
>>>>>>> at core/sr_module.c:939
>>>>>>> #24 0x0000000000542971 in init_mod_child (m=0x7f6af0f77cf0, rank=0)
>>>>>>> at core/sr_module.c:939
>>>>>>> #25 0x0000000000542971 in init_mod_child (m=0x7f6af0f78808, rank=0)
>>>>>>> at core/sr_module.c:939
>>>>>>> #26 0x0000000000542971 in init_mod_child (m=0x7f6af0f78bd8, rank=0)
>>>>>>> at core/sr_module.c:939
>>>>>>> #27 0x0000000000542971 in init_mod_child (m=0x7f6af0f794c8, rank=0)
>>>>>>> at core/sr_module.c:939
>>>>>>> #28 0x0000000000542971 in init_mod_child (m=0x7f6af0f79920, rank=0)
>>>>>>> at core/sr_module.c:939
>>>>>>> #29 0x0000000000542971 in init_mod_child (m=0x7f6af0f7a330, rank=0)
>>>>>>> at core/sr_module.c:939
>>>>>>> #30 0x0000000000542971 in init_mod_child (m=0x7f6af0f7afd0, rank=0)
>>>>>>> at core/sr_module.c:939
>>>>>>> #31 0x0000000000542971 in init_mod_child (m=0x7f6af0f7bc80, rank=0)
>>>>>>> at core/sr_module.c:939
>>>>>>> #32 0x000000000054303d in init_child (rank=0) at core/sr_module.c:970
>>>>>>> #33 0x0000000000425399 in main_loop () at main.c:1701
>>>>>>> #34 0x000000000042bd5c in main (argc=13, argv=0x7ffdddb31088) at
>>>>>>> main.c:2638
>>>>>>>
>>>>>>> And this:
>>>>>>> [New LWP 15804]
>>>>>>> [Thread debugging using libthread_db enabled]
>>>>>>> Using host libthread_db library
>>>>>>> "/lib/x86_64-linux-gnu/libthread_db.so.1".
>>>>>>> Core was generated by `/usr/local/kamailio/sbin/kamailio -P
>>>>>>> /var/run/siptrunk.pid -f /usr/local/carrie'.
>>>>>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>>>>>> #0  0x00007f6af0b90b70 in free_cell_helper
>>>>>>> (dead_cell=0x7f6ab0a6baa8, silent=0, fname=0x7f6af0c8f630 "timer.c",
>>>>>>> fline=654) at h_table.c:230
>>>>>>> 230 foo = tt->next;
>>>>>>> (gdb) bt
>>>>>>> #0  0x00007f6af0b90b70 in free_cell_helper
>>>>>>> (dead_cell=0x7f6ab0a6baa8, silent=0, fname=0x7f6af0c8f630 "timer.c",
>>>>>>> fline=654) at h_table.c:230
>>>>>>> #1  0x00007f6af0c24409 in wait_handler (ti=932640643,
>>>>>>> wait_tl=0x7f6ab0a6bb28, data=0x7f6ab0a6baa8) at timer.c:654
>>>>>>> #2  0x00000000004bb445 in timer_list_expire (t=932640643,
>>>>>>> h=0x7f6ab03ad158, slow_l=0x7f6ab03ae480, slow_mark=271) at core/timer.c:874
>>>>>>> #3  0x00000000004bb8ab in timer_handler () at core/timer.c:939
>>>>>>> #4  0x00000000004bbd30 in timer_main () at core/timer.c:978
>>>>>>> #5  0x00000000004250f9 in main_loop () at main.c:1691
>>>>>>> #6  0x000000000042bd5c in main (argc=13, argv=0x7ffdddb31088) at
>>>>>>> main.c:2638
>>>>>>>
>>>>>>> On Wed, Sep 5, 2018 at 3:13 PM Daniel-Constantin Mierla <
>>>>>>> miconda at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> the backtrace doesn't show any hint about kamailio, only from
>>>>>>>> closelog() up.
>>>>>>>>
>>>>>>>> It may be the core generated by shutdown procedure, have you
>>>>>>>> enabled one core file per pid/process? If not, do it and reproduce the
>>>>>>>> issue again, you may get two core files, one being the runtime issue and
>>>>>>>> the other one from shutdown procedure, which likely is an effect of the
>>>>>>>> other one. The one from the runtime is more relevant.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Daniel
>>>>>>>>
>>>>>>>> On 05.09.18 10:09, Jayesh Nambiar wrote:
>>>>>>>>
>>>>>>>> Hello,
>>>>>>>> I'm using kamailio 5.1.5 with evapi. I have a node.js connecting
>>>>>>>> with kamailio evapi to which I send events and also consume events based on
>>>>>>>> which I do the routing. I have 8 evapi workers defined in the config.
>>>>>>>> The problem is that kamailio randomly crashes on high load. I'm
>>>>>>>> assuming that it is related to the evapi module as rest of the config is
>>>>>>>> pretty straight forward. I could get a core file and here's the core dump:
>>>>>>>> [New LWP 14042]
>>>>>>>> [Thread debugging using libthread_db enabled]
>>>>>>>> Using host libthread_db library
>>>>>>>> “/lib/x86_64-linux-gnu/libthread_db.so.1”.
>>>>>>>> Core was generated by `/usr/local/kamailio/sbin/kamailio -P
>>>>>>>> /var/run/siptrunk.pid -f /usr/local/carrie’.
>>>>>>>> Program terminated with signal SIGABRT, Aborted.
>>>>>>>> #0  0x00007f9995283428 in __GI_raise (sig=sig at entry=6) at
>>>>>>>> ../sysdeps/unix/sysv/linux/raise.c:54
>>>>>>>> 54    ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
>>>>>>>> (gdb) bt
>>>>>>>> #0  0x00007f9995283428 in __GI_raise (sig=sig at entry=6) at
>>>>>>>> ../sysdeps/unix/sysv/linux/raise.c:54
>>>>>>>> #1  0x00007f999528502a in __GI_abort () at abort.c:89
>>>>>>>> #2  0x000000000041a029 in sig_alarm_abort (signo=14) at main.c:646
>>>>>>>> #3  <signal handler called>
>>>>>>>> #4  0x00007f999534f497 in __libc_cleanup_routine (f=<optimized
>>>>>>>> out>) at ../sysdeps/nptl/libc-lockP.h:291
>>>>>>>> #5  closelog () at ../misc/syslog.c:415
>>>>>>>> #6  0x0000000000000000 in ?? ()
>>>>>>>>
>>>>>>>> Any help in this regards is would allow me to identify the reason
>>>>>>>> of the crash. Thanks for the support.
>>>>>>>>
>>>>>>>> - Jayesh
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Kamailio (SER) - Users Mailing Listsr-users at lists.kamailio.orghttps://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Daniel-Constantin Mierla -- www.asipto.comwww.twitter.com/miconda -- www.linkedin.com/in/miconda
>>>>>>>> Kamailio World Conference -- www.kamailioworld.com
>>>>>>>> Kamailio Advanced Training, Nov 12-14, 2018, in Berlin -- www.asipto.com
>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>>> Daniel-Constantin Mierla -- www.asipto.comwww.twitter.com/miconda -- www.linkedin.com/in/miconda
>>>>>>> Kamailio World Conference -- www.kamailioworld.com
>>>>>>> Kamailio Advanced Training, Nov 12-14, 2018, in Berlin -- www.asipto.com
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> Daniel-Constantin Mierla -- www.asipto.comwww.twitter.com/miconda -- www.linkedin.com/in/miconda
>>>>>> Kamailio World Conference -- www.kamailioworld.com
>>>>>> Kamailio Advanced Training, Nov 12-14, 2018, in Berlin -- www.asipto.com
>>>>>>
>>>>>>
>>>>> --
>>>>> Daniel-Constantin Mierla -- www.asipto.comwww.twitter.com/miconda -- www.linkedin.com/in/miconda
>>>>> Kamailio World Conference -- www.kamailioworld.com
>>>>> Kamailio Advanced Training, Nov 12-14, 2018, in Berlin -- www.asipto.com
>>>>>
>>>>>
>>> --
>>> Daniel-Constantin Mierla -- www.asipto.comwww.twitter.com/miconda -- www.linkedin.com/in/miconda
>>> Kamailio World Conference -- www.kamailioworld.com
>>> Kamailio Advanced Training, Nov 12-14, 2018, in Berlin -- www.asipto.com
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kamailio.org/pipermail/sr-users/attachments/20180926/16e5fd3a/attachment.html>


More information about the sr-users mailing list