[sr-dev] [Redis module] Kamailio crashes in case of connection lost to redis server

Vicente Hernando vhernando at systemonenoc.com
Thu Nov 28 11:36:18 CET 2013


Hello,

could you test this patch and confirm the bug has disappeared?

Thanks,
Vicente.

On 11/28/2013 11:10 AM, Tuan Viet Nguyen wrote:
> Hi Vicente,
>
> Thank you for your quick reply.
>
> I'm ready to retest the patch.
>
> Regards,
>
>
> On Thu, Nov 28, 2013 at 11:07 AM, Vicente Hernando 
> <vhernando at systemonenoc.com <mailto:vhernando at systemonenoc.com>> wrote:
>
>     Hello,
>
>     I think you have discovered a bug I made using variadic functions.
>
>     Very soon I gonna send a patch to correct it.
>
>
>     Thanks,
>     Vicente.
>
>
>     On 11/28/2013 10:14 AM, Tuan Viet Nguyen wrote:
>>     Hello Vicente,
>>
>>     Thank you for your reply, you'll find my answer below
>>
>>     On Thu, Nov 28, 2013 at 12:03 AM, Vicente Hernando
>>     <vhernando at systemonenoc.com <mailto:vhernando at systemonenoc.com>>
>>     wrote:
>>
>>         Hello,
>>
>>         also full steps to crash kamailio and reproduce the error
>>         would be good.
>>
>>
>>     Here is the architecture
>>
>>     A <--> Asterisk <--> Kamailio 1 <---> kamailio2 <--- ISP---> mobile
>>
>>     Kamailio 1 & 2 are connected to a local redis server
>>     1/ I restarted the redis server
>>     2/ From the mobile I made a call to A then cancelled it. In the
>>     script of kamailio1, if a call has missed or failed, it sends a
>>     message to the redis. And in this case, it crashes
>>
>>
>>
>>
>>
>>         On 11/27/2013 11:35 PM, Daniel-Constantin Mierla wrote:
>>>         Hello,
>>>
>>>         can you give the full output for 'bt full' with gdb on the
>>>         core file? You gave only partial list of the frames, not
>>>         being enough to see the execution trace.
>>>
>>>         Cheers,
>>>         Daniel
>>>
>>>         On 11/27/13 6:52 PM, Tuan Viet Nguyen wrote:
>>>>         Hello,
>>>>
>>>>         I'll try to shut down the redis server to test the behavior
>>>>         of kamailio and it has crashed if a call is received and
>>>>         then cancelled.
>>>>
>>>>         *1/The kamailio version is 4.0.4*
>>>>
>>>>         *2/ Kamailio log *
>>>>         /usr/local/sbin/kamailio[25333]: ERROR: ndb_redis
>>>>         [redis_client.c:364]: redisc_exec(): Redis error: Server
>>>>         closed the connection
>>>>         /usr/local/sbin/kamailio[25361]: : <core> [pass_fd.c:293]:
>>>>         receive_fd(): ERROR: receive_fd: EOF on 13
>>>>         /usr/local/sbin/kamailio[25328]: ALERT: <core>
>>>>         [main.c:788]: handle_sigs(): child process 25333 exited by
>>>>         a signal 11
>>>>         /usr/local/sbin/kamailio[25328]: ALERT: <core>
>>>>         [main.c:791]: handle_sigs(): core was generated
>>>>
>>         I assume you disconnect redis server and don't reconnect it.
>>         It is that correct?
>>
>>         Then this line is an error but it should recover from that. I
>>         probably should set this as a warning instead an error.
>>
>>         /usr/local/sbin/kamailio[25333]: ERROR: ndb_redis
>>         [redis_client.c:364]: redisc_exec(): Redis error: Server
>>         closed the connection
>>
>>
>>     Yes, it has been restarted
>>
>>
>>>>         _*3/ Interesting information in the core*_
>>>>         #3  0x00007fc79412893d in redisvCommand (c=0x64657461,
>>>>         format=0x9 <Address 0x9 out of bounds>, ap=0x30,
>>>>         ap at entry=0x7fff0ff56aa8) at hiredis.c:1304
>>>>         No locals.
>>>>         #4  0x00007fc794341713 in redisc_exec
>>>>         (srv=srv at entry=0x7fff0ff56be0,
>>>>         res=res at entry=0x7fff0ff56c00, cmd=cmd at entry=0x7fff0ff56bf0)
>>>>         at redis_client.c:368
>>>>                 rsrv = 0x7fc794565150
>>>>                 rpl = 0x7fc7946fab70
>>>>                 c = 0 '\000'
>>>>                 ap = {{gp_offset = 48, fp_offset = 48,
>>>>         overflow_arg_area = 0x7fff0ff56bb0, reg_save_area =
>>>>         0x7fff0ff56ac0}}
>>>>                 __FUNCTION__ = "redisc_exec"
>>>>         #5  0x00007fc79433b781 in w_redis_cmd5 (msg=<optimized
>>>>         out>, ssrv=<optimized out>, scmd=<optimized out>,
>>>>         sargv1=<optimized out>, sargv2=0x7fc7946f7bf0
>>>>         "p\243_\224\307\177", sres=0x7fc7946f7c50 "
>>>>         \253_\224\307\177") at ndb_redis_mod.c:250
>>>>                 s = {{s = 0x7fc7945fb300 "kamailio_redis", len =
>>>>         14}, {s = 0x7fc7945f5f50 "PUBLISH %s %s", len = 13}, {s =
>>>>         0x7fc7945fab20 "r", len = 1}}
>>>>                 arg1 = {s = 0x7fc7945f5f80 "notification", len = 12}
>>>>                 arg2 = {
>>>>                   s = 0x7fc794551c60 "info XXX"...,
>>>>                   len = 212}
>>>>                 c1 = 0 '\000'
>>>>                 c2 = 0 '\000'
>>>>                 __FUNCTION__ = "w_redis_cmd5"
>>>>
>>>>
>>         In the source code:
>>
>>             rpl->rplRedis = redisvCommand(rsrv->ctxRedis, cmd->s, ap );
>>             if(rpl->rplRedis == NULL)
>>             {
>>                 /* null reply, reconnect and try again */
>>                 if(rsrv->ctxRedis->err)
>>                 {
>>                     LM_ERR("Redis error: %s\n", rsrv->ctxRedis->errstr);
>>                 }
>>         if(redisc_reconnect_server(rsrv)==0)
>>                 {
>>                     rpl->rplRedis = redisvCommand(rsrv->ctxRedis,
>>         cmd->s, ap);
>>                 }
>>             }
>>
>>         First redisvCommand executes but returns nothing. Then it
>>         shows a redis error.
>>
>>         It tries to reconnect and it manages to connect ?? because it
>>         shows no more errors.
>>
>>         And then executes redisvCommand again and crashes.
>>
>>         If server is down it should not be able to connect and so not
>>         to execute redisvCommand again.
>>
>>
>>     According to the core, we MUST be in this case
>>     *if(redisc_reconnect_server(rsrv)==0)
>>     *
>>     But I am wondering how the first redisvCommand can succeed before
>>     the reconnection ? (the connection kamailio1 <-> redis has
>>     already been taken down). Does all the redis context always there
>>     when we first call redisvCommand?
>>
>>
>>
>>         May be I would get more clues with more information.
>>
>>         Regards,
>>         Vicente.
>>
>>
>>     Thank you
>>     Regards,
>>
>>
>>
>>>>         I've found one of post that this issue has been fixed but
>>>>         it seems that it's always the case ..
>>>>         http://www.mail-archive.com/search?l=sr-users@lists.sip-router.org&q=subject:%22Re%3A+%5BSR-Users%5D+ndb_redis+module+fails+after+a+while%22
>>>>
>>>>         Do you have any idea?
>>>>         Thank you
>>>>
>>>>
>>>>         _______________________________________________
>>>>         sr-dev mailing list
>>>>         sr-dev at lists.sip-router.org  <mailto:sr-dev at lists.sip-router.org>
>>>>         http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
>>>
>>>         -- 
>>>         Daniel-Constantin Mierla -http://www.asipto.com
>>>         http://twitter.com/#!/miconda  <http://twitter.com/#%21/miconda>  -http://www.linkedin.com/in/miconda
>>>
>>>
>>>         _______________________________________________
>>>         sr-dev mailing list
>>>         sr-dev at lists.sip-router.org  <mailto:sr-dev at lists.sip-router.org>
>>>         http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
>>
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sip-router.org/pipermail/sr-dev/attachments/20131128/2b53284a/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: redis_redisc_exec.patch
Type: text/x-patch
Size: 1044 bytes
Desc: not available
URL: <http://lists.sip-router.org/pipermail/sr-dev/attachments/20131128/2b53284a/attachment-0001.bin>


More information about the sr-dev mailing list