[sr-dev] [Redis module] Kamailio crashes in case of connection lost to redis server

Tuan Viet Nguyen ntvietvn at gmail.com
Thu Nov 28 13:31:58 CET 2013


OK, thank you for your help, i've updated the source.

Regards,


On Thu, Nov 28, 2013 at 12:36 PM, Vicente Hernando <
vhernando at systemonenoc.com> wrote:

>  Hello Nguyen,
>
> I have uploaded the patch in devel, 4.0, and 4.1 versions.
>
>
> Regards,
> Vicente.
>
>
> On 11/28/2013 12:07 PM, Tuan Viet Nguyen wrote:
>
>  Hi Vicente,
>
>  It works now. Thank you for the patch. In which version will we have this
> one integrated ?
>
>
>  Regards,
>
>
>  On Thu, Nov 28, 2013 at 11:36 AM, Vicente Hernando <
> vhernando at systemonenoc.com> wrote:
>
>>  Hello,
>>
>> could you test this patch and confirm the bug has disappeared?
>>
>> Thanks,
>> Vicente.
>>
>>
>> On 11/28/2013 11:10 AM, Tuan Viet Nguyen wrote:
>>
>>   Hi Vicente,
>>
>>  Thank you for your quick reply.
>>
>>  I'm ready to retest the patch.
>>
>>  Regards,
>>
>>
>> On Thu, Nov 28, 2013 at 11:07 AM, Vicente Hernando <
>> vhernando at systemonenoc.com> wrote:
>>
>>>  Hello,
>>>
>>> I think you have discovered a bug I made using variadic functions.
>>>
>>> Very soon I gonna send a patch to correct it.
>>>
>>>
>>> Thanks,
>>> Vicente.
>>>
>>>
>>> On 11/28/2013 10:14 AM, Tuan Viet Nguyen wrote:
>>>
>>> Hello Vicente,
>>>
>>>  Thank you for your reply, you'll find my answer below
>>>
>>> On Thu, Nov 28, 2013 at 12:03 AM, Vicente Hernando <
>>> vhernando at systemonenoc.com> wrote:
>>>
>>>>  Hello,
>>>>
>>>> also full steps to crash kamailio and reproduce the error would be good.
>>>>
>>>
>>>  Here is the architecture
>>>
>>>  A <--> Asterisk <--> Kamailio 1 <---> kamailio2 <--- ISP---> mobile
>>>
>>>  Kamailio 1 & 2 are connected to a local redis server
>>>  1/ I restarted the redis server
>>>  2/ From the mobile I made a call to A then cancelled it. In the script
>>> of kamailio1, if a call has missed or failed, it sends a message to the
>>> redis. And in this case, it crashes
>>>
>>>
>>>
>>>>
>>>>
>>>> On 11/27/2013 11:35 PM, Daniel-Constantin Mierla wrote:
>>>>
>>>> Hello,
>>>>
>>>> can you give the full output for 'bt full' with gdb on the core file?
>>>> You gave only partial list of the frames, not being enough to see the
>>>> execution trace.
>>>>
>>>> Cheers,
>>>> Daniel
>>>>
>>>> On 11/27/13 6:52 PM, Tuan Viet Nguyen wrote:
>>>>
>>>>    Hello,
>>>>
>>>>  I'll try to shut down the redis server to test the behavior of
>>>> kamailio and it has crashed if a call is received and then cancelled.
>>>>
>>>> *1/The kamailio version is 4.0.4*
>>>>
>>>>  *2/ Kamailio log *
>>>>  /usr/local/sbin/kamailio[25333]: ERROR: ndb_redis
>>>> [redis_client.c:364]: redisc_exec(): Redis error: Server closed the
>>>> connection
>>>> /usr/local/sbin/kamailio[25361]: : <core> [pass_fd.c:293]:
>>>> receive_fd(): ERROR: receive_fd: EOF on 13
>>>> /usr/local/sbin/kamailio[25328]: ALERT: <core> [main.c:788]:
>>>> handle_sigs(): child process 25333 exited by a signal 11
>>>> /usr/local/sbin/kamailio[25328]: ALERT: <core> [main.c:791]:
>>>> handle_sigs(): core was generated
>>>>
>>>>     I assume you disconnect redis server and don't reconnect it. It is
>>>> that correct?
>>>>
>>>> Then this line is an error but it should recover from that. I probably
>>>> should set this as a warning instead an error.
>>>>
>>>> /usr/local/sbin/kamailio[25333]: ERROR: ndb_redis [redis_client.c:364]:
>>>> redisc_exec(): Redis error: Server closed the connection
>>>>
>>>
>>>  Yes, it has been restarted
>>>
>>>
>>>>      *3/ Interesting information in the core*
>>>> #3  0x00007fc79412893d in redisvCommand (c=0x64657461, format=0x9
>>>> <Address 0x9 out of bounds>, ap=0x30, ap at entry=0x7fff0ff56aa8) at
>>>> hiredis.c:1304
>>>> No locals.
>>>> #4  0x00007fc794341713 in redisc_exec (srv=srv at entry=0x7fff0ff56be0,
>>>> res=res at entry=0x7fff0ff56c00, cmd=cmd at entry=0x7fff0ff56bf0) at
>>>> redis_client.c:368
>>>>         rsrv = 0x7fc794565150
>>>>         rpl = 0x7fc7946fab70
>>>>         c = 0 '\000'
>>>>         ap = {{gp_offset = 48, fp_offset = 48, overflow_arg_area =
>>>> 0x7fff0ff56bb0, reg_save_area = 0x7fff0ff56ac0}}
>>>>         __FUNCTION__ = "redisc_exec"
>>>> #5  0x00007fc79433b781 in w_redis_cmd5 (msg=<optimized out>,
>>>> ssrv=<optimized out>, scmd=<optimized out>, sargv1=<optimized out>,
>>>> sargv2=0x7fc7946f7bf0 "p\243_\224\307\177", sres=0x7fc7946f7c50 "
>>>> \253_\224\307\177") at ndb_redis_mod.c:250
>>>>         s = {{s = 0x7fc7945fb300 "kamailio_redis", len = 14}, {s =
>>>> 0x7fc7945f5f50 "PUBLISH %s %s", len = 13}, {s = 0x7fc7945fab20 "r", len =
>>>> 1}}
>>>>         arg1 = {s = 0x7fc7945f5f80 "notification", len = 12}
>>>>         arg2 = {
>>>>           s = 0x7fc794551c60 "info XXX"...,
>>>>           len = 212}
>>>>         c1 = 0 '\000'
>>>>         c2 = 0 '\000'
>>>>         __FUNCTION__ = "w_redis_cmd5"
>>>>
>>>>
>>>>     In the source code:
>>>>
>>>>     rpl->rplRedis = redisvCommand(rsrv->ctxRedis, cmd->s, ap );
>>>>     if(rpl->rplRedis == NULL)
>>>>     {
>>>>         /* null reply, reconnect and try again */
>>>>         if(rsrv->ctxRedis->err)
>>>>         {
>>>>             LM_ERR("Redis error: %s\n", rsrv->ctxRedis->errstr);
>>>>         }
>>>>         if(redisc_reconnect_server(rsrv)==0)
>>>>         {
>>>>             rpl->rplRedis = redisvCommand(rsrv->ctxRedis, cmd->s, ap);
>>>>         }
>>>>     }
>>>>
>>>> First redisvCommand executes but returns nothing. Then it shows a redis
>>>> error.
>>>>
>>>> It tries to reconnect and it manages to connect ?? because it shows no
>>>> more errors.
>>>>
>>>> And then executes redisvCommand again and crashes.
>>>>
>>>> If server is down it should not be able to connect and so not to
>>>> execute redisvCommand again.
>>>>
>>>
>>>  According to the core, we MUST be in this case
>>> *if(redisc_reconnect_server(rsrv)==0) *
>>> But I am wondering how the first redisvCommand can succeed before the
>>> reconnection ? (the connection kamailio1 <-> redis has already been taken
>>> down). Does all the redis context always there when we first call
>>> redisvCommand?
>>>
>>>
>>>>
>>>>
>>>> May be I would get more clues with more information.
>>>>
>>>> Regards,
>>>> Vicente.
>>>>
>>>
>>>  Thank you
>>>  Regards,
>>>
>>>
>>>>
>>>>
>>>>     I've found one of post that this issue has been fixed but it seems
>>>> that it's always the case ..
>>>>
>>>> http://www.mail-archive.com/search?l=sr-users@lists.sip-router.org&q=subject:%22Re%3A+%5BSR-Users%5D+ndb_redis+module+fails+after+a+while%22
>>>>
>>>>  Do you have any idea?
>>>>  Thank you
>>>>
>>>>
>>>> _______________________________________________
>>>> sr-dev mailing listsr-dev at lists.sip-router.orghttp://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
>>>>
>>>>
>>>> --
>>>> Daniel-Constantin Mierla - http://www.asipto.comhttp://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> sr-dev mailing listsr-dev at lists.sip-router.orghttp://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sip-router.org/pipermail/sr-dev/attachments/20131128/413fac04/attachment-0001.html>


More information about the sr-dev mailing list