[sr-dev] [Redis module] Kamailio crashes in case of connection lost to redis server
Vicente Hernando
vhernando at systemonenoc.com
Thu Nov 28 12:36:25 CET 2013
Hello Nguyen,
I have uploaded the patch in devel, 4.0, and 4.1 versions.
Regards,
Vicente.
On 11/28/2013 12:07 PM, Tuan Viet Nguyen wrote:
> Hi Vicente,
>
> It works now. Thank you for the patch. In which version will we have
> this one integrated ?
>
>
> Regards,
>
>
> On Thu, Nov 28, 2013 at 11:36 AM, Vicente Hernando
> <vhernando at systemonenoc.com <mailto:vhernando at systemonenoc.com>> wrote:
>
> Hello,
>
> could you test this patch and confirm the bug has disappeared?
>
> Thanks,
> Vicente.
>
>
> On 11/28/2013 11:10 AM, Tuan Viet Nguyen wrote:
>> Hi Vicente,
>>
>> Thank you for your quick reply.
>>
>> I'm ready to retest the patch.
>>
>> Regards,
>>
>>
>> On Thu, Nov 28, 2013 at 11:07 AM, Vicente Hernando
>> <vhernando at systemonenoc.com <mailto:vhernando at systemonenoc.com>>
>> wrote:
>>
>> Hello,
>>
>> I think you have discovered a bug I made using variadic
>> functions.
>>
>> Very soon I gonna send a patch to correct it.
>>
>>
>> Thanks,
>> Vicente.
>>
>>
>> On 11/28/2013 10:14 AM, Tuan Viet Nguyen wrote:
>>> Hello Vicente,
>>>
>>> Thank you for your reply, you'll find my answer below
>>>
>>> On Thu, Nov 28, 2013 at 12:03 AM, Vicente Hernando
>>> <vhernando at systemonenoc.com
>>> <mailto:vhernando at systemonenoc.com>> wrote:
>>>
>>> Hello,
>>>
>>> also full steps to crash kamailio and reproduce the
>>> error would be good.
>>>
>>>
>>> Here is the architecture
>>>
>>> A <--> Asterisk <--> Kamailio 1 <---> kamailio2 <--- ISP--->
>>> mobile
>>>
>>> Kamailio 1 & 2 are connected to a local redis server
>>> 1/ I restarted the redis server
>>> 2/ From the mobile I made a call to A then cancelled it. In
>>> the script of kamailio1, if a call has missed or failed, it
>>> sends a message to the redis. And in this case, it crashes
>>>
>>>
>>>
>>>
>>>
>>> On 11/27/2013 11:35 PM, Daniel-Constantin Mierla wrote:
>>>> Hello,
>>>>
>>>> can you give the full output for 'bt full' with gdb on
>>>> the core file? You gave only partial list of the
>>>> frames, not being enough to see the execution trace.
>>>>
>>>> Cheers,
>>>> Daniel
>>>>
>>>> On 11/27/13 6:52 PM, Tuan Viet Nguyen wrote:
>>>>> Hello,
>>>>>
>>>>> I'll try to shut down the redis server to test the
>>>>> behavior of kamailio and it has crashed if a call is
>>>>> received and then cancelled.
>>>>>
>>>>> *1/The kamailio version is 4.0.4*
>>>>>
>>>>> *2/ Kamailio log *
>>>>> /usr/local/sbin/kamailio[25333]: ERROR: ndb_redis
>>>>> [redis_client.c:364]: redisc_exec(): Redis error:
>>>>> Server closed the connection
>>>>> /usr/local/sbin/kamailio[25361]: : <core>
>>>>> [pass_fd.c:293]: receive_fd(): ERROR: receive_fd: EOF
>>>>> on 13
>>>>> /usr/local/sbin/kamailio[25328]: ALERT: <core>
>>>>> [main.c:788]: handle_sigs(): child process 25333
>>>>> exited by a signal 11
>>>>> /usr/local/sbin/kamailio[25328]: ALERT: <core>
>>>>> [main.c:791]: handle_sigs(): core was generated
>>>>>
>>> I assume you disconnect redis server and don't reconnect
>>> it. It is that correct?
>>>
>>> Then this line is an error but it should recover from
>>> that. I probably should set this as a warning instead an
>>> error.
>>>
>>> /usr/local/sbin/kamailio[25333]: ERROR: ndb_redis
>>> [redis_client.c:364]: redisc_exec(): Redis error: Server
>>> closed the connection
>>>
>>>
>>> Yes, it has been restarted
>>>
>>>
>>>>> _*3/ Interesting information in the core*_
>>>>> #3 0x00007fc79412893d in redisvCommand (c=0x64657461,
>>>>> format=0x9 <Address 0x9 out of bounds>, ap=0x30,
>>>>> ap at entry=0x7fff0ff56aa8) at hiredis.c:1304
>>>>> No locals.
>>>>> #4 0x00007fc794341713 in redisc_exec
>>>>> (srv=srv at entry=0x7fff0ff56be0,
>>>>> res=res at entry=0x7fff0ff56c00,
>>>>> cmd=cmd at entry=0x7fff0ff56bf0) at redis_client.c:368
>>>>> rsrv = 0x7fc794565150
>>>>> rpl = 0x7fc7946fab70
>>>>> c = 0 '\000'
>>>>> ap = {{gp_offset = 48, fp_offset = 48,
>>>>> overflow_arg_area = 0x7fff0ff56bb0, reg_save_area =
>>>>> 0x7fff0ff56ac0}}
>>>>> __FUNCTION__ = "redisc_exec"
>>>>> #5 0x00007fc79433b781 in w_redis_cmd5 (msg=<optimized
>>>>> out>, ssrv=<optimized out>, scmd=<optimized out>,
>>>>> sargv1=<optimized out>, sargv2=0x7fc7946f7bf0
>>>>> "p\243_\224\307\177", sres=0x7fc7946f7c50 "
>>>>> \253_\224\307\177") at ndb_redis_mod.c:250
>>>>> s = {{s = 0x7fc7945fb300 "kamailio_redis", len
>>>>> = 14}, {s = 0x7fc7945f5f50 "PUBLISH %s %s", len = 13},
>>>>> {s = 0x7fc7945fab20 "r", len = 1}}
>>>>> arg1 = {s = 0x7fc7945f5f80 "notification", len
>>>>> = 12}
>>>>> arg2 = {
>>>>> s = 0x7fc794551c60 "info XXX"...,
>>>>> len = 212}
>>>>> c1 = 0 '\000'
>>>>> c2 = 0 '\000'
>>>>> __FUNCTION__ = "w_redis_cmd5"
>>>>>
>>>>>
>>> In the source code:
>>>
>>> rpl->rplRedis = redisvCommand(rsrv->ctxRedis,
>>> cmd->s, ap );
>>> if(rpl->rplRedis == NULL)
>>> {
>>> /* null reply, reconnect and try again */
>>> if(rsrv->ctxRedis->err)
>>> {
>>> LM_ERR("Redis error: %s\n", rsrv->ctxRedis->errstr);
>>> }
>>> if(redisc_reconnect_server(rsrv)==0)
>>> {
>>> rpl->rplRedis = redisvCommand(rsrv->ctxRedis, cmd->s, ap);
>>> }
>>> }
>>>
>>> First redisvCommand executes but returns nothing. Then
>>> it shows a redis error.
>>>
>>> It tries to reconnect and it manages to connect ??
>>> because it shows no more errors.
>>>
>>> And then executes redisvCommand again and crashes.
>>>
>>> If server is down it should not be able to connect and
>>> so not to execute redisvCommand again.
>>>
>>>
>>> According to the core, we MUST be in this case
>>> *if(redisc_reconnect_server(rsrv)==0)
>>> *
>>> But I am wondering how the first redisvCommand can succeed
>>> before the reconnection ? (the connection kamailio1 <->
>>> redis has already been taken down). Does all the redis
>>> context always there when we first call redisvCommand?
>>>
>>>
>>>
>>> May be I would get more clues with more information.
>>>
>>> Regards,
>>> Vicente.
>>>
>>>
>>> Thank you
>>> Regards,
>>>
>>>
>>>
>>>>> I've found one of post that this issue has been fixed
>>>>> but it seems that it's always the case ..
>>>>> http://www.mail-archive.com/search?l=sr-users@lists.sip-router.org&q=subject:%22Re%3A+%5BSR-Users%5D+ndb_redis+module+fails+after+a+while%22
>>>>>
>>>>> Do you have any idea?
>>>>> Thank you
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> sr-dev mailing list
>>>>> sr-dev at lists.sip-router.org <mailto:sr-dev at lists.sip-router.org>
>>>>> http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
>>>>
>>>> --
>>>> Daniel-Constantin Mierla -http://www.asipto.com
>>>> http://twitter.com/#!/miconda <http://twitter.com/#%21/miconda> -http://www.linkedin.com/in/miconda
>>>>
>>>>
>>>> _______________________________________________
>>>> sr-dev mailing list
>>>> sr-dev at lists.sip-router.org <mailto:sr-dev at lists.sip-router.org>
>>>> http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sip-router.org/pipermail/sr-dev/attachments/20131128/bca8686b/attachment-0001.html>
More information about the sr-dev
mailing list