[SR-Users] ndb_redis module fails after a while

Javier Gallart jgallartm at gmail.com
Fri Jan 13 12:27:51 CET 2012


Hi Daniel

both values are null. I might have found something: apparently some of the
sockets kamailio->redis were inactive for a while and were being closed in
the redis end. This is redis default config:
# Close the connection after a client is idle for N seconds (0 to disable)
timeout 600

I've set the timeout value to 0 to confirm if this is actually the problem.

In case it might be useful for somebody, we've used lsof in recurrent mode
to monitor the sockets status:

server# lsof -i :6379 -r 5"m===%T==="  | grep -e == -e kamailio
===05:28:26===
kamailio  13365 kamailio    4u  IPv4  58622      0t0  TCP
localhost:34994->localhost:6379 (ESTABLISHED)
kamailio  13366 kamailio    4u  IPv4  58626      0t0  TCP
localhost:34995->localhost:6379 (ESTABLISHED)
kamailio  13367 kamailio    4u  IPv4  58628      0t0  TCP
localhost:34996->localhost:6379 (ESTABLISHED)
kamailio  13368 kamailio    4u  IPv4  58632      0t0  TCP
localhost:34997->localhost:6379 (ESTABLISHED)
kamailio  13369 kamailio    4u  IPv4  58649      0t0  TCP
localhost:35000->localhost:6379 (ESTABLISHED)
kamailio  13370 kamailio    4u  IPv4  58661      0t0  TCP
localhost:35003->localhost:6379 (ESTABLISHED)
kamailio  13376 kamailio   10u  IPv4  58710      0t0  TCP
localhost:35013->localhost:6379 (ESTABLISHED)
kamailio  13377 kamailio    4u  IPv4  58705      0t0  TCP
localhost:35012->localhost:6379 (ESTABLISHED)
kamailio  13378 kamailio    4u  IPv4  58695      0t0  TCP
localhost:35008->localhost:6379 (ESTABLISHED)
kamailio  13381 kamailio    4u  IPv4  58691      0t0  TCP
localhost:35006->localhost:6379 (ESTABLISHED)
kamailio  13382 kamailio    4u  IPv4  58693      0t0  TCP
localhost:35007->localhost:6379 (ESTABLISHED)
===05:28:31===
kamailio  13365 kamailio    4u  IPv4  58622      0t0  TCP
localhost:34994->localhost:6379 (ESTABLISHED)
kamailio  13366 kamailio    4u  IPv4  58626      0t0  TCP
localhost:34995->localhost:6379 (CLOSE_WAIT)
kamailio  13367 kamailio    4u  IPv4  58628      0t0  TCP
localhost:34996->localhost:6379 (ESTABLISHED)
kamailio  13368 kamailio    4u  IPv4  58632      0t0  TCP
localhost:34997->localhost:6379 (CLOSE_WAIT)
kamailio  13369 kamailio    4u  IPv4  58649      0t0  TCP
localhost:35000->localhost:6379 (CLOSE_WAIT)
kamailio  13370 kamailio    4u  IPv4  58661      0t0  TCP
localhost:35003->localhost:6379 (CLOSE_WAIT)
kamailio  13376 kamailio   10u  IPv4  58710      0t0  TCP
localhost:35013->localhost:6379 (CLOSE_WAIT)
kamailio  13377 kamailio    4u  IPv4  58705      0t0  TCP
localhost:35012->localhost:6379 (CLOSE_WAIT)
kamailio  13378 kamailio    4u  IPv4  58695      0t0  TCP
localhost:35008->localhost:6379 (CLOSE_WAIT)
kamailio  13381 kamailio    4u  IPv4  58691      0t0  TCP
localhost:35006->localhost:6379 (CLOSE_WAIT)
kamailio  13382 kamailio    4u  IPv4  58693      0t0  TCP
localhost:35007->localhost:6379 (CLOSE_WAIT)

Regards

Javi

On Fri, Jan 13, 2012 at 9:35 AM, Daniel-Constantin Mierla <miconda at gmail.com
> wrote:

> Hello,
>
>
> On 1/13/12 8:00 AM, Javier Gallart wrote:
>
>> Hi all
>>
>> I have started making some tests with the ndb_redis module. So far we
>> have not stressed the module (no more than 5 HGET  commands/second at
>> maximum). It works well, but with at some point it starts failing. The
>> failures are easily found because the logs always show this:
>> INFO: <core> [main.c:811]: INFO: signal 13 received
>>
> this due to a broken connection. What do you get in redis reply and info
> variables?
>
>
>  After that the redis value is always null. If I restart kamailio it
>> starts working again.
>> I've run kamailio with debug=4 but I haven't seen more useful
>> information. On the redis side, I could find nothing in the logs either,
>> the number of clientes connected is alway much less than the configured
>> maximum, Any idea?
>> On the other hand, if I restart redis we need to restart kamailio to
>> restore the connections. Is the reconnection to redis on the roadmap?
>>
>
> It should not be that complex, there is the code for initializing the
> connection, it should be reused for doing it again in case of failure.
>
> Cheers,
> Daniel
>
> --
> Daniel-Constantin Mierla -- http://www.asipto.com
> http://linkedin.com/in/miconda -- http://twitter.com/miconda
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sip-router.org/pipermail/sr-users/attachments/20120113/a6d24d44/attachment.htm>


More information about the sr-users mailing list