[SR-Users] ndb_redis module fails after a while
Daniel-Constantin Mierla
miconda at gmail.com
Fri Feb 17 10:47:18 CET 2012
Hello,
I made a patch for server reconnect -- I had no access to a computer
with redis lib installed for the moment, hopefully it compiles. If you
can try and tell the result, it would be great, I can commit then.
Cheers,
Daniel
On 1/16/12 12:15 PM, Javier Gallart wrote:
> Hi Daniel
>
> On Mon, Jan 16, 2012 at 9:47 AM, Daniel-Constantin Mierla
> <miconda at gmail.com <mailto:miconda at gmail.com>> wrote:
>
> Hello,
>
>
> On 1/13/12 12:27 PM, Javier Gallart wrote:
>> Hi Daniel
>>
>> both values are null.
> ok, could be a hint that the connection is down and try a
> reconnect...
>
>
>> I might have found something: apparently some of the sockets
>> kamailio->redis were inactive for a while and were being closed
>> in the redis end.
>
> Do you know if there is a keepalive mechanism that reddis offers,
> or a command to set the timeout value from the client side?
>
>
> In redis config file the only related value I've seen is "timeout". If
> set to 0, the server never disconnects inactive clients. From the
> client perspective, what about this: http://www.redis.io/commands/ping
>
> Regards
>
> Javi
>
> Cheers,
> Daniel
>
>> This is redis default config:
>> # Close the connection after a client is idle for N seconds (0 to
>> disable)
>> timeout 600
>>
>> I've set the timeout value to 0 to confirm if this is actually
>> the problem.
>>
>> In case it might be useful for somebody, we've used lsof in
>> recurrent mode to monitor the sockets status:
>>
>> server# lsof -i :6379 -r 5"m===%T===" | grep -e == -e kamailio
>> ===05:28:26===
>> kamailio 13365 kamailio 4u IPv4 58622 0t0 TCP
>> localhost:34994->localhost:6379 (ESTABLISHED)
>> kamailio 13366 kamailio 4u IPv4 58626 0t0 TCP
>> localhost:34995->localhost:6379 (ESTABLISHED)
>> kamailio 13367 kamailio 4u IPv4 58628 0t0 TCP
>> localhost:34996->localhost:6379 (ESTABLISHED)
>> kamailio 13368 kamailio 4u IPv4 58632 0t0 TCP
>> localhost:34997->localhost:6379 (ESTABLISHED)
>> kamailio 13369 kamailio 4u IPv4 58649 0t0 TCP
>> localhost:35000->localhost:6379 (ESTABLISHED)
>> kamailio 13370 kamailio 4u IPv4 58661 0t0 TCP
>> localhost:35003->localhost:6379 (ESTABLISHED)
>> kamailio 13376 kamailio 10u IPv4 58710 0t0 TCP
>> localhost:35013->localhost:6379 (ESTABLISHED)
>> kamailio 13377 kamailio 4u IPv4 58705 0t0 TCP
>> localhost:35012->localhost:6379 (ESTABLISHED)
>> kamailio 13378 kamailio 4u IPv4 58695 0t0 TCP
>> localhost:35008->localhost:6379 (ESTABLISHED)
>> kamailio 13381 kamailio 4u IPv4 58691 0t0 TCP
>> localhost:35006->localhost:6379 (ESTABLISHED)
>> kamailio 13382 kamailio 4u IPv4 58693 0t0 TCP
>> localhost:35007->localhost:6379 (ESTABLISHED)
>> ===05:28:31===
>> kamailio 13365 kamailio 4u IPv4 58622 0t0 TCP
>> localhost:34994->localhost:6379 (ESTABLISHED)
>> kamailio 13366 kamailio 4u IPv4 58626 0t0 TCP
>> localhost:34995->localhost:6379 (CLOSE_WAIT)
>> kamailio 13367 kamailio 4u IPv4 58628 0t0 TCP
>> localhost:34996->localhost:6379 (ESTABLISHED)
>> kamailio 13368 kamailio 4u IPv4 58632 0t0 TCP
>> localhost:34997->localhost:6379 (CLOSE_WAIT)
>> kamailio 13369 kamailio 4u IPv4 58649 0t0 TCP
>> localhost:35000->localhost:6379 (CLOSE_WAIT)
>> kamailio 13370 kamailio 4u IPv4 58661 0t0 TCP
>> localhost:35003->localhost:6379 (CLOSE_WAIT)
>> kamailio 13376 kamailio 10u IPv4 58710 0t0 TCP
>> localhost:35013->localhost:6379 (CLOSE_WAIT)
>> kamailio 13377 kamailio 4u IPv4 58705 0t0 TCP
>> localhost:35012->localhost:6379 (CLOSE_WAIT)
>> kamailio 13378 kamailio 4u IPv4 58695 0t0 TCP
>> localhost:35008->localhost:6379 (CLOSE_WAIT)
>> kamailio 13381 kamailio 4u IPv4 58691 0t0 TCP
>> localhost:35006->localhost:6379 (CLOSE_WAIT)
>> kamailio 13382 kamailio 4u IPv4 58693 0t0 TCP
>> localhost:35007->localhost:6379 (CLOSE_WAIT)
>>
>> Regards
>>
>> Javi
>>
>> On Fri, Jan 13, 2012 at 9:35 AM, Daniel-Constantin Mierla
>> <miconda at gmail.com <mailto:miconda at gmail.com>> wrote:
>>
>> Hello,
>>
>>
>> On 1/13/12 8:00 AM, Javier Gallart wrote:
>>
>> Hi all
>>
>> I have started making some tests with the ndb_redis
>> module. So far we have not stressed the module (no more
>> than 5 HGET commands/second at maximum). It works well,
>> but with at some point it starts failing. The failures
>> are easily found because the logs always show this:
>> INFO: <core> [main.c:811]: INFO: signal 13 received
>>
>> this due to a broken connection. What do you get in redis
>> reply and info variables?
>>
>>
>> After that the redis value is always null. If I restart
>> kamailio it starts working again.
>> I've run kamailio with debug=4 but I haven't seen more
>> useful information. On the redis side, I could find
>> nothing in the logs either, the number of clientes
>> connected is alway much less than the configured maximum,
>> Any idea?
>> On the other hand, if I restart redis we need to restart
>> kamailio to restore the connections. Is the reconnection
>> to redis on the roadmap?
>>
>>
>> It should not be that complex, there is the code for
>> initializing the connection, it should be reused for doing it
>> again in case of failure.
>>
>> Cheers,
>> Daniel
>>
>> --
>> Daniel-Constantin Mierla -- http://www.asipto.com
>> http://linkedin.com/in/miconda -- http://twitter.com/miconda
>>
>>
>>
>>
>> _______________________________________________
>> SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
>> sr-users at lists.sip-router.org <mailto:sr-users at lists.sip-router.org>
>> http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
>
> --
> Daniel-Constantin Mierla --http://www.asipto.com
> http://linkedin.com/in/miconda -- http://twitter.com/miconda
>
>
>
>
> _______________________________________________
> SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
> sr-users at lists.sip-router.org
> http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
--
Daniel-Constantin Mierla -- http://www.asipto.com
http://linkedin.com/in/miconda -- http://twitter.com/miconda
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sip-router.org/pipermail/sr-users/attachments/20120217/f09bd6a5/attachment-0001.htm>
-------------- next part --------------
diff --git a/modules/ndb_redis/redis_client.c b/modules/ndb_redis/redis_client.c
index 9f4ffc4..f477f92 100644
--- a/modules/ndb_redis/redis_client.c
+++ b/modules/ndb_redis/redis_client.c
@@ -199,6 +199,62 @@ redisc_server_t *redisc_get_server(str *name)
/**
*
*/
+int redisc_reconnect_server(redisc_server_t *rsrv)
+{
+ char *addr;
+ unsigned int port, db;
+ redisc_server_t *rsrv=NULL;
+ param_t *pit = NULL;
+ struct timeval tv;
+
+ tv.tv_sec = 1;
+ tv.tv_usec = 0;
+ addr = "127.0.0.1";
+ port = 6379;
+ db = 0;
+ for (pit = rsrv->attrs; pit; pit=pit->next)
+ {
+ if(pit->name.len==4 && strncmp(pit->name.s, "addr", 4)==0) {
+ addr = pit->body.s;
+ addr[pit->body.len] = '\0';
+ } else if(pit->name.len==4 && strncmp(pit->name.s, "port", 4)==0) {
+ if(str2int(&pit->body, &port) < 0)
+ port = 6379;
+ } else if(pit->name.len==2 && strncmp(pit->name.s, "db", 2)==0) {
+ if(str2int(&pit->body, &db) < 0)
+ db = 0;
+ }
+ }
+ if(rsrv->ctxRedis!=NULL) {
+ rsrv->ctxRedis = NULL;
+ redisFree(rsrv->ctxRedis);
+ }
+
+ rsrv->ctxRedis = redisConnectWithTimeout(addr, port, tv);
+ if(!rsrv->ctxRedis)
+ goto err;
+ if (rsrv->ctxRedis->err)
+ goto err2;
+ if (redisCommandNR(rsrv->ctxRedis, "PING"))
+ goto err2;
+ if (redisCommandNR(rsrv->ctxRedis, "SELECT %i", db))
+ goto err2;
+
+ return 0;
+
+err2:
+ LM_ERR("error communicating with redis server [%.*s] (%s:%d/%d): %s\n",
+ rsrv->sname->len, rsrv->sname->s, addr, port, db, rsrv->ctxRedis->errstr);
+ return -1;
+err:
+ LM_ERR("failed to connect to redis server [%.*s] (%s:%d/%d)\n",
+ rsrv->sname->len, rsrv->sname->s, addr, port, db);
+ return -1;
+}
+
+/**
+ *
+ */
int redisc_exec(str *srv, str *cmd, str *argv1, str *argv2, str *argv3,
str *res)
{
@@ -237,6 +293,14 @@ int redisc_exec(str *srv, str *cmd, str *argv1, str *argv2, str *argv3,
c = cmd->s[cmd->len];
cmd->s[cmd->len] = '\0';
rpl->rplRedis = redisCommand(rsrv->ctxRedis, cmd->s);
+ if(rpl->rplRedis == NULL)
+ {
+ /* null reply, reconnect and try again */
+ if(redisc_reconnect_server(rsrv)==0)
+ {
+ rpl->rplRedis = redisCommand(rsrv->ctxRedis, cmd->s);
+ }
+ }
cmd->s[cmd->len] = c;
return 0;
}
More information about the sr-users
mailing list