[SR-Users] ndb_redis module fails after a while

Daniel-Constantin Mierla miconda at gmail.com
Fri Feb 17 10:47:18 CET 2012


Hello,

I made a patch for server reconnect -- I had no access to a computer 
with redis lib installed for the moment, hopefully it compiles. If you 
can try and tell the result, it would be great, I can commit then.

Cheers,
Daniel

On 1/16/12 12:15 PM, Javier Gallart wrote:
> Hi Daniel
>
> On Mon, Jan 16, 2012 at 9:47 AM, Daniel-Constantin Mierla 
> <miconda at gmail.com <mailto:miconda at gmail.com>> wrote:
>
>     Hello,
>
>
>     On 1/13/12 12:27 PM, Javier Gallart wrote:
>>     Hi Daniel
>>
>>     both values are null.
>     ok, could be a hint that the connection is down and try a
>     reconnect...
>
>
>>     I might have found something: apparently some of the sockets
>>     kamailio->redis were inactive for a while and were being closed
>>     in the redis end.
>
>     Do you know if there is a keepalive mechanism that reddis offers,
>     or a command to set the timeout value from the client side?
>
>
> In redis config file the only related value I've seen is "timeout". If 
> set to 0, the server never disconnects inactive clients. From the 
> client perspective, what about this: http://www.redis.io/commands/ping
>
> Regards
>
> Javi
>
>     Cheers,
>     Daniel
>
>>     This is redis default config:
>>     # Close the connection after a client is idle for N seconds (0 to
>>     disable)
>>     timeout 600
>>
>>     I've set the timeout value to 0 to confirm if this is actually
>>     the problem.
>>
>>     In case it might be useful for somebody, we've used lsof in
>>     recurrent mode to monitor the sockets status:
>>
>>     server# lsof -i :6379 -r 5"m===%T==="  | grep -e == -e kamailio
>>     ===05:28:26===
>>     kamailio  13365 kamailio    4u  IPv4  58622      0t0  TCP
>>     localhost:34994->localhost:6379 (ESTABLISHED)
>>     kamailio  13366 kamailio    4u  IPv4  58626      0t0  TCP
>>     localhost:34995->localhost:6379 (ESTABLISHED)
>>     kamailio  13367 kamailio    4u  IPv4  58628      0t0  TCP
>>     localhost:34996->localhost:6379 (ESTABLISHED)
>>     kamailio  13368 kamailio    4u  IPv4  58632      0t0  TCP
>>     localhost:34997->localhost:6379 (ESTABLISHED)
>>     kamailio  13369 kamailio    4u  IPv4  58649      0t0  TCP
>>     localhost:35000->localhost:6379 (ESTABLISHED)
>>     kamailio  13370 kamailio    4u  IPv4  58661      0t0  TCP
>>     localhost:35003->localhost:6379 (ESTABLISHED)
>>     kamailio  13376 kamailio   10u  IPv4  58710      0t0  TCP
>>     localhost:35013->localhost:6379 (ESTABLISHED)
>>     kamailio  13377 kamailio    4u  IPv4  58705      0t0  TCP
>>     localhost:35012->localhost:6379 (ESTABLISHED)
>>     kamailio  13378 kamailio    4u  IPv4  58695      0t0  TCP
>>     localhost:35008->localhost:6379 (ESTABLISHED)
>>     kamailio  13381 kamailio    4u  IPv4  58691      0t0  TCP
>>     localhost:35006->localhost:6379 (ESTABLISHED)
>>     kamailio  13382 kamailio    4u  IPv4  58693      0t0  TCP
>>     localhost:35007->localhost:6379 (ESTABLISHED)
>>     ===05:28:31===
>>     kamailio  13365 kamailio    4u  IPv4  58622      0t0  TCP
>>     localhost:34994->localhost:6379 (ESTABLISHED)
>>     kamailio  13366 kamailio    4u  IPv4  58626      0t0  TCP
>>     localhost:34995->localhost:6379 (CLOSE_WAIT)
>>     kamailio  13367 kamailio    4u  IPv4  58628      0t0  TCP
>>     localhost:34996->localhost:6379 (ESTABLISHED)
>>     kamailio  13368 kamailio    4u  IPv4  58632      0t0  TCP
>>     localhost:34997->localhost:6379 (CLOSE_WAIT)
>>     kamailio  13369 kamailio    4u  IPv4  58649      0t0  TCP
>>     localhost:35000->localhost:6379 (CLOSE_WAIT)
>>     kamailio  13370 kamailio    4u  IPv4  58661      0t0  TCP
>>     localhost:35003->localhost:6379 (CLOSE_WAIT)
>>     kamailio  13376 kamailio   10u  IPv4  58710      0t0  TCP
>>     localhost:35013->localhost:6379 (CLOSE_WAIT)
>>     kamailio  13377 kamailio    4u  IPv4  58705      0t0  TCP
>>     localhost:35012->localhost:6379 (CLOSE_WAIT)
>>     kamailio  13378 kamailio    4u  IPv4  58695      0t0  TCP
>>     localhost:35008->localhost:6379 (CLOSE_WAIT)
>>     kamailio  13381 kamailio    4u  IPv4  58691      0t0  TCP
>>     localhost:35006->localhost:6379 (CLOSE_WAIT)
>>     kamailio  13382 kamailio    4u  IPv4  58693      0t0  TCP
>>     localhost:35007->localhost:6379 (CLOSE_WAIT)
>>
>>     Regards
>>
>>     Javi
>>
>>     On Fri, Jan 13, 2012 at 9:35 AM, Daniel-Constantin Mierla
>>     <miconda at gmail.com <mailto:miconda at gmail.com>> wrote:
>>
>>         Hello,
>>
>>
>>         On 1/13/12 8:00 AM, Javier Gallart wrote:
>>
>>             Hi all
>>
>>             I have started making some tests with the ndb_redis
>>             module. So far we have not stressed the module (no more
>>             than 5 HGET  commands/second at maximum). It works well,
>>             but with at some point it starts failing. The failures
>>             are easily found because the logs always show this:
>>             INFO: <core> [main.c:811]: INFO: signal 13 received
>>
>>         this due to a broken connection. What do you get in redis
>>         reply and info variables?
>>
>>
>>             After that the redis value is always null. If I restart
>>             kamailio it starts working again.
>>             I've run kamailio with debug=4 but I haven't seen more
>>             useful information. On the redis side, I could find
>>             nothing in the logs either, the number of clientes
>>             connected is alway much less than the configured maximum,
>>             Any idea?
>>             On the other hand, if I restart redis we need to restart
>>             kamailio to restore the connections. Is the reconnection
>>             to redis on the roadmap?
>>
>>
>>         It should not be that complex, there is the code for
>>         initializing the connection, it should be reused for doing it
>>         again in case of failure.
>>
>>         Cheers,
>>         Daniel
>>
>>         -- 
>>         Daniel-Constantin Mierla -- http://www.asipto.com
>>         http://linkedin.com/in/miconda -- http://twitter.com/miconda
>>
>>
>>
>>
>>     _______________________________________________
>>     SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
>>     sr-users at lists.sip-router.org  <mailto:sr-users at lists.sip-router.org>
>>     http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
>
>     -- 
>     Daniel-Constantin Mierla --http://www.asipto.com
>     http://linkedin.com/in/miconda  -- http://twitter.com/miconda
>
>
>
>
> _______________________________________________
> SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
> sr-users at lists.sip-router.org
> http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users

-- 
Daniel-Constantin Mierla -- http://www.asipto.com
http://linkedin.com/in/miconda -- http://twitter.com/miconda

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sip-router.org/pipermail/sr-users/attachments/20120217/f09bd6a5/attachment-0001.htm>
-------------- next part --------------
diff --git a/modules/ndb_redis/redis_client.c b/modules/ndb_redis/redis_client.c
index 9f4ffc4..f477f92 100644
--- a/modules/ndb_redis/redis_client.c
+++ b/modules/ndb_redis/redis_client.c
@@ -199,6 +199,62 @@ redisc_server_t *redisc_get_server(str *name)
 /**
  *
  */
+int redisc_reconnect_server(redisc_server_t *rsrv)
+{
+	char *addr;
+	unsigned int port, db;
+	redisc_server_t *rsrv=NULL;
+	param_t *pit = NULL;
+	struct timeval tv;
+
+	tv.tv_sec = 1;
+	tv.tv_usec = 0;
+	addr = "127.0.0.1";
+	port = 6379;
+	db = 0;
+	for (pit = rsrv->attrs; pit; pit=pit->next)
+	{
+		if(pit->name.len==4 && strncmp(pit->name.s, "addr", 4)==0) {
+			addr = pit->body.s;
+			addr[pit->body.len] = '\0';
+		} else if(pit->name.len==4 && strncmp(pit->name.s, "port", 4)==0) {
+			if(str2int(&pit->body, &port) < 0)
+				port = 6379;
+		} else if(pit->name.len==2 && strncmp(pit->name.s, "db", 2)==0) {
+			if(str2int(&pit->body, &db) < 0)
+				db = 0;
+		}
+	}
+	if(rsrv->ctxRedis!=NULL) {
+		rsrv->ctxRedis = NULL;
+		redisFree(rsrv->ctxRedis);
+	}
+
+	rsrv->ctxRedis = redisConnectWithTimeout(addr, port, tv);
+	if(!rsrv->ctxRedis)
+		goto err;
+	if (rsrv->ctxRedis->err)
+		goto err2;
+	if (redisCommandNR(rsrv->ctxRedis, "PING"))
+		goto err2;
+	if (redisCommandNR(rsrv->ctxRedis, "SELECT %i", db))
+		goto err2;
+
+	return 0;
+
+err2:
+	LM_ERR("error communicating with redis server [%.*s] (%s:%d/%d): %s\n",
+		rsrv->sname->len, rsrv->sname->s, addr, port, db, rsrv->ctxRedis->errstr);
+	return -1;
+err:
+	LM_ERR("failed to connect to redis server [%.*s] (%s:%d/%d)\n",
+		rsrv->sname->len, rsrv->sname->s, addr, port, db);
+	return -1;
+}
+
+/**
+ *
+ */
 int redisc_exec(str *srv, str *cmd, str *argv1, str *argv2, str *argv3,
 		str *res)
 {
@@ -237,6 +293,14 @@ int redisc_exec(str *srv, str *cmd, str *argv1, str *argv2, str *argv3,
 	c = cmd->s[cmd->len];
 	cmd->s[cmd->len] = '\0';
 	rpl->rplRedis = redisCommand(rsrv->ctxRedis, cmd->s);
+	if(rpl->rplRedis == NULL)
+	{
+		/* null reply, reconnect and try again */
+		if(redisc_reconnect_server(rsrv)==0)
+		{
+			rpl->rplRedis = redisCommand(rsrv->ctxRedis, cmd->s);
+		}
+	}
 	cmd->s[cmd->len] = c;
 	return 0;
 }


More information about the sr-users mailing list