### Description ndb_redis is treating MOVED reply from redis cluster as error and breaking out even before the cluster handling code gets a chance to process this. ``` if(rpl->rplRedis->type == REDIS_REPLY_ERROR) { LM_ERR("Redis error:%.*s\n", (int)rpl->rplRedis->len, rpl->rplRedis->str); goto error_exec; }
if (check_cluster_reply(rpl->rplRedis, &rsrv)) { ... } ```
### Troubleshooting
#### Reproduction Setup a redis cluster with at least 2 nodes. Set modparam cluster=1 and allow_dynamic_nodes=1 fire redis get commands
#### Log Messages
``` ERROR: ndb_redis [redis_client.c:1037]: redisc_exec(): Redis error:MOVED 1090 10.4.20.69:6379 ```
### Possible Solutions revert https://github.com/kamailio/kamailio/commit/d00b14704805d728f5a845a6af900eff... or Add another check to ignore the above logic if cluster support is enabled. or Do not treat MOVED replies as of type REDIS_REPLY_ERROR
### Additional Information
``` kamailio built from branch 5.4.1 ```
@linuxmaniac - the commit referenced for revert was pushed by you, can you check and see if anything needs to be sorted out?
Closed #2461 via 5557b9b715a9ca754c454b5edaebf2a43b832015.
@kritarthh can you please confirm that 5557b9b fixes your issue? I will backport it afterwards.
@linuxmaniac This is working for me. Thanks for the fix.
@linuxmaniac Encountered a crash regarding this today. Relevant backtrace frame: ``` #0 0x00007fdc33f5e906 in redisc_exec (srv=0x7ffcb7b121b0, res=0x7ffcb7b121d0, cmd=0x7ffcb7b121c0) at redis_client.c:1073 rsrv = 0x7fdc63b2d878 rpl = 0x7fdc63b2d1d0 c = 0 '\000' ap = <error reading variable ap (Attempt to dereference a generic pointer.)> ap2 = <error reading variable ap2 (Attempt to dereference a generic pointer.)> ap3 = <error reading variable ap3 (Attempt to dereference a generic pointer.)> ap4 = <error reading variable ap4 (Attempt to dereference a generic pointer.)> ret = -1 __func__ = "redisc_exec" ``` Let's look at the code for this, with the line number mentioned on the left: ``` 1052 rpl->rplRedis = redisvCommand(rsrv->ctxRedis, cmd->s, ap3 ); 1053 if(rpl->rplRedis == NULL) 1054 { 1055 /* null reply, reconnect and try again */ 1056 if(rsrv->ctxRedis->err) 1057 { 1058 LM_ERR("Redis error: %s\n", rsrv->ctxRedis->errstr); 1059 } 1060 if(redisc_reconnect_server(rsrv)==0) 1061 { 1062 rpl->rplRedis = redisvCommand(rsrv->ctxRedis, cmd->s, ap4); 1063 } else { 1064 LM_ERR("unable to reconnect to redis server: %.*s\n", 1065 srv->len, srv->s); 1066 STR_ZTOV(cmd->s[cmd->len], c); 1067 goto error_exec; 1068 } 1069 } 1070 } 1071 1072 LM_DBG("rpl->rplRedis->type:%d\n", rpl->rplRedis->type); 1073 if(rpl->rplRedis->type == REDIS_REPLY_ERROR) { 1074 LM_ERR("Redis error:%.*s\n", 1075 (int)rpl->rplRedis->len, rpl->rplRedis->str); 1076 goto error_exec; 1077 } ``` This crashes if the code at :1062 fails, since after that we don't perform a null check on `rpl->rplRedis` and try to get its type at :1072 (or :1073 if debug logging is disabled). Earlier this was not a problem since we never tried to fetch the reply type in this function after the missing null check.
Added a simple null check to fix this. ``` diff --git a/src/modules/ndb_redis/redis_client.c b/src/modules/ndb_redis/redis_client.c index 42880310f2..cd3f14c7fb 100644 --- a/src/modules/ndb_redis/redis_client.c +++ b/src/modules/ndb_redis/redis_client.c @@ -1059,6 +1059,11 @@ int redisc_exec(str *srv, str *res, str *cmd, ...) if(redisc_reconnect_server(rsrv)==0) { rpl->rplRedis = redisvCommand(rsrv->ctxRedis, cmd->s, ap4); + if (rpl->rplRedis == NULL) + { + redis_count_err_and_disable(rsrv); + goto error_exec; + } } else { LM_ERR("unable to reconnect to redis server: %.*s\n", srv->len, srv->s); ```
@kritarthh - do not comment on closed old issues, just open a new one, otherwise it is a high risk not to spot the notification or forget about new comments as it does not show up in the list of open issues.
Also, if you already have a patch to propose a fix, you can even make a pull request directly, see contributing guidelines for how to format the commit message:
* https://github.com/kamailio/kamailio/blob/master/.github/CONTRIBUTING.md#com...
It seems that I forgot to backport the fix to 5.4, so backporting both now