Daniel/Henning,

The root cause of the crash lies in the sqlops/sql_api.c file within the function sql_connect. I pasted that function below so we can reference it when reviewing my notes below it:

int sql_connect(int mode)
{
sql_con_t *sc;
sc = _sql_con_root;
while(sc)
{
if (db_bind_mod(&sc->db_url, &sc->dbf))
{
LM_DBG("database module not found for [%.*s]\n",
sc->name.len, sc->name.s);
return -1;
}
if (!DB_CAPABILITY(sc->dbf, DB_CAP_RAW_QUERY))
{
LM_ERR("database module does not have DB_CAP_ALL [%.*s]\n",
sc->name.len, sc->name.s);
return -1;
}
sc->dbh = sc->dbf.init(&sc->db_url);
if (sc->dbh==NULL)
{
if(mode) {
LM_ERR("failed to connect to the database [%.*s]\n",
sc->name.len, sc->name.s);
return -1;
} else {
LM_INFO("failed to connect to the database [%.*s] - trying next\n",
sc->name.len, sc->name.s);
}
}
sc = sc->next;
}
return 0;
}

Notice the if(mode) clause. Looks like the statements within it need to be reversed. That is, if mode, then continue trying connecting to other database instances. If not mode, then return false immediately.

The setup for the crash begins to manifest if you have more database instances to connect to in the sql_con_t linked list when the code encounters a database instance it can't connect to and returns false.

If at a later time one of those database instances (ones remaining in the linked list that we weren't able to connect to because of a pre-mature return) has a sql submitted to it, the sql_reconnect function gets called because the connection structure has been initialized for that database instance but unfortunately because there was no actual attempt to connect made in sql_connect, the sc->dbf member is null. Basically this piece of code never gets executed for the remaining database instances in the linked list with the sql_connect function :
if (db_bind_mod(&sc->db_url, &sc->dbf))

sc->dbf remains null and access to it via sql_reconnect creates the segmentation fault.

This is clearly seen in the gdb output.

I have tested the code with reversing the logic in the if(mode) statement and all works well.

If you agree with my analysis, please let me know how we should proceed here.

Either i can make the change or you can. I am fine with either.

Thanks,

Karthik


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.