Hello,

I am going to take a step back here; it might be best to address the following issue that i have found (which is very much related to the one at hand) before proceeding to the issue in this ticket.

Regarding the following statements i made in this ticket: "In issue 1681 there is code that allows Kamailio to start even if a database connection can not be established. Queries attempting to run against the offline database fail gracefully. And once the database is back online, a connection is established and queries against it are successful.".

Those statements are indeed true; however, what i have noticed is that if i leave the database offline and another unrelated query using another unrelated database handle via sqlops executes, the program crashes. This other database handle is to a database that is connected to upon startup and is online, however it appears from kamailio's logs and the gdb output that the code thinks this particular online database is not online and attempts a reconnect (at which point the program crashes). And so we have the following scenario: one database offline; another online; test query to the offline database is gracefully rejected; but a query to the online database crashes kamailio.

So, the setup is this:
leave a database offline (shut down)
start up kamailio
kamailio starts even though the database is offline which is good
my test query begins to execute against the offline database and kamailio reacts gracefully to not
being able to submit the queries to the offline database. also good.
now, if another entirely unrelated query is executed (a query against an online database for example),
the program crashes.
I figure it might be best to tackle this issue first before addressing the one originally referenced in this ticket which is where the program crashes when a database engine is shutdown during normal call processing.

Here is the output of gdb for the issue where the database remains offline from start to end.

(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x00002ba7077959fb in sql_reconnect (sc=0x2ba5e34aac10) at sql_api.c:187
#2 0x00002ba7077a3448 in sql_check_connection (dbl=0x2ba5e34aac10) at sqlops.c:234
#3 0x00002ba7077a3765 in sql_query (msg=0x2ba5e3b59178, dbl=0x2ba5e34aac10 "(\253J\343\245+", query=0x2ba5e3b79c10 "\320\335c\343\245+",
res=0x2ba5e35f87d0 "\257\242\376h") at sqlops.c:247
#4 0x0000000000433f19 in do_action (h=0x7fff05edb820, a=0x2ba5e363e800, msg=0x2ba5e3b59178) at core/action.c:1085
#5 0x00000000004414df in run_actions (h=0x7fff05edb820, a=0x2ba5e363e800, msg=0x2ba5e3b59178) at core/action.c:1564
#6 0x0000000000441bd6 in run_actions_safe (h=0x7fff05ede060, a=0x2ba5e363e800, msg=0x2ba5e3b59178) at core/action.c:1625
#7 0x00000000004dd859 in lval_pvar_assign (h=0x7fff05ede060, msg=0x2ba5e3b59178, lv=0x2ba5e363e688, rv=0x2ba5e363f450) at core/lvalue.c:284
#8 0x00000000004dbf19 in lval_assign (h=0x7fff05ede060, msg=0x2ba5e3b59178, lv=0x2ba5e363e688, rve=0x2ba5e363f448) at core/lvalue.c:400
#9 0x000000000043f14d in do_action (h=0x7fff05ede060, a=0x2ba5e363de48, msg=0x2ba5e3b59178) at core/action.c:1443
#10 0x00000000004414df in run_actions (h=0x7fff05ede060, a=0x2ba5e363d3f8, msg=0x2ba5e3b59178) at core/action.c:1564
#11 0x00000000004305a6 in do_action (h=0x7fff05ede060, a=0x2ba5e390da70, msg=0x2ba5e3b59178) at core/action.c:691
#12 0x00000000004414df in run_actions (h=0x7fff05ede060, a=0x2ba5e3907958, msg=0x2ba5e3b59178) at core/action.c:1564
#13 0x00000000004305a6 in do_action (h=0x7fff05ede060, a=0x2ba5e3989260, msg=0x2ba5e3b59178) at core/action.c:691
#14 0x00000000004414df in run_actions (h=0x7fff05ede060, a=0x2ba5e3989260, msg=0x2ba5e3b59178) at core/action.c:1564
#15 0x0000000000441bd6 in run_actions_safe (h=0x7fff05ee1920, a=0x2ba5e3989260, msg=0x2ba5e3b59178) at core/action.c:1625
#16 0x00000000005843e0 in rval_get_int (h=0x7fff05ee1920, msg=0x2ba5e3b59178, i=0x7fff05ede768, rv=0x2ba5e39893b8, cache=0x0) at core/rvalue.c:915
#17 0x0000000000586d00 in rval_expr_eval_int (h=0x7fff05ee1920, msg=0x2ba5e3b59178, res=0x7fff05ede768, rve=0x2ba5e39893b0) at core/rvalue.c:1913
#18 0x0000000000587133 in rval_expr_eval_int (h=0x7fff05ee1920, msg=0x2ba5e3b59178, res=0x7fff05edf0e0, rve=0x2ba5e3989ae0) at core/rvalue.c:1921
#19 0x0000000000433810 in do_action (h=0x7fff05ee1920, a=0x2ba5e398a448, msg=0x2ba5e3b59178) at core/action.c:1043
#20 0x00000000004414df in run_actions (h=0x7fff05ee1920, a=0x2ba5e398a448, msg=0x2ba5e3b59178) at core/action.c:1564
#21 0x00000000004305a6 in do_action (h=0x7fff05ee1920, a=0x2ba5e350f4f0, msg=0x2ba5e3b59178) at core/action.c:691
#22 0x00000000004414df in run_actions (h=0x7fff05ee1920, a=0x2ba5e350f4f0, msg=0x2ba5e3b59178) at core/action.c:1564
#23 0x0000000000433ccc in do_action (h=0x7fff05ee1920, a=0x2ba5e3513e28, msg=0x2ba5e3b59178) at core/action.c:1058
#24 0x00000000004414df in run_actions (h=0x7fff05ee1920, a=0x2ba5e3503118, msg=0x2ba5e3b59178) at core/action.c:1564
#25 0x0000000000433ccc in do_action (h=0x7fff05ee1920, a=0x2ba5e35141b0, msg=0x2ba5e3b59178) at core/action.c:1058
#26 0x00000000004414df in run_actions (h=0x7fff05ee1920, a=0x2ba5e34ccb70, msg=0x2ba5e3b59178) at core/action.c:1564
#27 0x0000000000441cad in run_top_route (a=0x2ba5e34ccb70, msg=0x2ba5e3b59178, c=0x0) at core/action.c:1646
#28 0x0000000000545339 in receive_msg (
buf=0x595f530 "REGISTER sip:pipeline.bbpsphone.net SIP/2.0\r\nVia: SIP/2.0/TLS 10.65.5.1:27155;branch=z9hG4bK9QDJ45PWpBmNPe21;rport\r\nContact: <sip:259ef72cf54b2a6.79355009@10.65.5.1:27155;rinstance=377AD7AA;transport="..., len=948, rcv_info=0x2ba6070bb358) at core/receive.c:340
#29 0x0000000000640719 in receive_tcp_msg (
tcpbuf=0x2ba6070bb638 "REGISTER sip:pipeline.bbpsphone.net SIP/2.0\r\nVia: SIP/2.0/TLS 10.65.5.1:27155;branch=z9hG4bK9QDJ45PWpBmNPe21;rport\r\nContact: <sip:259ef72cf54b2a6.79355009@10.65.5.1:27155;rinstance=377AD7AA;transport="..., len=948, rcv_info=0x2ba6070bb358, con=0x2ba6070bb340) at core/tcp_read.c:1448
#30 0x00000000006426b0 in tcp_read_req (con=0x2ba6070bb340, bytes_read=0x7fff05ee2444, read_flags=0x7fff05ee243c) at core/tcp_read.c:1631
#31 0x000000000064c49d in handle_io (fm=0x2ba5e3ba3d50, events=1, idx=-1) at core/tcp_read.c:1862
#32 0x00000000006505d8 in io_wait_loop_epoll (h=0xad0bc0, t=2, repeat=0) at core/io_wait.h:1061
#33 0x0000000000645f12 in tcp_receive_loop (unix_sock=37) at core/tcp_read.c:1974
#34 0x000000000063347a in tcp_init_children () at core/tcp_main.c:5083
#35 0x0000000000425978 in main_loop () at main.c:1728
#36 0x000000000042bd72 in main (argc=13, argv=0x7fff05ee2de8) at main.c:2666

(gdb) frame 1
#1 0x00002ba7077959fb in sql_reconnect (sc=0x2ba5e34aac10) at sql_api.c:187
187 sc->dbh = sc->dbf.init(&sc->db_url);
(gdb) list
182 }
183 if (sc->dbh!=NULL) {
184 /* already connected */
185 return 0;
186 }
187 sc->dbh = sc->dbf.init(&sc->db_url);
188 if (sc->dbh==NULL) {
189 LM_ERR("failed to connect to the database [%.*s]\n",
190 sc->name.len, sc->name.s);
191 return -1;

(gdb) frame 2
#2 0x00002ba7077a3448 in sql_check_connection (dbl=0x2ba5e34aac10) at sqlops.c:234
234 if(sql_reconnect(dbl)<0) {
(gdb) list
229 LM_CRIT("no database handle with reconnect disabled\n");
230 return -1;
231 }
232
233 LM_DBG("try to establish SQL connection\n");
234 if(sql_reconnect(dbl)<0) {
235 LM_ERR("failed to connect to database\n");
236 return -1;
237 }
238 return 0;


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.