Hi all,
Is it normal that a simple mysql error causes the worker process to die which in turn shuts down all other openser processes? If I remember correctly, this never happened to me with the 1.2 version.
The following log comes from an 1.3 installation:
[30812]: ERROR:mysql:db_mysql_submit_query: driver error: Can't lock file (errno: 4009) [30812]: ERROR:mysql:db_mysql_raw_query: error while submitting query [30812]: ERROR:usrloc:get_all_db_ucontacts: raw_query failed [30792]: INFO:core:handle_sigs: child process 30812 exited by a signal 11 [30792]: INFO:core:handle_sigs: core was not generated [30792]: INFO:core:handle_sigs: terminating due to SIGCHLD [30797]: INFO:core:sig_usr: signal 15 received [30814]: INFO:core:sig_usr: signal 15 received [30798]: INFO:core:sig_usr: signal 15 received [30800]: INFO:core:sig_usr: signal 15 received [30802]: INFO:core:sig_usr: signal 15 received [30804]: INFO:core:sig_usr: signal 15 received [30806]: INFO:core:sig_usr: signal 15 received [30808]: INFO:core:sig_usr: signal 15 received Jan 14 12:54:45 proxy-2 ser[30810]: INFO:core:sig_usr: signal 15 received
/Christian
On Thursday 17 January 2008, Christian Schlatter wrote:
Hi all,
Is it normal that a simple mysql error causes the worker process to die which in turn shuts down all other openser processes? If I remember correctly, this never happened to me with the 1.2 version.
The following log comes from an 1.3 installation:
[30812]: ERROR:mysql:db_mysql_submit_query: driver error: Can't lock file (errno: 4009) [30812]: ERROR:mysql:db_mysql_raw_query: error while submitting query [30812]: ERROR:usrloc:get_all_db_ucontacts: raw_query failed [30792]: INFO:core:handle_sigs: child process 30812 exited by a signal 11 [30792]: INFO:core:handle_sigs: core was not generated [30792]: INFO:core:handle_sigs: terminating due to SIGCHLD
Hi Christian,
no, this is not normal. But looking to the trace it seems that the problem is not located in the mysql driver, but in the calling functions from usrloc, as the error is propagated to this layer: submit_query -> raw_query -> get_all_.
The error is probably located in the function that called get_all_db_ucontact, perhaps a return value is not checked, and a NULL pointer is accessed because the necessary data is not returned.. If the SIG 11 were in the mysql driver, then the server would never get back to this point, or am i wrong?
It would be nice to have a backtrace from gdb.. :-)
Cheers,
Henning
Hi Henning,
I'll try to get a backtrace. Unfortunately it didn't generate a core dump the last time it happened, I guess because the openser user couldn't write to the openser working directory. Shouldn't a core be generated even if openser is run as non-root, at least as long as 'disable_core_dump' is off?
The other thing is that I can't easily reproduce this problem. It looks like it only happens with mysql error 4009 (mysql cluster error), but not with e.g. error 2 (can't connect). Strange ...
/Christian
Henning Westerholt wrote:
On Thursday 17 January 2008, Christian Schlatter wrote:
Hi all,
Is it normal that a simple mysql error causes the worker process to die which in turn shuts down all other openser processes? If I remember correctly, this never happened to me with the 1.2 version.
The following log comes from an 1.3 installation:
[30812]: ERROR:mysql:db_mysql_submit_query: driver error: Can't lock file (errno: 4009) [30812]: ERROR:mysql:db_mysql_raw_query: error while submitting query [30812]: ERROR:usrloc:get_all_db_ucontacts: raw_query failed [30792]: INFO:core:handle_sigs: child process 30812 exited by a signal 11 [30792]: INFO:core:handle_sigs: core was not generated [30792]: INFO:core:handle_sigs: terminating due to SIGCHLD
Hi Christian,
no, this is not normal. But looking to the trace it seems that the problem is not located in the mysql driver, but in the calling functions from usrloc, as the error is propagated to this layer: submit_query -> raw_query -> get_all_.
The error is probably located in the function that called get_all_db_ucontact, perhaps a return value is not checked, and a NULL pointer is accessed because the necessary data is not returned.. If the SIG 11 were in the mysql driver, then the server would never get back to this point, or am i wrong?
It would be nice to have a backtrace from gdb.. :-)
Cheers,
Henning
On Thursday 17 January 2008, Christian Schlatter wrote:
Hi Henning,
I'll try to get a backtrace. Unfortunately it didn't generate a core dump the last time it happened, I guess because the openser user couldn't write to the openser working directory. Shouldn't a core be generated even if openser is run as non-root, at least as long as 'disable_core_dump' is off?
Hi Christian,
yes, a dump should be generated, as long as the server working directory (as specified with parameter) is writable for this user, and the ulimits allows it too. The ulimit is set if you enable coredumps in /etc/default/openser.
The other thing is that I can't easily reproduce this problem. It looks like it only happens with mysql error 4009 (mysql cluster error), but not with e.g. error 2 (can't connect). Strange ...
Perhaps this error causes a SIG11 in the libmysql? This whole cluster stuff don't seem that stable on errors sometimes, at least my short google research suggest this. Normal driver errors don't causes this problems, i observed this too.
Cheers,
Henning
Hi Christian,
I took a look and made a fix related to this. Are you using db mode DB ONLY in usrloc? also which module are you using? nathelper or mediaproxy?
Please update from CVS (devel branch) and give it a try.
Thanks and regards, Bogdan
Christian Schlatter wrote:
Hi all,
Is it normal that a simple mysql error causes the worker process to die which in turn shuts down all other openser processes? If I remember correctly, this never happened to me with the 1.2 version.
The following log comes from an 1.3 installation:
[30812]: ERROR:mysql:db_mysql_submit_query: driver error: Can't lock file (errno: 4009) [30812]: ERROR:mysql:db_mysql_raw_query: error while submitting query [30812]: ERROR:usrloc:get_all_db_ucontacts: raw_query failed [30792]: INFO:core:handle_sigs: child process 30812 exited by a signal 11 [30792]: INFO:core:handle_sigs: core was not generated [30792]: INFO:core:handle_sigs: terminating due to SIGCHLD [30797]: INFO:core:sig_usr: signal 15 received [30814]: INFO:core:sig_usr: signal 15 received [30798]: INFO:core:sig_usr: signal 15 received [30800]: INFO:core:sig_usr: signal 15 received [30802]: INFO:core:sig_usr: signal 15 received [30804]: INFO:core:sig_usr: signal 15 received [30806]: INFO:core:sig_usr: signal 15 received [30808]: INFO:core:sig_usr: signal 15 received Jan 14 12:54:45 proxy-2 ser[30810]: INFO:core:sig_usr: signal 15 received
/Christian
Users mailing list Users@lists.openser.org http://lists.openser.org/cgi-bin/mailman/listinfo/users
Hi Bogdan,
Bogdan-Andrei Iancu wrote:
Hi Christian,
I took a look and made a fix related to this. Are you using db mode DB ONLY in usrloc? also which module are you using? nathelper or mediaproxy?
I'm using usrloc DB_ONLY mode together with the mediaproxy module. Thanks for the patch, I'll give it a try.
/Christian
Please update from CVS (devel branch) and give it a try.
Thanks and regards, Bogdan
Christian Schlatter wrote:
Hi all,
Is it normal that a simple mysql error causes the worker process to die which in turn shuts down all other openser processes? If I remember correctly, this never happened to me with the 1.2 version.
The following log comes from an 1.3 installation:
[30812]: ERROR:mysql:db_mysql_submit_query: driver error: Can't lock file (errno: 4009) [30812]: ERROR:mysql:db_mysql_raw_query: error while submitting query [30812]: ERROR:usrloc:get_all_db_ucontacts: raw_query failed [30792]: INFO:core:handle_sigs: child process 30812 exited by a signal 11 [30792]: INFO:core:handle_sigs: core was not generated [30792]: INFO:core:handle_sigs: terminating due to SIGCHLD [30797]: INFO:core:sig_usr: signal 15 received [30814]: INFO:core:sig_usr: signal 15 received [30798]: INFO:core:sig_usr: signal 15 received [30800]: INFO:core:sig_usr: signal 15 received [30802]: INFO:core:sig_usr: signal 15 received [30804]: INFO:core:sig_usr: signal 15 received [30806]: INFO:core:sig_usr: signal 15 received [30808]: INFO:core:sig_usr: signal 15 received Jan 14 12:54:45 proxy-2 ser[30810]: INFO:core:sig_usr: signal 15 received
/Christian
Users mailing list Users@lists.openser.org http://lists.openser.org/cgi-bin/mailman/listinfo/users