[sr-dev] MySQL weirdness - timeout does not work

Klaus Darilion klaus.mailinglists at pernau.at
Mon May 18 23:49:42 CEST 2015


Am 18.05.2015 um 14:36 schrieb Daniel-Constantin Mierla:
>
> On 18/05/15 13:31, Klaus Darilion wrote:
>> On 15.05.2015 22:48, Daniel-Constantin Mierla wrote:
>>> On 15/05/15 21:55, Klaus Darilion wrote:
>>>> Am 15.05.2015 um 17:54 schrieb Klaus Darilion:
>>>>> Hi Daniel!
>>>>>
>>>>> On 08.05.2015 15:33, Daniel-Constantin Mierla wrote:
>>>>>> Hello,
>>>>>>
>>>>>> On 08/05/15 15:13, Klaus Darilion wrote:
>>>>>>> Hi!
>>>>>>>
>>>>>>> (Kamailio 4.2.2 on Debian Squeeze with libmysqlclient16 5.1.73)
>>>>>>>
>>>>>>> I have a problem that a lost TCP connection is not detected by
>>>>>>> Kamailio/MySQL-CLient and this blocks the Kamailio process for 15
>>>>>>> minutes until Linux reports the TCP connection as broken.
>>>>>>>
>>>>>>> I am using db_mysql with the default settings. Thus, according to the
>>>>>>> db_mysql README, the default timeout_interval() of 2 seconds should
>>>>>>> be used.
>>>>>>>
>>>>>>> As this is not the case I took a look at the code and got confused: it
>>>>>>> seems there are 2 different mysql modules in modules/db_mysql, eg: the
>>>>>>> interface is defined in mysql_mod.c and km_db_mysql.c. Which one is
>>>>>>> actually used? I see they have different timeout setting code.
>>>>>>>
>>>>>>> Any hints on why the timeout is not working?
>>>>>> the files starting with km_* are used by kamailio modules. The other
>>>>>> are
>>>>>> for db api v2, used by few modules inherited from ser (those prefixed
>>>>>> with uid_*).
>>>>> I checked the code and the timeouts are correctly initialized.
>>>>>
>>>>> During all my tests Kamailio behaves correct and the lost TCP connection
>>>>> is detected after the configured timeout.
>>>>>
>>>>> But what drives my nuts is: it does not work on my production server. Do
>>>>> you have any ideas what could be the case that the timeout is not
>>>>> detected?
>>>> I think it is related to the debugger module. First tests show that
>>>> the mysql timeout works if the debugger module is not loaded. Is there
>>>> a known issue, e.g. with signal handlers, when the debugger module is
>>>> loaded?
>>>>
>>> I am not aware of issues and I don't remember debugger module
>>> overwriting any signal handler. What parameters do you set for debugger
>>> module?
>> It seems that I was wrong. After I enabled the debugger module again, it
>> still works. Maybe it only worked due to the restart and the problem
>> starts only after a certain uptime.
>>
>> Once I connected with strace to the process which waited for a mysql
>> timeout and the timeout did not worked. The strace output was (a few
>> times per second):
>>
>> sigreturn()                             = ? (mask now [])
>> read(6, 0xa395f60, 16384)               = -1 EINTR (Interrupted system call)
>> --- SIGALRM (Alarm clock) @ 0 (0) ---
>> sigreturn()                             = ? (mask now [])
>> read(6, 0xa395f60, 16384)               = -1 EINTR (Interrupted system call)
>> --- SIGALRM (Alarm clock) @ 0 (0) ---
>> sigreturn()                             = ? (mask now [])
>> read(6, 0xa395f60, 16384)               = -1 EINTR (Interrupted system call)
>> --- SIGALRM (Alarm clock) @ 0 (0) ---
>>
>> Do you have an idea what that could be related too?
> Just a guess now, it might be the non-blocking read inside the
> libmysqlclient.
Very strange: This is an strace on the timer process (failure route) 
with a good DB connection:
22:51:51.042122 write(6, "e\0\0\0\3select value,attribute,type"..., 105) 
= 105
22:51:51.042188 read(6, 
"\1\0\0\1\3F\0\0\2\3def\10kamailio\17usr_prefe"..., 16384) = 251

This is an strace on the timer process (failure route) with a broken DB 
connection:
22:54:37.042057 poll([{fd=6, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
22:54:37.042092 write(6, "e\0\0\0\3select value,attribute,type"..., 105) 
= 105
22:54:37.042154 read(6, 0x9cabb28, 16384) = -1 EINTR (Interrupted system 
call)
22:54:37.103256 --- SIGALRM (Alarm clock) @ 0 (0) ---
22:54:37.103314 sigreturn()             = ? (mask now [])
22:54:37.103360 read(6, 0x9cabb28, 16384) = -1 EINTR (Interrupted system 
call)
22:54:37.165745 --- SIGALRM (Alarm clock) @ 0 (0) ---
22:54:37.165800 sigreturn()             = ? (mask now [])
22:54:37.165843 read(6, 0x9cabb28, 16384) = -1 EINTR (Interrupted system 
call)
22:54:37.228245 --- SIGALRM (Alarm clock) @ 0 (0) ---
...

The strange thing is, that this does not happen always. Sometimes the 
read just returns EAGAIN and then the read is tried 3 times and the 
timeout is detected after 3x2s=6 seconds. Really weird. I never had 
issues with Kamailio 3.1, but now with Kamailio 4.1 (I still use the 
same old libmysqlclient 5.1.73)

If I understand it correctly then the read is interruted by a EINTR 
signal and it seems libmysqlclient handles this buggy and retries 
endlessly (until the IP stock reports the broken TCP connection). But 
who sends the SIGALRM (approx every 60 ms)? Is it Kamailio? Any ideas 
somebody?

Thanks
Klaus







More information about the sr-dev mailing list