[sr-dev] MySQL weirdness - timeout does not work

Daniel-Constantin Mierla miconda at gmail.com
Tue May 19 17:03:23 CEST 2015



On 18/05/15 23:49, Klaus Darilion wrote:
> Am 18.05.2015 um 14:36 schrieb Daniel-Constantin Mierla:
>>
>> On 18/05/15 13:31, Klaus Darilion wrote:
>>> On 15.05.2015 22:48, Daniel-Constantin Mierla wrote:
>>>> On 15/05/15 21:55, Klaus Darilion wrote:
>>>>> Am 15.05.2015 um 17:54 schrieb Klaus Darilion:
>>>>>> Hi Daniel!
>>>>>>
>>>>>> On 08.05.2015 15:33, Daniel-Constantin Mierla wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> On 08/05/15 15:13, Klaus Darilion wrote:
>>>>>>>> Hi!
>>>>>>>>
>>>>>>>> (Kamailio 4.2.2 on Debian Squeeze with libmysqlclient16 5.1.73)
>>>>>>>>
>>>>>>>> I have a problem that a lost TCP connection is not detected by
>>>>>>>> Kamailio/MySQL-CLient and this blocks the Kamailio process for 15
>>>>>>>> minutes until Linux reports the TCP connection as broken.
>>>>>>>>
>>>>>>>> I am using db_mysql with the default settings. Thus, according
>>>>>>>> to the
>>>>>>>> db_mysql README, the default timeout_interval() of 2 seconds
>>>>>>>> should
>>>>>>>> be used.
>>>>>>>>
>>>>>>>> As this is not the case I took a look at the code and got
>>>>>>>> confused: it
>>>>>>>> seems there are 2 different mysql modules in modules/db_mysql,
>>>>>>>> eg: the
>>>>>>>> interface is defined in mysql_mod.c and km_db_mysql.c. Which
>>>>>>>> one is
>>>>>>>> actually used? I see they have different timeout setting code.
>>>>>>>>
>>>>>>>> Any hints on why the timeout is not working?
>>>>>>> the files starting with km_* are used by kamailio modules. The
>>>>>>> other
>>>>>>> are
>>>>>>> for db api v2, used by few modules inherited from ser (those
>>>>>>> prefixed
>>>>>>> with uid_*).
>>>>>> I checked the code and the timeouts are correctly initialized.
>>>>>>
>>>>>> During all my tests Kamailio behaves correct and the lost TCP
>>>>>> connection
>>>>>> is detected after the configured timeout.
>>>>>>
>>>>>> But what drives my nuts is: it does not work on my production
>>>>>> server. Do
>>>>>> you have any ideas what could be the case that the timeout is not
>>>>>> detected?
>>>>> I think it is related to the debugger module. First tests show that
>>>>> the mysql timeout works if the debugger module is not loaded. Is
>>>>> there
>>>>> a known issue, e.g. with signal handlers, when the debugger module is
>>>>> loaded?
>>>>>
>>>> I am not aware of issues and I don't remember debugger module
>>>> overwriting any signal handler. What parameters do you set for
>>>> debugger
>>>> module?
>>> It seems that I was wrong. After I enabled the debugger module
>>> again, it
>>> still works. Maybe it only worked due to the restart and the problem
>>> starts only after a certain uptime.
>>>
>>> Once I connected with strace to the process which waited for a mysql
>>> timeout and the timeout did not worked. The strace output was (a few
>>> times per second):
>>>
>>> sigreturn()                             = ? (mask now [])
>>> read(6, 0xa395f60, 16384)               = -1 EINTR (Interrupted
>>> system call)
>>> --- SIGALRM (Alarm clock) @ 0 (0) ---
>>> sigreturn()                             = ? (mask now [])
>>> read(6, 0xa395f60, 16384)               = -1 EINTR (Interrupted
>>> system call)
>>> --- SIGALRM (Alarm clock) @ 0 (0) ---
>>> sigreturn()                             = ? (mask now [])
>>> read(6, 0xa395f60, 16384)               = -1 EINTR (Interrupted
>>> system call)
>>> --- SIGALRM (Alarm clock) @ 0 (0) ---
>>>
>>> Do you have an idea what that could be related too?
>> Just a guess now, it might be the non-blocking read inside the
>> libmysqlclient.
> Very strange: This is an strace on the timer process (failure route)
> with a good DB connection:
> 22:51:51.042122 write(6, "e\0\0\0\3select value,attribute,type"...,
> 105) = 105
> 22:51:51.042188 read(6,
> "\1\0\0\1\3F\0\0\2\3def\10kamailio\17usr_prefe"..., 16384) = 251
>
> This is an strace on the timer process (failure route) with a broken
> DB connection:
> 22:54:37.042057 poll([{fd=6, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
> 22:54:37.042092 write(6, "e\0\0\0\3select value,attribute,type"...,
> 105) = 105
> 22:54:37.042154 read(6, 0x9cabb28, 16384) = -1 EINTR (Interrupted
> system call)
> 22:54:37.103256 --- SIGALRM (Alarm clock) @ 0 (0) ---
> 22:54:37.103314 sigreturn()             = ? (mask now [])
> 22:54:37.103360 read(6, 0x9cabb28, 16384) = -1 EINTR (Interrupted
> system call)
> 22:54:37.165745 --- SIGALRM (Alarm clock) @ 0 (0) ---
> 22:54:37.165800 sigreturn()             = ? (mask now [])
> 22:54:37.165843 read(6, 0x9cabb28, 16384) = -1 EINTR (Interrupted
> system call)
> 22:54:37.228245 --- SIGALRM (Alarm clock) @ 0 (0) ---
> ...
>
> The strange thing is, that this does not happen always. Sometimes the
> read just returns EAGAIN and then the read is tried 3 times and the
> timeout is detected after 3x2s=6 seconds. Really weird. I never had
> issues with Kamailio 3.1, but now with Kamailio 4.1 (I still use the
> same old libmysqlclient 5.1.73)
>
> If I understand it correctly then the read is interruted by a EINTR
> signal and it seems libmysqlclient handles this buggy and retries
> endlessly (until the IP stock reports the broken TCP connection). But
> who sends the SIGALRM (approx every 60 ms)? Is it Kamailio? Any ideas
> somebody?
SIGALRM could be issued for the timer process.

Cheers,
Daniel

-- 
Daniel-Constantin Mierla
http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda
Kamailio World Conference, May 27-29, 2015
Berlin, Germany - http://www.kamailioworld.com




More information about the sr-dev mailing list