[Devel] Postgres reconnect patch
Klaus Darilion
klaus.mailinglists at pernau.at
Tue Aug 9 19:39:28 CEST 2005
Hi all!
I found the bug in the posgres module. sprintf instead of snprintf.
I will commit (with Michael's reconnect patch) tommorow (CVS is down)
regards,
klaus
Bogdan-Andrei Iancu wrote:
> Hi there,
>
> looking into ACC code, I see no reason for the module to crash if the
> the insert fails - actually the result is not relevant for the module
> since there is nothing else to be done after the DB insert.
>
> My first guess is that the crash occurs inside the postgres driver - as
> the log shows, the insert function never returned to display the acc error:
> "ERROR:acc:acc_db_request: Error while inserting to database"
>
> Klaus, have you extra info from the core file?
>
> regards,
> bogdan
>
>
> Michael Ulitskiy wrote:
>
>> Sure, it's neccessary :)
>> But I suspect that the problem is in acc module. When query fails the
>> postgres module
>> would return an error to the caller and I guess this is what acc
>> module doesn't like.
>> I don't use acc, but with avpops, uri_db, auth_db and usrloc I didn't
>> manage to crash it.
>> At least so far :)
>>
>> Michael
>>
>> On Friday 05 August 2005 11:25 am, Klaus Darilion wrote:
>>
>>
>>> Hi!
>>>
>>> The patched opeser still crashes if the acc table does not exist:
>>>
>>> 8(14200) PG[367] submit_query query 'insert into acc
>>> (from_uri,to_uri,sip_method,i_uri,o_uri,sip_from,sip_callid,sip_to,sip_status,username,totag,fromtag,domain,time
>>> ) values
>>> ('sip:klaus at enum.at','sip:klaus.darilion at nic.at43.at','INVITE','sip:klaus.darilion at nic.at43.at;transport=udp','sip:10.10.0.76:14528','Klaus
>>> Darilion enum
>>> eyebeam<sip:klaus at enum.at>;tag=d96e8a16','c34fcc4e3b141d18','<sip:klaus.darilion at nic.at43.at>;tag=bf99c5e5ef87440cadfbabf7b1e47a7c','200','klaus','bf99c5e5ef87440cadfbabf7b1e47a7c','d96e8a16','enum.at','2005-08-05
>>> 15:21:24')', result 'ERROR: Relation "acc" does not exist
>>> ...
>>> ...
>>> 19(14218) 0(14186) child process 14195 exited by a signal 11
>>> 0(14186) core was generated
>>> 0(14186) INFO: terminating due to SIGCHLD
>>> 3(14191) INFO: signal 15 received
>>>
>>>
>>> Still some code review is necessary ;-)
>>>
>>> klaus
>>>
>>>
>>> Michael Ulitskiy wrote:
>>>
>>>
>>>> On Friday 05 August 2005 10:04 am, you wrote:
>>>>
>>>>
>>>>
>>>>> Michael Ulitskiy wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Hi Klaus,
>>>>>>
>>>>>> I've seen the problem described by Klaus too, but first of all I
>>>>>> was concerned that postgres module cannot reconnect to db after
>>>>>> connection failure. It crashed openser.
>>>>>>
>>>>>
>>>>> I tried CVS version and stoped postgres during operation. Then,
>>>>> openser failed to update the location table (of course) but did not
>>>>> crashed. Maybe it depends on the module which tries to access the
>>>>> database. Nevertheless, reconnect failed after postgres returned.
>>>>>
>>>>> Then I applied your patch. Now, reconnect works.
>>>>> Although, the debug message looks a little bit strange when the
>>>>> postgres module tries to reconnect when the db ist still down:
>>>>>
>>>>> 7(32748) PG[142] connect_db could not connect to server: ÿÿÿÿ
>>>>>
>>>>> ^^^^
>>>>>
>>>>
>>>>
>>>> Well, it tries to reconnect every time openser wants to do a query.
>>>> If db is still down
>>>> then the message looks ok.
>>>> BTW as far as I see the problem you described is also gone. I did a
>>>> couple of tests -
>>>> removing permissions for openser user and dropping a view it tries
>>>> to select from
>>>> and it didn't crash. It got back to normal after I restored the
>>>> changes.
>>>>
>>>> Michael
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> regards,
>>>>> klaus
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> I've done some research on the module source code and I believe
>>>>>> I've found a logical mistake in it.
>>>>>> The issue is that parse_sql_url() function is supplied with the
>>>>>> only copy
>>>>>> of sql url and it corrupts it. So if the connection fails and module
>>>>>> tries to reparse url it fails as CON_SQLURL(_h) is corrupted by first
>>>>>> parsing. I've created a simple patch that corrects the problem.
>>>>>> What it does it
>>>>>> it introduce a temporary buffer with sql url string for
>>>>>> parse_sql_url to work
>>>>>> on. Also it makes so that original sql url is not deleted from
>>>>>> connection structure
>>>>>> until db_close() is called.
>>>>>> Patch is attached.
>>>>>> Also if in dbase.c in db_init() function you comment out the
>>>>>> following:
>>>>>>
>>>>>> /*
>>>>>> if (connect_db(res) < 0)
>>>>>> {
>>>>>> PLOG("db_init", "Error while trying to open
>>>>>> database, FATAL\n")
>>>>>> aug_free(res);
>>>>>> return((db_con_t *) 0);
>>>>>> }
>>>>>> */
>>>>>>
>>>>>> you'll get a "delay connect until used" feature that can be
>>>>>> usefull due to lack
>>>>>> of connection pool for postgres.
>>>>>> Could please some of developers review this patch?
>>>>>>
>>>>>> Michael
>>>>>>
>>>>>>
>>>>>> On Thursday 04 August 2005 11:39 am, Klaus Darilion wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Hi!
>>>>>>>
>>>>>>> I had several situations in which openser crashed if the DB
>>>>>>> lookup fails, e.g:
>>>>>>> table does not exist (acc module)
>>>>>>> wrong SQL query (lcr module)
>>>>>>>
>>>>>>> Whereas is some cases (wrong table permissions, avpops module)
>>>>>>> openser keeps running.
>>>>>>>
>>>>>>> I do not know if these problems also occurs with mysql. If not,
>>>>>>> the postgresql module would really need some review.
>>>>>>>
>>>>>>> regards,
>>>>>>> klaus
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>>
>
>
More information about the Devel
mailing list