[Devel] Postgres reconnect patch

Klaus Darilion klaus.mailinglists at pernau.at
Tue Aug 9 19:39:28 CEST 2005


Hi all!

I found the bug in the posgres module. sprintf instead of snprintf.

I will commit (with Michael's reconnect patch) tommorow (CVS is down)

regards,
klaus

Bogdan-Andrei Iancu wrote:
> Hi there,
> 
> looking into ACC code, I see no reason for the module to crash if the 
> the insert fails - actually the result is not relevant for the module 
> since there is nothing else to be done after the DB insert.
> 
> My first guess is that the crash occurs inside the postgres driver - as 
> the log shows, the insert function never returned to display the acc error:
>    "ERROR:acc:acc_db_request: Error while inserting to database"
> 
> Klaus, have you extra info from the core file?
> 
> regards,
> bogdan
> 
> 
> Michael Ulitskiy wrote:
> 
>> Sure, it's neccessary :)
>> But I suspect that the problem is in acc module. When query fails the 
>> postgres module
>> would return an error to the caller and I guess this is what acc 
>> module doesn't like.
>> I don't use acc, but with avpops, uri_db, auth_db and usrloc  I didn't 
>> manage to crash it.
>> At least so far :)
>>
>> Michael
>>
>> On Friday 05 August 2005 11:25 am, Klaus Darilion wrote:
>>  
>>
>>> Hi!
>>>
>>> The patched opeser still crashes if the acc table does not exist:
>>>
>>>  8(14200) PG[367] submit_query query 'insert into acc 
>>> (from_uri,to_uri,sip_method,i_uri,o_uri,sip_from,sip_callid,sip_to,sip_status,username,totag,fromtag,domain,time 
>>> ) values 
>>> ('sip:klaus at enum.at','sip:klaus.darilion at nic.at43.at','INVITE','sip:klaus.darilion at nic.at43.at;transport=udp','sip:10.10.0.76:14528','Klaus 
>>> Darilion enum 
>>> eyebeam<sip:klaus at enum.at>;tag=d96e8a16','c34fcc4e3b141d18','<sip:klaus.darilion at nic.at43.at>;tag=bf99c5e5ef87440cadfbabf7b1e47a7c','200','klaus','bf99c5e5ef87440cadfbabf7b1e47a7c','d96e8a16','enum.at','2005-08-05 
>>> 15:21:24')', result 'ERROR:  Relation "acc" does not exist
>>> ...
>>> ...
>>> 19(14218)  0(14186) child process 14195 exited by a signal 11
>>>  0(14186) core was generated
>>>  0(14186) INFO: terminating due to SIGCHLD
>>>  3(14191) INFO: signal 15 received
>>>
>>>
>>> Still some code review is necessary ;-)
>>>
>>> klaus
>>>
>>>
>>> Michael Ulitskiy wrote:
>>>   
>>>
>>>> On Friday 05 August 2005 10:04 am, you wrote:
>>>>
>>>>     
>>>>
>>>>> Michael Ulitskiy wrote:
>>>>>
>>>>>       
>>>>>
>>>>>> Hi Klaus,
>>>>>>
>>>>>> I've seen the problem described by Klaus too, but first of all I 
>>>>>> was concerned that postgres module cannot reconnect to db after 
>>>>>> connection failure. It crashed openser.
>>>>>>         
>>>>>
>>>>> I tried CVS version and stoped postgres during operation. Then, 
>>>>> openser failed to update the location table (of course) but did not 
>>>>> crashed. Maybe it depends on the module which tries to access the 
>>>>> database. Nevertheless, reconnect failed after postgres returned.
>>>>>
>>>>> Then I applied your patch. Now, reconnect works.
>>>>> Although, the debug message looks a little bit strange when the 
>>>>> postgres module tries to reconnect when the db ist still down:
>>>>>
>>>>>  7(32748) PG[142] connect_db could not connect to server: ÿÿÿÿ
>>>>>
>>>>>                                                           ^^^^
>>>>>       
>>>>
>>>>
>>>> Well, it tries to reconnect every time openser wants to do a query. 
>>>> If db is still down
>>>> then the message looks ok.
>>>> BTW as far as I see the problem you described is also gone. I did a 
>>>> couple of tests -
>>>> removing permissions for openser user and dropping a view it tries 
>>>> to select from
>>>> and it didn't crash. It got back to normal after I restored the 
>>>> changes.
>>>>
>>>> Michael
>>>>
>>>>
>>>>
>>>>     
>>>>
>>>>> regards,
>>>>> klaus
>>>>>
>>>>>
>>>>>       
>>>>>
>>>>>> I've done some research on the module source code and I  believe
>>>>>> I've found a logical mistake in it.
>>>>>> The issue is that parse_sql_url() function is supplied with the 
>>>>>> only copy
>>>>>> of sql url and it corrupts it. So if the connection fails and module
>>>>>> tries to reparse url it fails as CON_SQLURL(_h) is corrupted by first
>>>>>> parsing. I've created a simple patch that corrects the problem. 
>>>>>> What it does it
>>>>>> it introduce a temporary buffer with sql url string for 
>>>>>> parse_sql_url to work
>>>>>> on. Also it makes so that original sql url is not deleted from 
>>>>>> connection structure
>>>>>> until db_close() is called.
>>>>>> Patch is attached.
>>>>>> Also if in dbase.c in db_init() function you comment out the 
>>>>>> following:
>>>>>>
>>>>>> /*
>>>>>>       if (connect_db(res) < 0)
>>>>>>       {
>>>>>>               PLOG("db_init", "Error while trying to open 
>>>>>> database, FATAL\n")
>>>>>>               aug_free(res);
>>>>>>               return((db_con_t *) 0);
>>>>>>       }
>>>>>> */
>>>>>>
>>>>>> you'll get a "delay connect until used" feature that can be 
>>>>>> usefull due to lack
>>>>>> of connection pool for postgres.
>>>>>> Could please some of developers review this patch?
>>>>>>
>>>>>> Michael
>>>>>>
>>>>>>
>>>>>> On Thursday 04 August 2005 11:39 am, Klaus Darilion wrote:
>>>>>>
>>>>>>
>>>>>>         
>>>>>>
>>>>>>> Hi!
>>>>>>>
>>>>>>> I had several situations in which openser crashed if the DB 
>>>>>>> lookup fails, e.g:
>>>>>>> table does not exist (acc module)
>>>>>>> wrong SQL query (lcr module)
>>>>>>>
>>>>>>> Whereas is some cases (wrong table permissions, avpops module) 
>>>>>>> openser keeps running.
>>>>>>>
>>>>>>> I do not know if these problems also occurs with mysql. If not, 
>>>>>>> the postgresql module would really need some review.
>>>>>>>
>>>>>>> regards,
>>>>>>> klaus
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>>           
>>>>>
>>>>>       
>>>>
>>>>     
>>>
>>>   
>>
>>
>>  
>>
> 
> 




More information about the Devel mailing list