[Devel] Postgres reconnect patch

Bogdan-Andrei Iancu bogdan at voice-system.ro
Tue Aug 9 11:33:06 CEST 2005


Hi there,

looking into ACC code, I see no reason for the module to crash if the 
the insert fails - actually the result is not relevant for the module 
since there is nothing else to be done after the DB insert.

My first guess is that the crash occurs inside the postgres driver - as 
the log shows, the insert function never returned to display the acc error:
    "ERROR:acc:acc_db_request: Error while inserting to database"

Klaus, have you extra info from the core file?

regards,
bogdan


Michael Ulitskiy wrote:

>Sure, it's neccessary :)
>But I suspect that the problem is in acc module. When query fails the postgres module
>would return an error to the caller and I guess this is what acc module doesn't like.
>I don't use acc, but with avpops, uri_db, auth_db and usrloc  I didn't manage to crash it.
>At least so far :)
>
>Michael
>
>On Friday 05 August 2005 11:25 am, Klaus Darilion wrote:
>  
>
>>Hi!
>>
>>The patched opeser still crashes if the acc table does not exist:
>>
>>  8(14200) PG[367] submit_query query 'insert into acc 
>>(from_uri,to_uri,sip_method,i_uri,o_uri,sip_from,sip_callid,sip_to,sip_status,username,totag,fromtag,domain,time 
>>) values 
>>('sip:klaus at enum.at','sip:klaus.darilion at nic.at43.at','INVITE','sip:klaus.darilion at nic.at43.at;transport=udp','sip:10.10.0.76:14528','Klaus 
>>Darilion enum 
>>eyebeam<sip:klaus at enum.at>;tag=d96e8a16','c34fcc4e3b141d18','<sip:klaus.darilion at nic.at43.at>;tag=bf99c5e5ef87440cadfbabf7b1e47a7c','200','klaus','bf99c5e5ef87440cadfbabf7b1e47a7c','d96e8a16','enum.at','2005-08-05 
>>15:21:24')', result 'ERROR:  Relation "acc" does not exist
>>...
>>...
>>19(14218)  0(14186) child process 14195 exited by a signal 11
>>  0(14186) core was generated
>>  0(14186) INFO: terminating due to SIGCHLD
>>  3(14191) INFO: signal 15 received
>>
>>
>>Still some code review is necessary ;-)
>>
>>klaus
>>
>>
>>Michael Ulitskiy wrote:
>>    
>>
>>>On Friday 05 August 2005 10:04 am, you wrote:
>>>
>>>      
>>>
>>>>Michael Ulitskiy wrote:
>>>>
>>>>        
>>>>
>>>>>Hi Klaus,
>>>>>
>>>>>I've seen the problem described by Klaus too, but first of all I was 
>>>>>concerned that postgres module cannot reconnect to db after connection 
>>>>>failure. It crashed openser.
>>>>>          
>>>>>
>>>>I tried CVS version and stoped postgres during operation. Then, openser 
>>>>failed to update the location table (of course) but did not crashed. 
>>>>Maybe it depends on the module which tries to access the database. 
>>>>Nevertheless, reconnect failed after postgres returned.
>>>>
>>>>Then I applied your patch. Now, reconnect works.
>>>>Although, the debug message looks a little bit strange when the postgres 
>>>>module tries to reconnect when the db ist still down:
>>>>
>>>>  7(32748) PG[142] connect_db could not connect to server: ÿÿÿÿ
>>>>
>>>>                                                           ^^^^
>>>>        
>>>>
>>> 
>>>Well, it tries to reconnect every time openser wants to do a query. If db is still down
>>>then the message looks ok.
>>>BTW as far as I see the problem you described is also gone. I did a couple of tests -
>>>removing permissions for openser user and dropping a view it tries to select from
>>>and it didn't crash. It got back to normal after I restored the changes.
>>>
>>>Michael
>>>
>>> 
>>>
>>>      
>>>
>>>>regards,
>>>>klaus
>>>>
>>>>
>>>>        
>>>>
>>>>>I've done some research on the module source code and I  believe
>>>>>I've found a logical mistake in it.
>>>>>The issue is that parse_sql_url() function is supplied with the only copy
>>>>>of sql url and it corrupts it. So if the connection fails and module
>>>>>tries to reparse url it fails as CON_SQLURL(_h) is corrupted by first
>>>>>parsing. 
>>>>>I've created a simple patch that corrects the problem. What it does it
>>>>>it introduce a temporary buffer with sql url string for parse_sql_url to work
>>>>>on. Also it makes so that original sql url is not deleted from connection structure
>>>>>until db_close() is called.
>>>>>Patch is attached.
>>>>>Also if in dbase.c in db_init() function you comment out the following:
>>>>>
>>>>>/*
>>>>>       if (connect_db(res) < 0)
>>>>>       {
>>>>>               PLOG("db_init", "Error while trying to open database, FATAL\n")
>>>>>               aug_free(res);
>>>>>               return((db_con_t *) 0);
>>>>>       }
>>>>>*/
>>>>>
>>>>>you'll get a "delay connect until used" feature that can be usefull due to lack
>>>>>of connection pool for postgres.
>>>>>Could please some of developers review this patch?
>>>>>
>>>>>Michael
>>>>>
>>>>>
>>>>>On Thursday 04 August 2005 11:39 am, Klaus Darilion wrote:
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>>>Hi!
>>>>>>
>>>>>>I had several situations in which openser crashed if the DB lookup 
>>>>>>fails, e.g:
>>>>>>table does not exist (acc module)
>>>>>>wrong SQL query (lcr module)
>>>>>>
>>>>>>Whereas is some cases (wrong table permissions, avpops module) openser 
>>>>>>keeps running.
>>>>>>
>>>>>>I do not know if these problems also occurs with mysql. If not, the 
>>>>>>postgresql module would really need some review.
>>>>>>
>>>>>>regards,
>>>>>>klaus
>>>>>>
>>>>>>_______________________________________________
>>>>>>            
>>>>>>
>>>>        
>>>>
>>>      
>>>
>>    
>>
>
>  
>




More information about the Devel mailing list