[Devel] Postgres reconnect patch

Michael Ulitskiy mdu113 at acedsl.com
Tue Aug 9 15:17:14 CEST 2005


It also may be multi-threading issue.
I compiled ser with -D_REENTRANT to use with postgres
and it works quite reliably for me.

Michael

On Tuesday 09 August 2005 05:33 am, you wrote:
> Hi there,
> 
> looking into ACC code, I see no reason for the module to crash if the 
> the insert fails - actually the result is not relevant for the module 
> since there is nothing else to be done after the DB insert.
> 
> My first guess is that the crash occurs inside the postgres driver - as 
> the log shows, the insert function never returned to display the acc error:
>     "ERROR:acc:acc_db_request: Error while inserting to database"
> 
> Klaus, have you extra info from the core file?
> 
> regards,
> bogdan
> 
> 
> Michael Ulitskiy wrote:
> 
> >Sure, it's neccessary :)
> >But I suspect that the problem is in acc module. When query fails the postgres module
> >would return an error to the caller and I guess this is what acc module doesn't like.
> >I don't use acc, but with avpops, uri_db, auth_db and usrloc  I didn't manage to crash it.
> >At least so far :)
> >
> >Michael
> >
> >On Friday 05 August 2005 11:25 am, Klaus Darilion wrote:
> >  
> >
> >>Hi!
> >>
> >>The patched opeser still crashes if the acc table does not exist:
> >>
> >>  8(14200) PG[367] submit_query query 'insert into acc 
> >>(from_uri,to_uri,sip_method,i_uri,o_uri,sip_from,sip_callid,sip_to,sip_status,username,totag,fromtag,domain,time 
> >>) values 
> >>('sip:klaus at enum.at','sip:klaus.darilion at nic.at43.at','INVITE','sip:klaus.darilion at nic.at43.at;transport=udp','sip:10.10.0.76:14528','Klaus 
> >>Darilion enum 
> >>eyebeam<sip:klaus at enum.at>;tag=d96e8a16','c34fcc4e3b141d18','<sip:klaus.darilion at nic.at43.at>;tag=bf99c5e5ef87440cadfbabf7b1e47a7c','200','klaus','bf99c5e5ef87440cadfbabf7b1e47a7c','d96e8a16','enum.at','2005-08-05 
> >>15:21:24')', result 'ERROR:  Relation "acc" does not exist
> >>...
> >>...
> >>19(14218)  0(14186) child process 14195 exited by a signal 11
> >>  0(14186) core was generated
> >>  0(14186) INFO: terminating due to SIGCHLD
> >>  3(14191) INFO: signal 15 received
> >>
> >>
> >>Still some code review is necessary ;-)
> >>
> >>klaus
> >>
> >>
> >>Michael Ulitskiy wrote:
> >>    
> >>
> >>>On Friday 05 August 2005 10:04 am, you wrote:
> >>>
> >>>      
> >>>
> >>>>Michael Ulitskiy wrote:
> >>>>
> >>>>        
> >>>>
> >>>>>Hi Klaus,
> >>>>>
> >>>>>I've seen the problem described by Klaus too, but first of all I was 
> >>>>>concerned that postgres module cannot reconnect to db after connection 
> >>>>>failure. It crashed openser.
> >>>>>          
> >>>>>
> >>>>I tried CVS version and stoped postgres during operation. Then, openser 
> >>>>failed to update the location table (of course) but did not crashed. 
> >>>>Maybe it depends on the module which tries to access the database. 
> >>>>Nevertheless, reconnect failed after postgres returned.
> >>>>
> >>>>Then I applied your patch. Now, reconnect works.
> >>>>Although, the debug message looks a little bit strange when the postgres 
> >>>>module tries to reconnect when the db ist still down:
> >>>>
> >>>>  7(32748) PG[142] connect_db could not connect to server: ÿÿÿÿ
> >>>>
> >>>>                                                           ^^^^
> >>>>        
> >>>>
> >>> 
> >>>Well, it tries to reconnect every time openser wants to do a query. If db is still down
> >>>then the message looks ok.
> >>>BTW as far as I see the problem you described is also gone. I did a couple of tests -
> >>>removing permissions for openser user and dropping a view it tries to select from
> >>>and it didn't crash. It got back to normal after I restored the changes.
> >>>
> >>>Michael
> >>>
> >>> 
> >>>
> >>>      
> >>>
> >>>>regards,
> >>>>klaus
> >>>>
> >>>>
> >>>>        
> >>>>
> >>>>>I've done some research on the module source code and I  believe
> >>>>>I've found a logical mistake in it.
> >>>>>The issue is that parse_sql_url() function is supplied with the only copy
> >>>>>of sql url and it corrupts it. So if the connection fails and module
> >>>>>tries to reparse url it fails as CON_SQLURL(_h) is corrupted by first
> >>>>>parsing. 
> >>>>>I've created a simple patch that corrects the problem. What it does it
> >>>>>it introduce a temporary buffer with sql url string for parse_sql_url to work
> >>>>>on. Also it makes so that original sql url is not deleted from connection structure
> >>>>>until db_close() is called.
> >>>>>Patch is attached.
> >>>>>Also if in dbase.c in db_init() function you comment out the following:
> >>>>>
> >>>>>/*
> >>>>>       if (connect_db(res) < 0)
> >>>>>       {
> >>>>>               PLOG("db_init", "Error while trying to open database, FATAL\n")
> >>>>>               aug_free(res);
> >>>>>               return((db_con_t *) 0);
> >>>>>       }
> >>>>>*/
> >>>>>
> >>>>>you'll get a "delay connect until used" feature that can be usefull due to lack
> >>>>>of connection pool for postgres.
> >>>>>Could please some of developers review this patch?
> >>>>>
> >>>>>Michael
> >>>>>
> >>>>>
> >>>>>On Thursday 04 August 2005 11:39 am, Klaus Darilion wrote:
> >>>>>
> >>>>>
> >>>>>          
> >>>>>
> >>>>>>Hi!
> >>>>>>
> >>>>>>I had several situations in which openser crashed if the DB lookup 
> >>>>>>fails, e.g:
> >>>>>>table does not exist (acc module)
> >>>>>>wrong SQL query (lcr module)
> >>>>>>
> >>>>>>Whereas is some cases (wrong table permissions, avpops module) openser 
> >>>>>>keeps running.
> >>>>>>
> >>>>>>I do not know if these problems also occurs with mysql. If not, the 
> >>>>>>postgresql module would really need some review.
> >>>>>>
> >>>>>>regards,
> >>>>>>klaus
> >>>>>>
> >>>>>>_______________________________________________
> >>>>>>            
> >>>>>>
> >>>>        
> >>>>
> >>>      
> >>>
> >>    
> >>
> >
> >  
> >
> 
> 

-- 
See you later,
                    Michael



More information about the Devel mailing list