[Serdev] Re: postgres module

Mon Jan 26 18:08:43 UTC 2004

Jan,

Thank you for writing me about these subjects.
I am very opinionated.  I have wrestled with most of the
subjects you discuss over the years (decades actually) and
I have found things that work for me.  My responses are in
no way directed at you or SER...I am just expressing my opinions!
I appreciate being offered the forums, and I'll respond candidly.

Jan Janak wrote:
> Hello Greg,
> 
> We are planning to deploy your postgres module on our servers mainly to
> stress it a little bit (mysql has been tested enough).
> 
> I am thinking about some changes to postgres module and I would like to
> check whether they would be fine with you.
> 
> - Connection pool -- I would like to implement the same connection pool
>   which is now implemented in mysql module. It allows sharing of
>   connections with the same URL among modules within the same process.
>   That means the number of connections will not grow with the number of
>   modules using db anymore.

A connection pool is fine with me.

There are basic problems with the
approach that you are using with database operations.
I went over this when I created the first postgres module.
The main problem is that a file descriptor is *not* a database
connection.  The practice of opening the database and then
forking is just completely wrong for postgres!  The correct
approach for postgres is to open the database *in the thread or process*
that it is used in.

If the connection pool operates outside
the ser modules, and is communicated with over a pipe/datagram/ip
then that would be fine.  If all queries are atomic in nature (that
is, they do not span multiple queries, like 'select' followed by
'update') then a completely shared pool would work.  Otherwise,
the pool would need to be 'reserved' so that a transaction
can be started, run, end committed/aborted.   This would require
reuse of the same database connection throughout the entire
transaction.

We (Andy Fullford mostly) has actually coded a modules called
RI (relational interface) a long time ago.  It does pooling,
communicates with remote processes via IP/datagram/pipe, and
insulates the client program from the underlying database type.
That is a different story.

> 
> - Memory management functions -- I've noticed that you have been using
>   your own memory management functions that allow to find mem leaks
>   easily. I'd like to remove them. I understand that they are good for
>   debugging, but they also introduce performance bottleneck which is not
>   necessary. Of course I take the responsibility for any memory leaks
>   which I might introduce and will fix them immediately.

I have strong opinions about memory management.  I feel with current
processor speed and memory size memory management should lean towards
robustness at the expense of efficiency.  Certainly if there is a
performance problem it needs to be addressed.  Have you determined
there is a performance problem?  I have pref'fed this stuff, it
doesn't have a measurable performance hit, nor does it really take
too much memory (the machine I just built for our backup-SER
box has a 3Gig processor and 4GB of memory!)!

Our memory routines were donated to he cause.  If there aren't needed
I won't mind.  From a programmer's point of view I find it very
appealing to free a single pointer (like the memory associated with
a dbopen) and I know in my heart that all memory associated with
that pointer is freed.  So, all memory associated with the database
connection is freed with a single free.  Or all memory associated
with a single query is freed with a single free.  That's clean.
I don't think micro-management of strings inside one memory
allocation is necessary or called for.

I'm an old dog, and the tricks I know work for me.  I'm not going to
learn any new tricks.  If you can manage the memory through brute
force then by all means, go for it.  However, if it were me, I would
use the memory management we have developed everywhere else in the SER
code.

> 
> - Postgres API allows to specify queries in the form of string vectors
>   (which means that the sql query buffer will be no more needed). I am
>   still studying the API, but it looks like it will fit our needs better
>   than MySQL API (which doesn't not allow this and you have to assemble
>   the query by yourself before you pass it to the library, which is the
>   ugliest part in mysql module now).
> 

In my opinion postgres is the only real database to use.  We have
experience with almost all database platforms.  Postgres is full
featured, and the price is right.  I personally like the text buffer
for sql queries, but yes indeed postgres can introduce the query first,
then the arguments at a later time (over and over again, against the
same query definition).

By the way, we have developed 'views' for postgres that completely
isolates each 'domain' from each other as far as SER is concerned.
Each view has insert/delete/update ability.  Each 'domain' has it's own
login, and the views only allow access to that domains records.  The
scema can be published to the domain holder, and access to the database
can be granted without concern about that domain seeing and manipulating
other domain's records.  The postgres views enabled the changes to the
database without any changes to the SER code.

---greg
Greg Fausak

> Please let me know what do you think about this. I can send you patches
> for review before I commit, of course.
> 
> We also plan to introduce the database unification into serweb. Once it
> is done the two database modules should become fully interchangeable.
> 
>   Jan.
> 
>