[Serdev] Re: postgres module
Greg Fausak
lgfausak at august.net
Mon Jan 26 21:37:14 UTC 2004
Jan Janak wrote:
> Greg, comments inline.
>
> On 26-01 12:08, Greg Fausak wrote:
>
>>Jan,
>>
>>Thank you for writing me about these subjects.
>>I am very opinionated. I have wrestled with most of the
>>subjects you discuss over the years (decades actually) and
>>I have found things that work for me. My responses are in
>>no way directed at you or SER...I am just expressing my opinions!
>>I appreciate being offered the forums, and I'll respond candidly.
>
>
> Sure, that's what we have this mailing list for.
>
>
>>>- Connection pool -- I would like to implement the same connection pool
>>> which is now implemented in mysql module. It allows sharing of
>>> connections with the same URL among modules within the same process.
>>> That means the number of connections will not grow with the number of
>>> modules using db anymore.
>>
>>A connection pool is fine with me.
>>
>>There are basic problems with the
>>approach that you are using with database operations.
>>I went over this when I created the first postgres module.
>>The main problem is that a file descriptor is *not* a database
>>connection. The practice of opening the database and then
>>forking is just completely wrong for postgres! The correct
>>approach for postgres is to open the database *in the thread or process*
>>that it is used in.
>
>
> I remember the problem you are mentioning, but I have removed all such
> construct so this shouldn't be a problem anymore. Currently all
> database connection are open and closed in the process in which they
> will be used and are never inherited. Last relicts were usrloc and
> auth_db modules which inherited connections but never used them in the
> descendant.
>
> I've fixed even those two modules over the weekend. Actually mysql
> module does not allow inherited connections and it will scream (and
> ser won't start) if it detects open inherited database connections.
>
>
>>If the connection pool operates outside
>>the ser modules, and is communicated with over a pipe/datagram/ip
>>then that would be fine. If all queries are atomic in nature (that
>>is, they do not span multiple queries, like 'select' followed by
>>'update') then a completely shared pool would work. Otherwise,
>>the pool would need to be 'reserved' so that a transaction
>>can be started, run, end committed/aborted. This would require
>>reuse of the same database connection throughout the entire
>>transaction.
>
>
> The term connection pool has a slightly different meaning here. Let me
> describe what it is good for a little bit.
>
> Older versions of ser opened a huge number of database connections
> (you might remember some emails on the list about maximum number of
> allowed connections in mysql).
>
> For example, suppose you configure ser to start 16 processes and you
> load usrloc, auth_db, acc and domain modules, for example.
>
> Usrloc will open a database connection in each children, that is 16
> database connections. After that auth_db module gets initialized and
> opens also a database connection in each children, so we have 16 + 16
> = 32 database connections.
>
> The same for acc and domain modules so we will end up with 64 open
> database connections. All the connections have usually the same
> username, password and database. Each new module that needs database
> will add 16 new connections. Each connection will start a new thread
> (in case of mysql) on the server.
>
> Module functions within one process will never be executed in
> parallel, they will be always executed in the order in which they
> are written in the configuration file. A function must return before
> the next one is called. If a function performs any database
> operations, they will be finished before the function returns (this is
> true in all ser modules and in fact it must be true).
>
> Given the constraints described above, a single database connection
> can be reused by multiple modules within the same process (as long as
> they are configured with the same database URL). Modules will never
> conflict with each other because they are executed sequentially and
> each database operation is finished before a function from a different
> module is executed.
>
> The connection pool relies upon these facts. When a module opens a
> database connection, the connection will be remembered by the database
> module (mysql in this case).
>
> When a different module opens a connection _within the same process_,
> the database module will iterate through the pool to see if a
> connection with the same URL has already been opened. If so then it
> will return reference to the connection opened by previous module,
> otherwise it will open a new one. So on and so forth... Each ser
> process has a distinct connection pool.
>
> With the connection pool, the example (mentioned above) will look like
> this:
>
> Again, you start ser with 16 processes. Usrloc module will open a
> database connection in each process, since it is the first module,
> there are no open connections in the pool yet and 16 new connections
> will be open.
>
> After that auth_db tries to open a connection in each child. But
> because it was configured with the same database url as usrloc, a
> previously opened connection is found in the pool and returned to
> auth_db. So in fact auth_db doesn't open any new database connections.
> The same happens in acc and domain provided that they were configured
> with the same database URL.
>
> So in this case we have only 16 opened database connections (compared
> to 64 previously). That's it. The purpose of the connection pool is to
> reduce the number of opened connections.
I hacked dbase.c in the postgres modules to:
1) Skip the connect_db() when doing db_init()
2) Call connect_db() in any operation that tries to use the
database, but the database is currently not open.
Using this technique has cut the number of database connections
to two. Maybe a combination of both techniques is in order?
>
>
>>We (Andy Fullford mostly) has actually coded a modules called
>>RI (relational interface) a long time ago. It does pooling,
>>communicates with remote processes via IP/datagram/pipe, and
>>insulates the client program from the underlying database type.
>>That is a different story.
>
>
> Yes, we have similar (but probably simpler) api in ser.
>
>
>>>- Memory management functions -- I've noticed that you have been using
>>> your own memory management functions that allow to find mem leaks
>>> easily. I'd like to remove them. I understand that they are good for
>>> debugging, but they also introduce performance bottleneck which is not
>>> necessary. Of course I take the responsibility for any memory leaks
>>> which I might introduce and will fix them immediately.
>>
>>I have strong opinions about memory management. I feel with current
>>processor speed and memory size memory management should lean towards
>>robustness at the expense of efficiency. Certainly if there is a
>>performance problem it needs to be addressed. Have you determined
>>there is a performance problem? I have pref'fed this stuff, it
>>doesn't have a measurable performance hit, nor does it really take
>>too much memory (the machine I just built for our backup-SER
>>box has a 3Gig processor and 4GB of memory!)!
>
>
> No, I haven't done any performance measurements of postgres module and
> maybe you are right that the performance impact is minimal compared to
> the rest of the server.
>
> We've taken a different approach in ser -- performance and efficiency
> at the expenses of programmer's convenience.
>
> SER is everything but memory efficient. We use a handcrafted memory
> management which is very fast, but it uses much more memory than
> needed. SER in default configuration allocates only 1 MB of private
> memory and only this memory can be used. If you reach the limit then ser
> will bail out and you will have to recompile it. (Shared memory
> segment is much bigger, of course).
>
> Postgres module does use the standard malloc (the one from libc) which
> is slower than ours. In addition to that our memory allocator can be
> switched over into debugging mode which allows you to find out memory
> leaks easily. You simply start the server, let it running for a while
> and after that you stop the server. It will print all unfreed memory
> blocks along with file and line at which they were allocated. We are
> able to search for memory leaks in whole ser.
>
> Because of that I think that having a separate garbage collector in
> postgres module is not necessary and I would like to switch it over to
> the memory allocator we are using everywhere else. I don't know if
> there will be any performance boost (probably not), but at least the
> postgres module will be smaller, easier to read and understand and we
> will have all the memory management at one place.
>
> I would prefer smaller code base and memory allocation aligned to the
> rest of the server at the expenses of careful memory handling in the module.
>
>
>>Our memory routines were donated to he cause. If there aren't needed
>>I won't mind. From a programmer's point of view I find it very
>>appealing to free a single pointer (like the memory associated with
>>a dbopen) and I know in my heart that all memory associated with
>>that pointer is freed. So, all memory associated with the database
>>connection is freed with a single free. Or all memory associated
>>with a single query is freed with a single free. That's clean.
>>I don't think micro-management of strings inside one memory
>>allocation is necessary or called for.
>
>
> First of all I have nothing against your memory allocation routines
> and we are thankful for anyone who is willing to contribute and give it
> away for free like you did.
>
> I am just presenting my point of view and tring to clarify why I think
> it would be better without them. Of course you have the final word as
> the module maintainer -- I wouldn't do anything you disagree with.
>
>
>>I'm an old dog, and the tricks I know work for me. I'm not going to
>>learn any new tricks. If you can manage the memory through brute
>>force then by all means, go for it. However, if it were me, I would
>>use the memory management we have developed everywhere else in the SER
>>code.
>
>
> It's a tradeoff between efficiency and programmer's comfort. We have
> chosen the former. For the latter C is probably not the best language
> (that's one of reasons why so many people use java :-).
I program in java as well. It really isn't a fair comparison. Java is
extremely inefficient and remarkably elegant. We are talking about
a memory management routine in compiled C. Anyway, feel free to replace
the memory management.
>
>
>>[...]
>>
>>By the way, we have developed 'views' for postgres that completely
>>isolates each 'domain' from each other as far as SER is concerned.
>>Each view has insert/delete/update ability. Each 'domain' has it's own
>>login, and the views only allow access to that domains records. The
>>scema can be published to the domain holder, and access to the database
>>can be granted without concern about that domain seeing and manipulating
>>other domain's records. The postgres views enabled the changes to the
>>database without any changes to the SER code.
>
>
> That sounds cool, are you willing to contribute this ? (in form of a
> description, scripts, whatever).
Sure. We use a scheme ddl generator, I can donate the output of the ddl
generation which includes the .html descriptions, creation scripts (both
tables and views).
---greg
>
> Jan.
>
> _______________________________________________
> Serdev mailing list
> serdev at lists.iptel.org
> http://lists.iptel.org/mailman/listinfo/serdev
>
>
More information about the Serdev
mailing list