[OpenSER-Devel] Re: [OpenSER-Users] New db_berkeley module

Fri Oct 12 06:11:49 CEST 2007

[send; please disregard previous response; thunderbird error on my part.]

Henning, Thanks for working through this. I can definitely understand
consistency across the DB modules is important architecturally.
I have been think about this all day, and I dont think I have a 'easy'
response to the issue of the row id as a primary key in db_berkeley.
The berkeley database is not relational and the extra burden of
maintaining an artificial key (id) for each row will not actually
improve performance as it would in a relational database.
I am not an expert in DB internals, so I'll just explain things as I
understand them. We need to hash this out :)
The api for querying in berkeley is either:
1. get() - where your provide the key, and in our case it must be
lexicographically equal in order to find a result. I believe this is the
'natural join'.
2. cursor() - where you iterate over each row, do the join on any
columns you want, and create a result set.
As implemented, without the id columns, the queries are implemented with
get() which implies a natural join, or exact string equality on the
'key', which is in most cases a composite key comprised of the
METADATA_KEY columns seperated by a delimiter. Since the underlying
access method is db_hash, the query runtime is constant.
I think if we change things in the bdb schema to use the id column as
part of the composite key, we will be limiting ourselves to using cursor
based queries, since we will not know the id until after the first query.
Aside, my understanding is that that future development would implement
queries that fetch and store the oid such that subsequent queries would
perform queries in that table with a 'WHERE id = oid' clause. (Please
let me know if this assumption is incorrect.) As I sit here, I think I
would have to create a secondary bdb database for each table that
requires the id column. The key would be a unique integer id, and the
value would point to the row of the 'real' table. This would probably
work but it does add a layer of complexity that we take for granted in
the relational databases. Today, these secondary databases are not
implemented, and there are other issues not discussed like the concept
of uniqueness of the ids, etc. However, to be honest I dont know if I
can get all this secondary db stuff working in the next 2 months.

Please do not take this as me rejecting your ideas, but rather full
discloser that making db_berkeley more 'relational' comes at the cost of
additional complexities that are not implemented yet.

Aside, I started looking at the code for the openserctl cmds today, and
I think I need to add some fifo cmds to the modules since openser is
actually running at the time the openserctl util is being invoked. This
means the DBs are open and some data may not be commited to disk, etc. I
thought I'd use the carrierroute module as the starting example for
implemented such fifo commands, but I need a few more days to get all
those command implemeted/tested. I will continue on this path over the
next few days, such that there will be parity between the db modules
from the perspective of the openserctl cmds.

If you prefer discussions in this working group that is good, but I am
also available via sip if you want to discuss voice. Just so you know
its an option.
regards,
--will

Henning Westerholt wrote:
> On Thursday 11 October 2007, William Quan wrote:
>   
>> I was poking around and I don't think the Berkeley DB has indexes like
>> we're used to in relational databases (or if they do they are not
>> exposed via api).
>>
>> So basically each Berkeley DB maps to a SQL 'table'. The 'rows' are
>> mapped to key/value pairs in the bdb, and 'columns' are
>> application-encapsulated fields that the module needs to manipulate.
>> Conceptually its like a big hash table, where you need to know the key
>> for the query to find a row. Because of this, I did not include the ID
>> column in the tables, as its the auto incremented column that relational
>> db would use for an index, not something that is ordinarily provided in
>> a query by the application. I did not see your xslt file, but could we
>> modify it to not include the id columns for the berkeleydb stuff?
>>     
>
> Hello William,
>
> the 'id' column is currently not used from the openser server, but this is 
> planned for further releases. For that reason we also include the id field to 
> the dbtext tables, this db is from the concept somewhat like the berkeley_db 
> module.
>
> We also had a some real pain in the past to support different db tables for 
> all the modules, so i really would like use the same table for this module 
> too. If this its possible with dbtext, it should be possible with 
> db_berkeley, too. :-)
> BTW, the xml source is in db/schema, the xsl scripts are in doc/dbschema/xsl.
>
>   
>> I use this module for registration so that involves the modules auth_db,
>> registrar, and usrloc. These modules use primarily tables subscriber and
>> location.
>> This stuff has been working for a while, but due to the key definition
>> of the subscriber and location tables, it does require you to set
>> use_domain=1 in the script.
>> I also tested tables acc and version, but the rest remain to be tested.
>>     
>
> So, its ok if i set the METADATA_KEY field e.g. for subscriber to 
> 0 1 2 (id, username, domain)? What happens if i don't set the use_domain 
> parameter?
>
>   
>> I should have some more code in the next few days.
>>     
>
> Great!
>
> Best regards,
>
> Henning
>