[OpenSER-Devel] Re: [OpenSER-Users] New db_berkeley module

Fri Oct 12 06:09:19 CEST 2007

Henning, Thanks for working through this. I can definitely understand 
consistency across the DB modules is important architecturally.
I have been think about this all day, and I dont think I have a 
favorable response to the issue of the row id as a primary key in berkeley.
The berkeley database is not relational and the extra burden of 
maintaining an artificial key (id) for each row will not actually 
improve performance as it would in a relational database.
I am not an expert in DB internals, so I'll just explain things as I 
understand them. We need to hash this out :)
The api for querying in berkeley is either:
1. get() - where your provide the key, and in our case it must be 
lexicographically equal in order to find a result. I believe this is the 
'natural join'.
2. cursor() - where you iterate over each row, do the join on any 
columns you want, and create a result set.
As implemented, without the id columns, the queries are implemented with 
get() which implies a natural join, or exact string equality on the 
'key', which is in most cases a composite key comprised of the 
METADATA_KEY columns seperated by a delimiter. Since the underlying 
access method is db_hash, the query runtime is constant.
I think if we change things in the bdb schema to use the id column as 
part of the composite key, we will be limiting ourselves to using cursor 
based queries, since we will not know the id until after the first query.
Aside, my understanding is that that future development would implement 
queries that fetch and store the oid such that subsequent queries would 
perform queries in that table with a 'WHERE id = oid' clause. (Please 
let me know if this assumption is incorrect.) As I sit here, I think I 
would have to create a secondary bdb database for each table that 
requires the id column. The key would be a unique integer id, and the 
value would point to the row of the 'real' table. This would probably 
work but it does add a layer of complexity that we take for granted in 
the relational databases. Today , these secondary databases are not 
implemented, and there are other issues not discussed like the concept 
of uniqueness of the ids, etc. However, to be honest I dont know if I 
can get all this secondary db stuff working in the next 2 months.

Please do not take this as me rejecting your ideas, but rather full 
discloser that making db_berkeley more 'relational' comes at the cost of 
additional complexities that are not implemented yet.

Aside, I started looking at the code for the openserctl cmds today, and 
I think I need to add some fifo cmds to the modules since openser is 
actually running at the time the openserctl util is being invoked. This 
means the DBs are open and some data may not be commited to disk, etc. I 
thought I'd use the carrierroute module as the starting example for 
implemented such fifo commands, but I need a few more days to get all 
those command implemeted/tested.

If you prefer discussions in this working group that is good, but I am 
also available via sip if you want to g

Henning Westerholt wrote:
> On Thursday 11 October 2007, William Quan wrote:
>   
>> I was poking around and I don't think the Berkeley DB has indexes like
>> we're used to in relational databases (or if they do they are not
>> exposed via api).
>>
>> So basically each Berkeley DB maps to a SQL 'table'. The 'rows' are
>> mapped to key/value pairs in the bdb, and 'columns' are
>> application-encapsulated fields that the module needs to manipulate.
>> Conceptually its like a big hash table, where you need to know the key
>> for the query to find a row. Because of this, I did not include the ID
>> column in the tables, as its the auto incremented column that relational
>> db would use for an index, not something that is ordinarily provided in
>> a query by the application. I did not see your xslt file, but could we
>> modify it to not include the id columns for the berkeleydb stuff?
>>     
>
> Hello William,
>
> the 'id' column is currently not used from the openser server, but this is 
> planned for further releases. For that reason we also include the id field to 
> the dbtext tables, this db is from the concept somewhat like the berkeley_db 
> module.
>
> We also had a some real pain in the past to support different db tables for 
> all the modules, so i really would like use the same table for this module 
> too. If this its possible with dbtext, it should be possible with 
> db_berkeley, too. :-)
> BTW, the xml source is in db/schema, the xsl scripts are in doc/dbschema/xsl.
>
>   
>> I use this module for registration so that involves the modules auth_db,
>> registrar, and usrloc. These modules use primarily tables subscriber and
>> location.
>> This stuff has been working for a while, but due to the key definition
>> of the subscriber and location tables, it does require you to set
>> use_domain=1 in the script.
>> I also tested tables acc and version, but the rest remain to be tested.
>>     
>
> So, its ok if i set the METADATA_KEY field e.g. for subscriber to 
> 0 1 2 (id, username, domain)? What happens if i don't set the use_domain 
> parameter?
>
>   
>> I should have some more code in the next few days.
>>     
>
> Great!
>
> Best regards,
>
> Henning
>