[OpenSER-Devel] Re: [OpenSER-Users] New db_berkeley module

Fri Oct 12 18:56:59 CEST 2007

Apparently there is much discussion on the topic of 'natural key' vs.
'surrogate key' in the DB world out there :) .
I spent a just a few minutes reviewing basic arguements on
[http://en.wikipedia.org/wiki/Surrogate_key ].
Having read this, I am more curious and will have to go back and look
for old threads in this group to learn what your motivations are.
>From what I read so far, I don't think you are supposed to use the id in
application queries; that was a misunderstanding on my part.
Further, I should have clarified by concisely stating that db_berkeley
relies on a 'natural key' schema. As a consequence, I considered each
table and attempted to define the 'natural' keys for that table. I may
have made some mistakes along the way. METADATA_KEY indicates which
columns comprise the 'natural' (composite) key.


Henning Westerholt wrote:
> On Friday 12 October 2007, Will Quan wrote:
>   
>> Henning, Thanks for working through this. I can definitely understand
>> consistency across the DB modules is important architecturally.
>> I have been think about this all day, and I dont think I have a 'easy'
>> response to the issue of the row id as a primary key in db_berkeley.
>>     
>
> Hello William.
>
>   
>> The berkeley database is not relational and the extra burden of
>> maintaining an artificial key (id) for each row will not actually
>> improve performance as it would in a relational database.
>> I am not an expert in DB internals, so I'll just explain things as I
>> understand them. We need to hash this out :)
>> The api for querying in berkeley is either:
>> 1. get() - where your provide the key, and in our case it must be
>> lexicographically equal in order to find a result. I believe this is the
>> 'natural join'.
>>     
>
> Thank you for the detailed explanation, now i understand the problem in much 
> more detail.
>
>   
>> 2. cursor() - where you iterate over each row, do the join on any
>> columns you want, and create a result set.
>> As implemented, without the id columns, the queries are implemented with
>> get() which implies a natural join, or exact string equality on the
>> 'key', which is in most cases a composite key comprised of the
>> METADATA_KEY columns seperated by a delimiter. 
>>     
>
> It is not possible to use only one key of the set instead of all? E.g. use 
> only the username, or the id?
>   
Consistency is the key here. If you are inserting data into table
subscriber with a key like 'wiquan|domain.com', then  you cant later
query with only the username. The db_berkeley module will fill in any
missing keys with 'NULL' so the actual key used for the get() would look
like 'wiquan|NULL' which would not yield a result row.

>   
>> Since the underlying 
>> access method is db_hash, the query runtime is constant.
>> I think if we change things in the bdb schema to use the id column as
>> part of the composite key, we will be limiting ourselves to using cursor
>> based queries, since we will not know the id until after the first query.
>>     
>
> Well, if i understand it correctly, this would be rather slow, iterating over 
> the columns. So this is not a good solution.
>
>   
I agree.

>> Aside, my understanding is that that future development would implement
>> queries that fetch and store the oid such that subsequent queries would
>> perform queries in that table with a 'WHERE id = oid' clause. (Please
>> let me know if this assumption is incorrect.)
>>     
>
> I don't think any current module that uses a id query. Daniel or Bogdan, is 
> this planned for the for future, and in what timeframe? 
> I remember a discussion some month ago that this was the reason for the 
> introduction of the id columns..
>
>   
>> As I sit here, I think I 
>> would have to create a secondary bdb database for each table that
>> requires the id column. The key would be a unique integer id, and the
>> value would point to the row of the 'real' table. This would probably
>> work but it does add a layer of complexity that we take for granted in
>> the relational databases. Today, these secondary databases are not
>> implemented, and there are other issues not discussed like the concept
>> of uniqueness of the ids, etc. However, to be honest I dont know if I
>> can get all this secondary db stuff working in the next 2 months.
>>     
>
> As long as no one using this access method, you don't need to hurry at the 
> moment in this area, in my opinion. Using DB->associate (from berkeley_db) 
> sounds not so difficult, but i'm not an bdb expert. There probably many other 
> issues that need to be worked out.
>   

I agree.
> Its not possible to implement this for 1.3 anyway, the code is frozen.
>
>   
>> Please do not take this as me rejecting your ideas, but rather full
>> discloser that making db_berkeley more 'relational' comes at the cost of
>> additional complexities that are not implemented yet.
>>     
>
> No problem, as i wrote this mail i don't understand the full implication of 
> this problem completely.
>
>   
>> Aside, I started looking at the code for the openserctl cmds today, and
>> I think I need to add some fifo cmds to the modules since openser is
>> actually running at the time the openserctl util is being invoked. This
>> means the DBs are open and some data may not be commited to disk, etc. I
>> thought I'd use the carrierroute module as the starting example for
>> implemented such fifo commands, but I need a few more days to get all
>> those command implemeted/tested. I will continue on this path over the
>> next few days, such that there will be parity between the db modules
>> from the perspective of the openserctl cmds.
>>     
>
> Ok, so you want to implement some kind of "flush data" parameter?
>   
Yes, something like that. How do you feel about a 'reload' cmd, which is
a close() that flushes the cache to disk, followed by an open().
But there are other 'core' cmds that will require a live db such as:
1. add
2. rm
3. showdb
>   
>> If you prefer discussions in this working group that is good, but I am
>> also available via sip if you want to discuss voice. Just so you know
>> its an option.
>>     
>
> I can give you my company number if you like, but i'm also available at the 
> openser irc channel for private chat thorough the day (german time). 
>
> Cheers,
>
> Henning
>