[Serusers] SER data model discussion - developers please read

Sun Dec 2 03:48:21 CET 2007

Hi All,
This is specifically for the SER/OpenSER developers, but I'm not a serdev
list member so I'm posting here.

I've been using SER since version 0.8.X and I'm still running
0.8.14production for my company PBX to this day.

I was very excited as version 2 became a release candidate and I downloaded
it for testing. I was pretty disappointed with one aspect of the new data
model and I'm requesting that the developers consider a further revision on
the data model.

Basically, taking all fields out of the subscriber table like Last_name,
first_name, email, timezone, rpid/asserted identity, etc, etc is not the
greatest idea. It's a better idea from a database architecture and
performance perspective to keep adding columns into that table for data that
has a 1 to 1 relationship with a user, and that is common in > 90% of SER's
use cases (i.e. corporate, carrier/VSP.) I would suggest adding
voicemail_password, and maybe every other field that is being added into the
default attributes script that I saw in CVS recently. If you already know
what attributes a user has (and they have a 1 to 1 relationship), then its
far better from a db performance perspective to keep these attributes in the
user table. I know that the code becomes more complicated, but I think it
may be a tradeoff worth discussing.

See this discussion (
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056)
between Oracle users and Tom, (an Oracle engineer/architect.)  The
full
text of this discussion is very informative and I highly recommend people
read it through.

Tom's conclusion is that the type of data model being discussed, and now
being used in SER fails for all but the most trivial of applications. Maybe
SER *by itself* qualifies as "trivial" from a database architect's
perspective, but think about things like Asterisk integration, which is
quite common. You quickly run into some very nasty queries . . .

Please note that I am not a software developer nor a database engineer, just
a user who reads a lot, so I'm open to being the ignorant one here, but I
thought that this should be discussed among users and developers.

Thanks for considering,
Mahatma

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056

The following is an excerpt from the above link:

Here is a excerpt from my forthcoming book where I talk about this
(and show you how
ugly, hard and inefficient queries against your very flexible model will be)

(2)Do not use Generic Data Models

Frequently I see applications built on a generic data model for
"maximum flexibility" or
applications built in ways that prohibit performance. Many times -
these are one in the
same thing! For example, it is well known you can represent any object
in a database
using just four tables:

Create table objects ( oid int primary key, name varchar2(255) );

Create table attributes
( attrId int primary key, attrName varchar2(255),
datatype varchar2(25) );

Create table object_Attributes
( oid int, attrId int, value varchar2(4000),
primary key(oid,attrId) );

Create table Links ( oid1 int, oid2 int,
primary key (oid1, oid2) );

That's it - no more CREATE TABLE for me! I can fill the attributes
table up with rows
like this:

insert into attributes values ( 1, 'DATE_OF_BIRTH', 'DATE' );
insert into attributes values ( 2, 'FIRST_NAME',    'STRING' );
insert into attributes values ( 3, 'LAST_NAME',    'STRING' );
commit;

And now I'm ready to create a PERSON record:

insert into objects values ( 1, 'PERSON' );
insert into object_Attributes values( 1, 1, '15-mar-1965' );
insert into object_Attributes values( 1, 2, 'Thomas' );
insert into object_Attributes values( 1, 3, 'Kyte' );
commit;

insert into objects values ( 2, 'PERSON' );
insert into object_Attributes values( 2, 1, '21-oct-1968' );
insert into object_Attributes values( 2, 2, 'John' );
insert into object_Attributes values( 2, 3, 'Smith' );
commit;

And since I'm good at SQL, I can even query this record up to get the
FIRST_NAME and
LAST_NAME of all PERSON records:

ops$tkyte at ORA920> select
     max( decode(attrName, 'FIRST_NAME', value, null )) first_name,
  2  max( decode( attrName, 'LAST_NAME',  value, null ) ) last_name
  3    from objects, object_attributes, attributes
  4   where attributes.attrName in ( 'FIRST_NAME', 'LAST_NAME' )
  5     and object_attributes.attrId = attributes.attrId
  6     and object_attributes.oid = objects.oid
  7     and objects.name = 'PERSON'
  8   group by objects.oid
  9  /

FIRST_NAME           LAST_NAME
-------------------- --------------------
Thomas               Kyte
John                 Smith

Looks great, right? I mean, the developers don't have to create tables
anymore, we can
add columns at the drop of a hat (just requires an insert into the
ATTRIBUTES table). The
developers can do whatever they want and the DBA can't stop them. This
is ultimate
"flexibility". I've seen people try to build entire systems on this model.

But, how does it perform? Miserably, terribly, horribly. A simple
"select first_name,
last_name from person" query is transformed into a 3-table join with
aggregates and all.
Further, if the attributes are "NULLABLE" - that is, there might not
be a row in
OBJECT_ATTRIBUTES for some attributes, you may have to outer join
instead of just joining
which in some cases can remove more optimal query plans from consideration.

Writing queries might look pretty straightforward, but it's impossible
to do in a
performant fashion. For example, if we wanted to get everyone that was
born in MARCH or
has a LAST_NAME = 'SMITH', we could simply take the query from above
and just wrap an
inline view around that:

ops$tkyte at ORA920> select *
  2    from (
  3  select
     max(decode(attrName, 'FIRST_NAME', value, null)) first_name,
  4  max(decode(attrName, 'LAST_NAME',  value, null)) last_name,
  5  max(decode(attrName, 'DATE_OF_BIRTH',  value, null))
                                                      date_of_birth
  6    from objects, object_attributes, attributes
  7   where attributes.attrName in ( 'FIRST_NAME',
                                     'LAST_NAME', 'DATE_OF_BIRTH' )
  8     and object_attributes.attrId = attributes.attrId
  9     and object_attributes.oid = objects.oid
 10     and objects.name = 'PERSON'
 11   group by objects.oid
 12         )
 13   where last_name = 'Smith'
 14      or date_of_birth like '%-mar-%'
 15  /

FIRST_NAME           LAST_NAME            DATE_OF_BIRTH
-------------------- -------------------- --------------------
Thomas               Kyte                 15-mar-1965
John                 Smith                21-oct-1968

So, it looks "easy" to query, but think about the performance! If you
had a couple
thousand OBJECT records, and a couple tens of thousands of
OBJECT_ATTRIBUTES - Oracle
would have to process the entire inner group by query first and then
apply the WHERE
clause.

This is not a made up data model, one that I crafted just to make a
point. This is an
actual data model that I've seen people try to use. Their goal is
ultimate flexibility.
They don't know what OBJECTS they need, they don't know what
ATTRIBUTES they will have.
Well - that is what the database was written for in the first place:
Oracle implemented
this thing called SQL to define OBJECTS and ATTRIBUTES and lets you
use SQL to query
them. You are trying to put a generic layer on top of a generic layer
- and it fails each
and every time except for the most trivial of applications.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sip-router.org/pipermail/sr-users/attachments/20071201/22d04d8b/attachment.htm>