Re: [Serusers] SER data model discussion - developers please read

3 Dec 2007

      The current data model, while designed to make certain things easier 
DOES indeed encourage data duplication, as, in order to create a unified 
system, most of us will opt to duplicate data into more usable tables.
When meshing with an overall system, gathering all relevant data for a 
particular substructure or user by using joins is neither speedy 
(especially when the tables get HUGE -- and they very much will) nor 
terribly convenient.
Of course, data duplication is hardly a cardinal sin, but there's ALWAYS 
a trade off between abstraction and actual functionality. I understand 
the reasons WHY they've chosen the current data model, and to a degree 
it makes sense for the core SER system, but for meshing SER with other 
systems, it's god-awful ugly. :)
My recommendation, Mahatma, is to AVOID using the new data model for 
anything other than the most basic of SER functionality, or, if you 
gather users in the 50-100,000 user range, your user_attrs table is just 
going to be one ugly, unmanageably large pile of annoyance. Since the 
user_attrs and domain_attrs are designed in part, from what I can tell, 
to make selects more rational and to allow basic SIP flexibility, if you 
don't NEED to keep the data there, don't do it.  They could keep 
modifying the data model to suit everyone's tastes, but right now, it's 
designed more to ensure that SER works and works well. Want to store 
user information like a cell phone number and a fax number and a 
timezone and a flag on whether or not that user has DND enabled or has 
access to the PSTN, etc, etc? Keep it elsewhere in a more usable table 
for such purposes.
As we've begun our testing with SER 2.0, we've had to modify things more 
drastically than we modified them in SER 0.9.6, but in ANY environment, 
you're going to have to modify things to work for you. At least the core 
system is flexible enough to let you tailor things. :)
N.
Greger V. Teigre wrote:
...
Dear Mahatma,
If you want to engage the developers, you will have to subscribe to 
serdev, as many of the developers don't follow serusers.
I have not participated in SER's 2.0 data model, but my initial take 
on your suggestion is the following:
With all respect, I believe you may have misunderstood what Tom is 
addressing in his discussion.  He targets a generic database model 
built around attribute value pairs. This is not the case for SER, on 
the contrary, SER's new data model is much sounder from a db 
perspective (than 0.8 and 0..9) and is built around the uid and did as 
unique identifiers. Queries will through joins across the tables 
construct the needed data in a very efficient manner, as uid and did 
are indexed and where the queries will use uid and did in the where 
clause. Without having checked, I assume the tables have been 
normalized just as they should (i.e. splitting them up).
The attribute-value pairs you are referring to are not part of the 
core data model (which Tom covers), but rather attributes that may be 
loaded and made available in ser.cfg through a query created to 
retrieve the attributes-value pairs. Without creating a limited set of 
attributes that can be supported in ser.cfg, the generic avpairs 
cannot be avoided. However, the queries that retrieve avpairs do not 
use the semantic of the avpairs to select which avpairs to load, ALL 
avpairs belonging to a specific uid and did are loaded at the same time.
Also, I'm afraid this statement is wrong: "It's a better idea from a 
database architecture and performance perspective to keep adding 
columns into that table for data that has a 1 to 1 relationship with a 
user." 
This is exactly how you should not do it if you have complex data 
relationships that need to be represented and retrieved without 
duplicating data.
Does this answer alleviate your fears? 
If not, please subscribe to serdev and post your comment there to 
engage people closer to the design of the database.
g-)
X Z wrote:
...
Hi All,
This is specifically for the SER/OpenSER developers, but I'm not a 
serdev list member so I'm posting here.
I've been using SER since version 0.8.X and I'm still running 0.8.14 
production for my company PBX to this day.
I was very excited as version 2 became a release candidate and I 
downloaded it for testing. I was pretty disappointed with one aspect 
of the new data model and I'm requesting that the developers consider 
a further revision on the data model.
Basically, taking all fields out of the subscriber table like 
Last_name, first_name, email, timezone, rpid/asserted identity, etc, 
etc is not the greatest idea. It's a better idea from a database 
architecture and performance perspective to keep adding columns into 
that table for data that has a 1 to 1 relationship with a user, and 
that is common in > 90% of SER's use cases ( i.e. corporate, 
carrier/VSP.) I would suggest adding voicemail_password, and maybe 
every other field that is being added into the default attributes 
script that I saw in CVS recently. If you already know what 
attributes a user has (and they have a 1 to 1 relationship), then its 
far better from a db performance perspective to keep these attributes 
in the user table. I know that the code becomes more complicated, but 
I think it may be a tradeoff worth discussing.
See this discussion ( 
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084... 
) between Oracle users and Tom, (an Oracle engineer/architect.)  The 
full text of this discussion is very informative and I highly 
recommend people read it through.
Tom's conclusion is that the type of data model being discussed, and 
now being used in SER fails for all but the most trivial of 
applications. Maybe SER *by itself* qualifies as "trivial" from a 
database architect's perspective, but think about things like 
Asterisk integration, which is quite common. You quickly run into 
some very nasty queries . . .
Please note that I am not a software developer nor a database 
engineer, just a user who reads a lot, so I'm open to being the 
ignorant one here, but I thought that this should be discussed among 
users and developers.
Thanks for considering,
Mahatma
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084... 
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056
The following is an excerpt from the above link:
Here is a excerpt from my forthcoming book where I talk about this (and show you how 
ugly, hard and inefficient queries against your very flexible model will be)
(2)Do not use Generic Data Models
Frequently I see applications built on a generic data model for "maximum flexibility" or 
applications built in ways that prohibit performance. Many times - these are one in the
same thing! For example, it is well known you can represent any object in a database 
using just four tables:
Create table objects ( oid int primary key, name varchar2(255) );
Create table attributes
( attrId int primary key, attrName varchar2(255), 
datatype varchar2(25) );
Create table object_Attributes 
( oid int, attrId int, value varchar2(4000), 
primary key(oid,attrId) );
Create table Links ( oid1 int, oid2 int,
primary key (oid1, oid2) );
That's it - no more CREATE TABLE for me! I can fill the attributes table up with rows 
like this:
insert into attributes values ( 1, 'DATE_OF_BIRTH', 'DATE' );
insert into attributes values ( 2, 'FIRST_NAME',    'STRING' );
insert into attributes values ( 3, 'LAST_NAME',    'STRING' );
commit;
And now I'm ready to create a PERSON record:
insert into objects values ( 1, 'PERSON' );
insert into object_Attributes values( 1, 1, '15-mar-1965' );
insert into object_Attributes values( 1, 2, 'Thomas' );
insert into object_Attributes values( 1, 3, 'Kyte' );
commit;
insert into objects values ( 2, 'PERSON' );
insert into object_Attributes values( 2, 1, '21-oct-1968' );
insert into object_Attributes values( 2, 2, 'John' );
insert into object_Attributes values( 2, 3, 'Smith' );
commit;
And since I'm good at SQL, I can even query this record up to get the FIRST_NAME and 
LAST_NAME of all PERSON records:
ops$tkyte@ORA920> select 
     max( decode(attrName, 'FIRST_NAME', value, null )) first_name,
2  max( decode( attrName, 'LAST_NAME',  value, null ) ) last_name
  3    from objects, object_attributes, attributes
  4   where attributes.attrName in ( 'FIRST_NAME', 'LAST_NAME' )
5     and object_attributes.attrId = attributes.attrId
  6     and object_attributes.oid = objects.oid
  7     and objects.name http://objects.name = 'PERSON'
  8   group by objects.oid
9  /
FIRST_NAME           LAST_NAME

Thomas               Kyte
John                 Smith
Looks great, right? I mean, the developers don't have to create tables anymore, we can
add columns at the drop of a hat (just requires an insert into the ATTRIBUTES table). The 
developers can do whatever they want and the DBA can't stop them. This is ultimate 
"flexibility". I've seen people try to build entire systems on this model.
But, how does it perform? Miserably, terribly, horribly. A simple "select first_name, 
last_name from person" query is transformed into a 3-table join with aggregates and all. 
Further, if the attributes are "NULLABLE" - that is, there might not be a row in
OBJECT_ATTRIBUTES for some attributes, you may have to outer join instead of just joining 
which in some cases can remove more optimal query plans from consideration.
Writing queries might look pretty straightforward, but it's impossible to do in a
performant fashion. For example, if we wanted to get everyone that was born in MARCH or 
has a LAST_NAME = 'SMITH', we could simply take the query from above and just wrap an 
inline view around that:
ops$tkyte@ORA920> select *
  2    from (
  3  select 
     max(decode(attrName, 'FIRST_NAME', value, null)) first_name,
  4  max(decode(attrName, 'LAST_NAME',  value, null)) last_name,
5  max(decode(attrName, 'DATE_OF_BIRTH',  value, null)) 
                                                      date_of_birth
  6    from objects, object_attributes, attributes
  7   where attributes.attrName
 in ( 'FIRST_NAME', 
                                     'LAST_NAME', 'DATE_OF_BIRTH' )
  8     and object_attributes.attrId = attributes.attrId
  9     and object_attributes.oid = objects.oid
10     and objects.name http://objects.name = 'PERSON'
 11   group by objects.oid
 12         )
 13   where last_name = 'Smith'
 14      or date_of_birth like '%-mar-%'
15  /
FIRST_NAME           LAST_NAME            DATE_OF_BIRTH

Thomas               Kyte                 15-mar-1965
John                 Smith                21-oct-1968
So, it looks "easy" to query, but think about the performance! If you had a couple 
thousand OBJECT records, and a couple tens of thousands of OBJECT_ATTRIBUTES - Oracle 
would have to process the entire inner group by query first and then apply the WHERE
clause.
This is not a made up data model, one that I crafted just to make a point. This is an 
actual data model that I've seen people try to use. Their goal is ultimate flexibility. 
They don't know what OBJECTS they need, they don't know what ATTRIBUTES they will have.
Well - that is what the database was written for in the first place: Oracle implemented 
this thing called SQL to define OBJECTS and ATTRIBUTES and lets you use SQL to query 
them. You are trying to put a generic layer on top of a generic layer - and it fails each
and every time except for the most trivial of applications.

Serusers mailing list
Serusers@lists.iptel.org
http://lists.iptel.org/mailman/listinfo/serusers

Serusers mailing list
Serusers@lists.iptel.org
http://lists.iptel.org/mailman/listinfo/serusers

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Serusers] SER data model discussion - developers please read