[Serusers] SER data model discussion - developers please read

Tue Dec 4 00:33:33 CET 2007

At 19:39 03/12/2007, SIP wrote:
>On the flip side of having to do more modifications for integration, of 
>course, is that we get the ABILITY to do more modifications.
>
>SER 0.9.6 wasn't limited, per se, but certain aspects of what we've 
>implemented (user-configurable timers, number-based AND uri-based call 
>blocking, call forwarding, call return, whitelisting number mode, etc) 
>in 0.9.6 has felt like a complete and total hack because of the way it 
>has had to be done using a mish mash of AVPs.  We're HOPING we can 
>forego some of that with the new SER using the new db queries and such, 
>as well as being able to better-implement some functionality of other 
>RFCs using the more flexible core.
>
>But yes... integration with the new db schema is proving to be a pain in 
>the left ventricle if only because it's so totally focused around 
>optimisation for SER.

The question to me is whether this pain is really specific to current release
of the data model (which I apparently don't think so) or to do the burden
of migration from the previous. IMO, it is the latter, to be compensated
by extensibility which minimizes future migration burden.

-jiri

>As with all new stuff, there's good and there's bad.
>
>N.
>
>
>X Z wrote:
>> Hello serdev list,
>>
>> Please see the thread below started on serusers list.
>>
>> I do understand the power and design of the new data model and that it 
>> does allow the SER Core functionality to be very efficient.
>>
>> In addition, I definitely agree that of all the scenarios below, data 
>> duplication is the best of the undesirable options.
>>
>> The undesirable options being:
>> 1. The old data model that didn't support avpairs sufficiently,
>> 2. The new data model that introduces very inefficient joins and ugly 
>> SQL queries in order to create a view to integrate with Asterisk or 
>> any other third party application
>> 3. The new data model that works very well for SER core functionality 
>> PLUS a "subscriber" table that includes user data which may duplicated 
>> in the avpairs tables.
>>
>> I wasn't suggesting that the avpairs model was bad, just that the 
>> result of the data model as it stands makes it inefficient to 
>> integrate with other apps data models (like Asterisk for example.)
>>
>> I might also suggest that the subscriber table comes back into the 
>> core code in an expanded form that covers a high percentage of 
>> integration use cases, and that triggers get added in to populate the 
>> data which would be duplicated. Users can simply opt to not populate 
>> the subscriber table at all and the triggers can be disabled as a db 
>> installation option.
>>
>> The great advantage to this is that new users don't have to reinvent 
>> the wheel when they want to integrate SER with third party apps to 
>> create an "overall system" as sip at arcdiv says below.
>>
>> I'd be happy to propose a table structure and triggers if this idea 
>> gains traction with the development team. We could begin a discussion 
>> in the serusers list as to what people would like to see in this table.
>>
>> I guess I'm suggesting that SER core is great, but coming from the 
>> user base, I'd like to see a few more considerations going into how 
>> SER will be used. I'm guessing that a vast majority of users will not 
>> be using it completely stand alone. Everyone is trying to add 
>> value-added services like voicemail, presence, a web interface for 
>> users and admin management, etc., etc. Can we come up with a reference 
>> model for this and included it in the repository?
>>
>> I think it's very interesting that sip at ardev is commenting that he has 
>> to do more modifications to use the new version than he had to do 
>> before. I would hope that the development roadmap has improvements in 
>> core functionality AND functionality that helps users actually use SER 
>> (like simpler integrations and fewer custom mods). Making software 
>> simpler to implement always helps encourage adoption and helps 
>> increase user base and market share, which supports the success of the 
>> project long term.
>>
>> Thanks,
>> Mahatma
>>
>>
>> On Dec 3, 2007 4:17 AM, SIP <sip at arcdiv.com <mailto:sip at arcdiv.com>> 
>> wrote:
>>
>>     The current data model, while designed to make certain things easier
>>     DOES indeed encourage data duplication, as, in order to create a
>>     unified
>>     system, most of us will opt to duplicate data into more usable tables.
>>
>>     When meshing with an overall system, gathering all relevant data for a
>>     particular substructure or user by using joins is neither speedy
>>     (especially when the tables get HUGE -- and they very much will) nor
>>     terribly convenient.
>>
>>     Of course, data duplication is hardly a cardinal sin, but there's
>>     ALWAYS
>>     a trade off between abstraction and actual functionality. I understand
>>     the reasons WHY they've chosen the current data model, and to a
>>     degree
>>     it makes sense for the core SER system, but for meshing SER with other
>>     systems, it's god-awful ugly. :)
>>
>>     My recommendation, Mahatma, is to AVOID using the new data model for
>>     anything other than the most basic of SER functionality, or, if you
>>     gather users in the 50-100,000 user range, your user_attrs table
>>     is just
>>     going to be one ugly, unmanageably large pile of annoyance. Since the
>>     user_attrs and domain_attrs are designed in part, from what I can
>>     tell,
>>     to make selects more rational and to allow basic SIP flexibility,
>>     if you
>>     don't NEED to keep the data there, don't do it.  They could keep
>>     modifying the data model to suit everyone's tastes, but right now,
>>     it's
>>     designed more to ensure that SER works and works well. Want to store
>>     user information like a cell phone number and a fax number and a
>>     timezone and a flag on whether or not that user has DND enabled or has
>>     access to the PSTN, etc, etc? Keep it elsewhere in a more usable
>>     table
>>     for such purposes.
>>
>>     As we've begun our testing with SER 2.0, we've had to modify
>>     things more
>>     drastically than we modified them in SER 0.9.6, but in ANY
>>     environment,
>>     you're going to have to modify things to work for you. At least
>>     the core
>>     system is flexible enough to let you tailor things. :)
>>
>>     N.
>>
>>
>>     Greger V. Teigre wrote:
>>     > Dear Mahatma,
>>     > If you want to engage the developers, you will have to subscribe to
>>     > serdev, as many of the developers don't follow serusers.
>>     >
>>     > I have not participated in SER's 2.0 data model, but my initial take
>>     > on your suggestion is the following:
>>     > With all respect, I believe you may have misunderstood what Tom is
>>     > addressing in his discussion.  He targets a generic database model
>>     > built around attribute value pairs. This is not the case for SER, on
>>     > the contrary, SER's new data model is much sounder from a db
>>     > perspective (than 0.8 and 0..9) and is built around the uid and
>>     did as
>>     > unique identifiers. Queries will through joins across the tables
>>     > construct the needed data in a very efficient manner, as uid and
>>     did
>>     > are indexed and where the queries will use uid and did in the where
>>     > clause. Without having checked, I assume the tables have been
>>     > normalized just as they should (i.e. splitting them up).
>>     >
>>     > The attribute-value pairs you are referring to are not part of the
>>     > core data model (which Tom covers), but rather attributes that
>>     may be
>>     > loaded and made available in ser.cfg through a query created to
>>     > retrieve the attributes-value pairs. Without creating a limited
>>     set of
>>     > attributes that can be supported in ser.cfg, the generic avpairs
>>     > cannot be avoided. However, the queries that retrieve avpairs do
>>     not
>>     > use the semantic of the avpairs to select which avpairs to load, ALL
>>     > avpairs belonging to a specific uid and did are loaded at the
>>     same time.
>>     >
>>     > Also, I'm afraid this statement is wrong: "It's a better idea
>>     from a
>>     > database architecture and performance perspective to keep adding
>>     > columns into that table for data that has a 1 to 1 relationship
>>     with a
>>     > user."
>>     > This is exactly how you should not do it if you have complex data
>>     > relationships that need to be represented and retrieved without
>>     > duplicating data.
>>     >
>>     > Does this answer alleviate your fears?
>>     > If not, please subscribe to serdev and post your comment there to
>>     > engage people closer to the design of the database.
>>     > g-)
>>     >
>>     >
>>     >
>>     >
>>     > X Z wrote:
>>     >> Hi All,
>>     >> This is specifically for the SER/OpenSER developers, but I'm not a
>>     >> serdev list member so I'm posting here.
>>     >>
>>     >> I've been using SER since version 0.8.X and I'm still running
>>     0.8.14
>>     >> production for my company PBX to this day.
>>     >>
>>     >> I was very excited as version 2 became a release candidate and I
>>     >> downloaded it for testing. I was pretty disappointed with one
>>     aspect
>>     >> of the new data model and I'm requesting that the developers
>>     consider
>>     >> a further revision on the data model.
>>     >>
>>     >> Basically, taking all fields out of the subscriber table like
>>     >> Last_name, first_name, email, timezone, rpid/asserted identity,
>>     etc,
>>     >> etc is not the greatest idea. It's a better idea from a database
>>     >> architecture and performance perspective to keep adding columns
>>     into
>>     >> that table for data that has a 1 to 1 relationship with a user,
>>     and
>>     >> that is common in > 90% of SER's use cases ( i.e. corporate,
>>     >> carrier/VSP.) I would suggest adding voicemail_password, and maybe
>>     >> every other field that is being added into the default attributes
>>     >> script that I saw in CVS recently. If you already know what
>>     >> attributes a user has (and they have a 1 to 1 relationship),
>>     then its
>>     >> far better from a db performance perspective to keep these
>>     attributes
>>     >> in the user table. I know that the code becomes more
>>     complicated, but
>>     >> I think it may be a tradeoff worth discussing.
>>     >>
>>     >> See this discussion (
>>     >>
>>     http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056
>>     >> ) between Oracle users and Tom, (an Oracle engineer/architect.)
>>      The
>>     >> full text of this discussion is very informative and I highly
>>     >> recommend people read it through.
>>     >>
>>     >> Tom's conclusion is that the type of data model being
>>     discussed, and
>>     >> now being used in SER fails for all but the most trivial of
>>     >> applications. Maybe SER *by itself* qualifies as "trivial" from a
>>     >> database architect's perspective, but think about things like
>>     >> Asterisk integration, which is quite common. You quickly run into
>>     >> some very nasty queries . . .
>>     >>
>>     >> Please note that I am not a software developer nor a database
>>     >> engineer, just a user who reads a lot, so I'm open to being the
>>     >> ignorant one here, but I thought that this should be discussed
>>     among
>>     >> users and developers.
>>     >>
>>     >> Thanks for considering,
>>     >> Mahatma
>>     >>
>>     >>
>>     http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056
>>     <http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056>
>>     >>
>>     <http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056>
>>     >>
>>     >>
>>     >> The following is an excerpt from the above link:
>>     >>
>>     >> Here is a excerpt from my forthcoming book where I talk about
>>     this (and show you how
>>     >> ugly, hard and inefficient queries against your very flexible
>>     model will be)
>>     >>
>>     >>
>>     >>
>>     >>
>>     >> (2)Do not use Generic Data Models
>>     >>
>>     >> Frequently I see applications built on a generic data model for
>>     "maximum flexibility" or
>>     >> applications built in ways that prohibit performance. Many
>>     times - these are one in the
>>     >>
>>     >> same thing! For example, it is well known you can represent any
>>     object in a database
>>     >> using just four tables:
>>     >>
>>     >> Create table objects ( oid int primary key, name varchar2(255) );
>>     >>
>>     >> Create table attributes
>>     >>
>>     >> ( attrId int primary key, attrName varchar2(255),
>>     >> datatype varchar2(25) );
>>     >>
>>     >> Create table object_Attributes
>>     >> ( oid int, attrId int, value varchar2(4000),
>>     >> primary key(oid,attrId) );
>>     >>
>>     >> Create table Links ( oid1 int, oid2 int,
>>     >>
>>     >> primary key (oid1, oid2) );
>>     >>
>>     >>
>>     >> That's it - no more CREATE TABLE for me! I can fill the
>>     attributes table up with rows
>>     >> like this:
>>     >>
>>     >> insert into attributes values ( 1, 'DATE_OF_BIRTH', 'DATE' );
>>     >>
>>     >> insert into attributes values ( 2, 'FIRST_NAME',    'STRING' );
>>     >> insert into attributes values ( 3, 'LAST_NAME',    'STRING' );
>>     >> commit;
>>     >>
>>     >>
>>     >> And now I'm ready to create a PERSON record:
>>     >>
>>     >>
>>     >> insert into objects values ( 1, 'PERSON' );
>>     >> insert into object_Attributes values( 1, 1, '15-mar-1965' );
>>     >> insert into object_Attributes values( 1, 2, 'Thomas' );
>>     >> insert into object_Attributes values( 1, 3, 'Kyte' );
>>     >>
>>     >> commit;
>>     >>
>>     >> insert into objects values ( 2, 'PERSON' );
>>     >> insert into object_Attributes values( 2, 1, '21-oct-1968' );
>>     >> insert into object_Attributes values( 2, 2, 'John' );
>>     >> insert into object_Attributes values( 2, 3, 'Smith' );
>>     >>
>>     >> commit;
>>     >>
>>     >> And since I'm good at SQL, I can even query this record up to
>>     get the FIRST_NAME and
>>     >> LAST_NAME of all PERSON records:
>>     >>
>>     >> ops$tkyte at ORA920 > select
>>     >>      max( decode(attrName, 'FIRST_NAME', value, null )) first_name,
>>     >>
>>     >>   2  max( decode( attrName, 'LAST_NAME',  value, null ) ) last_name
>>     >>   3    from objects, object_attributes, attributes
>>     >>   4   where attributes.attrName in ( 'FIRST_NAME', 'LAST_NAME' )
>>     >>
>>     >>   5     and object_attributes.attrId = attributes.attrId
>>     >>   6     and object_attributes.oid = objects.oid
>>     >>   7     and objects.name <http://objects.name>
>>     <http://objects.name> = 'PERSON'
>>     >>   8   group by objects.oid
>>     >>
>>     >>   9  /
>>     >>
>>     >> FIRST_NAME           LAST_NAME
>>     >> -------------------- --------------------
>>     >> Thomas               Kyte
>>     >> John                 Smith
>>     >>
>>     >>
>>     >> Looks great, right? I mean, the developers don't have to create
>>     tables anymore, we can
>>     >>
>>     >> add columns at the drop of a hat (just requires an insert into
>>     the ATTRIBUTES table). The
>>     >> developers can do whatever they want and the DBA can't stop
>>     them. This is ultimate
>>     >> "flexibility". I've seen people try to build entire systems on
>>     this model.
>>     >>
>>     >>
>>     >> But, how does it perform? Miserably, terribly, horribly. A
>>     simple "select first_name,
>>     >> last_name from person" query is transformed into a 3-table join
>>     with aggregates and all.
>>     >> Further, if the attributes are "NULLABLE" - that is, there
>>     might not be a row in
>>     >>
>>     >> OBJECT_ATTRIBUTES for some attributes, you may have to outer
>>     join instead of just joining
>>     >> which in some cases can remove more optimal query plans from
>>     consideration.
>>     >>
>>     >> Writing queries might look pretty straightforward, but it's
>>     impossible to do in a
>>     >>
>>     >> performant fashion. For example, if we wanted to get everyone
>>     that was born in MARCH or
>>     >> has a LAST_NAME = 'SMITH', we could simply take the query from
>>     above and just wrap an
>>     >> inline view around that:
>>     >>
>>     >>
>>     >>
>>     >> ops$tkyte at ORA920> select *
>>     >>   2    from (
>>     >>   3  select
>>     >>      max(decode(attrName, 'FIRST_NAME', value, null)) first_name,
>>     >>   4  max(decode(attrName, 'LAST_NAME',  value, null)) last_name,
>>     >>
>>     >>   5  max(decode(attrName, 'DATE_OF_BIRTH',  value, null))
>>     >>                                                      
>>     date_of_birth
>>     >>   6    from objects, object_attributes, attributes
>>     >>   7   where attributes.attrName
>>     >>  in ( 'FIRST_NAME',
>>     >>                                      'LAST_NAME',
>>     'DATE_OF_BIRTH' )
>>     >>   8     and object_attributes.attrId = attributes.attrId
>>     >>   9     and object_attributes.oid = objects.oid
>>     >>
>>     >>  10     and objects.name <http://objects.name>
>>     <http://objects.name> = 'PERSON'
>>     >>  11   group by objects.oid
>>     >>  12         )
>>     >>  13   where last_name = 'Smith'
>>     >>  14      or date_of_birth like '%-mar-%'
>>     >>
>>     >>  15  /
>>     >>
>>     >> FIRST_NAME           LAST_NAME            DATE_OF_BIRTH
>>     >> -------------------- -------------------- --------------------
>>     >> Thomas               Kyte                 15-mar-1965
>>     >> John                 Smith                21-oct-1968
>>     >>
>>     >>
>>     >> So, it looks "easy" to query, but think about the performance!
>>     If you had a couple
>>     >> thousand OBJECT records, and a couple tens of thousands of
>>     OBJECT_ATTRIBUTES - Oracle
>>     >> would have to process the entire inner group by query first and
>>     then apply the WHERE
>>     >>
>>     >> clause.
>>     >>
>>     >> This is not a made up data model, one that I crafted just to
>>     make a point. This is an
>>     >> actual data model that I've seen people try to use. Their goal
>>     is ultimate flexibility.
>>     >> They don't know what OBJECTS they need, they don't know what
>>     ATTRIBUTES they will have.
>>     >>
>>     >> Well - that is what the database was written for in the first
>>     place: Oracle implemented
>>     >> this thing called SQL to define OBJECTS and ATTRIBUTES and lets
>>     you use SQL to query
>>     >> them. You are trying to put a generic layer on top of a generic
>>     layer - and it fails each
>>     >>
>>     >> and every time except for the most trivial of applications.
>>     >>
>>     >>
>>     >>
>>     ------------------------------------------------------------------------
>>     >>
>>     >> _______________________________________________
>>     >> Serusers mailing list
>>     >> Serusers at lists.iptel.org <mailto:Serusers at lists.iptel.org>
>>     >> http://lists.iptel.org/mailman/listinfo/serusers
>>     <http://lists.iptel.org/mailman/listinfo/serusers>
>>     >>
>>     >
>>     ------------------------------------------------------------------------
>>     >
>>     > _______________________________________________
>>     > Serusers mailing list
>>     > Serusers at lists.iptel.org <mailto:Serusers at lists.iptel.org>
>>     > http://lists.iptel.org/mailman/listinfo/serusers
>>     >
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Serusers mailing list
>> Serusers at lists.iptel.org
>> http://lists.iptel.org/mailman/listinfo/serusers
>>   
>
>_______________________________________________
>Serusers mailing list
>Serusers at lists.iptel.org
>http://lists.iptel.org/mailman/listinfo/serusers

--
Jiri Kuthan            http://iptel.org/~jiri/