[Serusers] SER data model discussion - developers please read

Tue Dec 4 03:37:36 CET 2007

>I disagree entirely that it's strictly based on migration.
>
>Let's look at the simple act of adding information to the system:
>
>With a flat subscriber table with all the fields I desire, adding 
>information to the system is 1 insert.
>Let's say that my subscriber table looks like this (because in the dev 
>ser 0.9.6 box, it does):
>
>+-------------------+--------------+------+-----+---------------------+-------+
>| Field             | Type         | Null | Key | Default             | 
>Extra |
>+-------------------+--------------+------+-----+---------------------+-------+
>| phplib_id         | varchar(32)  | NO   | UNI |                     
>|       |
>| username          | varchar(64)  | NO   | PRI |                     
>|       |
>| domain            | varchar(128) | NO   | PRI |                     
>|       |
>| password          | varchar(25)  | NO   |     |                     
>|       |
>| first_name        | varchar(25)  | NO   |     |                     
>|       |
>| last_name         | varchar(45)  | NO   |     |                     
>|       |
>| phone             | varchar(15)  | NO   |     |                     
>|       |
>| email_address     | varchar(50)  | NO   |     |                     
>|       |
>| datetime_created  | datetime     | NO   |     | 0000-00-00 00:00:00 
>|       |
>| datetime_modified | timestamp    | NO   |     | CURRENT_TIMESTAMP   
>|       |
>| confirmation      | varchar(64)  | NO   |     |                     
>|       |
>| flag              | char(1)      | NO   |     | o                   
>|       |
>| sendnotification  | varchar(50)  | NO   |     |                     
>|       |
>| greeting          | varchar(50)  | NO   |     |                     
>|       |
>| ha1               | varchar(128) | NO   |     |                     
>|       |
>| ha1b              | varchar(128) | NO   |     |                     
>|       |
>| allow_find        | char(1)      | NO   |     | 0                   
>|       |
>| timezone          | varchar(128) | YES  |     | NULL                
>|       |
>| rpid              | varchar(128) | YES  |     | NULL                
>|       |
>| domn              | int(10)      | YES  |     | NULL                
>|       |
>| uuid              | varchar(64)  | YES  |     | NULL                
>|       |
>| signup_ip         | varchar(15)  | NO   |     |                     
>|       |
>+-------------------+--------------+------+-----+---------------------+-------+
>
>
>Now... ignoring the fact that we have 1 possibly 2 unused fields in that 
>table, let's assume we have to add a user in the new schema.

let's be first just a bit more accurate:
- uuid, username,password,domain name, ha's, and their equivalents are
  like before in the table, which is now called credentials (about 7)
- so we are speaking about the application stuff, which may be very
  different for different uses: callid, signup_up, timezone, email
  address, confirmation string, etc. etc. (about 15, how many depending
  on the actual deployment -- I'm wondering who is for example using
  "sendnotification")

>That's 22 inserts right there (one for each attribute, especially since 
>we can't have defaults).

I would guess about ten as opposed to 22.
Nevertheless, as long as they are inserted by machines as opposed to 
humans, I'm less worried about that.

>Searching through data? Let's say I want to know the 
>first_name,last_name,username,domain,confirmation, and the 
>datetime_modified of a user.  Simple enough.  That's 1 query. If any of 
>those data points were null or their default, grabbing that is still 
>easy as can be.
>
>Now... in the new schema, that's a join (which IS less efficient than a 
>query, 

It is not necessarily join -- it may be a double query, with the first
query being trivial (delivering UID) and the second actually too
(querying attributed by UID). Many applications have UID stored about
lifetime of their session (a subscriber logs in, his UID is stored)
and effectively carry out only the latter query.

>I'm afraid, though I can't quote exact meaningful metrics on how 
>much). 

Well, without that, this discussion is a bit missing grounds.
I mean we can't really speak about performance concerns unless
we have the numbers for it.

>If any of those data points are not SET (assuming we were going 
>to minimise our impact by NOT doing a full 22-insert creation for each 
>user), it causes yet another series of headaches. When you want to 
>reference data in other relations, it becomes even MORE of a hassle. The 
>queries themselves become these unwieldy beasts that increase the chance 
>of an error in syntax or logic.
>
>One of the simplest queries we have is with our admin interface to query 
>who's logged in in such a way that the data will be used in the web 
>interface:
>
>select username, first_name, last_name, signup_ip from subscriber where 
>username=ANY (select username from location);
>
>Simple. Succinct. Clear.We grab the fields we need and can format them. 
>One short line.

I would add in your languagne "non-relational" too. Squizing everything
you can in a single table does not have much of relation in it, does it?

>Now I'll leave the new schema version as an exercise for the reader, as 
>I don't recall the syntax right off (which is the first clue as to its 
>added complexity). It's lengthy and utterly non-relational -- which is 
>why we simply don't USE it for these sorts of things.

I understand that formulating a query may be more complex here, and 
that's for sure a good point.

>This is not just a migration headache. This is a schema problem.  Now, I 
>know you love your schema, and you're allowed, and I fully understand 
>that it's flexible and good for the SER service. But for integration 
>with any sort of system beyond the most basic, it's a right pain in the 
>ass. You can tout its joys and wonders all you'd like, but we're 
>developing with it, and I can assure you, it's not as easy and as 
>straightforward as you like to believe.

Maybe it depends on the type of apps you develop. We have had very
dynamic apps too (say a la phpmyadmin), and the pain of changing scheme
was unbearable.

>I'm not saying there's anything wrong with the schema overall. There 
>have been some great forward leaps in this one, such as the riddance of 
>the incredibly tiresome domain column in every table (that was sometimes 
>checked and sometimes not). But it DOES make for additional complexity 
>when building a system around it.

Point taken, that's for sure agreeable.
Again, if you think that's a way too big obstacle, doing a patch which allows
to read AVPs from credentials table should be easy and should not cause
conflicts with rest of codebase. It could be even done in a way that works
over both types of structures. (Despite a certain mess risk.)

-jiri

>N.
>_______________________________________________
>Serusers mailing list
>Serusers at lists.iptel.org
>http://lists.iptel.org/mailman/listinfo/serusers

--
Jiri Kuthan            http://iptel.org/~jiri/