On 8/30/12 12:50 PM, Øyvind Kolbu wrote:
On 2012-08-30 at 11:32, Daniel-Constantin Mierla wrote:
On 8/30/12 10:29 AM, Øyvind Kolbu wrote:
Yes, I actually want both, HA + replication, by using db_cluster as the replicator. And as I've demonstrated the location-table on non-primary servers will lack entries after downtime, which is bad.
Replication is done with parallel writing as long as all nodes are available, but, better said, you look for re-synchronization after downtime, which is another story, not easy to code, because it is db specific.
I don't want Kamailio to synchronise my data, but I think it is reasonable to expect it to treat the write targets individual and independent of the result from the initial reading.
The write is independent of the read operation. Read cannot be performed in parallel, that results in duplicated records back to application.
Then, have in mind that there are several operations, one after the other: - update - which do update to all write db servers - test affected rows (which is always working on last write (update) connection) - insert if affected rows is 0
So it is not an atomic operation, like updated_and_if_affected_rows_is_0_then_insert. All this layer is done in usrloc, in sequential steps, working fine for one server, but not for multiple nodes.
I am not sure what it will take to implement this kind of operation inside the database drivers, then it may work. TBased on quick thoughts, the code is there, just that has to be structured for each db connector and exported via db api and propagated to db_cluster.
Perhaps enabling db_update_as_insert is the only current option.
This can end up in lot of records, as there is no update - if you set constraints at database layer, then you have failures on unique keys.
If this indeed is impossible I've have to continue our current scheme with SIP level replication.
If you use db mode 3, then you can do database level replication, is the same.
| DB1 | | DB2 |
| | | | | |
| LOC1| | LOC2 |
\ / ---------- | Phones | ----------
The above setup is my scenario. When everything is up LOC1 will use DB1 for reading and write to both DB1 and DB2. Similarly LOC2 will use DB2 for reading and write to both DB1 and DB2. Both uses the "other" DB as failover for reading. LOC1 and LOC2 are setup with even load in an SRV record.
- While DB2 is down, say reboot for a new kernel, a phone A boots and registers at LOC1 and is populated in the DB1 database. Reading works fine from LOC2 to DB1.
- DB2 is back again.
- Phone A re-registers at LOC1. The previous entry is found in the location table and an UPDATE is issued for both DB1 and DB2, but DB2 will still lack the entry.
DB2 will _never_ get an entry for until phone A boots and gets a new Call-ID or for some reason phone A chooses to register with LOC2 instead. Then a duplicate entry will end up in DB1, as LOC2 will blindly issue an INSERT to both DB1 and DB2.
As the location servers are evenly used, ~every second call to phone A will fail with 404.
You have to do cross replication at database layer and use db_cluster as read/write for failover access (e.g., try read/write on db1 and if fails, try the other one).
Cheers, Daniel