Hello,
following some discussions on github issues, mailing list and irc recently, I am opening a topic and see if we can get something to accommodate the needs out there at the best for everyone.
These are more in the context with the increase of sizes for deployments, but also having the option to use a no-sql, some internal bits of usrloc may need tunings.
The points to be considered: 1) discovering what is the server that wrote the record 2) efficiently getting the records that require nat keepalive
The two are sometime related, but not necessary always.
Victor Seva on irc channel was saying that is planning to propose a patch for 1), relying on socket value in the location table. The use case presented was many servers saving and looking up the records in the same table, but fetch only the records written by current instance for sending keepalives. It makes sense not send keepalives from all instances, for load as well and bandwidth considerations. This could work relying on local socket, but I am not sure how efficient will be when listening on many sockets (different ips, but also different ports or protocols - ipv4/6, tcp, tls, udp -- all add sockets).
On 2), it's being for a while on my list to review, as the module uses custom sql queries (or even functions) to be able to match the the records for sending keepalives -- it does matching with bitwise operations to see if flags for nat are set. That doesn't work with no-sql databases (including db_text, db_mongodb and I expect db_cassandra). Even for sql, that kind of query is not efficient, especially when dealing with a large amount of records.
The solutions that came in my mind for now:
- for 1) -- add the server_id as a new column in the location table. It was pointed to be that ser location had it. This will turn a query matching on many socket-representation strings to one expression done on an integer. The value for server_id can be set via the core parameter with the same name.
- for 2) -- add a new column 'keepalive' to be set internally by the module to 1 if any of the flags for sending keepalive was set. Then the query to fetch it will just use a match on it, rather that bitwise operations in the match expression. Furthermore, this can be set to a different value when enabling the keepalive partitioning (ie., sending keepalives from different timer processes, each doing for a part of natted users) - right now, for the db only mode, the selection is done by column id % (number of keepalive processes).
Opinions, improvements, other proposals?
Cheers, Daniel