Hello
I use the htable module a lot but the only problem, when I add a new entry in a htable, is when I will delete it. My question is: if the hash table is completely filled and I try to add a new value to it, do I get an error or is an old value automatically deleted to be able to write my new value? If an old value was automatically deleted whenever a new value is added, I wouldn't have to bother deleting the values I no longer need.
Thanks Catalina
Hi,
On 26.08.2009 14:26 Uhr, catalina oancea wrote:
Hello
I use the htable module a lot but the only problem, when I add a new entry in a htable, is when I will delete it. My question is: if the hash table is completely filled and I try to add a new value to it, do I get an error or is an old value automatically deleted to be able to write my new value? If an old value was automatically deleted whenever a new value is added, I wouldn't have to bother deleting the values I no longer need.
the are deleted only if you have auto-expire set for htable -- see readme for defining htables.
A hash table is filled when no more shm is available, it is better not to get there since not much will work at that time.
Cheers, Daniel
Daniel-Constantin Mierla wrote:
Hi,
On 26.08.2009 14:26 Uhr, catalina oancea wrote:
Hello
I use the htable module a lot but the only problem, when I add a new entry in a htable, is when I will delete it. My question is: if the hash table is completely filled and I try to add a new value to it, do I get an error or is an old value automatically deleted to be able to write my new value? If an old value was automatically deleted whenever a new value is added, I wouldn't have to bother deleting the values I no longer need.
the are deleted only if you have auto-expire set for htable -- see readme for defining htables.
A hash table is filled when no more shm is available, it is better not to get there since not much will work at that time.
There is no way to manually delete a key->value in a bucket?
On 26.08.2009 14:38 Uhr, Alex Balashov wrote:
Daniel-Constantin Mierla wrote:
Hi,
On 26.08.2009 14:26 Uhr, catalina oancea wrote:
Hello
I use the htable module a lot but the only problem, when I add a new entry in a htable, is when I will delete it. My question is: if the hash table is completely filled and I try to add a new value to it, do I get an error or is an old value automatically deleted to be able to write my new value? If an old value was automatically deleted whenever a new value is added, I wouldn't have to bother deleting the values I no longer need.
the are deleted only if you have auto-expire set for htable -- see readme for defining htables.
A hash table is filled when no more shm is available, it is better not to get there since not much will work at that time.
There is no way to manually delete a key->value in a bucket?
it is (was there from first day):
$sht(a=>x) = null;
I have been talking about auto-delete.
There are options to delete by regular expression matching against key or value, see: http://kamailio.org/docs/modules/1.5.x/htable.html#id2491912
Cheers, Daniel
Daniel-Constantin Mierla wrote:
On 26.08.2009 14:38 Uhr, Alex Balashov wrote:
Daniel-Constantin Mierla wrote:
Hi,
On 26.08.2009 14:26 Uhr, catalina oancea wrote:
Hello
I use the htable module a lot but the only problem, when I add a new entry in a htable, is when I will delete it. My question is: if the hash table is completely filled and I try to add a new value to it, do I get an error or is an old value automatically deleted to be able to write my new value? If an old value was automatically deleted whenever a new value is added, I wouldn't have to bother deleting the values I no longer need.
the are deleted only if you have auto-expire set for htable -- see readme for defining htables.
A hash table is filled when no more shm is available, it is better not to get there since not much will work at that time.
There is no way to manually delete a key->value in a bucket?
it is (was there from first day):
$sht(a=>x) = null;
I have been talking about auto-delete.
There are options to delete by regular expression matching against key or value, see: http://kamailio.org/docs/modules/1.5.x/htable.html#id2491912
Oh, I see. Sorry for the confusion!
What would be interesting and pleasant is some sort of map() function (in the Perl and/or functional programming style) that allows one to easily load database result sets into a hash table, keyed by a column.
On 26.08.2009 15:29 Uhr, Alex Balashov wrote:
Daniel-Constantin Mierla wrote:
On 26.08.2009 14:38 Uhr, Alex Balashov wrote:
Daniel-Constantin Mierla wrote:
Hi,
On 26.08.2009 14:26 Uhr, catalina oancea wrote:
Hello
I use the htable module a lot but the only problem, when I add a new entry in a htable, is when I will delete it. My question is: if the hash table is completely filled and I try to add a new value to it, do I get an error or is an old value automatically deleted to be able to write my new value? If an old value was automatically deleted whenever a new value is added, I wouldn't have to bother deleting the values I no longer need.
the are deleted only if you have auto-expire set for htable -- see readme for defining htables.
A hash table is filled when no more shm is available, it is better not to get there since not much will work at that time.
There is no way to manually delete a key->value in a bucket?
it is (was there from first day):
$sht(a=>x) = null;
I have been talking about auto-delete.
There are options to delete by regular expression matching against key or value, see: http://kamailio.org/docs/modules/1.5.x/htable.html#id2491912
Oh, I see. Sorry for the confusion!
What would be interesting and pleasant is some sort of map() function (in the Perl and/or functional programming style) that allows one to easily load database result sets into a hash table, keyed by a column.
this is available in kamailio via sqlops and htable module. You can iterate through sql result, have access to column name and value.
Not sure right now what it takes to export it to perl or so. The c code has function to add a item in table as simple as add_item(str key, int_str value).
Cheers, Daniel
Daniel-Constantin Mierla wrote:
this is available in kamailio via sqlops and htable module. You can iterate through sql result, have access to column name and value.
Yes, but the point is that it is not a terse and elegant one-liner.
You like elegance, right? :-)
What do you mean with: "A hash table is filled when no more shm is available, it is better not to get there since not much will work at that time." ?
In the docs I understood that the size parameter decides the number of entries: "size - number specifying the size of hash table. The number of entries in the table is 2^size "
2009/8/26 Daniel-Constantin Mierla miconda@gmail.com:
On 26.08.2009 14:38 Uhr, Alex Balashov wrote:
Daniel-Constantin Mierla wrote:
Hi,
On 26.08.2009 14:26 Uhr, catalina oancea wrote:
Hello
I use the htable module a lot but the only problem, when I add a new entry in a htable, is when I will delete it. My question is: if the hash table is completely filled and I try to add a new value to it, do I get an error or is an old value automatically deleted to be able to write my new value? If an old value was automatically deleted whenever a new value is added, I wouldn't have to bother deleting the values I no longer need.
the are deleted only if you have auto-expire set for htable -- see readme for defining htables.
A hash table is filled when no more shm is available, it is better not to get there since not much will work at that time.
There is no way to manually delete a key->value in a bucket?
it is (was there from first day):
$sht(a=>x) = null;
I have been talking about auto-delete.
There are options to delete by regular expression matching against key or value, see: http://kamailio.org/docs/modules/1.5.x/htable.html#id2491912
Cheers, Daniel
-- Daniel-Constantin Mierla
Kamailio (OpenSER) - Users mailing list Users@lists.kamailio.org http://lists.kamailio.org/cgi-bin/mailman/listinfo/users http://lists.openser-project.org/cgi-bin/mailman/listinfo/users
The size of a hash table only indicates how many buckets it has, not how many entries it can hold. Entries can hash to the same bucket ID; this will result in collisions, which hash table implementations can manage. Various algorithms dealing with collision scenarios account allow the buckets to be oversubscribed, if necessary.
Typical collision management strategies are either to hang a linked list off a bucket containing all entries beyond the head that also mapped to that bucket, or some sort of mathematical approach that computes a different bucket to use at deterministic relation to the one that has collided.
Larger table sizes will - given a decent and appropriate hash algorithm - cause fewer (if any) collisions since the factors and/or divisors involved in common hash functions are more numerous. This is desirable because a collision-ridden table takes longer to search, undermining its usefulness as a data structure of O(1) search complexity.
For example, if a list of collided keys (and value pointers) is hung off a bucket, that list is searched linearly once the hash computation is run and the first value encountered is not found to be the one sought.
So, a very small table size will cause your table to degenerate into a few parallel linear structures, and that's assuming a perfect uniform distribution from the hash function and variance in keys. Larger table sizes eliminate - or mitigate - this problem.
catalina oancea wrote:
What do you mean with: "A hash table is filled when no more shm is available, it is better not to get there since not much will work at that time." ?
In the docs I understood that the size parameter decides the number of entries: "size - number specifying the size of hash table. The number of entries in the table is 2^size "
2009/8/26 Daniel-Constantin Mierla miconda@gmail.com:
On 26.08.2009 14:38 Uhr, Alex Balashov wrote:
Daniel-Constantin Mierla wrote:
Hi,
On 26.08.2009 14:26 Uhr, catalina oancea wrote:
Hello
I use the htable module a lot but the only problem, when I add a new entry in a htable, is when I will delete it. My question is: if the hash table is completely filled and I try to add a new value to it, do I get an error or is an old value automatically deleted to be able to write my new value? If an old value was automatically deleted whenever a new value is added, I wouldn't have to bother deleting the values I no longer need.
the are deleted only if you have auto-expire set for htable -- see readme for defining htables.
A hash table is filled when no more shm is available, it is better not to get there since not much will work at that time.
There is no way to manually delete a key->value in a bucket?
it is (was there from first day):
$sht(a=>x) = null;
I have been talking about auto-delete.
There are options to delete by regular expression matching against key or value, see: http://kamailio.org/docs/modules/1.5.x/htable.html#id2491912
Cheers, Daniel
-- Daniel-Constantin Mierla
Kamailio (OpenSER) - Users mailing list Users@lists.kamailio.org http://lists.kamailio.org/cgi-bin/mailman/listinfo/users http://lists.openser-project.org/cgi-bin/mailman/listinfo/users
comprehensive description ... for more, see wikipedia:
http://en.wikipedia.org/wiki/Hash_table
Cheers, Daniel
On 26.08.2009 15:43 Uhr, Alex Balashov wrote:
The size of a hash table only indicates how many buckets it has, not how many entries it can hold. Entries can hash to the same bucket ID; this will result in collisions, which hash table implementations can manage. Various algorithms dealing with collision scenarios account allow the buckets to be oversubscribed, if necessary.
Typical collision management strategies are either to hang a linked list off a bucket containing all entries beyond the head that also mapped to that bucket, or some sort of mathematical approach that computes a different bucket to use at deterministic relation to the one that has collided.
Larger table sizes will - given a decent and appropriate hash algorithm - cause fewer (if any) collisions since the factors and/or divisors involved in common hash functions are more numerous. This is desirable because a collision-ridden table takes longer to search, undermining its usefulness as a data structure of O(1) search complexity.
For example, if a list of collided keys (and value pointers) is hung off a bucket, that list is searched linearly once the hash computation is run and the first value encountered is not found to be the one sought.
So, a very small table size will cause your table to degenerate into a few parallel linear structures, and that's assuming a perfect uniform distribution from the hash function and variance in keys. Larger table sizes eliminate - or mitigate - this problem.
catalina oancea wrote:
What do you mean with: "A hash table is filled when no more shm is available, it is better not to get there since not much will work at that time." ?
In the docs I understood that the size parameter decides the number of entries: "size - number specifying the size of hash table. The number of entries in the table is 2^size "
2009/8/26 Daniel-Constantin Mierla miconda@gmail.com:
On 26.08.2009 14:38 Uhr, Alex Balashov wrote:
Daniel-Constantin Mierla wrote:
Hi,
On 26.08.2009 14:26 Uhr, catalina oancea wrote:
Hello
I use the htable module a lot but the only problem, when I add a new entry in a htable, is when I will delete it. My question is: if the hash table is completely filled and I try to add a new value to it, do I get an error or is an old value automatically deleted to be able to write my new value? If an old value was automatically deleted whenever a new value is added, I wouldn't have to bother deleting the values I no longer need.
the are deleted only if you have auto-expire set for htable -- see readme for defining htables.
A hash table is filled when no more shm is available, it is better not to get there since not much will work at that time.
There is no way to manually delete a key->value in a bucket?
it is (was there from first day):
$sht(a=>x) = null;
I have been talking about auto-delete.
There are options to delete by regular expression matching against key or value, see: http://kamailio.org/docs/modules/1.5.x/htable.html#id2491912
Cheers, Daniel
-- Daniel-Constantin Mierla
Kamailio (OpenSER) - Users mailing list Users@lists.kamailio.org http://lists.kamailio.org/cgi-bin/mailman/listinfo/users http://lists.openser-project.org/cgi-bin/mailman/listinfo/users
Hey Daniel,
Another feature idea:
Have a way to serialise a hash table and pass it around between elements condensed into a header value. It can then be deserialised and resurrected into a hash table on the other side.
I have a setup I run that involves 3 Kamailio proxies that perform different parts of the logic and have to constantly use 5-6 custom headers to shuttle data around. It is very annoying and messy to manage all this.
What do you think?
-- Alex
Daniel-Constantin Mierla wrote:
comprehensive description ... for more, see wikipedia:
http://en.wikipedia.org/wiki/Hash_table
Cheers, Daniel
On 26.08.2009 15:43 Uhr, Alex Balashov wrote:
The size of a hash table only indicates how many buckets it has, not how many entries it can hold. Entries can hash to the same bucket ID; this will result in collisions, which hash table implementations can manage. Various algorithms dealing with collision scenarios account allow the buckets to be oversubscribed, if necessary.
Typical collision management strategies are either to hang a linked list off a bucket containing all entries beyond the head that also mapped to that bucket, or some sort of mathematical approach that computes a different bucket to use at deterministic relation to the one that has collided.
Larger table sizes will - given a decent and appropriate hash algorithm - cause fewer (if any) collisions since the factors and/or divisors involved in common hash functions are more numerous. This is desirable because a collision-ridden table takes longer to search, undermining its usefulness as a data structure of O(1) search complexity.
For example, if a list of collided keys (and value pointers) is hung off a bucket, that list is searched linearly once the hash computation is run and the first value encountered is not found to be the one sought.
So, a very small table size will cause your table to degenerate into a few parallel linear structures, and that's assuming a perfect uniform distribution from the hash function and variance in keys. Larger table sizes eliminate - or mitigate - this problem.
catalina oancea wrote:
What do you mean with: "A hash table is filled when no more shm is available, it is better not to get there since not much will work at that time." ?
In the docs I understood that the size parameter decides the number of entries: "size - number specifying the size of hash table. The number of entries in the table is 2^size "
2009/8/26 Daniel-Constantin Mierla miconda@gmail.com:
On 26.08.2009 14:38 Uhr, Alex Balashov wrote:
Daniel-Constantin Mierla wrote:
Hi,
On 26.08.2009 14:26 Uhr, catalina oancea wrote: > Hello > > I use the htable module a lot but the only problem, when I add a new > entry in a htable, is when I will delete it. My question is: if the > hash table is completely filled and I try to add a new value to > it, do > I get an error or is an old value automatically deleted to be > able to > write my new value? If an old value was automatically deleted > whenever > a new value is added, I wouldn't have to bother deleting the > values I > no longer need. > the are deleted only if you have auto-expire set for htable -- see readme for defining htables.
A hash table is filled when no more shm is available, it is better not to get there since not much will work at that time.
There is no way to manually delete a key->value in a bucket?
it is (was there from first day):
$sht(a=>x) = null;
I have been talking about auto-delete.
There are options to delete by regular expression matching against key or value, see: http://kamailio.org/docs/modules/1.5.x/htable.html#id2491912
Cheers, Daniel
-- Daniel-Constantin Mierla
Kamailio (OpenSER) - Users mailing list Users@lists.kamailio.org http://lists.kamailio.org/cgi-bin/mailman/listinfo/users http://lists.openser-project.org/cgi-bin/mailman/listinfo/users
Oh, and dynamically instantiating hash tables without having to enumerate them statically in modparam("htable", "htable", ...) would be a huge plus. :-)
Alex Balashov wrote:
Hey Daniel,
Another feature idea:
Have a way to serialise a hash table and pass it around between elements condensed into a header value. It can then be deserialised and resurrected into a hash table on the other side.
I have a setup I run that involves 3 Kamailio proxies that perform different parts of the logic and have to constantly use 5-6 custom headers to shuttle data around. It is very annoying and messy to manage all this.
What do you think?
-- Alex
Daniel-Constantin Mierla wrote:
comprehensive description ... for more, see wikipedia:
http://en.wikipedia.org/wiki/Hash_table
Cheers, Daniel
On 26.08.2009 15:43 Uhr, Alex Balashov wrote:
The size of a hash table only indicates how many buckets it has, not how many entries it can hold. Entries can hash to the same bucket ID; this will result in collisions, which hash table implementations can manage. Various algorithms dealing with collision scenarios account allow the buckets to be oversubscribed, if necessary.
Typical collision management strategies are either to hang a linked list off a bucket containing all entries beyond the head that also mapped to that bucket, or some sort of mathematical approach that computes a different bucket to use at deterministic relation to the one that has collided.
Larger table sizes will - given a decent and appropriate hash algorithm - cause fewer (if any) collisions since the factors and/or divisors involved in common hash functions are more numerous. This is desirable because a collision-ridden table takes longer to search, undermining its usefulness as a data structure of O(1) search complexity.
For example, if a list of collided keys (and value pointers) is hung off a bucket, that list is searched linearly once the hash computation is run and the first value encountered is not found to be the one sought.
So, a very small table size will cause your table to degenerate into a few parallel linear structures, and that's assuming a perfect uniform distribution from the hash function and variance in keys. Larger table sizes eliminate - or mitigate - this problem.
catalina oancea wrote:
What do you mean with: "A hash table is filled when no more shm is available, it is better not to get there since not much will work at that time." ?
In the docs I understood that the size parameter decides the number of entries: "size - number specifying the size of hash table. The number of entries in the table is 2^size "
2009/8/26 Daniel-Constantin Mierla miconda@gmail.com:
On 26.08.2009 14:38 Uhr, Alex Balashov wrote:
Daniel-Constantin Mierla wrote: > Hi, > > On 26.08.2009 14:26 Uhr, catalina oancea wrote: >> Hello >> >> I use the htable module a lot but the only problem, when I add a >> new >> entry in a htable, is when I will delete it. My question is: if the >> hash table is completely filled and I try to add a new value to >> it, do >> I get an error or is an old value automatically deleted to be >> able to >> write my new value? If an old value was automatically deleted >> whenever >> a new value is added, I wouldn't have to bother deleting the >> values I >> no longer need. >> > the are deleted only if you have auto-expire set for htable -- > see readme > for defining htables. > > A hash table is filled when no more shm is available, it is > better not to > get there since not much will work at that time. There is no way to manually delete a key->value in a bucket?
it is (was there from first day):
$sht(a=>x) = null;
I have been talking about auto-delete.
There are options to delete by regular expression matching against key or value, see: http://kamailio.org/docs/modules/1.5.x/htable.html#id2491912
Cheers, Daniel
-- Daniel-Constantin Mierla
Kamailio (OpenSER) - Users mailing list Users@lists.kamailio.org http://lists.kamailio.org/cgi-bin/mailman/listinfo/users http://lists.openser-project.org/cgi-bin/mailman/listinfo/users
That seems like an interesting idea, Alex.
I still have problems understanding the size parameter. For example what exactly does size=4 mean?
2009/8/26 Alex Balashov abalashov@evaristesys.com:
Oh, and dynamically instantiating hash tables without having to enumerate them statically in modparam("htable", "htable", ...) would be a huge plus. :-)
Alex Balashov wrote:
Hey Daniel,
Another feature idea:
Have a way to serialise a hash table and pass it around between elements condensed into a header value. It can then be deserialised and resurrected into a hash table on the other side.
I have a setup I run that involves 3 Kamailio proxies that perform different parts of the logic and have to constantly use 5-6 custom headers to shuttle data around. It is very annoying and messy to manage all this.
What do you think?
-- Alex
Daniel-Constantin Mierla wrote:
comprehensive description ... for more, see wikipedia:
http://en.wikipedia.org/wiki/Hash_table
Cheers, Daniel
On 26.08.2009 15:43 Uhr, Alex Balashov wrote:
The size of a hash table only indicates how many buckets it has, not how many entries it can hold. Entries can hash to the same bucket ID; this will result in collisions, which hash table implementations can manage. Various algorithms dealing with collision scenarios account allow the buckets to be oversubscribed, if necessary.
Typical collision management strategies are either to hang a linked list off a bucket containing all entries beyond the head that also mapped to that bucket, or some sort of mathematical approach that computes a different bucket to use at deterministic relation to the one that has collided.
Larger table sizes will - given a decent and appropriate hash algorithm
- cause fewer (if any) collisions since the factors and/or divisors involved
in common hash functions are more numerous. This is desirable because a collision-ridden table takes longer to search, undermining its usefulness as a data structure of O(1) search complexity.
For example, if a list of collided keys (and value pointers) is hung off a bucket, that list is searched linearly once the hash computation is run and the first value encountered is not found to be the one sought.
So, a very small table size will cause your table to degenerate into a few parallel linear structures, and that's assuming a perfect uniform distribution from the hash function and variance in keys. Larger table sizes eliminate - or mitigate - this problem.
catalina oancea wrote:
What do you mean with: "A hash table is filled when no more shm is available, it is better not to get there since not much will work at that time." ?
In the docs I understood that the size parameter decides the number of entries: "size - number specifying the size of hash table. The number of entries in the table is 2^size "
2009/8/26 Daniel-Constantin Mierla miconda@gmail.com:
On 26.08.2009 14:38 Uhr, Alex Balashov wrote: > > Daniel-Constantin Mierla wrote: >> >> Hi, >> >> On 26.08.2009 14:26 Uhr, catalina oancea wrote: >>> >>> Hello >>> >>> I use the htable module a lot but the only problem, when I add a >>> new >>> entry in a htable, is when I will delete it. My question is: if the >>> hash table is completely filled and I try to add a new value to it, >>> do >>> I get an error or is an old value automatically deleted to be able >>> to >>> write my new value? If an old value was automatically deleted >>> whenever >>> a new value is added, I wouldn't have to bother deleting the values >>> I >>> no longer need. >>> >> the are deleted only if you have auto-expire set for htable -- see >> readme >> for defining htables. >> >> A hash table is filled when no more shm is available, it is better >> not to >> get there since not much will work at that time. > > There is no way to manually delete a key->value in a bucket?
it is (was there from first day):
$sht(a=>x) = null;
I have been talking about auto-delete.
There are options to delete by regular expression matching against key or value, see: http://kamailio.org/docs/modules/1.5.x/htable.html#id2491912
Cheers, Daniel
-- Daniel-Constantin Mierla
Kamailio (OpenSER) - Users mailing list Users@lists.kamailio.org http://lists.kamailio.org/cgi-bin/mailman/listinfo/users http://lists.openser-project.org/cgi-bin/mailman/listinfo/users
-- Alex Balashov - Principal Evariste Systems Web : http://www.evaristesys.com/ Tel : (+1) (678) 954-0670 Direct : (+1) (678) 954-0671
Kamailio (OpenSER) - Users mailing list Users@lists.kamailio.org http://lists.kamailio.org/cgi-bin/mailman/listinfo/users http://lists.openser-project.org/cgi-bin/mailman/listinfo/users
catalina oancea wrote:
That seems like an interesting idea, Alex.
I still have problems understanding the size parameter. For example what exactly does size=4 mean?
So, imagine a two-dimensional array. An array of size 4 would mean a 2x2 array. Or, if you prefer, a one-dimensional array of 4 elements. But "tables" are easier to visualise as a grid.
Every time a value is inserted,
location = hash_function(key, table_size); hash_table[location] = value;
happens. The hash_function() is structured in such a way that it will produce a deterministic result; the same key will always result in the same hash. It also is structured to never return a hash value that is greater than the size of the table, as that would result in an array overflow.
The size is the size of the set of all possible values hash_function() can produce. How the hash function works exactly depends on the data and there are many strategies. Good hash functions achieve as much of an "optimal" distribution as possible; with size=4, the aim is to have about 25% of the stored values go in hash_table[0], 25% in hash_table[1], 25% in [2], and 25% in [3].
How exactly to achieve this best is a question I leave to the mathematicians; there are many hash algorithms one can use. Of course, whether the result is optimal or not also depends on the relationship of the hash algorithm to the variation of the data. A very basic and common one is used for introductory teaching is to sum up the ASCII values of the key string and divide them by the size of the table and use the remainder as the hash value, e.g.
char key[] = "catalina"; int sum = 0;
for(int i = 0; i < sizeof(key); i ++) sum += (int) i;
return (sum % HASH_TABLE_SIZE);
Of course, this is a rather naive hash function and more sophisticated ones are available.
Anyway, the point is that if you have a table with a size of 4, the most optimal, best-case scenario is that you will have the first 4 insertions map to a unique value among those 4.
Anything inserted AFTER that will *necessarily* result in a collision - AKA there is another element in that bucket, and it is necessary to chain the elements and/or use some sort of linear approach. This increases the computational complexity of the search and removes the speed advantage which is the point of using a hash table as a data structure in the first place:
struct value_data *hash_lookup(table_t **tbl, char *key) { int idx = hash_lookup(tbl->size, key); struct bucket_node *scan = NULL;
if(tbl->buckets[idx] == NULL) return NULL;
for(scan = tbl->buckets; scan; scan = scan->next) { if(!strcmp(scan->key, key)) return scan->value; }
return NULL; }
With a larger table, you get the advantage of having more mathematically possible unique buckets for the data to go into, thus reducing or eliminating the need for a linear search. If you try to stuff 200 values into 4 buckets, you're going to get a best-case performance of about O(N/4), or O(50). With a table size > 200, you'll get a best-case performance of O(1), which is the point of a hash table.
All associative arrays and "maps" in various high-level languages, e.g. Perl
my %hash = ( 'key' => 'value' );
and PHP:
my $arr = array( 'key' => 'value' );
and Java:
HashMap<String, String> m = new HashMap<String, String>(); m.put("Catalina", "Oancea");
... work on this principle, but dynamically resize themselves in the background in accordance with the runtime population of the structure.
The difference is that in the Kamailio 'htable' module, you must specify a fixed size at initialisation.
-- Alex
just to be sure everyone got it right regarding the size attribute in htable parameter ... the number of slots for hash table is 2^size (2 power size). So, size=4 means that the hash table has 16 slots.
Reason for that is the speed in getting the slot number - modulo function is very fast in such cases, the slot number being actually a bitwise operation: hash_value & (2^size - 1) --- 2^size is cached at startup.
Cheers, Daniel
On 26.08.2009 16:58 Uhr, Alex Balashov wrote:
catalina oancea wrote:
That seems like an interesting idea, Alex.
I still have problems understanding the size parameter. For example what exactly does size=4 mean?
So, imagine a two-dimensional array. An array of size 4 would mean a 2x2 array. Or, if you prefer, a one-dimensional array of 4 elements. But "tables" are easier to visualise as a grid.
Every time a value is inserted,
location = hash_function(key, table_size); hash_table[location] = value;
happens. The hash_function() is structured in such a way that it will produce a deterministic result; the same key will always result in the same hash. It also is structured to never return a hash value that is greater than the size of the table, as that would result in an array overflow.
The size is the size of the set of all possible values hash_function() can produce. How the hash function works exactly depends on the data and there are many strategies. Good hash functions achieve as much of an "optimal" distribution as possible; with size=4, the aim is to have about 25% of the stored values go in hash_table[0], 25% in hash_table[1], 25% in [2], and 25% in [3].
How exactly to achieve this best is a question I leave to the mathematicians; there are many hash algorithms one can use. Of course, whether the result is optimal or not also depends on the relationship of the hash algorithm to the variation of the data. A very basic and common one is used for introductory teaching is to sum up the ASCII values of the key string and divide them by the size of the table and use the remainder as the hash value, e.g.
char key[] = "catalina"; int sum = 0;
for(int i = 0; i < sizeof(key); i ++) sum += (int) i;
return (sum % HASH_TABLE_SIZE);
Of course, this is a rather naive hash function and more sophisticated ones are available.
Anyway, the point is that if you have a table with a size of 4, the most optimal, best-case scenario is that you will have the first 4 insertions map to a unique value among those 4.
Anything inserted AFTER that will *necessarily* result in a collision
- AKA there is another element in that bucket, and it is necessary to
chain the elements and/or use some sort of linear approach. This increases the computational complexity of the search and removes the speed advantage which is the point of using a hash table as a data structure in the first place:
struct value_data *hash_lookup(table_t **tbl, char *key) { int idx = hash_lookup(tbl->size, key); struct bucket_node *scan = NULL;
if(tbl->buckets[idx] == NULL) return NULL; for(scan = tbl->buckets; scan; scan = scan->next) { if(!strcmp(scan->key, key)) return scan->value; } return NULL;
}
With a larger table, you get the advantage of having more mathematically possible unique buckets for the data to go into, thus reducing or eliminating the need for a linear search. If you try to stuff 200 values into 4 buckets, you're going to get a best-case performance of about O(N/4), or O(50). With a table size > 200, you'll get a best-case performance of O(1), which is the point of a hash table.
All associative arrays and "maps" in various high-level languages, e.g. Perl
my %hash = ( 'key' => 'value' );
and PHP:
my $arr = array( 'key' => 'value' );
and Java:
HashMap<String, String> m = new HashMap<String, String>(); m.put("Catalina", "Oancea");
... work on this principle, but dynamically resize themselves in the background in accordance with the runtime population of the structure.
The difference is that in the Kamailio 'htable' module, you must specify a fixed size at initialisation.
-- Alex
Daniel-Constantin Mierla wrote:
just to be sure everyone got it right regarding the size attribute in htable parameter ... the number of slots for hash table is 2^size (2 power size). So, size=4 means that the hash table has 16 slots.
Reason for that is the speed in getting the slot number - modulo function is very fast in such cases, the slot number being actually a bitwise operation: hash_value & (2^size - 1) --- 2^size is cached at startup.
Thanks, that was a much-needed clarification omitted from my explanation.
I have always been told that table sizes that are prime numbers result in better distribution of factors due to the way modulo works, although I am not a mathematician and don't have a ready understanding of why. Is this true? Is there some reason you did not take this approach? Is hash_value & 2(^size - 1) equally fast?
On 26.08.2009 17:14 Uhr, Alex Balashov wrote:
Daniel-Constantin Mierla wrote:
just to be sure everyone got it right regarding the size attribute in htable parameter ... the number of slots for hash table is 2^size (2 power size). So, size=4 means that the hash table has 16 slots.
Reason for that is the speed in getting the slot number - modulo function is very fast in such cases, the slot number being actually a bitwise operation: hash_value & (2^size - 1) --- 2^size is cached at startup.
Thanks, that was a much-needed clarification omitted from my explanation.
I have always been told that table sizes that are prime numbers result in better distribution of factors due to the way modulo works, although I am not a mathematician and don't have a ready understanding of why. Is this true? Is there some reason you did not take this approach? Is hash_value & 2(^size - 1) equally fast?
not a mathematician myself and havent tried the approach with prime numbers. The fair distribution is (attempted to be) ensured by the hash functions. They exist for long time in ser/openser, iirc, Andrei Pelinescu implemented them (with some improvements in SR) and he tested to get a fair distribution for strings, SIP addresses (being used in location table as well).
I did tests as well, mainly for location and under heavy load the (max - min) was in average less than 10% of max, which is good.
Cheers, Daniel
Daniel-Constantin Mierla wrote:
On 26.08.2009 17:14 Uhr, Alex Balashov wrote:
Daniel-Constantin Mierla wrote:
just to be sure everyone got it right regarding the size attribute in htable parameter ... the number of slots for hash table is 2^size (2 power size). So, size=4 means that the hash table has 16 slots.
Reason for that is the speed in getting the slot number - modulo function is very fast in such cases, the slot number being actually a bitwise operation: hash_value & (2^size - 1) --- 2^size is cached at startup.
Thanks, that was a much-needed clarification omitted from my explanation.
I have always been told that table sizes that are prime numbers result in better distribution of factors due to the way modulo works, although I am not a mathematician and don't have a ready understanding of why. Is this true? Is there some reason you did not take this approach? Is hash_value & 2(^size - 1) equally fast?
not a mathematician myself and havent tried the approach with prime numbers. The fair distribution is (attempted to be) ensured by the hash functions. They exist for long time in ser/openser, iirc, Andrei Pelinescu implemented them (with some improvements in SR) and he tested to get a fair distribution for strings, SIP addresses (being used in location table as well).
I did tests as well, mainly for location and under heavy load the (max - min) was in average less than 10% of max, which is good.
I suppose the key is, as always, a hash function that produces radically different results even for very cosmetically similar data, much like MD5 /CRC32/etc. does.
On 26.08.2009 16:32 Uhr, Alex Balashov wrote:
Oh, and dynamically instantiating hash tables without having to enumerate them statically in modparam("htable", "htable", ...) would be a huge plus. :-)
could be possible to do this way, but the number of slots has to be initialized at startup. Unlike in other cases, here the hash table is in shared memory, each slot being protected by a mutex to prevent access conflicts.
Might be better to create a issue on the tracker collecting these ideas -- as time allows, somebody may contribute.
Cheers, Daniel
Alex Balashov wrote:
Hey Daniel,
Another feature idea:
Have a way to serialise a hash table and pass it around between elements condensed into a header value. It can then be deserialised and resurrected into a hash table on the other side.
I have a setup I run that involves 3 Kamailio proxies that perform different parts of the logic and have to constantly use 5-6 custom headers to shuttle data around. It is very annoying and messy to manage all this.
What do you think?
-- Alex
Daniel-Constantin Mierla wrote:
comprehensive description ... for more, see wikipedia:
http://en.wikipedia.org/wiki/Hash_table
Cheers, Daniel
On 26.08.2009 15:43 Uhr, Alex Balashov wrote:
The size of a hash table only indicates how many buckets it has, not how many entries it can hold. Entries can hash to the same bucket ID; this will result in collisions, which hash table implementations can manage. Various algorithms dealing with collision scenarios account allow the buckets to be oversubscribed, if necessary.
Typical collision management strategies are either to hang a linked list off a bucket containing all entries beyond the head that also mapped to that bucket, or some sort of mathematical approach that computes a different bucket to use at deterministic relation to the one that has collided.
Larger table sizes will - given a decent and appropriate hash algorithm - cause fewer (if any) collisions since the factors and/or divisors involved in common hash functions are more numerous. This is desirable because a collision-ridden table takes longer to search, undermining its usefulness as a data structure of O(1) search complexity.
For example, if a list of collided keys (and value pointers) is hung off a bucket, that list is searched linearly once the hash computation is run and the first value encountered is not found to be the one sought.
So, a very small table size will cause your table to degenerate into a few parallel linear structures, and that's assuming a perfect uniform distribution from the hash function and variance in keys. Larger table sizes eliminate - or mitigate - this problem.
catalina oancea wrote:
What do you mean with: "A hash table is filled when no more shm is available, it is better not to get there since not much will work at that time." ?
In the docs I understood that the size parameter decides the number of entries: "size - number specifying the size of hash table. The number of entries in the table is 2^size "
2009/8/26 Daniel-Constantin Mierla miconda@gmail.com:
On 26.08.2009 14:38 Uhr, Alex Balashov wrote: > Daniel-Constantin Mierla wrote: >> Hi, >> >> On 26.08.2009 14:26 Uhr, catalina oancea wrote: >>> Hello >>> >>> I use the htable module a lot but the only problem, when I add >>> a new >>> entry in a htable, is when I will delete it. My question is: >>> if the >>> hash table is completely filled and I try to add a new value >>> to it, do >>> I get an error or is an old value automatically deleted to be >>> able to >>> write my new value? If an old value was automatically deleted >>> whenever >>> a new value is added, I wouldn't have to bother deleting the >>> values I >>> no longer need. >>> >> the are deleted only if you have auto-expire set for htable -- >> see readme >> for defining htables. >> >> A hash table is filled when no more shm is available, it is >> better not to >> get there since not much will work at that time. > There is no way to manually delete a key->value in a bucket? it is (was there from first day):
$sht(a=>x) = null;
I have been talking about auto-delete.
There are options to delete by regular expression matching against key or value, see: http://kamailio.org/docs/modules/1.5.x/htable.html#id2491912
Cheers, Daniel
-- Daniel-Constantin Mierla
Kamailio (OpenSER) - Users mailing list Users@lists.kamailio.org http://lists.kamailio.org/cgi-bin/mailman/listinfo/users http://lists.openser-project.org/cgi-bin/mailman/listinfo/users
Daniel-Constantin Mierla wrote:
could be possible to do this way, but the number of slots has to be initialized at startup. Unlike in other cases, here the hash table is in shared memory, each slot being protected by a mutex to prevent access conflicts.
While it would be nice to have the table dynamically resize itself at runtime, it is not necessary. All I was suggesting was a wrapper around the process of iterating through a serialised data structure and re-hashing the keys=>values. The table can still be fixed-size. It's really just serialisation/deserialisation that is the essence of the request.
Setting the size on the deserialising end in a way appropriate to the required performance characteristics can still be up to the user.
Might be better to create a issue on the tracker collecting these ideas -- as time allows, somebody may contribute.
Will do. Is the htable code fairly simple? I might contribute.
On 26.08.2009 17:11 Uhr, Alex Balashov wrote:
Daniel-Constantin Mierla wrote:
could be possible to do this way, but the number of slots has to be initialized at startup. Unlike in other cases, here the hash table is in shared memory, each slot being protected by a mutex to prevent access conflicts.
While it would be nice to have the table dynamically resize itself at runtime, it is not necessary.
it is what I wanted to say is not possible - I read ahead your email where you said that Java/etc. does not need to specify the size of hash table and does resizing dynamically.
Cheers, Daniel
All I was suggesting was a wrapper around the process of iterating through a serialised data structure and re-hashing the keys=>values. The table can still be fixed-size. It's really just serialisation/deserialisation that is the essence of the request.
Setting the size on the deserialising end in a way appropriate to the required performance characteristics can still be up to the user.
Might be better to create a issue on the tracker collecting these ideas -- as time allows, somebody may contribute.
Will do. Is the htable code fairly simple? I might contribute.