[Kamailio-Users] What happens with filled htables?
Daniel-Constantin Mierla
miconda at gmail.com
Wed Aug 26 16:11:34 CEST 2009
just to be sure everyone got it right regarding the size attribute in
htable parameter ... the number of slots for hash table is 2^size (2
power size). So, size=4 means that the hash table has 16 slots.
Reason for that is the speed in getting the slot number - modulo
function is very fast in such cases, the slot number being actually a
bitwise operation: hash_value & (2^size - 1)
--- 2^size is cached at startup.
Cheers,
Daniel
On 26.08.2009 16:58 Uhr, Alex Balashov wrote:
> catalina oancea wrote:
>
>> That seems like an interesting idea, Alex.
>>
>> I still have problems understanding the size parameter. For example
>> what exactly does size=4 mean?
>
> So, imagine a two-dimensional array. An array of size 4 would mean a
> 2x2 array. Or, if you prefer, a one-dimensional array of 4 elements.
> But "tables" are easier to visualise as a grid.
>
> Every time a value is inserted,
>
> location = hash_function(key, table_size);
> hash_table[location] = value;
>
> happens. The hash_function() is structured in such a way that it will
> produce a deterministic result; the same key will always result in
> the same hash. It also is structured to never return a hash value
> that is greater than the size of the table, as that would result in an
> array overflow.
>
> The size is the size of the set of all possible values hash_function()
> can produce. How the hash function works exactly depends on the data
> and there are many strategies. Good hash functions achieve as much of
> an "optimal" distribution as possible; with size=4, the aim is to
> have about 25% of the stored values go in hash_table[0], 25% in
> hash_table[1], 25% in [2], and 25% in [3].
>
> How exactly to achieve this best is a question I leave to the
> mathematicians; there are many hash algorithms one can use. Of
> course, whether the result is optimal or not also depends on the
> relationship of the hash algorithm to the variation of the data. A
> very basic and common one is used for introductory teaching is to sum
> up the ASCII values of the key string and divide them by the size of
> the table and use the remainder as the hash value, e.g.
>
> char key[] = "catalina";
> int sum = 0;
>
> for(int i = 0; i < sizeof(key); i ++)
> sum += (int) i;
>
> return (sum % HASH_TABLE_SIZE);
>
> Of course, this is a rather naive hash function and more sophisticated
> ones are available.
>
> Anyway, the point is that if you have a table with a size of 4, the
> most optimal, best-case scenario is that you will have the first 4
> insertions map to a unique value among those 4.
>
> Anything inserted AFTER that will *necessarily* result in a collision
> - AKA there is another element in that bucket, and it is necessary to
> chain the elements and/or use some sort of linear approach. This
> increases the computational complexity of the search and removes the
> speed advantage which is the point of using a hash table as a data
> structure in the first place:
>
> struct value_data *hash_lookup(table_t **tbl, char *key) {
> int idx = hash_lookup(tbl->size, key);
> struct bucket_node *scan = NULL;
>
> if(tbl->buckets[idx] == NULL)
> return NULL;
>
> for(scan = tbl->buckets; scan; scan = scan->next) {
> if(!strcmp(scan->key, key))
> return scan->value;
> }
>
> return NULL;
> }
>
> With a larger table, you get the advantage of having more
> mathematically possible unique buckets for the data to go into, thus
> reducing or eliminating the need for a linear search. If you try to
> stuff 200 values into 4 buckets, you're going to get a best-case
> performance of about O(N/4), or O(50). With a table size > 200,
> you'll get a best-case performance of O(1), which is the point of a
> hash table.
>
> All associative arrays and "maps" in various high-level languages,
> e.g. Perl
>
> my %hash = (
> 'key' => 'value'
> );
>
> and PHP:
>
> my $arr = array(
> 'key' => 'value'
> );
>
> and Java:
>
> HashMap<String, String> m = new HashMap<String, String>();
> m.put("Catalina", "Oancea");
>
> ... work on this principle, but dynamically resize themselves in the
> background in accordance with the runtime population of the structure.
>
> The difference is that in the Kamailio 'htable' module, you must
> specify a fixed size at initialisation.
>
> -- Alex
>
--
Daniel-Constantin Mierla
* http://www.asipto.com/
More information about the Users
mailing list