[Kamailio-Users] What happens with filled htables?

Daniel-Constantin Mierla miconda at gmail.com
Wed Aug 26 16:11:34 CEST 2009


just to be sure everyone got it right regarding the size attribute in 
htable parameter ... the number of slots for hash table is 2^size (2 
power size). So, size=4 means that the hash table has 16 slots.

Reason for that is the speed in getting the slot number - modulo 
function is very fast in such cases, the slot number being actually a 
bitwise operation: hash_value & (2^size - 1)
--- 2^size is cached at startup.

Cheers,
Daniel


On 26.08.2009 16:58 Uhr, Alex Balashov wrote:
> catalina oancea wrote:
>
>> That seems like an interesting idea, Alex.
>>
>> I still have problems understanding the size parameter. For example
>> what exactly does size=4 mean?
>
> So, imagine a two-dimensional array.  An array of size 4 would mean a 
> 2x2 array.  Or, if you prefer, a one-dimensional array of 4 elements. 
> But "tables" are easier to visualise as a grid.
>
> Every time a value is inserted,
>
>    location = hash_function(key, table_size);
>    hash_table[location] = value;
>
> happens.  The hash_function() is structured in such a way that it will 
> produce a deterministic result;  the same key will always result in 
> the same hash.  It also is structured to never return a hash value 
> that is greater than the size of the table, as that would result in an 
> array overflow.
>
> The size is the size of the set of all possible values hash_function() 
> can produce.  How the hash function works exactly depends on the data 
> and there are many strategies.  Good hash functions achieve as much of 
> an "optimal" distribution as possible;  with size=4, the aim is to 
> have about 25% of the stored values go in hash_table[0], 25% in 
> hash_table[1], 25% in [2], and 25% in [3].
>
> How exactly to achieve this best is a question I leave to the 
> mathematicians;  there are many hash algorithms one can use.  Of 
> course, whether the result is optimal or not also depends on the 
> relationship of the hash algorithm to the variation of the data.  A 
> very basic and common one is used for introductory teaching is to sum 
> up the ASCII values of the key string and divide them by the size of 
> the table and use the remainder as the hash value, e.g.
>
>    char       key[] = "catalina";
>    int     sum = 0;
>
>    for(int i = 0; i < sizeof(key); i ++)
>            sum += (int) i;
>
>    return (sum % HASH_TABLE_SIZE);
>
> Of course, this is a rather naive hash function and more sophisticated 
> ones are available.
>
> Anyway, the point is that if you have a table with a size of 4, the 
> most optimal, best-case scenario is that you will have the first 4 
> insertions map to a unique value among those 4.
>
> Anything inserted AFTER that will *necessarily* result in a collision 
> - AKA there is another element in that bucket, and it is necessary to 
> chain the elements and/or use some sort of linear approach.  This 
> increases the computational complexity of the search and removes the 
> speed advantage which is the point of using a hash table as a data 
> structure in the first place:
>
>    struct value_data   *hash_lookup(table_t **tbl, char *key) {
>            int idx = hash_lookup(tbl->size, key);
>            struct bucket_node *scan = NULL;
>
>            if(tbl->buckets[idx] == NULL)
>                    return NULL;
>
>            for(scan = tbl->buckets; scan; scan = scan->next) {
>                    if(!strcmp(scan->key, key))
>                            return scan->value;
>            }
>
>            return NULL;
>    }
>
> With a larger table, you get the advantage of having more 
> mathematically possible unique buckets for the data to go into, thus 
> reducing or eliminating the need for a linear search.  If you try to 
> stuff 200 values into 4 buckets, you're going to get a best-case 
> performance of about O(N/4), or O(50).  With a table size > 200, 
> you'll get a best-case performance of O(1), which is the point of a 
> hash table.
>
> All associative arrays and "maps" in various high-level languages, 
> e.g. Perl
>
>    my    %hash = (
>     'key' => 'value'
>    );
>
> and PHP:
>
>    my  $arr = array(
>         'key' => 'value'
>    );
>
> and Java:
>
>    HashMap<String, String> m = new HashMap<String, String>();
>    m.put("Catalina", "Oancea");
>
> ... work on this principle, but dynamically resize themselves in the 
> background in accordance with the runtime population of the structure.
>
> The difference is that in the Kamailio 'htable' module, you must 
> specify a fixed size at initialisation.
>
> -- Alex
>

-- 
Daniel-Constantin Mierla
* http://www.asipto.com/




More information about the Users mailing list