Hello Henning,
On 06/13/2012 03:10 PM, Henning Westerholt wrote:
Am Dienstag, 12. Juni 2012, 13:05:18 schrieb Vicente
Hernando:
str_str function in lib/cds/sstr.c file has a
comment: /* FIXME:
reimplement using better algorithm */
Attached patch reimplements that function using
Boyer-Moore-Horspool-Raita algorithm, which is the one most text editors
use to search for substrings.
Boyer-Moore-Horspool-Raita algorithm is a simplification from
Boyer-Moore one, with less preprocessing overhead.
If n is the string length, and m is substring length, naive brute force
algorithm has a cost of O(n*m).
Raita algoritm has same cost in worst case but Ω(n/m) a better average
cost.
As a drawback it adds a preprocessing cost of O(m + charset_length)
where charset_length is 256 for us, because we use char.
Another way to approach this substring search issue would be separate it
into two functions, a preprocessing one, and a searching one. Then in
module initializacion where the function is used, calculate the needed
preprocess and when the function is called do the search only.
This Raita algorithm could also be useful for str_search function in
ut.c. Its usefulness depends on n, and m lengths, the bigger they are
the better the algoritm works.
http://en.wikipedia.org/wiki/String_searching_algorithm
http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore%E2%80%93Horspool_algorithm
http://www-igm.univ-mlv.fr/~lecroq/string/node22.html Hallo Vincente,
thanks for your patch. One question, would it not make sense to use the strstr
function of<string.h>, with a small wrapper, which is probably quite
optimised instead of implementing it our self?
actually, I do not know. Kamailio
provides some functions that are
already available in some libraries, like isdigit, etc.
Is there any kamailio policy about what functions to use?
str_str function is used in presence_b2b SER module.
We would manage further optimization by separating substring
preprocessing and searching in the string, because we know we are always
going to search for "accepted", "terminated" and "pending".
Best regards,
Vicente
Best regards,
Henning