[sr-dev] str_str function

Vicente Hernando vhernando at systemonenoc.com
Tue Jun 12 13:05:18 CEST 2012


Hello,

str_str function searches for a substring into a bigger string.

str_str function in lib/cds/sstr.c file has a comment: /* FIXME: 
reimplement using better algorithm */

Attached patch reimplements that function using 
Boyer-Moore-Horspool-Raita algorithm, which is the one most text editors 
use to search for substrings.

Boyer-Moore-Horspool-Raita algorithm is a simplification from 
Boyer-Moore one, with less preprocessing overhead.


If n is the string length, and m is substring length, naive brute force 
algorithm has a cost of O(n*m).

Raita algoritm has same cost in worst case but Ω(n/m) a better average cost.
As a drawback it adds a preprocessing cost of O(m + charset_length) 
where charset_length is 256 for us, because we use char.


Another way to approach this substring search issue would be separate it 
into two functions, a preprocessing one, and a searching one. Then in 
module initializacion where the function is used, calculate the needed 
preprocess and when the function is called do the search only.


This Raita algorithm could also be useful for str_search function in 
ut.c. Its usefulness depends on n, and m lengths, the bigger they are 
the better the algoritm works.

http://en.wikipedia.org/wiki/String_searching_algorithm
http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore%E2%80%93Horspool_algorithm
http://www-igm.univ-mlv.fr/~lecroq/string/node22.html

Regards,
Vicente.




-------------- next part --------------
A non-text attachment was scrubbed...
Name: str_str.patch
Type: text/x-diff
Size: 1692 bytes
Desc: not available
URL: <http://lists.sip-router.org/pipermail/sr-dev/attachments/20120612/4ce614e8/attachment.patch>


More information about the sr-dev mailing list