Juha Heinanen wrote:
Alex Balashov writes:
What about some sort of hashing algorithm that operates on regular expression sums or is otherwise tuned by regex elements?
the problem is: given a string, find from set of prefixes the longest one that matches the string. if prefixes are known strings, they can be organized in a data structure (tree, hash table, etc), where matching is fast to execute.
if prefixes are regular expressions, i don't know any other solution than linearly match them one by one to string.
The solution, I imagine, would be to devise a tree or hash structure wherein certain generalisable and atomic elements of regular expression parsing are used to place the keys.
That said, I do not know what such an algorithm would be, cannot point to any concrete implementations as examples, etc. and probably ought to shut up. :)
Besides, I frequently make the erroneous assumption that such data structures and optimisations exist where they don't, or at least are not implemented. For instance, I was recently party to a thread on a list for network service providers using Cisco gear in which it was revealed to me, contrary to what I have characteristically and habitually assumed, that ordinary access lists are actually evaluated in a linear way; thus, it makes sense to place the most frequently-matched rules near the top of the list if it is long.
I had no idea. I just took for granted that of course the matching is done using some sort of compiled tree or hash structure that gives matching performance somewhere between O(1) and O(N).
So, don't listen to me. :)
Cheers,