Hello,
Following up on this, now that master is open for changes again, here's a patch of how I'd suggest to change the behavior of how dialplan does regular expression substitution.
As a short summary, this patch is trying to achieve the following:
Up until now, the dialplan module performs RE substitution in a "break down string and construct new string" fashion, in other words RE sub-patterns are used to extract certain pieces from the original string, and then a new string is built from those pieces to replace the original string. Side effect of this is that parts of the string that the RE didn't match are discarded. While this works to perform all necessary operations for string rewriting, there's two drawbacks to this approach:
1) It's counter-intuitive to anybody who's ever used RE substitution in any other tool or language. Be it in VIM, SED, BASH, Perl, PHP, Python, you name it, they all do RE substitution as a "search and replace" operation. Whatever is matched by the RE is replaced by the replacement pattern, but anything not matched by the RE is left untouched. The dialplan module however would discard those non-matched parts.
2) There's a slight performance impact in many use cases, namely in those that only perform action on a known prefix or suffix. Most commonly, the user wants to match against a certain prefix or suffix and then either strip it out or replace it with something else, while leaving the rest of the string alone, or alternatively simply prepend or append something to the string. As it is now, the RE has to be constructed so that it always matches the whole string, with the parts of the string that are to be left alone captured in a sub-pattern. With this patch, the RE can be constructed so that it only matches the part of the string that you're interested in, meaning the PCRE engine has less work to do.
(More performance savings are possible by unifying match_exp and subst_exp, as it doesn't seem to make sense to first match on one part of the string and then perform substitution on some other part. There might be some use cases where it comes in useful, but I'd say the benefits of doing only one RE match vs. two outweighs that. But that would be another patch anyway.)
Here's a few examples of common substitution patterns (subst_exp and repl_exp pairs) and how they can be simplified with this patch:
Stripping prefix: old: "^00(.*)" -> "\1" new: "^00" -> ""
Replacing prefix: old: "^+(.*)" -> "00\1" new: "^+" -> "00"
Prepending new prefix: old: "(.*)" -> "00\1" new: "^" -> "00"
It should be noted that all "old" patterns from those examples will continue to work as they did before even with the patch applied. More generally, all patterns that are designed to always match the complete string will continue to work unchanged, providing backwards compatibility. Only patterns that make deliberate use of the side-effect of stripping out unmatched parts of the string will break, but I don't think there's a whole lot of those out there in the wild. However, there's always the possibility of adding a new module option to act as a new behavior vs. old behavior switch if desired.
As for the patch itself, it looks like it's more than it actually is, because it mostly moves things around. If you apply it and look at it with diff -b, it gives you a clearer view of what's changed.
Comments welcome.
cheers Richard
On 05/07/12 16:47, Daniel-Constantin Mierla wrote:
Hello,
On 5/7/12 8:57 PM, Juha Heinanen wrote:
Richard Fuchs writes:
I know that, but that doesn't answer my question. :) Regex substitutions work the same everywhere, s/(.)/\1/ in sed or Perl for example leaves the string unchanged. Why is the dialplan module different?
are you sure that (.) does the trick? i have used (.*) and that works ok.
indeed the .* is at least the posix standard way to match everything, '.' being for matching one single character.
Replacements in configuration file/dialplan do not use external library for substitution, only for matching (posix regexp for core/textops which is in libc and libpcre for dialplan). The replacement itself is made via a function from the core (iirc, Andrei Pelinescu-Onciul implemented it in the very early days of ser).
I am not that familiar with perl/sed and their full substitution rules, but in Kamailio, practically the subst_exp is supposed to break the matched value in tokens and then back-references in repl_exp can be used to build the new value.
Maybe I got used to this kind of model, to group parts of matches values and everything went fine for me.
For what Andreas exemplified in the first email in this thread, I would have used:
subst="^999(.*)" repl="\1"
I would consider a bug if there is no way to remove the full value (i.e., set the result to empty string), like no change will happen with: subst="(.*)" repl=""
Personally, I would not mind an update to get to a more common behaviour, if it will be properly documented and referenced to other well established languages/tutorials. So far the term was 'perl-like substitutions' not 'perl substitutions' more for syntax/idea. Also, we have our specific behaviour, the repl expression can include cfg variables (e.g., $avp(...), $var(...), ...) that are expanded when building the result.
However, I think it is too late for 3.3.0, because it will introduce lot of changes, perhaps many in the behaviour as well, and we are already 2 weeks in the testing phase.
Cheers, Daniel