Hello,
Following up on this, now that master is open for changes again, here's
a patch of how I'd suggest to change the behavior of how dialplan does
regular expression substitution.
As a short summary, this patch is trying to achieve the following:
Up until now, the dialplan module performs RE substitution in a "break
down string and construct new string" fashion, in other words RE
sub-patterns are used to extract certain pieces from the original
string, and then a new string is built from those pieces to replace the
original string. Side effect of this is that parts of the string that
the RE didn't match are discarded. While this works to perform all
necessary operations for string rewriting, there's two drawbacks to this
approach:
1) It's counter-intuitive to anybody who's ever used RE substitution in
any other tool or language. Be it in VIM, SED, BASH, Perl, PHP, Python,
you name it, they all do RE substitution as a "search and replace"
operation. Whatever is matched by the RE is replaced by the replacement
pattern, but anything not matched by the RE is left untouched. The
dialplan module however would discard those non-matched parts.
2) There's a slight performance impact in many use cases, namely in
those that only perform action on a known prefix or suffix. Most
commonly, the user wants to match against a certain prefix or suffix and
then either strip it out or replace it with something else, while
leaving the rest of the string alone, or alternatively simply prepend or
append something to the string. As it is now, the RE has to be
constructed so that it always matches the whole string, with the parts
of the string that are to be left alone captured in a sub-pattern. With
this patch, the RE can be constructed so that it only matches the part
of the string that you're interested in, meaning the PCRE engine has
less work to do.
(More performance savings are possible by unifying match_exp and
subst_exp, as it doesn't seem to make sense to first match on one part
of the string and then perform substitution on some other part. There
might be some use cases where it comes in useful, but I'd say the
benefits of doing only one RE match vs. two outweighs that. But that
would be another patch anyway.)
Here's a few examples of common substitution patterns (subst_exp and
repl_exp pairs) and how they can be simplified with this patch:
Stripping prefix:
old: "^00(.*)" -> "\1"
new: "^00" -> ""
Replacing prefix:
old: "^\+(.*)" -> "00\1"
new: "^\+" -> "00"
Prepending new prefix:
old: "(.*)" -> "00\1"
new: "^" -> "00"
It should be noted that all "old" patterns from those examples will
continue to work as they did before even with the patch applied. More
generally, all patterns that are designed to always match the complete
string will continue to work unchanged, providing backwards
compatibility. Only patterns that make deliberate use of the side-effect
of stripping out unmatched parts of the string will break, but I don't
think there's a whole lot of those out there in the wild. However,
there's always the possibility of adding a new module option to act as a
new behavior vs. old behavior switch if desired.
As for the patch itself, it looks like it's more than it actually is,
because it mostly moves things around. If you apply it and look at it
with diff -b, it gives you a clearer view of what's changed.
Comments welcome.
cheers
Richard
On 05/07/12 16:47, Daniel-Constantin Mierla wrote:
Hello,
On 5/7/12 8:57 PM, Juha Heinanen wrote:
Richard Fuchs writes:
I know that, but that doesn't answer my
question. :) Regex substitutions
work the same everywhere, s/(.)/\1/ in sed or Perl for example leaves
the string unchanged. Why is the dialplan module different?
are you sure that (.)
does the trick? i have used (.*) and that works ok.
indeed the .* is at least the
posix standard way to match everything,
'.' being for matching one single character.
Replacements in configuration file/dialplan do not use external library
for substitution, only for matching (posix regexp for core/textops which
is in libc and libpcre for dialplan). The replacement itself is made via
a function from the core (iirc, Andrei Pelinescu-Onciul implemented it
in the very early days of ser).
I am not that familiar with perl/sed and their full substitution rules,
but in Kamailio, practically the subst_exp is supposed to break the
matched value in tokens and then back-references in repl_exp can be used
to build the new value.
Maybe I got used to this kind of model, to group parts of matches values
and everything went fine for me.
For what Andreas exemplified in the first email in this thread, I would
have used:
subst="^999(.*)"
repl="\1"
I would consider a bug if there is no way to remove the full value
(i.e., set the result to empty string), like no change will happen with:
subst="(.*)"
repl=""
Personally, I would not mind an update to get to a more common
behaviour, if it will be properly documented and referenced to other
well established languages/tutorials. So far the term was 'perl-like
substitutions' not 'perl substitutions' more for syntax/idea. Also, we
have our specific behaviour, the repl expression can include cfg
variables (e.g., $avp(...), $var(...), ...) that are expanded when
building the result.
However, I think it is too late for 3.3.0, because it will introduce lot
of changes, perhaps many in the behaviour as well, and we are already 2
weeks in the testing phase.
Cheers,
Daniel