[sr-dev] dialplan and empty repl_exp

Richard Fuchs rfuchs at sipwise.com
Tue Jun 19 14:20:40 CEST 2012


Hello,

Following up on this, now that master is open for changes again, here's 
a patch of how I'd suggest to change the behavior of how dialplan does 
regular expression substitution.

As a short summary, this patch is trying to achieve the following:

Up until now, the dialplan module performs RE substitution in a "break 
down string and construct new string" fashion, in other words RE 
sub-patterns are used to extract certain pieces from the original 
string, and then a new string is built from those pieces to replace the 
original string. Side effect of this is that parts of the string that 
the RE didn't match are discarded. While this works to perform all 
necessary operations for string rewriting, there's two drawbacks to this 
approach:

1) It's counter-intuitive to anybody who's ever used RE substitution in 
any other tool or language. Be it in VIM, SED, BASH, Perl, PHP, Python, 
you name it, they all do RE substitution as a "search and replace" 
operation. Whatever is matched by the RE is replaced by the replacement 
pattern, but anything not matched by the RE is left untouched. The 
dialplan module however would discard those non-matched parts.

2) There's a slight performance impact in many use cases, namely in 
those that only perform action on a known prefix or suffix. Most 
commonly, the user wants to match against a certain prefix or suffix and 
then either strip it out or replace it with something else, while 
leaving the rest of the string alone, or alternatively simply prepend or 
append something to the string. As it is now, the RE has to be 
constructed so that it always matches the whole string, with the parts 
of the string that are to be left alone captured in a sub-pattern. With 
this patch, the RE can be constructed so that it only matches the part 
of the string that you're interested in, meaning the PCRE engine has 
less work to do.

(More performance savings are possible by unifying match_exp and 
subst_exp, as it doesn't seem to make sense to first match on one part 
of the string and then perform substitution on some other part. There 
might be some use cases where it comes in useful, but I'd say the 
benefits of doing only one RE match vs. two outweighs that. But that 
would be another patch anyway.)

Here's a few examples of common substitution patterns (subst_exp and 
repl_exp pairs) and how they can be simplified with this patch:

Stripping prefix:
old: "^00(.*)" -> "\1"
new: "^00" -> ""

Replacing prefix:
old: "^\+(.*)" -> "00\1"
new: "^\+" -> "00"

Prepending new prefix:
old: "(.*)" -> "00\1"
new: "^" -> "00"

It should be noted that all "old" patterns from those examples will 
continue to work as they did before even with the patch applied. More 
generally, all patterns that are designed to always match the complete 
string will continue to work unchanged, providing backwards 
compatibility. Only patterns that make deliberate use of the side-effect 
of stripping out unmatched parts of the string will break, but I don't 
think there's a whole lot of those out there in the wild. However, 
there's always the possibility of adding a new module option to act as a 
new behavior vs. old behavior switch if desired.

As for the patch itself, it looks like it's more than it actually is, 
because it mostly moves things around. If you apply it and look at it 
with diff -b, it gives you a clearer view of what's changed.

Comments welcome.

cheers
Richard


On 05/07/12 16:47, Daniel-Constantin Mierla wrote:
> Hello,
>
> On 5/7/12 8:57 PM, Juha Heinanen wrote:
>> Richard Fuchs writes:
>>
>>> I know that, but that doesn't answer my question. :) Regex substitutions
>>> work the same everywhere, s/(.)/\1/ in sed or Perl for example leaves
>>> the string unchanged. Why is the dialplan module different?
>> are you sure that (.) does the trick? i have used (.*) and that works ok.
> indeed the .* is at least the posix standard way to match everything,
> '.' being for matching one single character.
>
> Replacements in configuration file/dialplan do not use external library
> for substitution, only for matching (posix regexp for core/textops which
> is in libc and libpcre for dialplan). The replacement itself is made via
> a function from the core (iirc, Andrei Pelinescu-Onciul implemented it
> in the very early days of ser).
>
> I am not that familiar with perl/sed and their full substitution rules,
> but in Kamailio, practically the subst_exp is supposed to break the
> matched value in tokens and then back-references in repl_exp can be used
> to build the new value.
>
> Maybe I got used to this kind of model, to group parts of matches values
> and everything went fine for me.
>
> For what Andreas exemplified in the first email in this thread, I would
> have used:
>
> subst="^999(.*)"
> repl="\1"
>
> I would consider a bug if there is no way to remove the full value
> (i.e., set the result to empty string), like no change will happen with:
> subst="(.*)"
> repl=""
>
> Personally, I would not mind an update to get to a more common
> behaviour, if it will be properly documented and referenced to other
> well established languages/tutorials. So far the term was 'perl-like
> substitutions' not 'perl substitutions' more for syntax/idea. Also, we
> have our specific behaviour, the repl expression can include cfg
> variables (e.g., $avp(...), $var(...), ...) that are expanded when
> building the result.
>
> However, I think it is too late for 3.3.0, because it will introduce lot
> of changes, perhaps many in the behaviour as well, and we are already 2
> weeks in the testing phase.
>
> Cheers,
> Daniel
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dialplan-regexp-subst.patch
Type: text/x-patch
Size: 9898 bytes
Desc: not available
URL: <http://lists.sip-router.org/pipermail/sr-dev/attachments/20120619/7df04fea/attachment-0001.bin>


More information about the sr-dev mailing list