Interpolate a variable into a regular expression - regex

I am used to Perl but a Perl 6 newbie
I want to host a regular expression in a text variable, like I would have done in perl5:
my $a = 'abababa';
my $b = '^aba';
if ($a =~ m/$b/) {
print "True\n";
} else {
print "False\n";
}
But if I do the same in Perl6 it doesn't work:
my $a = 'abababa';
my $b = '^aba';
say so $a ~~ /^aba/; # True
say so $a ~~ /$b/; # False
I'm puzzled... What am I missing?

You need to have a closer look at Quoting Constructs.
For this case, enclose the part of the LHS that is a separate token with angle brackets or <{ and }>:
my $a = 'abababa';
my $b = '^aba';
say so $a ~~ /<$b>/; # True, starts with aba
say so $a ~~ /<{$b}>/; # True, starts with aba
my $c = '<[0..5]>'
say so $a ~~ /<$c>/; # False, no digits 1 to 5 in $a
say so $a ~~ /<{$c}>/; # False, no digits 1 to 5 in $a
Another story is when you need to pass a variable into a limiting quantifier. That is where you need to only use braces:
my $ok = "12345678";
my $not_ok = "1234567";
my $min = 8;
say so $ok ~~ / ^ \d ** {$min .. *} $ /; # True, the string consists of 8 or more digits
say so $not_ok ~~ / ^ \d ** {$min .. *} $ /; # False, there are 7 digits only

Is there a reason why you don't pick the regex object for these types of uses?
my $a = 'abababa';
my $b = rx/^aba/;
say so $a ~~ /^aba/; # True
say so $a ~~ $b; # True

Related

Raku: Using topic variable (from a 'for') inside a regex

I have this code that works as expected:
my #words = 'foo', 'bar';
my $text = 'barfoo';
for #words -> $to-regex {
$text ~~ m/ ($to-regex) {say "matched $0"}/;
}
It prints:
matched foo
matched bar
However, if I try to use topic variable on the for loop, as in:
for #words { # implicit "-> $_", AFAIK
$text ~~ m/ ($_) {say "matched $0"}/;
}
I get this:
matched barfoo
matched barfoo
Same results using postfix for:
$text ~~ m/ ($_) {say "matched $0"}/ for #words; # implicit "-> $_", AFAIK
Is this a special case of the topic variable inside a regex?
Is it supposed to hold the whole string it is matching against?
The smart-match operator has 3 stages
alias the left argument temporarily to $_
run the expression on the right
call .ACCEPTS($_) on that result
So it isn't a special case for a regex, it is how ~~ always works.
for 1,2,3 {
$_.print;
'abc' ~~ $_.say
}
# 1abc
# 2abc
# 3abc

swapping values

I was wondering if there is an easy/clean way of swapping values as follows, perhaps using a single regex/substitution?
If $a ends with "x", substitute it with "y". And similarly if $a ends with "y", swap it with "x":
$a = "test_x";
if ($a =~ /x$/) {
$a =~ s/x$/y/;
} else {
$a =~ s/y$/x/;
}
I can only think of something like this:
$a = $a =~ /x$/ ? s/x$/y/ : s/y$/x/;
This is simply:
$a =~ s/x$/y/ or $a =~ s/y$/x/;
It's almost always redundant to do a match to see if you should do a substitution.
Another way:
substr($a,-1) =~ y/xy/yx/;
You can squeeze it in a line like you show, perhaps a bit nicer with /r (with v5.14+).
Or you can prepare a hash. This also relieves the code from hard-coding particular characters.
my %swap = (x => 'y', y => 'x', a => 'b', b => 'a'); # expand as needed
my #test = map { 'test_' . $_ } qw(x y a b Z);
for my $string (#test)
{
$string =~ s| (.)$ | $swap{$1} // $1 |ex;
say $string;
}
The // (defined-or) is there to handle the case where the last character isn't in the hash, in which case $swap{$1} returns undef. Thanks to user52889 for the comment.
To swap individual characters, you can use tr///.
Not sure what your criteria for cleanliness or ease, but you could even do this inside the right hand side of the substitution:
$xy = "test_x" =~ s`([xy])$`$1=~tr/xy/yx/r`re; # $xy is "test_y"

What is name of a variable within a variable

I'm assigning a series of regex's to vars. Some of the regex values will be the same but unique and be identifiable by the var name itself ($a and $c as example).
#various regex
$a = "([\d]{1,2})"
$b = "([\d]{3})"
$c = $b #Note this has the same regex as $b
$d = "\s[-]\s"
$e = "[_]"
#select the pattern
$patternNum = 4
I then want to be able to concat the vars in different orders to create a larger regex.
Switch ($patternNum){
#create a pattern
1 { $pattern = ($a, $e, $b) }
2 { $pattern = ($a, $d, $b) }
3 { $pattern = ($a, $d, $a, $e, $b) }
4 { $pattern = ($a, $e, $b, $e, $c) }
}
This creates the expanded regex string i'm hoping for
#so i can use full regex pattern later
$selectedPattern = -join $pattern
But I want to be able to associate the var in $pattern with the original var name and not the literal string that's associated with the var (as some strings will be the same)
#find the index of each var and assign to another var so var can be used later to identify position within match
var1 = [array]::IndexOf($pattern, $a) # [0]
var2 = array]::IndexOf($pattern, $b) # [2]
var3 = [array]::IndexOf($pattern, $c) # [2] but i want it to be [4]
The regex which will be used in matching, each match will be used in different strings and in different positions
I thought i'd be able to use scriptblock {} and then convert back to string but that doesn't seem to work. Can anybody think of a way to get each vars original var name or think of a better way of doing this?
Using named captures
Use (? ) syntax to create named captures. Make the name the same as your variable names, e.g.:
$A = '(?<A>\d{3})'
$B = '(?<B>\D{3})'
$string = 'ABC123'
$regex = $B + $A
$string -match $regex
$Matches
Name Value
---- -----
A 123
B ABC
0 ABC123
Now you can correlate the variables to the position they matched in the string like this:
$string.IndexOf($Matches.A)
3
$string.IndexOf($Matches.B)
0
following your code I'll do it like this, but knowing what's is your real need someone can suggest other solution:
$c = $b
$d = "\s[-]\s"
$e = "[_]"
#select the pattern
$patternNum = 4
Switch ($patternNum){
#create a pattern
1 { $pattern = ('$a', '$e', '$b') }
2 { $pattern = ('$a', '$d', '$b') }
3 { $pattern = ('$a', '$d', '$a', '$e', '$b') }
4 { $pattern = ('$a', '$e', '$b', '$e', '$c') }
}
$selectedPattern = -join $pattern
$var1 = [array]::IndexOf($pattern, '$a') # [0]
$var2 = [array]::IndexOf($pattern, '$b') # [2]
$var3 = [array]::IndexOf($pattern, '$c') # [4]
#converting literal to your pattern
$regexpattern = $ExecutionContext.InvokeCommand.ExpandString( -JOIN $pattern )
$regexpattern
([\d]{1,2})[_]([\d]{3})[_]([\d]{3})

Pattern binding operator on assignment

I am working into uncommented perl code. I came across a passage, that looks too perl-ish to me as a perl beginner. This is a simplified adaption:
my $foo;
my $bar = "x|y|z|";
$bar =~ s{\|$}{};
($foo = $bar) =~ s{ }{}gs;
I understand that $bar =~ s{\|$}{} applies the regular expression on the right to the string inside $bar.
But what does the expression ($foo = $bar) =~ s{ }{}gs; mean? I am not asking about the regular expression but on the expression it is apllied to.
Just follow the precedence that the parentheses dictate and solve each statement one at the time:
($a = $b) =~ s{ }{}gs;
#^^^^^^^^--- executed first
($a = $b) # set $a to the value contained in $b
$a =~ s{ }{}gs; # perform the regex on $a
The /g global modifier causes the regex to match as many times as possible, the /s modifier makes the wildcard . match newline as well (so it now really matches everything). The /s modifier is redundant for this regex, since there are no wildcards . in it.
Note that $a and $b are predeclared variables which are used by sort, and you should avoid using them.
When in doubt, you can always print the variables and see how they change. For example:
use Data::Dumper;
my $x = 'foo bar';
(my $y = $x) =~ s{ }{}gs;
print Dumper $x, $y;
Output:
$VAR1 = 'foo bar';
$VAR2 = 'foobar';
A scalar assignment in scalar context returns its left-hand-side operand (as shown here). That means
$a = $b
assigns the value of $b to $a and returns $a. That means
($a = $b) =~ s{ }{}gs;
is short for
$a = $b; $a =~ s{ }{}gs;
and long for
$a = $b =~ s{ }{}gsr; # Requires 5.14+
But what does the expression ($a = $b) =~ s{ }{}gs; mean?
It is same as
$a = $b;
$a =~ s{ }{}gs;
s{ }{}gs is substitution s/ //gs regex with {} as delimiters

Replace only up to N matches on a line

In Perl, how to write a regular expression that replaces only up to N matches per string?
I.e., I'm looking for a middle ground between s/aa/bb/; and s/aa/bb/g;. I want to allow multiple substitutions, but only up to N times.
I can think of three reliable ways. The first is to replace everything after the Nth match with itself.
my $max = 5;
$s =~ s/(aa)/ $max-- > 0 ? 'bb' : $1 /eg;
That's not very efficient if there are far more than N matches. For that, we need to move the loop out of the regex engine. The next two methods are ways of doing that.
my $max = 5;
my $out = '';
$out .= $1 . 'bb' while $max-- && $in =~ /\G(.*?)aa/gcs;
$out .= $1 if $in =~ /\G(.*)/gcs;
And this time, in-place:
my $max = 5;
my $replace = 'bb';
while ($max-- && $s =~ s/\G.*?\Kaa/$replace/s) {
pos($s) = $-[0] + length($replace);
}
You might be tempted to do something like
my $max = 5;
$s =~ s/aa/bb/ for 1..$max;
but that approach will fail for other patterns and/or replacement expressions.
my $max = 5;
$s =~ s/aa/ba/ for 1..$max; # XXX Turns 'aaaaaaaa'
# into 'bbbbbaaa'
# instead of 'babababa'
And of course, starting from the beginning of the string every time could be expensive.
What you want is not posible in regular expressions. But you can put the replacement in a for-loop:
my $i;
my $aa = 'aaaaaaaaaaaaaaaaaaaa';
for ($i=0;$i<4;$i++) {
$aa =~ s/aa/bb/;
}
print "$aa\n";
result:
bbbbbbbbaaaaaaaaaaaa
You can use the /e flag which evaluates the right side as an expression:
my $n = 3;
$string =~ s/(aa)/$n-- > 0 ? "bb" : $1/ge;
Here's a solution using the /e modifier, with which you can use
perl code to generate the replacement string:
my $count = 0;
$string =~ s{ $pattern }
{
$count++;
if ($count < $limit ) {
$replace;
} else {
$&; # faking a no-op, replacing with the original match.
}
}xeg;
With perl 5.10 or later you can drop the $& (which has weird
performance complications) and use ${^MATCH} via the /p modifier
$string =~ s{ $pattern }
{
$count++;
if ($count < $limit ) {
$replace;
} else {
${^MATCH};
}
}xegp;
It's too bad you can't just do this, but you can't:
last if $count >= $limit;