Raku: Using topic variable (from a 'for') inside a regex - regex

I have this code that works as expected:
my #words = 'foo', 'bar';
my $text = 'barfoo';
for #words -> $to-regex {
$text ~~ m/ ($to-regex) {say "matched $0"}/;
}
It prints:
matched foo
matched bar
However, if I try to use topic variable on the for loop, as in:
for #words { # implicit "-> $_", AFAIK
$text ~~ m/ ($_) {say "matched $0"}/;
}
I get this:
matched barfoo
matched barfoo
Same results using postfix for:
$text ~~ m/ ($_) {say "matched $0"}/ for #words; # implicit "-> $_", AFAIK
Is this a special case of the topic variable inside a regex?
Is it supposed to hold the whole string it is matching against?

The smart-match operator has 3 stages
alias the left argument temporarily to $_
run the expression on the right
call .ACCEPTS($_) on that result
So it isn't a special case for a regex, it is how ~~ always works.
for 1,2,3 {
$_.print;
'abc' ~~ $_.say
}
# 1abc
# 2abc
# 3abc

Related

perl regex matching, why is it not finding all matches, why is the order important?

I ran into a problem with perl's regex matching. I destilled it down to a small example on the command line. Why is the order in which the matches are attempted important here ?
1.
$ echo "XYG" | perl -ne 'if ($_ =~ m/X/gi) { print "Matches X\n"; } ; if ($_ =~ m/Y/gi) { print "Matches Y\n"; } ; if ($_ =~ m/G/gi) { print "Matches G\n"; } '
Matches X
Matches Y
Matches G
2.
$ echo "GXY" | perl -ne 'if ($_ =~ m/X/gi) { print "Matches X\n"; } ; if ($_ =~ m/Y/gi) { print "Matches Y\n"; } ; if ($_ =~ m/G/gi) { print "Matches G\n"; } else { print "No match on G\n"; } '
Matches X
Matches Y
No match on G
The 1. examples matches all three letters as expected, but the second example does not match the letter G, why ?
However if I create an intermediate variable, here named $aa:
$ echo "GXY" | perl -ne 'if ($_ =~ m/X/gi) { print "Matches X\n"; } ; if ($_ =~ m/Y/gi) { print "Matches Y\n"; } ; $aa = $_; if ($aa =~ m/G/gi) { print "Matches G\n"; } '
Matches X
Matches Y
Matches G
Then the match works again ?
My perl version is:
$ perl -e 'print "$]\n";'
5.022001
On a LM 18.2 machine
$ lsb_release -d
Description: Linux Mint 18.2 Sonya
Ty+BR
Max.
Because if you match a regex in a scalar context like that, and you set the g flag (for global matching) it's iterative - that's to allow you to do things like while ( m/somepattern/g ) { and have it trigger multiple times.
That's because g means:
g - globally match the pattern repeatedly in the string
It'd not be particularly useful if it reset each time you tried it. But you can also use it slightly differently in an array context:
my #matches = $str =~ m/(some_capture)/g;
And that'll select them all into a list.
But with your code and regex debugging:
#!/usr/bin/env perl
use strict;
use warnings;
use re 'debug';
$_ = 'GXY';
if ( $_ =~ m/X/gi ) { print "Matches X\n"; }
if ( $_ =~ m/Y/gi ) { print "Matches Y\n"; }
if ( $_ =~ m/G/gi ) { print "Matches G\n"; }
else { print "No match on G\n"; }
You'll get (snipped for brevity):
Matching REx "X" against "GXY"
Matching REx "Y" against "Y"
Matching REx "G" against ""
The first match 'eats' "GX" to find "X", leaving "Y" for the next match, but nothing at all for the "G" match.
The simple workaround is omit the g flag, because then you're saying explicitly 'match once' and you'll get:
Matches X
Matches Y
Matches G
Alternatively, you can use the global match with a character class:
$_ = 'GXY';
my #matches = m/([GYX])/g; #implicitly operates on $_
print "Match on $_\n" for #matches;

Interpolate a variable into a regular expression

I am used to Perl but a Perl 6 newbie
I want to host a regular expression in a text variable, like I would have done in perl5:
my $a = 'abababa';
my $b = '^aba';
if ($a =~ m/$b/) {
print "True\n";
} else {
print "False\n";
}
But if I do the same in Perl6 it doesn't work:
my $a = 'abababa';
my $b = '^aba';
say so $a ~~ /^aba/; # True
say so $a ~~ /$b/; # False
I'm puzzled... What am I missing?
You need to have a closer look at Quoting Constructs.
For this case, enclose the part of the LHS that is a separate token with angle brackets or <{ and }>:
my $a = 'abababa';
my $b = '^aba';
say so $a ~~ /<$b>/; # True, starts with aba
say so $a ~~ /<{$b}>/; # True, starts with aba
my $c = '<[0..5]>'
say so $a ~~ /<$c>/; # False, no digits 1 to 5 in $a
say so $a ~~ /<{$c}>/; # False, no digits 1 to 5 in $a
Another story is when you need to pass a variable into a limiting quantifier. That is where you need to only use braces:
my $ok = "12345678";
my $not_ok = "1234567";
my $min = 8;
say so $ok ~~ / ^ \d ** {$min .. *} $ /; # True, the string consists of 8 or more digits
say so $not_ok ~~ / ^ \d ** {$min .. *} $ /; # False, there are 7 digits only
Is there a reason why you don't pick the regex object for these types of uses?
my $a = 'abababa';
my $b = rx/^aba/;
say so $a ~~ /^aba/; # True
say so $a ~~ $b; # True

Perl match multiple strings on same line (anything inside double and single quotes)

I thought this should be simple, matching strings in double/single quotes on same line
for example, following string all on same line
"hello" 'world' 'foo' "bar"
I have
print /(".*?")|('.*?')/g;
but I got following errors
Use of uninitialized value in print at ...
The following will return the warnings you mention:
use strict;
use warnings;
my $str = q{"hello" 'world' 'foo' "bar"};
print $str =~ /(".*?")|('.*?')/g;
That is because your regex will only match either one or the other of capture groups. The other one will not match and so will return undef.
The following will demonstrate:
while ($str =~ /(".*?")|('.*?')/g) {
print "one = " . (defined $1 ? $1 : 'undef') . "\n";
print "two = " . (defined $2 ? $2 : 'undef') . "\n";
print "\n";
}
Outputs:
one = "hello"
two = undef
one = undef
two = 'world'
one = undef
two = 'foo'
one = "bar"
two = undef
To get your desired behavior, just put the capture group around the entire expression.
print $str =~ /(".*?"|'.*?')/g;
You might want to check Text::ParseWords
use Text::ParseWords;
my $s = q{"hello" 'world' 'foo' "bar"};
my #words = quotewords('\s+', 0, $s);
use Data::Dumper; print Dumper \#words;
output
$VAR1 = [
'hello',
'world',
'foo',
'bar'
];
anoher option using backreference:
use strict;
use warnings;
my $str = q{"hello" 'world' 'foo' "bar"};
while ($str =~ /(["']).*?\1/g) {
print $& . "\n";
}

Pattern binding operator on assignment

I am working into uncommented perl code. I came across a passage, that looks too perl-ish to me as a perl beginner. This is a simplified adaption:
my $foo;
my $bar = "x|y|z|";
$bar =~ s{\|$}{};
($foo = $bar) =~ s{ }{}gs;
I understand that $bar =~ s{\|$}{} applies the regular expression on the right to the string inside $bar.
But what does the expression ($foo = $bar) =~ s{ }{}gs; mean? I am not asking about the regular expression but on the expression it is apllied to.
Just follow the precedence that the parentheses dictate and solve each statement one at the time:
($a = $b) =~ s{ }{}gs;
#^^^^^^^^--- executed first
($a = $b) # set $a to the value contained in $b
$a =~ s{ }{}gs; # perform the regex on $a
The /g global modifier causes the regex to match as many times as possible, the /s modifier makes the wildcard . match newline as well (so it now really matches everything). The /s modifier is redundant for this regex, since there are no wildcards . in it.
Note that $a and $b are predeclared variables which are used by sort, and you should avoid using them.
When in doubt, you can always print the variables and see how they change. For example:
use Data::Dumper;
my $x = 'foo bar';
(my $y = $x) =~ s{ }{}gs;
print Dumper $x, $y;
Output:
$VAR1 = 'foo bar';
$VAR2 = 'foobar';
A scalar assignment in scalar context returns its left-hand-side operand (as shown here). That means
$a = $b
assigns the value of $b to $a and returns $a. That means
($a = $b) =~ s{ }{}gs;
is short for
$a = $b; $a =~ s{ }{}gs;
and long for
$a = $b =~ s{ }{}gsr; # Requires 5.14+
But what does the expression ($a = $b) =~ s{ }{}gs; mean?
It is same as
$a = $b;
$a =~ s{ }{}gs;
s{ }{}gs is substitution s/ //gs regex with {} as delimiters

How to replace a variable with another variable in PERL?

I am trying to replace all words from a text except some that I have in an array. Here's my code:
my $text = "This is a text!And that's some-more text,text!";
while ($text =~ m/([\w']+)/g) {
next if $1 ~~ #ignore_words;
my $search = $1;
my $replace = uc $search;
$text =~ s/$search/$replace/e;
}
However, the program doesn't work. Basically I am trying to make all words uppercase but skip the ones in #ignore_words. I know it's a problem with the variables being used in the regular expression, but I can't figure the problem out.
#!/usr/bin/perl
my $text = "This is a text!And that's some-more text,text!";
my #ignorearr=qw(is some);
my %h1=map{$_ => 1}#ignorearr;
$text=~s/([\w']+)/($h1{$1})?$1:uc($1)/ge;
print $text;
On running this,
THIS is A TEXT!AND THAT'S some-MORE TEXT,TEXT!
You can figure the problem out of your code if instead of applying an expression to the same control variable of a while loop, just let s/../../eg do it globally for you:
my $text = "This is a text!And that's some-more text,text!";
my #ignore_words = qw{ is more };
$text =~ s/([\w']+)/$1 ~~ #ignore_words ? $1 : uc($1)/eg;
print $text;
And on running:
THIS is A TEXT!AND THAT'S SOME-more TEXT,TEXT!