How to capture every match in a global regex substitution? - regex

I realize it is possible to achieve this with a slight workaround, but I am hoping there is a simpler way (since I often make use of this type of expression).
Given the example string:
my $str = "An example: sentence!*"
A regex can be used to match each punctuation mark and capture them in an array.
Thereafter, I can simply repeat the regex and replace the matches as in the following code:
push (#matches, $1), while ($str =~ /([\*\!:;])/);
$str =~ s/([\*\!:;])//g;
Would it be possible to combine this into a single step in Perl where substitution occurs globally while also keeping tabs on the replaced matches?

You can embed code to run in your regular expression:
my #matches;
my $str = 'An example: sentence!*';
$str =~ s/([\*\!:;])(?{push #matches, $1})//g;
But with a match this simple, I'd just do the captures and substitution separately.

Yes, it's possible.
my #matches;
$str =~ s/[*!:;]/ push #matches, $&; "" /eg;
However, I'm not convinced that the above is faster or clearer than the following:
my #matches = $str =~ /[*!:;]/g;
$str =~ tr/*!:;//d;

Use:
my $str = "An example: sentence!*";
my #matches = $str =~ /([\*\!:;])/g;
say Dumper \#matches;
$str =~ tr/*!:;//d;
Output:
$VAR1 = [
':',
'!',
'*'
];

Is that what you're looking for ?
my ($str, #matches) = ("An example: sentence!*");
#first method :
($str =~ s/([\*\!:;])//g) && push(#matches, $1);
#second method :
push(#matches, $1) while ($str =~ s/([\*\!:;])//g);

Try:
my $str = "An example: sentence!*";
push(#mys, ($str=~m/([^\w\s])/g));
print join "\n", #mys;
Thanks.

Related

Different ways to test for $1 after regex?

Normally when I check if the regex succeeded I do
if ($var =~ /aaa(\d+)bbb(\d+)/) { # $1 and $2 should be defined now }
but I recall seeing a variation of this that seamed shorter. Perhaps it was only with one buffer.
Can anyone think or other ways to test if $1 after a successful regex?
You can avoid $1 and similar altogether:
if (my ($anum, $bnum) = $var =~ /aaa(\d+)bbb(\d+)/) {
# Work with $anum and $bnum
}
The only shorter way that I can think of is if the match is on $_. So for instance:
for (#strings) {
if (m/aaa(\d+)bbb(\d+)/) {
...
If the match succeeds then $1 and $2 will be populated.
never forget about
use strict;
use warnings;
I like plain syntax in Perl, but not in this way:
my $str = 'abc101abc';
$str =~ m/(\d+)/ and do {print $1;}
OR
$str =~ m/(\d+)/ and print $1;
OR
($str in $_, so $_ = $str;)
m/(\d+)/ and print $1;
BUT! TIMTOWTDI helps you to dream about your own style :)
I prefer old-if style.
Reading both answers, I now recall that this was what I had seen
my $str = 'abc101abc';
$str =~ m/(\d+)/;
print $1 if $1;
print $1 if $str =~ m/(\d+)/;

How to save a matching regex's value to a variable in one line of perl?

I'm sure there is a very simple way to do this, but whenever I search for examples, I get the two step method. Here is what I typically do:
$data =~ m/(my_query)/;
$result = $1;
I want to set $result in the same line as the regex and never use $1. Thanks!
my($result) = ($data =~ m/(my_query)/);
As noted in a comment, the my($result) needs the parentheses to provide an array context for the result of the match. In an array context, you get the $1 etc allocated to the array. You could use #result = ($data =~ m/(my_query)/);; you could omit the my but you would need to keep the parentheses; you could subscript the array using $result = ($data =~ m/(my_query)/)[0]; (thanks ysth). The key words here are 'array context'.
Examples:
$ perl -e '$data="abcdef";my($result)=($data =~ m/(cde)/); print "$result\n"'
cde
$ perl -e '$data="abcdef"; ($result)=($data =~ m/(cde)/); print "$result\n"'
cde
$ perl -e '$data="abcdef"; #result =($data =~ m/(cde)/); print "$result[0]\n"'
cde
$ perl -e '$data="abcdef"; $result =($data =~ m/(cde)/)[0]; print "$result\n"'
cde
$
You didn't specify what problem you want to avoid, but there is definitely one to avoid. The following code assigns something unknown to $result when the pattern doesn't match:
$data =~ /(my_query)/;
my $result = $1;
You could use a conditional to assign something useful to $result when the pattern doesn't match
my $result = $data =~ /(my_query)/ ? $1 : undef;
Or you could take advantage of the fact that m// in list context returns what it captured.
my ($result) = $data =~ /(my_query)/;
$data="abcde";
$data =~ s/(cde)/$result=$1/e;

Use variable as RegEx pattern

I'd like to use a variable as a RegEx pattern for matching filenames:
my $file = "test~";
my $regex1 = '^.+\Q~\E$';
my $regex2 = '^.+\\Q~\\E$';
print int($file =~ m/$regex1/)."\n";
print int($file =~ m/$regex2/)."\n";
print int($file =~ m/^.+\Q~\E$/)."\n";
The result (or on ideone.com):
0
0
1
Can anyone explain to me how I can use a variable as a RegEx pattern?
As documentation says:
$re = qr/$pattern/;
$string =~ /foo${re}bar/; # can be interpolated in other patterns
$string =~ $re; # or used standalone
$string =~ /$re/; # or this way
So, use the qr quote-like operator.
You cannot use \Q in a single-quoted / non-interpolated string. It must be seen by the lexer.
Anyway, tilde isn’t a meta-character.
Add use regex "debug" and you will see what is actually happening.

How to add a modifier to a quoted regular (qr) expression

Is there an easy way to add regex modifiers such as 'i' to a quoted regular expression? For example:
$pat = qr/F(o+)B(a+)r/;
$newpat = $pat . 'i'; # This doesn't work
The only way I can think of is to print "$pat\n" and get back (?-xism:F(o+)B(a+)r) and try to remove the 'i' in ?-xism: with a substitution
You cannot put the flag inside the result of qr that you already have, because it’s protected. Instead, use this:
$pat = qr/F(o+)B(a+)r/i;
You can modify an existing regex as if it was a string as long as you recompile it afterwards
my $pat = qr/F(o+)B(a+)r/;
print $pat, "\n";
print 'FOOBAR' =~ $pat ? "match\n" : "mismatch\n";
$pat =~ s/i//;
$pat = qr/(?i)$pat/;
print $pat, "\n";
print 'FOOBAR' =~ $pat ? "match\n" : "mismatch\n";
OUTPUT
(?-xism:F(o+)B(a+)r)
mismatch
(?-xism:(?i)(?-xsm:F(o+)B(a+)r))
match
Looks like the only way is to stringify the RE, replace (-i) with (i-) and re-quote it back:
my $pat = qr/F(o+)B(a+)r/;
my $str = "$pat";
$str =~ s/(?<!\\)(\(\?\w*)-([^i:]*)i([^i:]*):/$1i-$2$3:/g;
$pati = qr/$str/;
UPDATE: perl 5.14 quotes regexps in a different way, so my sample should probably look like
my $pat = qr/F(o+)B(a+)r/;
my $str = "$pat";
$str =~ s/(?<!\\)\(\?\^/(?^i/g;
$pati = qr/$str/;
But I don't have perl 5.14 at hand and can't test it.
UPD2: I also failed to check for escaped opening parenthesis.

Matching a regular expression multiple times with Perl

Noob question here. I have a very simple perl script and I want the regex to match multiple parts in the string
my $string = "ohai there. ohai";
my #results = $string =~ /(\w\w\w\w)/;
foreach my $x (#results){
print "$x\n";
}
This isn't working the way i want as it only returns ohai. I would like it to match and print out ohai ther ohai
How would i go about doing this.
Thanks
Would this do what you want?
my $string = "ohai there. ohai";
while ($string =~ m/(\w\w\w\w)/g) {
print "$1\n";
}
It returns
ohai
ther
ohai
From perlretut:
The modifier "//g" stands for global matching and allows the
matching operator to match within a
string as many times as possible.
Also, if you want to put the matches in an array instead you can do:
my $string = "ohai there. ohai";
my #matches = ($string =~ m/(\w\w\w\w)/g);
foreach my $x (#matches) {
print "$x\n";
}
Or you could do this
my $string = "ohai there. ohai";
my #matches = split(/\s/, $string);
foreach my $x (#matches) {
print "$x\n";
}
The split function in this case splits on spaces and prints
ohai
there.
ohai