I am trying to simultaneously remove and store (into an array) all matches of some regex in a string.
To return matches from a string into an array, you could use
my #matches = $string=~/$pattern/g;
I would like to use a similar pattern for a substitution regex. Of course, one option is:
my #matches = $string=~/$pattern/g;
$string =~ s/$pattern//g;
But is there really no way to do this without running the regex engine over the full string twice? Something like
my #matches = $string=~s/$pattern//g
Except that this will only return the number of subs, regardless of list context. I would also take, as a consolation prize, a method to use qr// where I could simply modify the quoted regex to to a sub regex, but I don't know if that's possible either (and that wouldn't preclude searching the same string twice).
Perhaps the following will be helpful:
use warnings;
use strict;
my $string = 'I thistle thing am thinking this Thistle a changed thirsty string.';
my $pattern = '\b[Tt]hi\S+\b';
my #matches;
$string =~ s/($pattern)/push #matches, $1; ''/ge;
print "New string: $string; Removed: #matches\n";
Output:
New string: I am a changed string.; Removed: thistle thing thinking this Thistle thirsty
Here is another way to do it without executing Perl code inside the substitution. The trick is that the s///g will return one capture at a time and undef if it does not match, thus quitting the while loop.
use strict;
use warnings;
use Data::Dump;
my $string = "The example Kenosis came up with was way better than mine.";
my #matches;
push #matches, $1 while $string =~ s/(\b\w{4}\b)\s//;
dd #matches, $string;
__END__
(
"came",
"with",
"than",
"The example Kenosis up was way better mine.",
)
Related
I have written a basic program using regular expression.
However the entire line is being returned instead of the matched part.
I want to extract the number only.
use strict;
use warnings;
my $line = "ABMA 1234";
$line =~ /(\s)(\d){4}/;
print $line; #prints *ABMA 1234*
Is my regular expression incorrect?
If you want to print 1234, you need to change your regex and print the 2nd match:
use strict;
use warnings;
my $line = "ABMA 1234";
$line =~ /(\s)(\d{4})/;
print $2;
You can replace the exact value with the corresponding values. And your are not removing the text \w;
use strict;
use warnings;
my $line = "ABMA 1234";
$line=~s/([A-z]*)\s+(\d+)/$2/;
print $line; #prints only 1234
If you want to store the value in the new string then
(my $newstring = $line)=~s/([A-z]*)\s+(\d+)/$2/;
print $newstring; #prints only 1234
Just try this:
I don't know how you output the match in perl but you can use below regex for output the full match in your regex, you might getting space appended with your result in your current regex.
\b[\d]{4}
DEMO
Given the following code,
my $string = "foo";
my $regex = s/foo/bar/;
$string =~ $regex;
print $string, "\n";
I would have expected the output to be bar, however it is foo. Why is that the case, and how can I solve that problem?
Note that in my actual case, the regex is more complicated, and I actually want to store several of them in a hash (so I can write something like $string =~ $rules{$key}).
You're looking for substitution, not only the regex part so I guess compiled regex (qr//) is not what you're looking for,
use strict;
use warnings;
my $string = "foo";
my $regex = sub { $_[0] =~ s/foo/bar/ };
$regex->($string);
print $string, "\n";
Your statement
my $regex = s/foo/bar/
is equivalent to
my $regex = $_ =~ s/foo/bar/
s/// returns the number of substitutions made, or it returns false (specifically, the empty string). So $regex is now '' or 1 (it could be more if the /g modifier was in effect) and
$string =~ $regex
is doing 'foo' =~ // or 'foo' =~ /1/ depending on what $_ contained originally.
You can store a regex pattern in a variable but, in your example, the regex is just foo, and there is a lot more going on than just that pattern
The statement s/foo/bar/ is more complex than it seems -- it is a fully-fledged statement that applies a regex pattern to a target string and substitutes a replacement string if the pattern is found. In this case the target string is the default variable $_ and the replacement string is foo. You could think of it as a call to a subroutine
substitute($_, 'foo', 'bar')
and the regex pattern is only the second parameter
What you can do is store a regex pattern. The regex part of that substitution is foo, and you can say
my $pattern = qr/foo/;
s/$pattern/bar/;
But you really should explain the problem that you're trying to solve so that we can help you better
In the assignment, you need to tell Perl not to evaluate the regular expression but just to keep it. This is what qr is for.
But you can't do this with whole substitutions, which is why Сухой27 suggests using a subroutine.
I am trying to construct a small regex during runtime, but somehow it never matches -- what am I doing wrong?
my $word = quotemeta("test");
my $lines = "just a test to testing find tester testönig something fastest out pentest";
my $rule = "m/" . $word . "/g";
my $regex = qr/$rule/;
while ($lines =~ $regex) {
# this never happens...
print "\nFound pattern: '$&'";
}
Your code:
my $word = quotemeta("test");
my $rule = "m/" . $word . "/g";
my $regex = qr/$rule/;
is the same as this:
my $word = quotemeta("test");
my $rule = "m/test/g"; # interpolated $word
my $regex = qr~m/test/g~; # interpolated $rule
That is, it matches the literal string "m/test/g" and nothing else.
buff has already given pretty much the same code I would have suggested, except that I recommend avoiding the use of $& due to a performance penalty as noted in perlvar:
The use of this variable anywhere in a program imposes a considerable
performance penalty on all regular expression matches. To avoid this
penalty, you can extract the same substring by using #-. Starting with
Perl v5.10.0, you can use the /p match flag and the ${^MATCH} variable
to do the same thing for particular match operations.
This might be what you want:
#!/usr/bin/perl
use strict;
use warnings;
my $word = quotemeta("test");
my $lines = "just a test to testing find tester testönig something fastest out pentest";
my $regex = qr/$word/;
while ($lines =~ /$regex/g) {
print "\nFound pattern: '$&'";
}
You cannot use /g directly with qr.
The pattern matching quantifiers of a Perl regular expression are "greedy" (they match the longest possible string). To force the match to be "ungreedy", a ? can be appended to the pattern quantifier (*, +).
Here is an example:
#!/usr/bin/perl
$string="111s11111s";
#-- greedy match
$string =~ /^(.*)s/;
print "$1\n"; # prints 111s11111
#-- ungreedy match
$string =~ /^(.*?)s/;
print "$1\n"; # prints 111
But how one can find the second, third and .. possible string match in Perl? Make a simple example of yours --if need a better one.
Utilize a conditional expression, a code expression, and backtracking control verbs.
my $skips = 1;
$string =~ /^(.*)s(?(?{$skips-- > 0})(*FAIL))/;
The above will use greedy matching, but will cause the largest match to intentionally fail. If you wanted the 3rd largest, you could just set the number of skips to 2.
Demonstrated below:
#!/usr/bin/perl
use strict;
use warnings;
my $string = "111s11111s11111s";
$string =~ /^(.*)s/;
print "Greedy match - $1\n";
$string =~ /^(.*?)s/;
print "Ungreedy match - $1\n";
my $skips = 1;
$string =~ /^(.*)s(?(?{$skips-- > 0})(*FAIL))/;
print "2nd Greedy match - $1\n";
Outputs:
Greedy match - 111s11111s11111
Ungreedy match - 111
2nd Greedy match - 111s11111
When using such advanced features, it is important to have a full understanding of regular expressions to predict the results. This particular case works because the regex is fixed on one end with ^. That means that we know that each subsequent match is also one shorter than the previous. However, if both ends could shift, we could not necessarily predict order.
If that were the case, then you find them all, and then you sort them:
use strict;
use warnings;
my $string = "111s11111s";
my #seqs;
$string =~ /^(.*)s(?{push #seqs, $1})(*FAIL)/;
my #sorted = sort {length $b <=> length $a} #seqs;
use Data::Dump;
dd #sorted;
Outputs:
("111s11111s11111", "111s11111", 111)
Note for Perl versions prior to v5.18
Perl v5.18 introduced a change, /(?{})/ and /(??{})/ have been heavily reworked, that enabled the scope of lexical variables to work properly in code expressions as utilized above. Before then, the above code would result in the following errors, as demonstrated in this subroutine version run under v5.16.2:
Variable "$skips" will not stay shared at (re_eval 1) line 1.
Variable "#seqs" will not stay shared at (re_eval 2) line 1.
The fix for older implementations of RE code expressions is to declare the variables with our, and for further good coding practices, to localize them when initialized. This is demonstrated in this modified subroutine version run under v5.16.2, or as put below:
local our #seqs;
$string =~ /^(.*)s(?{push #seqs, $1})(*FAIL)/;
Start by getting all possible matches.
my $string = "111s1111s11111s";
local our #matches;
$string =~ /^(.*)s(?{ push #matches, $1 })(?!)/;
This finds
111s1111s11111
111s1111
111
Then, it's just a matter of finding out which one is the second longuest and filtering out the others.
use List::MoreUtils qw( uniq );
my $target_length = ( sort { $b <=> $a } uniq map length, #matches )[1];
#matches = uniq grep { length($_) == $target_length } #matches
if $target_length;
I have text in the form:
Name=Value1
Name=Value2
Name=Value3
Using Perl, I would like to match /Name=(.+?)/ every time it appears and extract the (.+?) and push it onto an array. I know I can use $1 to get the text I need and I can use =~ to perform the regex matching, but I don't know how to get all matches.
A m//g in list context returns all the captured matches.
#!/usr/bin/perl
use strict; use warnings;
my $str = <<EO_STR;
Name=Value1
Name=Value2
Name=Value3
EO_STR
my #matches = $str =~ /=(\w+)/g;
# or my #matches = $str =~ /=([^\n]+)/g;
# or my #matches = $str =~ /=(.+)$/mg;
# depending on what you want to capture
print "#matches\n";
However, it looks like you are parsing an INI style configuration file. In that case, I will recommend Config::Std.
my #values;
while(<DATA>){
chomp;
push #values, /Name=(.+?)$/;
}
print join " " => #values,"\n";
__DATA__
Name=Value1
Name=Value2
Name=Value3
The following will give all the matches to the regex in an array.
push (#matches,$&) while($string =~ /=(.+)$/g );
Use a Config:: module to read configuration data. For something simple like that, I might reach for ConfigReader::Simple. It's nice to stay out of the weeds whenever you can.
Instead of using a regular expression you might prefer trying a grammar engine like:
Parse::RecDescent
Regexp::Grammars
I've given a snippet of a Parse::ResDescent answer before on SO. However Regexp::Grammars looks very interesting and is influenced by Perl6 rules & grammars.
So I thought I'd have a crack at Regexp::Grammars ;-)
use strict;
use warnings;
use 5.010;
my $text = q{
Name=Value1
Name = Value2
Name=Value3
};
my $grammar = do {
use Regexp::Grammars;
qr{
<[VariableDeclare]>*
<rule: VariableDeclare>
<Var> \= <Value>
<token: Var> Name
<rule: Value> <MATCH= ([\w]+) >
}xms;
};
if ( $text =~ $grammar ) {
my #Name_values = map { $_->{Value} } #{ $/{VariableDeclare} };
say "#Name_values";
}
The above code outputs Value1 Value2 Value3.
Very nice! The only caveat is that it requires Perl 5.10 and that it may be overkill for the example you provided ;-)
/I3az/