How can I find all matches to a regular expression in Perl? - regex

I have text in the form:
Name=Value1
Name=Value2
Name=Value3
Using Perl, I would like to match /Name=(.+?)/ every time it appears and extract the (.+?) and push it onto an array. I know I can use $1 to get the text I need and I can use =~ to perform the regex matching, but I don't know how to get all matches.

A m//g in list context returns all the captured matches.
#!/usr/bin/perl
use strict; use warnings;
my $str = <<EO_STR;
Name=Value1
Name=Value2
Name=Value3
EO_STR
my #matches = $str =~ /=(\w+)/g;
# or my #matches = $str =~ /=([^\n]+)/g;
# or my #matches = $str =~ /=(.+)$/mg;
# depending on what you want to capture
print "#matches\n";
However, it looks like you are parsing an INI style configuration file. In that case, I will recommend Config::Std.

my #values;
while(<DATA>){
chomp;
push #values, /Name=(.+?)$/;
}
print join " " => #values,"\n";
__DATA__
Name=Value1
Name=Value2
Name=Value3

The following will give all the matches to the regex in an array.
push (#matches,$&) while($string =~ /=(.+)$/g );

Use a Config:: module to read configuration data. For something simple like that, I might reach for ConfigReader::Simple. It's nice to stay out of the weeds whenever you can.

Instead of using a regular expression you might prefer trying a grammar engine like:
Parse::RecDescent
Regexp::Grammars
I've given a snippet of a Parse::ResDescent answer before on SO. However Regexp::Grammars looks very interesting and is influenced by Perl6 rules & grammars.
So I thought I'd have a crack at Regexp::Grammars ;-)
use strict;
use warnings;
use 5.010;
my $text = q{
Name=Value1
Name = Value2
Name=Value3
};
my $grammar = do {
use Regexp::Grammars;
qr{
<[VariableDeclare]>*
<rule: VariableDeclare>
<Var> \= <Value>
<token: Var> Name
<rule: Value> <MATCH= ([\w]+) >
}xms;
};
if ( $text =~ $grammar ) {
my #Name_values = map { $_->{Value} } #{ $/{VariableDeclare} };
say "#Name_values";
}
The above code outputs Value1 Value2 Value3.
Very nice! The only caveat is that it requires Perl 5.10 and that it may be overkill for the example you provided ;-)
/I3az/

Related

Add html to perl Regex

Am trying to replace all `` with a HTML code tag
replace:
$string = "Foo `FooBar` Bar";
with:
$string = "Foo <code>FooBar</code> Bar";
i tried these
$pattern = '`(.*?)`';
my $replace = "<code/>$&</code>";
$subject =~ s/$pattern/$replace/im;
#And
$subject =~ s/$pattern/<code/>$&</code>/im;
but none of them works.
Assuming you meant $string instead of $subject...
use strict;
use warnings;
use v5.10;
my $string = "Foo `FooBar` Bar";
my $pattern = '`(.*?)`';
my $replace = "<code/>$&</code>";
$string =~ s{$pattern}{$replace}im;
say $string;
This results in...
$ perl ~/tmp/test.plx
Use of uninitialized value $& in concatenation (.) or string at /Users/schwern/tmp/test.plx line 9.
Foo <code/></code> Bar
There's some problems here. First, $& means the string matched by the last match. That would be all of `FooBar`. You just want FooBar which is inside capturing parens. You get that with $1. See Extracting Matches in the Perl Regex Tutorial.
Second is $& and $1 are variables. If you put them in double quotes like $replace = "<code/>$&</code>" then Perl will immediately interpolate them. That means $replace is <code/></code>. This is where the warning comes from. If you want to use $1 it has to go directly into the replace.
Finally, when quoting regexes it's best to use qr{}. That does special regex quoting. It avoids all sorts of quoting issues.
Put it all together...
use strict;
use warnings;
use v5.10;
my $string = "Foo `FooBar` Bar";
my $pattern = qr{`(.*?)`};
$string =~ s{$pattern}{<code/>$1</code>}im;
say $string;

Simple perl regex replacement

Here is my perl code:
my $var="[url=/jobs/]click here[/url]";
$var =~ /\[url=(.+?)\](.+?)\[\/url\]/\2/g
I'm very new to perl so i am aware that its incorrect but how do i perform this regex replacement correctly.
The end result would be a transformation of $var to click here
So, with all the answers you know the substitute form is s///
However, with something this big you should break it up into parts
to make it easier to maintain. And also helps to get out of the
quagmire of delimiter hell.
This uses a pre-compiled regex and a callback function invoked with s///e
use strict;
use warnings;
# Pre-compiled regex
my $rx = qr{\[url=(.+?)\](.+?)\[/url\]};
# Callback
sub MakeAnchor {
my ($href,$text) = #_;
return '' . $text . '';
}
my $input = '[url=/jobs/]click here[/url]';
$input =~ s/$rx/MakeAnchor($1,$2)/eg;
print $input;
Outout
click here

Perl split by regexp issue

I'm writing some parser on Perl and here is a problem with split. Here is my code:
my $str = 'a,b,"c,d",e';
my #arr = split(/,(?=([^\"]*\"[^\"]*\")*[^\"]*$)/, $str);
# try to split the string by comma delimiter, but only if comma is followed by the even or zero number of quotes
foreach my $val (#arr) {
print "$val\n"
}
I'm expecting the following:
a
b
"c,d"
e
But this is what am I really received:
a
b,"c,d"
b
"c,d"
"c,d"
e
I see my string parts are in array, their indices are 0, 2, 4, 6. But how to avoid these odd b,"c,d" and other rest string parts in the resulting array? Is there any error in my regexp delimiter or is there some special split options?
You need to use a non-capturing group:
my #arr = split(/,(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)/, $str);
^^
See IDEONE demo
Otherwise, the captured texts are output as part of the resulting array.
See perldoc reference:
If the regex has groupings, then the list produced contains the matched substrings from the groupings as well
What's tripping you up is a feature in split in that if you're using a group, and it's set to capture - it returns the captured 'bit' as well.
But rather than using split I would suggest the Text::CSV module, that already handles quoting for you:
#!/usr/bin/env perl
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new();
my $fields = $csv->getline( \*DATA );
print join "\n", #$fields;
__DATA__
a,b,"c,d",e
Prints:
a
b
c,d
e
My reasoning is fairly simple - you're doing quote matching and may have things like quoted/escaped quotes, etc. mean you're trying to do a recursive parse, which is something regex simply isn't well suited to doing.
You can use parse_line() of Text::ParseWords, if you are not really bounded for regex:
use Text::ParseWords;
my $str = 'a,b,"c,d",e';
my #arr = parse_line(',', 1, $str);
foreach (#arr)
{
print "$_\n";
}
Output:
a
b
"c,d"
e
Do matching instead of splitting.
use strict; use warnings;
my $str = 'a,b,"c,d",e';
my #matches = $str =~ /"[^"]*"|[^,]+/g;
foreach my $val (#matches) {
print "$val\n"
}

escape the word in regexp

I have a string like this
$str = '"filename","lf","$data","{ }",0';
How to remove all " from the string?
I tried to use this kind of regexp:
$str =~ s/"(.+?)"//s;
It should match the word and remove "-s
you can do it like this $string =~ s/\"//g;
Your $str looks lke you're dealing with a CSV file. Saddam's answer will work for most cases of course, but if you're really working with a .csv file, then I suggest that you use an actual parser like Text::CSV. That way if there are commas embedded in your double quoted valeus, they'll be handled properly:
use Text::CSV;
use strict;
use warnings;
my $csv = Text::CSV->new();
my $str = '"filename","lf","$data","{ }",0';
$csv->parse($str);
my #columns = $csv->fields();
use Data::Dump;
dd \#columns;

Perl regex return matches from substitution

I am trying to simultaneously remove and store (into an array) all matches of some regex in a string.
To return matches from a string into an array, you could use
my #matches = $string=~/$pattern/g;
I would like to use a similar pattern for a substitution regex. Of course, one option is:
my #matches = $string=~/$pattern/g;
$string =~ s/$pattern//g;
But is there really no way to do this without running the regex engine over the full string twice? Something like
my #matches = $string=~s/$pattern//g
Except that this will only return the number of subs, regardless of list context. I would also take, as a consolation prize, a method to use qr// where I could simply modify the quoted regex to to a sub regex, but I don't know if that's possible either (and that wouldn't preclude searching the same string twice).
Perhaps the following will be helpful:
use warnings;
use strict;
my $string = 'I thistle thing am thinking this Thistle a changed thirsty string.';
my $pattern = '\b[Tt]hi\S+\b';
my #matches;
$string =~ s/($pattern)/push #matches, $1; ''/ge;
print "New string: $string; Removed: #matches\n";
Output:
New string: I am a changed string.; Removed: thistle thing thinking this Thistle thirsty
Here is another way to do it without executing Perl code inside the substitution. The trick is that the s///g will return one capture at a time and undef if it does not match, thus quitting the while loop.
use strict;
use warnings;
use Data::Dump;
my $string = "The example Kenosis came up with was way better than mine.";
my #matches;
push #matches, $1 while $string =~ s/(\b\w{4}\b)\s//;
dd #matches, $string;
__END__
(
"came",
"with",
"than",
"The example Kenosis up was way better mine.",
)