I have a string like below
atom:link[#me="samiron" and #test1="t1" and #test2="t2"]
and I need a regular expression which will generate the following back references
#I would prefer to have
$1 = #test1
$2 = t1
$3 = #test2
$4 = t2
#Or at least. I will break these up in parts later on.
$1 = #test1="t1"
$2 = #test2="t2"
I've tried something like ( and [#\w]+=["\w]+)*\] which returns only last match and #test2="t2". Completely out of ideas. Any help?
Edit:
actually the number of #test1="t1" pattern is not fixed. And the regex must fit the situation. Thnx #Pietzcker.
This will give you hash which maps "#test1" => "t1" and so on:
my %matches = ($str =~ /and (\#\w+)="(\w+)"/g);
Explanation: /g global match will give you an array of matches like
"#test1", "t1", "#test2", "t2", ...
When hash %matches is assigned to this array, perl will automatically convert array to hash by treating it as key-value pairs.
As a result, hash %matches will contain what are you looking for in nice hash format.
You can do it like this:
my $text = 'atom:link[#me="samiron" and #test1="t1" and #test2="t2"]';
my #results;
while ($text =~ m/and (#\w+)="(\w+)"/g) {
push #results, $1, $2;
}
print Dumper \#results;
Result:
$VAR1 = [
'#me',
'samiron',
'#test1',
't1',
'#test2',
't2'
];
When you use a repeating capturing group, each new match will overwrite any previous match.
So you can only do a "find all" with a regex like
#result = $subject =~ m/(?<= and )([#\w]+)=(["\w]+)(?= and |\])/g;
to get an array of all matches.
This works for me:
#result = $s =~ /(#(?!me).*?)="(.*?)"/g;
foreach (#result){
print "$_\n";
}
The output is:
#test1
t1
#test2
t2
Related
How can I extract string after a symbol in Perl?
I tried doing some searches but even the code I found didn't work.
I'm trying to extract the string after a colon. So I want to show everything after the colon.
Example:
string = day1: string over here
substring = string over here
So far I have tried:
$substring = $string=~ /(\:.*)\s*$/;
But it only outputs the number 1 over and over.
That's because pattern matches in a scalar context are boolean tests. If you want to capture bracket content (capture groups), you need a list context. It's ok if the list is only one element though:
try this:
my ( $substring ) = $string=~ /(\:.*)\s*$/;
Difference maybe a bit subtle, but basically - we are assigning 'all the hits' from the pattern match to a list... that comprises one element.
Note - that's so you can do:
my #matches = $string =~ m/(.)/g;
And get multiple 'hits' returned. If you do as above, you will only get the first match - which is irrelevant given your pattern, but you can do:
my ( $key, $value ) = $string =~ m/(\w+)=(\w+)/;
for example.
I usually use parentheses to extract a part from text and then refer to the result stored in $1 variable.
look at example:
my $text = "day1: string over here";
print $1 if ($text =~ /:\s*(.+)$/);
but similar result may be recieved with this code too:
my $text = "day1: string over here";
my ($a) = $text =~ /:\s*(.+)$/;
print $a;
You can achieve desire substring by using split function also:
#!/usr/bin/perl
use warnings;
use strict;
my $string = "day1: string over here";
my (undef, $substring) = split(':\s*', $string);
print $substring, "\n";
Output:
string over here
Or you can get this by using capturing group () in regex:
my $string = "day1: string over here";
$string =~ m/(.*)\:\s+(.*)$/;
my $substring = $2;
print $substring, "\n";
How do you create a $scalar from the result of a regex match?
Is there any way that once the script has matched the regex that it can be assigned to a variable so it can be used later on, outside of the block.
IE. If $regex_result = blah blah then do something.
I understand that I should make the regex as non-greedy as possible.
#!/usr/bin/perl
use strict;
use warnings;
# use diagnostics;
use Win32::OLE;
use Win32::OLE::Const 'Microsoft Outlook';
my #Qmail;
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\\s\*owner \#/";
my $outlook = Win32::OLE->new('Outlook.Application')
or warn "Failed Opening Outlook.";
my $namespace = $outlook->GetNamespace("MAPI");
my $folder = $namespace->Folders("test")->Folders("Inbox");
my $items = $folder->Items;
foreach my $msg ( $items->in ) {
if ( $msg->{Subject} =~ m/^(.*test alert) / ) {
my $name = $1;
print " processing Email for $name \n";
push #Qmail, $msg->{Body};
}
}
for(#Qmail) {
next unless /$regex|^\s*description/i;
print; # prints what i want ie lines that start with owner and description
}
print $sentence; # prints ^\\s\*offense \ # not lines that start with owner.
One way is to verify a match occurred.
use strict;
use warnings;
my $str = "hello what world";
my $match = 'no match found';
my $what = 'no what found';
if ( $str =~ /hello (what) world/ )
{
$match = $&;
$what = $1;
}
print '$match = ', $match, "\n";
print '$what = ', $what, "\n";
Use Below Perl variables to meet your requirements -
$` = The string preceding whatever was matched by the last pattern match, not counting patterns matched in nested blocks that have been exited already.
$& = Contains the string matched by the last pattern match
$' = The string following whatever was matched by the last pattern match, not counting patterns matched in nested blockes that have been exited already. For example:
$_ = 'abcdefghi';
/def/;
print "$`:$&:$'\n"; # prints abc:def:ghi
The match of a regex is stored in special variables (as well as some more readable variables if you specify the regex to do so and use the /p flag).
For the whole last match you're looking at the $MATCH (or $& for short) variable. This is covered in the manual page perlvar.
So say you wanted to store your last for loop's matches in an array called #matches, you could write the loop (and for some reason I think you meant it to be a foreach loop) as:
my #matches = ();
foreach (#Qmail) {
next unless /$regex|^\s*description/i;
push #matches_in_qmail $MATCH
print;
}
I think you have a problem in your code. I'm not sure of the original intention but looking at these lines:
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\s*owner #/";
I'll step through that as:
Assign $regexto the string ^\s*owner #.
Assign $sentence to value of running a match within $regex with the regular expression /^s*owner $/ (which won't match, if it did $sentence will be 1 but since it didn't it's false).
I think. I'm actually not exactly certain what that line will do or was meant to do.
I'm not quite sure what part of the match you want: the captures, or something else. I've written Regexp::Result which you can use to grab all the captures etc. on a successful match, and Regexp::Flow to grab multiple results (including success statuses). If you just want numbered captures, you can also use Data::Munge
You can do the following:
my $str ="hello world";
my ($hello, $world) = $str =~ /(hello)|(what)/;
say "[$_]" for($hello,$world);
As you see $hello contains "hello".
If you have older perl on your system like me, perl 5.18 or earlier, and you use $ $& $' like codequestor's answer above, it will slow down your program.
Instead, you can use your regex pattern with the modifier /p, and then check these 3 variables: ${^PREMATCH}, ${^MATCH}, and ${^POSTMATCH} for your matching results.
I am matching a string of the form A<=>B!C<=>D!E<=>F... and want to do checks on the letters. Basically I want to tell if the letters are in the class according to a hash I have defined. I had the idea of doing the following regex and then looping through the matched strings:
$a =~ /(.)<=>(.)/g;
But I can't figure out to tell how many $1, $2 variables have matched. How do I know how many there are? Also, is there a better way to do this? I am using Perl 5.8.8.
You'll want the 'countof' operator to count the number of matches:
my $count = () = $string =~ /(.)<=>(.)/g;
Replacing the empty list with an array will retain the matches:
my #matches = $string =~ /(.)<=>(.)/g;
Which provides another way to get the $count:
my $count = #matches; # scalar #matches works too
Use a while loop
use warnings;
use strict;
my %letters = map { $_ => 1 } qw(A C F);
my $s = 'A<=>B!C<=>D!E<=>F';
while ($s =~ /(.)<=>(.)/g) {
print "$1\n" if exists $letters{$1};
print "$2\n" if exists $letters{$2};
}
__END__
A
C
F
Create a variable and increment it each time you go through your loop?
I want to be able to do a regex match on a variable and assign the results to the variable itself. What is the best way to do it?
I want to essentially combine lines 2 and 3 in a single line of code:
$variable = "some string";
$variable =~ /(find something).*/;
$variable = $1;
Is there a shorter/simpler way to do this? Am I missing something?
my($variable) = "some string" =~ /(e\s*str)/;
This works because
If the /g option is not used, m// in list context returns a list consisting of the subexpressions matched by the parentheses in the pattern, i.e., ($1, $2, $3 …).
and because my($variable) = ... (note the parentheses around the scalar) supplies list context to the match.
If the pattern fails to match, $variable gets the undefined value.
Why do you want it to be shorter? Does is really matter?
$variable = $1 if $variable =~ /(find something).*/;
If you are worried about the variable name or doing this repeatedly, wrap the thing in a subroutine and forget about it:
some_sub( $variable, qr/pattern/ );
sub some_sub { $_[0] = $1 if eval { $_[0] =~ m/$_[1]/ }; $1 };
However you implement it, the point of the subroutine is to make it reuseable so you give a particular set of lines a short name that stands in their place.
Several other answers mention a destructive substitution:
( my $new = $variable ) =~ s/pattern/replacement/;
I tend to keep the original data around, and Perl v5.14 has an /r flag that leaves the original alone and returns a new string with the replacement (instead of the count of replacements):
my $match = $variable =~ s/pattern/replacement/r;
Well, you could say
my $variable;
($variable) = ($variable = "find something soon") =~ /(find something).*/;
or
(my $variable = "find something soon") =~ s/^.*?(find something).*/$1/;
You can do substitution as:
$a = 'stackoverflow';
$a =~ s/(\w+)overflow/$1/;
$a is now "stack"
From Perl Cookbook 2nd ed
6.1 Copying and Substituting Simultaneously
$dst = $src;
$dst =~ s/this/that/;
becomes
($dst = $src) =~ s/this/that/;
I just assumed everyone did it this way, amazed that no one gave this answer.
Almost ....
You can combine the match and retrieve the matched value with a substitution.
$variable =~ s/.*(find something).*/$1/;
AFAIK, You will always have to copy the value though, unless you do not care to clobber the original.
$variable2 = "stackoverflow";
(my $variable1) = ($variable2 =~ /stack(\w+)/);
$variable1 now equals "overflow".
I do this:
#!/usr/bin/perl
$target = "n: 123";
my ($target) = $target =~ /n:\s*(\d+)/g;
print $target; # the var $target now is "123"
Also, to amplify the accepted answer using the ternary operator to allow you to specify a default if there is no match:
my $match = $variable =~ /(*pattern*).*/ ? $1 : *defaultValue*;
Is it possible to store all matches for a regular expression into an array?
I know I can use ($1,...,$n) = m/expr/g;, but it seems as though that can only be used if you know the number of matches you are looking for. I have tried my #array = m/expr/g;, but that doesn't seem to work.
If you're doing a global match (/g) then the regex in list context will return all of the captured matches. Simply do:
my #matches = ( $str =~ /pa(tt)ern/g )
This command for example:
perl -le '#m = ( "foo12gfd2bgbg654" =~ /(\d+)/g ); print for #m'
Gives the output:
12
2
654
Sometimes you need to get all matches globally, like PHP's preg_match_all does. If it's your case, then you can write something like:
# a dummy example
my $subject = 'Philip Fry Bender Rodriguez Turanga Leela';
my #matches;
push #matches, [$1, $2] while $subject =~ /(\w+) (\w+)/g;
use Data::Dumper;
print Dumper(\#matches);
It prints
$VAR1 = [
[
'Philip',
'Fry'
],
[
'Bender',
'Rodriguez'
],
[
'Turanga',
'Leela'
]
];
See the manual entry for perldoc perlop under "Matching in List Context":
If the /g option is not used, m// in list context returns a list consisting of the
subexpressions matched by the parentheses in the pattern, i.e., ($1 , $2 , $3 ...)
The /g modifier specifies global pattern matching--that is, matching as many times as
possible within the string. How it behaves depends on the context. In list context, it
returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern.
You can simply grab all the matches by assigning to an array, or otherwise performing the evaluation in list context:
my #matches = ($string =~ m/word/g);
I think this is a self-explanatory example. Note /g modifier in the first regex:
$string = "one two three four";
#res = $string =~ m/(\w+)/g;
print Dumper(#res); # #res = ("one", "two", "three", "four")
#res = $string =~ m/(\w+) (\w+)/;
print Dumper(#res); # #res = ("one", "two")
Remember, you need to make sure the lvalue is in the list context, which means you have to surround scalar values with parenthesis:
($one, $two) = $string =~ m/(\w+) (\w+)/;
Is it possible to store all matches for a regular expression into an array?
Yes, in Perl 5.25.7, the variable #{^CAPTURE} was added, which holds "the contents of the capture buffers, if any, of the last successful pattern match". This means it contains ($1, $2, ...) even if the number of capture groups is unknown.
Before Perl 5.25.7 (since 5.6.0) you could build the same array using #- and #+ as suggested by #Jaques in his answer. You would have to do something like this:
my #capture = ();
for (my $i = 1; $i < #+; $i++) {
push #capture, substr $subject, $-[$i], $+[$i] - $-[$i];
}
I am surprised this is not already mentioned here, but perl documentation provides with the standard variable #+. To quote from the documentation:
This array holds the offsets of the beginnings of the last successful submatches in the currently active dynamic scope.
So, to get the value caught in first capture, one would write:
print substr( $str, $-[1], $+[1] - $-[1] ), "\n"; # equivalent to $1
As a side note, there is also the standard variable %- which is very nifty, because it not only contains named captures, but also allows for duplicate names to be stored in an array.
Using the example provided in the documentation:
/(?<A>1)(?<B>2)(?<A>3)(?<B>4)/
would yield an hash with entries such as:
$-{A}[0] : '1'
$-{A}[1] : '3'
$-{B}[0] : '2'
$-{B}[1] : '4'
Note that if you know the number of capturing groups you need per match, you can use this simple approach, which I present as an example (of 2 capturing groups.)
Suppose you have some 'data' like
my $mess = <<'IS_YOURS';
Richard Rich
April May
Harmony Ha\rm
Winter Win
Faith Hope
William Will
Aurora Dawn
Joy
IS_YOURS
With the following regex
my $oven = qr'^(\w+)\h+(\w+)$'ma; # skip the /a modifier if using perl < 5.14
I can capture all 12 (6 pairs, not 8...Harmony escaped and Joy is missing) in the #box below.
my #box = $mess =~ m[$oven]g;
If I want to "hash out" the details of the box I could just do:
my %hash = #box;
Or I just could have just skipped the box entirely,
my %hash = $mess =~ m[$oven]g;
Note that %hash contains the following. Order is lost and dupe keys (if any had existed) are squashed:
(
'April' => 'May',
'Richard' => 'Rich',
'Winter' => 'Win',
'William' => 'Will',
'Faith' => 'Hope',
'Aurora' => 'Dawn'
);