Perl how do you assign a varanble to a regex match result - regex

How do you create a $scalar from the result of a regex match?
Is there any way that once the script has matched the regex that it can be assigned to a variable so it can be used later on, outside of the block.
IE. If $regex_result = blah blah then do something.
I understand that I should make the regex as non-greedy as possible.
#!/usr/bin/perl
use strict;
use warnings;
# use diagnostics;
use Win32::OLE;
use Win32::OLE::Const 'Microsoft Outlook';
my #Qmail;
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\\s\*owner \#/";
my $outlook = Win32::OLE->new('Outlook.Application')
or warn "Failed Opening Outlook.";
my $namespace = $outlook->GetNamespace("MAPI");
my $folder = $namespace->Folders("test")->Folders("Inbox");
my $items = $folder->Items;
foreach my $msg ( $items->in ) {
if ( $msg->{Subject} =~ m/^(.*test alert) / ) {
my $name = $1;
print " processing Email for $name \n";
push #Qmail, $msg->{Body};
}
}
for(#Qmail) {
next unless /$regex|^\s*description/i;
print; # prints what i want ie lines that start with owner and description
}
print $sentence; # prints ^\\s\*offense \ # not lines that start with owner.

One way is to verify a match occurred.
use strict;
use warnings;
my $str = "hello what world";
my $match = 'no match found';
my $what = 'no what found';
if ( $str =~ /hello (what) world/ )
{
$match = $&;
$what = $1;
}
print '$match = ', $match, "\n";
print '$what = ', $what, "\n";

Use Below Perl variables to meet your requirements -
$` = The string preceding whatever was matched by the last pattern match, not counting patterns matched in nested blocks that have been exited already.
$& = Contains the string matched by the last pattern match
$' = The string following whatever was matched by the last pattern match, not counting patterns matched in nested blockes that have been exited already. For example:
$_ = 'abcdefghi';
/def/;
print "$`:$&:$'\n"; # prints abc:def:ghi

The match of a regex is stored in special variables (as well as some more readable variables if you specify the regex to do so and use the /p flag).
For the whole last match you're looking at the $MATCH (or $& for short) variable. This is covered in the manual page perlvar.
So say you wanted to store your last for loop's matches in an array called #matches, you could write the loop (and for some reason I think you meant it to be a foreach loop) as:
my #matches = ();
foreach (#Qmail) {
next unless /$regex|^\s*description/i;
push #matches_in_qmail $MATCH
print;
}
I think you have a problem in your code. I'm not sure of the original intention but looking at these lines:
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\s*owner #/";
I'll step through that as:
Assign $regexto the string ^\s*owner #.
Assign $sentence to value of running a match within $regex with the regular expression /^s*owner $/ (which won't match, if it did $sentence will be 1 but since it didn't it's false).
I think. I'm actually not exactly certain what that line will do or was meant to do.

I'm not quite sure what part of the match you want: the captures, or something else. I've written Regexp::Result which you can use to grab all the captures etc. on a successful match, and Regexp::Flow to grab multiple results (including success statuses). If you just want numbered captures, you can also use Data::Munge

You can do the following:
my $str ="hello world";
my ($hello, $world) = $str =~ /(hello)|(what)/;
say "[$_]" for($hello,$world);
As you see $hello contains "hello".

If you have older perl on your system like me, perl 5.18 or earlier, and you use $ $& $' like codequestor's answer above, it will slow down your program.
Instead, you can use your regex pattern with the modifier /p, and then check these 3 variables: ${^PREMATCH}, ${^MATCH}, and ${^POSTMATCH} for your matching results.

Related

extract string between two dots

I have a string of the following format:
word1.word2.word3
What are the ways to extract word2 from that string in perl?
I tried the following expression but it assigns 1 to sub:
#perleval $vars{sub} = $vars{string} =~ /.(.*)./; 0#
EDIT:
I have tried several suggestions, but still get the value of 1. I suspect that the entire expression above has a problem in addition to parsing. However, when I do simple assignment, I get the correct result:
#perleval $vars{sub} = $vars{string} ; 0#
assigns word1.word2.word3 to variable sub
. has a special meaning in regular expressions, so it needs to be escaped.
.* could match more than intended. [^.]* is safer.
The match operator (//) simply returns true/false in scalar context.
You can use any of the following:
$vars{sub} = $vars{string} =~ /\.([^.]*)\./ ? $1 : undef;
$vars{sub} = ( $vars{string} =~ /\.([^.]*)\./ )[0];
( $vars{sub} ) = $vars{string} =~ /\.([^.]*)\./;
The first one allows you to provide a default if there's no match.
Try:
/\.([^\.]+)\./
. has a special meaning and would need to be escaped. Then you would want to capture the values between the dots, so use a negative character class like ([^\.]+) meaning at least one non-dot. if you use (.*) you will get:
word1.stuff1.stuff2.stuff3.word2 to result in:
stuff1.stuff2.stuff3
But maybe you want that?
Here is my little example, I do find the perl one liners a little harder to read at times so I break it out:
use strict;
use warnings;
if ("stuff1.stuff2.stuff3" =~ m/\.([^.]+)\./) {
my $value = $1;
print $value;
}
else {
print "no match";
}
result
stuff2
. has a special meaning: any character (see the expression between your parentheses)
Therefore you have to escape it (\.) if you search a literal dot:
/\.(.*)\./
You've got to make sure you're asking for a list when you do the search.
my $x= $string =~ /look for (pattern)/ ;
sets $x to 1
my ($x)= $string =~ /look for (pattern)/ ;
sets $x to pattern.

Extract string after a symbol in Perl

How can I extract string after a symbol in Perl?
I tried doing some searches but even the code I found didn't work.
I'm trying to extract the string after a colon. So I want to show everything after the colon.
Example:
string = day1: string over here
substring = string over here
So far I have tried:
$substring = $string=~ /(\:.*)\s*$/;
But it only outputs the number 1 over and over.
That's because pattern matches in a scalar context are boolean tests. If you want to capture bracket content (capture groups), you need a list context. It's ok if the list is only one element though:
try this:
my ( $substring ) = $string=~ /(\:.*)\s*$/;
Difference maybe a bit subtle, but basically - we are assigning 'all the hits' from the pattern match to a list... that comprises one element.
Note - that's so you can do:
my #matches = $string =~ m/(.)/g;
And get multiple 'hits' returned. If you do as above, you will only get the first match - which is irrelevant given your pattern, but you can do:
my ( $key, $value ) = $string =~ m/(\w+)=(\w+)/;
for example.
I usually use parentheses to extract a part from text and then refer to the result stored in $1 variable.
look at example:
my $text = "day1: string over here";
print $1 if ($text =~ /:\s*(.+)$/);
but similar result may be recieved with this code too:
my $text = "day1: string over here";
my ($a) = $text =~ /:\s*(.+)$/;
print $a;
You can achieve desire substring by using split function also:
#!/usr/bin/perl
use warnings;
use strict;
my $string = "day1: string over here";
my (undef, $substring) = split(':\s*', $string);
print $substring, "\n";
Output:
string over here
Or you can get this by using capturing group () in regex:
my $string = "day1: string over here";
$string =~ m/(.*)\:\s+(.*)$/;
my $substring = $2;
print $substring, "\n";

How do I use a variable as a regex modifier in Perl?

I'm writing an abstraction function that will ask the user a given question and validate the answer based on a given regular expression. The question is repeated until the answer matches the validation regexp.
However, I also want the client to be able to specify whether the answer must match case-sensitively or not.
So something like this:
sub ask {
my ($prompt, $validationRe, $caseSensitive) = #_;
my $modifier = ($caseSensitive) ? "" : "i";
my $ans;
my $isValid;
do {
print $prompt;
$ans = <>;
chomp($ans);
# What I want to do that doesn't work:
# $isValid = $ans =~ /$validationRe/$modifier;
# What I have to do:
$isValid = ($caseSensitive) ?
($ans =~ /$validationRe/) :
($ans =~ /$validationRe/i);
} while (!$isValid);
return $ans;
}
Upshot: is there a way to dynamically specify a regular expression's modifiers?
Upshot: is there a way to dynamically specify a regular expression's modifiers?
From perldoc perlre:
"(?adlupimsx-imsx)"
"(?^alupimsx)"
One or more embedded pattern-match modifiers, to be turned on (or
turned off, if preceded by "-") for the remainder of the pattern or
the remainder of the enclosing pattern group (if any).
This is particularly useful for dynamic patterns, such as those read
in from a configuration file, taken from an argument, or specified in
a table somewhere. Consider the case where some patterns want to be
case-sensitive and some do not: The case-insensitive ones merely need
to include "(?i)" at the front of the pattern.
Which gives you something along the lines of
$isValid = $ans =~ m/(?$modifier)$validationRe/;
Just be sure to take the appropriate security precautions when accepting user input in this way.
You might also like the qr operator which quotes its STRING as a regular expression.
my $rex = qr/(?$mod)$pattern/;
$isValid = <STDIN> =~ $rex;
Get rid of your $caseSensitive parameter, as it will be useless in many cases. Instead, users of that function can encode the necessary information directly in the $validationRe regex.
When you create a regex object like qr/foo/, then the pattern is at that point compiled into instructions for the regex engine. If you stringify a regex object, you'll get a string that when interpolated back into a regex will have exactly the same behaviour as the original regex object. Most importantly, this means that all flags provided or omitted from the regex object literal will be preserved and can't be overridden! This is by design, so that a regex object will continue to behave identical no matter what context it is used in.
That's a bit dry, so let's use an example. Here is a match function that tries to apply a couple similar regexes to a list of strings. Which one will match?
use strict;
use warnings;
use feature 'say';
# This sub takes a string to match on, a regex, and a case insensitive marker.
# The regex will be recompiled to anchor at the start and end of the string.
sub match {
my ($str, $re, $i) = #_;
return $str =~ /\A$re\z/i if $i;
return $str =~ /\A$re\z/;
}
my #words = qw/foo FOO foO/;
my $real_regex = qr/foo/;
my $fake_regex = 'foo';
for my $re ($fake_regex, $real_regex) {
for my $i (0, 1) {
for my $word (#words) {
my $match = 0+ match($word, $re, $i);
my $output = qq("$word" =~ /$re/);
$output .= "i" if $i;
say "$output\t-->" . uc($match ? "match" : "fail");
}
}
}
Output:
"foo" =~ /foo/ -->MATCH
"FOO" =~ /foo/ -->FAIL
"foO" =~ /foo/ -->FAIL
"foo" =~ /foo/i -->MATCH
"FOO" =~ /foo/i -->MATCH
"foO" =~ /foo/i -->MATCH
"foo" =~ /(?^:foo)/ -->MATCH
"FOO" =~ /(?^:foo)/ -->FAIL
"foO" =~ /(?^:foo)/ -->FAIL
"foo" =~ /(?^:foo)/i -->MATCH
"FOO" =~ /(?^:foo)/i -->FAIL
"foO" =~ /(?^:foo)/i -->FAIL
First, we should notice that the string representation of regex objects has this weird (?^:...) form. In a non-capturing group (?: ... ), modifiers for the pattern inside the group can be added or removed between the question mark and colon, while the ^ indicates the default set of flags.
Now when we look at the fake regex that's actually just a string being interpolated, we can see that the addition of the /i flag makes a difference as expected. But when we use a real regex object, it doesn't change anything: The outside /i cannot override the (?^: ... ) flags.
It is probably best to assume that all regexes already are regex objects and should not be interfered with. If you load the regex patterns from a file, you should require the regexes to use the (?: ... ) syntax to apply flages (e.g. (?^i:foo) as an equivalent to qr/foo/i). E.g. loading one regex per line from a file handle could look like:
my #regexes;
while (<$fh>) {
chomp;
push #regexes, qr/$_/; # will die here on regex syntax errors
}
You need to use the eval function. The below code will work:
sub ask {
my ($prompt, $validationRe, $caseSensitive) = #_;
my $modifier = ($caseSensitive) ? "" : "i";
my $ans;
my $isValid;
do {
print $prompt;
$ans = <>;
chomp($ans);
# What I want to do that doesn't work:
# $isValid = $ans =~ /$validationRe/$modifier;
$isValid = eval "$ans =~ /$validationRe/$modifier";
# What I have to do:
#$isValid = ($caseSensitive) ?
# ($ans =~ /$validationRe/) :
# ($ans =~ /$validationRe/i);
} while (!$isValid);
return $ans;
}

How to user Perl to find all instances of strings that match a regexp?

How to user Perl to find and print all strings that match a regexp?
The following only finds the first match.
$text="?Adsfsadfgaasdf.
?Bafadfdsaadsfadsf.
xcxvfdgfdg";
if($text =~ m/\\?([^\.]+\.)/) {
print "$1\n";
}
EDIT1: /g doesn't work
#!/usr/bin/env perl
$text="?Adsfsadfgaasdf.
?Bafadfdsaadsfadsf.
xcxvfdgfdg";
if($text =~ m/\\?([^\.]+\.)/g) {
print "$1\n";
}
$ ./test.pl
?Adsfsadfgaasdf.
The problem is that the /g modifier does not use capture groups for multiple matches. You need to either iterate over the matches in scalar context, or catch the returned list in list context. For example:
use v5.10; # required for say()
$text="?Adsfsadfgaasdf.
?Bafadfdsaadsfadsf.
xcxvfdgfdg";
while ($text =~ /\?([^.]+\.)/g) { # scalar context
say $1;
}
for ($text =~ /\?[^.]+\./g) { # list context
say; # match is held in $_
}
Note in the second case, I skipped the parens, because in list context the whole match is returned if there are no parens. You may add parens to select part of the string.
Your version, using if, uses scalar context, which saves the position of the most recent match, but does not continue. A way to see what happens is:
if($text =~ m/\?([^\.]+\.)/g) {
print "$1\n";
}
say "Rest of string: ", substr $text, pos;
pos gives the position of the most recent match.
In previous answer #TLP correctly wrote that matching should be in list context.
use Data::Dumper;
$text="?Adsfsadfgaasdf.
?Bafadfdsaa.
dsfadsf.
xcxvfdgfdg";
#arr = ($text =~ /\?([^\.]+\.)/g);
print Dumper(#arr);
Expected result:
$VAR1 = 'Adsfsadfgaasdf.';
$VAR2 = 'Bafadfdsaa.';
You seem to be missing the /g flag, which tells perl to repeat the match as many times as possible.

In Perl, how can I get the matched substring from a regex?

My program read other programs source code and colect information about used SQL queries. I have problem with getting substring.
...
$line = <FILE_IN>;
until( ($line =~m/$values_string/i && $line !~m/$rem_string/i) || eof )
{
if($line =~m/ \S{2}DT\S{3}/i)
{
# here I wish to get (only) substring that match to pattern \S{2}DT\S{3}
# (7 letter table name) and display it.
$line =~/\S{2}DT\S{3}/i;
print $line."\n";
...
In result print prints whole line and not a substring I expect. I tried different approach, but I use Perl seldom and probably make basic concept error. ( position of tablename in line is not fixed. Another problem is multiple occurrence i.e.[... SELECT * FROM AADTTAB, BBDTTAB, ...] ). How can I obtain that substring?
Use grouping with parenthesis and store the first group.
if( $line =~ /(\S{2}DT\S{3})/i )
{
my $substring = $1;
}
The code above fixes the immediate problem of pulling out the first table name. However, the question also asked how to pull out all the table names. So:
# FROM\s+ match FROM followed by one or more spaces
# (.+?) match (non-greedy) and capture any character until...
# (?:x|y) match x OR y - next 2 matches
# [^,]\s+[^,] match non-comma, 1 or more spaces, and non-comma
# \s*; match 0 or more spaces followed by a semi colon
if( $line =~ /FROM\s+(.+?)(?:[^,]\s+[^,]|\s*;)/i )
{
# $1 will be table1, table2, table3
my #tables = split(/\s*,\s*/, $1);
# delim is a space/comma
foreach(#tables)
{
# $_ = table name
print $_ . "\n";
}
}
Result:
If $line = "SELECT * FROM AADTTAB, BBDTTAB;"
Output:
AADTTAB
BBDTTAB
If $line = "SELECT * FROM AADTTAB;"
Output:
AADTTAB
Perl Version: v5.10.0 built for MSWin32-x86-multi-thread
I prefer this:
my ( $table_name ) = $line =~ m/(\S{2}DT\S{3})/i;
This
scans $line and captures the text corresponding to the pattern
returns "all" the captures (1) to the "list" on the other side.
This psuedo-list context is how we catch the first item in a list. It's done the same way as parameters passed to a subroutine.
my ( $first, $second, #rest ) = #_;
my ( $first_capture, $second_capture, #others ) = $feldman =~ /$some_pattern/;
NOTE:: That said, your regex assumes too much about the text to be useful in more than a handful of situations. Not capturing any table name that doesn't have dt as in positions 3 and 4 out of 7? It's good enough for 1) quick-and-dirty, 2) if you're okay with limited applicability.
It would be better to match the pattern if it follows FROM. I assume table names consist solely of ASCII letters. In that case, it is best to say what you want. With those two remarks out of the way, note that a successful capturing regex match in list context returns the matched substring(s).
#!/usr/bin/perl
use strict;
use warnings;
my $s = 'select * from aadttab, bbdttab';
if ( my ($table) = $s =~ /FROM ([A-Z]{2}DT[A-Z]{3})/i ) {
print $table, "\n";
}
__END__
Output:
C:\Temp> s
aadttab
Depending on the version of perl on your system, you may be able to use a named capturing group which might make the whole thing easier to read:
if ( $s =~ /FROM (?<table>[A-Z]{2}DT[A-Z]{3})/i ) {
print $+{table}, "\n";
}
See perldoc perlre.
Parens will let you grab part of the regex into special variables: $1, $2, $3...
So:
$line = ' abc andtabl 1234';
if($line =~m/ (\S{2}DT\S{3})/i) {
# here I wish to get (only) substring that match to pattern \S{2}DT\S{3}
# (7 letter table name) and display it.
print $1."\n";
}
Use a capturing group:
$line =~ /(\S{2}DT\S{3})/i;
my $substr = $1;
$& contains the string matched by the last pattern match.
Example:
$str = "abcdefghijkl";
$str =~ m/cdefg/;
print $&;
# Output: "cdefg"
So you could do something like
if($line =~m/ \S{2}DT\S{3}/i) {
print $&."\n";
}
WARNING:
If you use $& in your code it will slow down all pattern matches.