Perl: Constructing simple regular expression during runtime - regex

I am trying to construct a small regex during runtime, but somehow it never matches -- what am I doing wrong?
my $word = quotemeta("test");
my $lines = "just a test to testing find tester testönig something fastest out pentest";
my $rule = "m/" . $word . "/g";
my $regex = qr/$rule/;
while ($lines =~ $regex) {
# this never happens...
print "\nFound pattern: '$&'";
}

Your code:
my $word = quotemeta("test");
my $rule = "m/" . $word . "/g";
my $regex = qr/$rule/;
is the same as this:
my $word = quotemeta("test");
my $rule = "m/test/g"; # interpolated $word
my $regex = qr~m/test/g~; # interpolated $rule
That is, it matches the literal string "m/test/g" and nothing else.
buff has already given pretty much the same code I would have suggested, except that I recommend avoiding the use of $& due to a performance penalty as noted in perlvar:
The use of this variable anywhere in a program imposes a considerable
performance penalty on all regular expression matches. To avoid this
penalty, you can extract the same substring by using #-. Starting with
Perl v5.10.0, you can use the /p match flag and the ${^MATCH} variable
to do the same thing for particular match operations.

This might be what you want:
#!/usr/bin/perl
use strict;
use warnings;
my $word = quotemeta("test");
my $lines = "just a test to testing find tester testönig something fastest out pentest";
my $regex = qr/$word/;
while ($lines =~ /$regex/g) {
print "\nFound pattern: '$&'";
}
You cannot use /g directly with qr.

Related

Simple perl regex replacement

Here is my perl code:
my $var="[url=/jobs/]click here[/url]";
$var =~ /\[url=(.+?)\](.+?)\[\/url\]/\2/g
I'm very new to perl so i am aware that its incorrect but how do i perform this regex replacement correctly.
The end result would be a transformation of $var to click here
So, with all the answers you know the substitute form is s///
However, with something this big you should break it up into parts
to make it easier to maintain. And also helps to get out of the
quagmire of delimiter hell.
This uses a pre-compiled regex and a callback function invoked with s///e
use strict;
use warnings;
# Pre-compiled regex
my $rx = qr{\[url=(.+?)\](.+?)\[/url\]};
# Callback
sub MakeAnchor {
my ($href,$text) = #_;
return '' . $text . '';
}
my $input = '[url=/jobs/]click here[/url]';
$input =~ s/$rx/MakeAnchor($1,$2)/eg;
print $input;
Outout
click here

Why can't I store a regexp in a variable?

Given the following code,
my $string = "foo";
my $regex = s/foo/bar/;
$string =~ $regex;
print $string, "\n";
I would have expected the output to be bar, however it is foo. Why is that the case, and how can I solve that problem?
Note that in my actual case, the regex is more complicated, and I actually want to store several of them in a hash (so I can write something like $string =~ $rules{$key}).
You're looking for substitution, not only the regex part so I guess compiled regex (qr//) is not what you're looking for,
use strict;
use warnings;
my $string = "foo";
my $regex = sub { $_[0] =~ s/foo/bar/ };
$regex->($string);
print $string, "\n";
Your statement
my $regex = s/foo/bar/
is equivalent to
my $regex = $_ =~ s/foo/bar/
s/// returns the number of substitutions made, or it returns false (specifically, the empty string). So $regex is now '' or 1 (it could be more if the /g modifier was in effect) and
$string =~ $regex
is doing 'foo' =~ // or 'foo' =~ /1/ depending on what $_ contained originally.
You can store a regex pattern in a variable but, in your example, the regex is just foo, and there is a lot more going on than just that pattern
The statement s/foo/bar/ is more complex than it seems -- it is a fully-fledged statement that applies a regex pattern to a target string and substitutes a replacement string if the pattern is found. In this case the target string is the default variable $_ and the replacement string is foo. You could think of it as a call to a subroutine
substitute($_, 'foo', 'bar')
and the regex pattern is only the second parameter
What you can do is store a regex pattern. The regex part of that substitution is foo, and you can say
my $pattern = qr/foo/;
s/$pattern/bar/;
But you really should explain the problem that you're trying to solve so that we can help you better
In the assignment, you need to tell Perl not to evaluate the regular expression but just to keep it. This is what qr is for.
But you can't do this with whole substitutions, which is why Сухой27 suggests using a subroutine.

Perl how do you assign a varanble to a regex match result

How do you create a $scalar from the result of a regex match?
Is there any way that once the script has matched the regex that it can be assigned to a variable so it can be used later on, outside of the block.
IE. If $regex_result = blah blah then do something.
I understand that I should make the regex as non-greedy as possible.
#!/usr/bin/perl
use strict;
use warnings;
# use diagnostics;
use Win32::OLE;
use Win32::OLE::Const 'Microsoft Outlook';
my #Qmail;
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\\s\*owner \#/";
my $outlook = Win32::OLE->new('Outlook.Application')
or warn "Failed Opening Outlook.";
my $namespace = $outlook->GetNamespace("MAPI");
my $folder = $namespace->Folders("test")->Folders("Inbox");
my $items = $folder->Items;
foreach my $msg ( $items->in ) {
if ( $msg->{Subject} =~ m/^(.*test alert) / ) {
my $name = $1;
print " processing Email for $name \n";
push #Qmail, $msg->{Body};
}
}
for(#Qmail) {
next unless /$regex|^\s*description/i;
print; # prints what i want ie lines that start with owner and description
}
print $sentence; # prints ^\\s\*offense \ # not lines that start with owner.
One way is to verify a match occurred.
use strict;
use warnings;
my $str = "hello what world";
my $match = 'no match found';
my $what = 'no what found';
if ( $str =~ /hello (what) world/ )
{
$match = $&;
$what = $1;
}
print '$match = ', $match, "\n";
print '$what = ', $what, "\n";
Use Below Perl variables to meet your requirements -
$` = The string preceding whatever was matched by the last pattern match, not counting patterns matched in nested blocks that have been exited already.
$& = Contains the string matched by the last pattern match
$' = The string following whatever was matched by the last pattern match, not counting patterns matched in nested blockes that have been exited already. For example:
$_ = 'abcdefghi';
/def/;
print "$`:$&:$'\n"; # prints abc:def:ghi
The match of a regex is stored in special variables (as well as some more readable variables if you specify the regex to do so and use the /p flag).
For the whole last match you're looking at the $MATCH (or $& for short) variable. This is covered in the manual page perlvar.
So say you wanted to store your last for loop's matches in an array called #matches, you could write the loop (and for some reason I think you meant it to be a foreach loop) as:
my #matches = ();
foreach (#Qmail) {
next unless /$regex|^\s*description/i;
push #matches_in_qmail $MATCH
print;
}
I think you have a problem in your code. I'm not sure of the original intention but looking at these lines:
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\s*owner #/";
I'll step through that as:
Assign $regexto the string ^\s*owner #.
Assign $sentence to value of running a match within $regex with the regular expression /^s*owner $/ (which won't match, if it did $sentence will be 1 but since it didn't it's false).
I think. I'm actually not exactly certain what that line will do or was meant to do.
I'm not quite sure what part of the match you want: the captures, or something else. I've written Regexp::Result which you can use to grab all the captures etc. on a successful match, and Regexp::Flow to grab multiple results (including success statuses). If you just want numbered captures, you can also use Data::Munge
You can do the following:
my $str ="hello world";
my ($hello, $world) = $str =~ /(hello)|(what)/;
say "[$_]" for($hello,$world);
As you see $hello contains "hello".
If you have older perl on your system like me, perl 5.18 or earlier, and you use $ $& $' like codequestor's answer above, it will slow down your program.
Instead, you can use your regex pattern with the modifier /p, and then check these 3 variables: ${^PREMATCH}, ${^MATCH}, and ${^POSTMATCH} for your matching results.

Perl regex return matches from substitution

I am trying to simultaneously remove and store (into an array) all matches of some regex in a string.
To return matches from a string into an array, you could use
my #matches = $string=~/$pattern/g;
I would like to use a similar pattern for a substitution regex. Of course, one option is:
my #matches = $string=~/$pattern/g;
$string =~ s/$pattern//g;
But is there really no way to do this without running the regex engine over the full string twice? Something like
my #matches = $string=~s/$pattern//g
Except that this will only return the number of subs, regardless of list context. I would also take, as a consolation prize, a method to use qr// where I could simply modify the quoted regex to to a sub regex, but I don't know if that's possible either (and that wouldn't preclude searching the same string twice).
Perhaps the following will be helpful:
use warnings;
use strict;
my $string = 'I thistle thing am thinking this Thistle a changed thirsty string.';
my $pattern = '\b[Tt]hi\S+\b';
my #matches;
$string =~ s/($pattern)/push #matches, $1; ''/ge;
print "New string: $string; Removed: #matches\n";
Output:
New string: I am a changed string.; Removed: thistle thing thinking this Thistle thirsty
Here is another way to do it without executing Perl code inside the substitution. The trick is that the s///g will return one capture at a time and undef if it does not match, thus quitting the while loop.
use strict;
use warnings;
use Data::Dump;
my $string = "The example Kenosis came up with was way better than mine.";
my #matches;
push #matches, $1 while $string =~ s/(\b\w{4}\b)\s//;
dd #matches, $string;
__END__
(
"came",
"with",
"than",
"The example Kenosis up was way better mine.",
)

Use variable as RegEx pattern

I'd like to use a variable as a RegEx pattern for matching filenames:
my $file = "test~";
my $regex1 = '^.+\Q~\E$';
my $regex2 = '^.+\\Q~\\E$';
print int($file =~ m/$regex1/)."\n";
print int($file =~ m/$regex2/)."\n";
print int($file =~ m/^.+\Q~\E$/)."\n";
The result (or on ideone.com):
0
0
1
Can anyone explain to me how I can use a variable as a RegEx pattern?
As documentation says:
$re = qr/$pattern/;
$string =~ /foo${re}bar/; # can be interpolated in other patterns
$string =~ $re; # or used standalone
$string =~ /$re/; # or this way
So, use the qr quote-like operator.
You cannot use \Q in a single-quoted / non-interpolated string. It must be seen by the lexer.
Anyway, tilde isn’t a meta-character.
Add use regex "debug" and you will see what is actually happening.