Turn Perl variable into regex - regex

Here I have
my %id_to_name = (
51803 => 'Jim bob and associates',
);
while (my ($key, $value) = each %id_to_name) {
$regex = qr/^.*?$value.*?$/;
$value = $regex;
I basically want to match $value to:
a bunch of random text blah blah 'Jim bob and associates' blah blah.
I can't seem to get a match because of all the text before and after.
I am trying qr// but it does not seem to work. Any suggestions?

Looks like you don't need regex for that... The index function will let you check if a string contains a substring.
print $value if index($input, $value) >= 0;
FYI, a regex solution would be:
print $value if $input =~ m/\Q$value\E/;
You can use it if you need modifiers (like i for a case insensitive match). \Q...\E is like quotemeta.

On Perl 5.18.2, this works:
my %id_to_name = (
51803 => 'Jim bob and associates',
);
while (my ($key, $value) = each %id_to_name) {
$regex = qr/^.*?$value.*?$/;
print "$regex\n";
$test="a bunch of random text blah blah 'Jim bob and associates' blah blah.";
print "match" if $test =~/$value/;
}
Prints:
(?^:^.*?Jim bob and associates.*?$)
match
As stated in comments, the leading and trailing .*? are pointless.

Related

RegEx for capping multiple groups between two words

Consider the following strings:
targethelloluketestlukeluketestluktestingendtarget
sourcehelloluketestlukeluketestluktestingendsource
I want to replace all instances of luke with something else, but only if it's between target...endtarget, not when it's between source...nonsource. The result should be that all three instances of luke in the top string are replaced with whatever I want.
I got this far, but this will only cap one instance of luke. How do I replace all of them?
(?<=target)(?:.*?(luke).*?)(?=target)
SOLUTION
Thanks to the help of this great community, I arrived at the following solution. I find RegEx really convoluted when it comes to this, but in PHP the following works great and is a lot easier to understand:
function replaceBetweenTags($starttag, $endtag, $replace, $with, $text) {
$starttag = escapeStringToRegEx($starttag);
$endtag = escapeStringToRegEx($endtag);
$text = preg_replace_callback(
'/' . $starttag . '.*?' . $endtag . '/',
function ($matches) use ($replace, $with) {
return str_replace($replace, $with, $matches[0]);
},
$text
);
return $text;
}
function escapeStringToRegEx($string)
{
$string = str_replace('\\', '\\\\', $string);
$string = str_replace('.', '\.', $string);
$string = str_replace('^', '\^', $string);
$string = str_replace('$', '\$', $string);
$string = str_replace('*', '\*.', $string);
$string = str_replace('+', '\+', $string);
$string = str_replace('-', '\-', $string);
$string = str_replace('?', '\?', $string);
$string = str_replace('(', '\(', $string);
$string = str_replace(')', '\)', $string);
$string = str_replace('[', '\[', $string);
$string = str_replace(']', '\]', $string);
$string = str_replace('{', '\{', $string);
$string = str_replace('}', '\}', $string);
$string = str_replace('|', '\|', $string);
$string = str_replace(' ', '\s', $string);
$string = str_replace('/', '\/', $string);
return $string;
}
I'm aware of the fact that the escapeStringToRegEx is really quick and dirty, and maybe not even entirely correct, but it's a good starting point to work from.
Here is a solution using a PHP regex callback function:
$input = "luke is here and targethelloluketestlukeluketestluktestingendtarget and luke is also here";
$output = preg_replace_callback(
"/target.*?endtarget/",
function ($matches) {
return str_replace("luke", "peter", $matches[0]);
},
$input
);
echo $output;
This prints:
luke is here and targethellopetertestpeterpetertestluktestingendtarget and luke is also here
Note that occurrences of luke have been replaced with peter only inside the target ... endtarget bounds.
You can use
(?:\G(?!\A)|target)(?:(?!luke|(?:end)?target).)*\Kluke(?=(?:(?!(?:end)?target).)*endtarget)
See the regex demo. If the string has line breaks, you need to use the s flag, or prepend the pattern with (?s) inline PCRE_DOTALL modifier.
Regex details:
(?:\G(?!\A)|target) - either the end of the previous successful match or target string
(?:(?!luke|(?:end)?target).)* - any one char, zero or more occurrences but as many as possible that is not a starting point for the endtarget, target or `luke char sequence
\K - a match reset operator that discards the text matched so far
luke - string to replace
(?=(?:(?!(?:end)?target).)*endtarget) - a positive lookahead that matches a location that must be immediately followed with
(?:(?!(?:end)?target).)* - any one char, zero or more occurrences but as many as possible that is not a starting point for the endtarget or target char sequence
endtarget - an endtarget string.
If you can use preg_replace_callback, use it:
preg_replace_callback('/target.*?endtarget/s', function ($m) {
return str_replace("luke", "<SOME>", $m[0]);
}, $input)
Or, unrolling the loop:
preg_replace_callback('/target[^e]*(?:e(?!ndtarget)[^e]*)*endtarget/', function ($m) {
return str_replace("luke", "<SOME>", $m[0]);
}, $input)

Dynamically capture regular expression match in Perl

I'm trying to dynamically catch regex matching in Perl. I've known that eval will help me do this but I may be doing something wrong.
Code:
use strict;
use warnings;
my %testHash = (
'(\d+)\/(\d+)\/(\d+)' => '$1$2$3'
);
my $str = '1/12/2016';
foreach my $pattern (keys (%testHash)) {
my $value = $testHash{$pattern};
my $result;
eval {
local $_ = $str;
/$pattern/;
print "\$1 - $1\n";
print "\$2 - $2\n";
print "\$3 - $3\n";
eval { print "$value\n"; }
}
}
Is it also possible to store captured regex patterns in an array?
I believe what you really want is a dynamic version of the following:
say $str =~ s/(\d+)\/(\d+)\/(\d+)/$1$2$3/gr;
String::Substitution provides what we need to achieve that.
use String::Substitution qw( gsub_copy );
for my $pattern (keys(%testHash)) {
my $replacement = $testHash{$pattern};
say gsub_copy($str, $pattern, $replacement);
}
Note that $replacement can also be a callback. This permits far more complicated substitutions. For example, if you wanted to convert 1/12/2016 into 2016-01-12, you could use the following:
'(\d+)/(\d+)/(\d+)' => sub { sprintf "%d-%02d-%02d", #_[3,1,2] },
To answer your actual question:
use String::Substitution qw( interpolate_match_vars last_match_vars );
for my $pattern (keys(%testHash)) {
my $template = $testHash{$pattern};
$str =~ $pattern # Or /$pattern/ if you prefer
or die("No match!\n");
say interpolate_match_vars($template, last_match_vars());
}
I am not completely sure what you want to do here, but I don't think your program does what you think it does.
You are useing eval with a BLOCK of code. That's like a try block. If it dies inside of that eval block, it will catch that error. It will not run your string like it was code. You need a string eval for that.
Instead of explaining that, here's an alternative.
This program uses sprintf and numbers the parameters. The %1$s syntax in the pattern says _take the first argument (1$) and format it as a string (%s). You don't need to localize or assign to $_ to do a match. The =~ operator does that on other variables for you. I also use qr{} to create a quoted regular expression (essentially a variable containing a precompiled pattern) that I can use directly. Because of the {} as delimiter, I don't need to escape the slashes.
use strict;
use warnings;
use feature 'say'; # like print ..., "\n"
my %testHash = (
qr{(\d+)/(\d+)/(\d+)} => '%1$s.%2$s.%3$s',
qr{(\d+)/(\d+)/(\d+) nomatch} => '%1$s.%2$s.%3$s',
qr{(\d+)/(\d+)/(\d\d\d\d)} => '%3$4d-%2$02d-%1$02d',
qr{\d} => '%s', # no capture group
);
my $str = '1/12/2016';
foreach my $pattern ( keys %testHash ) {
my #captures = ( $str =~ $pattern );
say "pattern: $pattern";
if ($#+ == 0) {
say " no capture groups";
next;
}
unless (#captures) {
say " no match";
next;
}
# debug-output
for my $i ( 1 .. $#- ) {
say sprintf " \$%d - %s", $i, $captures[ $i - 1 ];
}
say sprintf $testHash{$pattern}, #captures;
}
I included four examples:
The first pattern is the one you had. It uses %1$s and so on as explained above.
The second one does not match. We check the number of elements in #captured by looking at it in scalar context.
The third one shows that you can also reorder the result, or even use the sprintf formatting.
The last one has no capture group. We check by looking at the index of the last element ($# as the sigil for arrays that usually have an # sigil) in #+, which holds the offsets of the ends of the last successful submatches in the currently active dynamic scope. The first element is the end of the overall match, so if this only has one element, we don't have capture groups.
The output for me is this:
pattern: (?^:(\d+)/(\d+)/(\d\d\d\d))
$1 - 1
$2 - 12
$3 - 2016
2016-12-01
pattern: (?^:(\d+)/(\d+)/(\d+) nomatch)
no match
pattern: (?^:\d)
no capture groups
pattern: (?^:(\d+)/(\d+)/(\d+))
$1 - 1
$2 - 12
$3 - 2016
1.12.2016
Note that the order in the output is mixed up. That's because hashes are not ordered in Perl, and if you iterate over the keys in a hash without sort the order is random.
Apologies! I realized both my question and sample code were both vague. But after reading your suggestions I came of with the following code.
I haven't optimized this code yet and there is a limit to the replacement.
foreach my $key (keys %testHash) {
if ( $str =~ $key ) {
my #matchArr = ($str =~ $key); # Capture all matches
# Search and replace (limited from $1 to $9)
for ( my $i = 0; $i < #matchArr; $i++ ) {
my $num = $i+1;
$testHash{$key} =~ s/\$$num/$matchArr[$i]/;
}
$result = $testHash{$key};
last;
}
}
print "$result\n";
Evaluing the regexp in list context returns the matches. so in your example:
use Data::Dumper; # so we can see the result
foreach my $pattern (keys (%testHash)) {
my #a = ($str =~/$pattern/);
print Dumper(\#a);
}
would do the job.
HTH
Georg
Is it also possible to store captured regex patterns in an array?
Of course it is possible to store captured substrings in an array:
#!/usr/bin/env perl
use strict;
use warnings;
my #patterns = map qr{$_}, qw{
(\d+)/(\d+)/(\d+)
};
my $str = '1/12/2016';
foreach my $pattern ( #patterns ) {
my #captured = ($str =~ $pattern)
or next;
print "'$_'\n" for #captured;
}
Output:
'1'
'12'
'2016'
I do not quite understand what you are trying to do with combinations of local, eval EXPR and eval BLOCK in your code and the purpose of the following hash:
my %testHash = (
'(\d+)\/(\d+)\/(\d+)' => '$1$2$3'
);
If you are trying to codify that this pattern should result in three captures, you can do that like this:
my #tests = (
{
pattern => qr{(\d+)/(\d+)/(\d+)},
ncaptures => 3,
}
);
my $str = '1/12/2016';
foreach my $test ( #tests ) {
my #captured = ($str =~ $test->{pattern})
or next;
unless (#captured == $test->{ncaptures}) {
# handle failure
}
}
See this answer to find out how you can automate counting the number of capture groups in a pattern. Using the technique in that answer:
#!/usr/bin/env perl
use strict;
use warnings;
use Test::More;
my #tests = map +{ pattern => qr{$_}, ncaptures => number_of_capturing_groups($_) }, qw(
(\d+)/(\d+)/(\d+)
);
my $str = '1/12/2016';
foreach my $test ( #tests ) {
my #captured = ($str =~ $test->{pattern});
ok #captured == $test->{ncaptures};
}
done_testing;
sub number_of_capturing_groups {
"" =~ /|$_[0]/;
return $#+;
}
Output:
ok 1
1..1

Perl Grepping from an Array

I need to grep a value from an array.
For example i have a values
#a=('branches/Soft/a.txt', 'branches/Soft/h.cpp', branches/Main/utils.pl');
#Array = ('branches/Soft/a.txt', 'branches/Soft/h.cpp', branches/Main/utils.pl','branches/Soft/B2/c.tct', 'branches/Docs/A1/b.txt');
Now, i need to loop #a and find each value matches to #Array. For Example
It works for me with grep. You'd do it the exact same way as in the More::ListUtils example below, except for having grep instead of any. You can also shorten it to
my $got_it = grep { /$str/ } #paths;
my #matches = grep { /$str/ } #paths;
This by default tests with /m against $_, each element of the list in turn. The $str and #paths are the same as below.
You can use the module More::ListUtils as well. Its function any returns true/false depending on whether the condition in the block is satisfied for any element in the list, ie. whether there was a match in this case.
use warnings;
use strict;
use Most::ListUtils;
my $str = 'branches/Soft/a.txt';
my #paths = ('branches/Soft/a.txt', 'branches/Soft/b.txt',
'branches/Docs/A1/b.txt', 'branches/Soft/B2/c.tct');
my $got_match = any { $_ =~ m/$str/ } #paths;
With the list above, containing the $str, the $got_match is 1.
Or you can roll it by hand and catch the match as well
foreach my $p (#paths) {
print "Found it: $1\n" if $p =~ m/($str)/;
}
This does print out the match.
Note that the strings you show in your example do not contain the one to match. I added it to my list for a test. Without it in the list no match is found in either of the examples.
To test for more than one string, with the added sample
my #strings = ('branches/Soft/a.txt', 'branches/Soft/h.cpp', 'branches/Main/utils.pl');
my #paths = ('branches/Soft/a.txt', 'branches/Soft/h.cpp', 'branches/Main/utils.pl',
'branches/Soft/B2/c.tct', 'branches/Docs/A1/b.txt');
foreach my $str (#strings) {
foreach my $p (#paths) {
print "Found it: $1\n" if $p =~ m/($str)/;
}
# Or, instead of the foreach loop above use
# my $match = grep { /$str/ } #paths;
# print "Matched for $str\n" if $match;
}
This prints
Found it: branches/Soft/a.txt
Found it: branches/Soft/h.cpp
Found it: branches/Main/utils.pl
When the lines with grep are uncommented and foreach ones commented out I get the corresponding prints for the same strings.
The slashes dot in $a will pose a problem so you either have to escape them it when doing regex match or use a simple eq to find the matches:
Regex match with $a escaped:
my #matches = grep { /\Q$a\E/ } #array;
Simple comparison with "equals":
my #matches = grep { $_ eq $a } #array;
With your sample data both will give an empty array #matches because there is no match.
This Solved My Question. Thanks to all especially #zdim for the valuable time and support
my #SVNFILES = ('branches/Soft/a.txt', 'branches/Soft/b.txt');
my #paths = ('branches/Soft/a.txt', 'branches/Soft/b.txt',
'branches/Docs/A1/b.txt', 'branches/Soft/B2/c.tct');
foreach my $svn (#SVNFILES)
{
chomp ($svn);
my $m = grep { /$svn/ } (#paths);
if ( $m eq '0' ) {
print "Files Mismatch\n";
exit 1;
}
}
You should escape characters like '/' and '.' in any regex when you need it as a character.
Likewise :
$a="branches\/Soft\/a\.txt"
Retry whatever you did with either grep or perl with that. If it still doesn't work, tell us precisely what you tried.

How to replace a variable with another variable in PERL?

I am trying to replace all words from a text except some that I have in an array. Here's my code:
my $text = "This is a text!And that's some-more text,text!";
while ($text =~ m/([\w']+)/g) {
next if $1 ~~ #ignore_words;
my $search = $1;
my $replace = uc $search;
$text =~ s/$search/$replace/e;
}
However, the program doesn't work. Basically I am trying to make all words uppercase but skip the ones in #ignore_words. I know it's a problem with the variables being used in the regular expression, but I can't figure the problem out.
#!/usr/bin/perl
my $text = "This is a text!And that's some-more text,text!";
my #ignorearr=qw(is some);
my %h1=map{$_ => 1}#ignorearr;
$text=~s/([\w']+)/($h1{$1})?$1:uc($1)/ge;
print $text;
On running this,
THIS is A TEXT!AND THAT'S some-MORE TEXT,TEXT!
You can figure the problem out of your code if instead of applying an expression to the same control variable of a while loop, just let s/../../eg do it globally for you:
my $text = "This is a text!And that's some-more text,text!";
my #ignore_words = qw{ is more };
$text =~ s/([\w']+)/$1 ~~ #ignore_words ? $1 : uc($1)/eg;
print $text;
And on running:
THIS is A TEXT!AND THAT'S SOME-more TEXT,TEXT!

Regex and the characters case

Okay, I got a rather simple one (at least seems simple). I have a multi lined string and I am just playing around with replacing different words with something else. Let me show you...
#!/usr/bin/perl -w
use strict;
$_ = "That is my coat.\nCoats are very expensive.";
s/coat/Hat/igm;
print;
The output would be
That is my Hat
Hats are very expensive...
The "hat" on the first line shouldn't be capitalized. Are there any tricks that can make the casing compliant with how english is written? Thanks :)
see how-to-replace-string-and-preserve-its-uppercase-lowercase
For more detail go to How do I substitute case insensitively on the LHS while preserving case on the RHS?
You can use the e modifier to s/// to do the trick:
s/(coat)/ucfirst($1) eq $1 ? 'Hat' : 'hat'/igme;
For one, you should use \b (word boundary) to match only the whole word. For example s/hat/coat/ would change That to Tcoat without leading \b. Now for your question. With the flag /e you can use Perl code in the replacement part of the regex. So you can write a Perl function that checks the case of the match and then set the case of the replacement properly:
my $s = "That is my coat.\nCoats are very expensive.";
$s =~ s/(\bcoat)/&same_case($1, "hat")/igme;
print $s, "\n";
sub same_case {
my ($match, $replacement) = #_;
# if match starts with uppercase character, apply ucfirst to replacement
if($match =~ /^[A-Z]/) {
return ucfirst($replacement);
}
else {
return $replacement;
}
}
Prints:
That is my hat.
Hats are very expensive.
This may solve your problem:
#!/usr/bin/perl -w
use strict;
sub smartSubstitute {
my $target = shift;
my $pattern = shift;
my $replacement = shift;
$pattern = ucfirst $pattern;
$replacement = ucfirst $replacement;
$target =~ s/$pattern/$replacement/gm;
$pattern = lcfirst $pattern;
$replacement = lcfirst $replacement;
$target =~ s/$pattern/$replacement/gm;
return $target;
}
my $x = "That is my coat.\nCoats are very expansive.";
my $y = smartSubstitute($x, "coat", "Hat");
print $y, "\n";