Issue with regex matching in Perl - regex

I am new to Perl and any help will be appreciated. I have 2 variables: $release and $env_type. I want to check if a string contains $release_$env_type, then do something. For example,
$release="beta";
$env_type="testing";
so string is beta_testing
Code snippet:
if ( $_ =~ /${release}_${env_type}/ ) {
#do Something
}
This if condition doesn't get resolved. Kindly let me know what is the correct syntax to make this check? I searched on Google but didn't get any good post..
Kindly help!
I have a file with contents:
admin_vh_c9_simv2_edg=/console,/consolehelp
idminternal_vh_c9_simv2_edg=/oim,/soa-infra
sso_vh_c9_simv2_edg=/oim,/soa-infra,/odsm
my $env_type = "edg";
my $release = "c9_simv2";
#Input file containing contexts
my $idmInternal = "./IdmContexts.conf";
if ( !-e $idmInternal ) {
die "Unable to find the file $idmInternal!\n";
}
open( MYFILE, $idmInternal );
while (<MYFILE>) {
chomp;
if ( $_ =~ /${release}_${env_type}/ ) {
push( #filtered, $_ );
}
}

Your code is fine. The problem is elsewhere. The following prints match.
my $release="beta";
my $env_type="testing";
$_ = "so string is beta_testing";
if ( $_ =~ /${release}_${env_type}/ ) {
print "match\n";
}
Note: /\Q${release}_${env_type}/ would be better. It'll make sure that special characters in the interpolated variables match themselves.
Most likely problem: You read the value of $release and/or $env_type from a file, and forgot to chomp the trailing newline.

If you are using $_ then this will work.
if (m/${release}_${env_type}/)
{
# Do something
}
The m// match operator binds automatically to $_. There is no need to bind it explicitly.

To really tell what is going on, you can inject code before the test. For example compile a regex first and then print it.
# compile the regex first
my $regex = qr/${release}_${env_type}/;
say qq{\$regex="$regex"};
# then print your scanned text
say qq{\$_="$_"};
if ( m/$regex/ ) {
# do something
}
If you're going to explicitly bind to a regex, then use variables:
my $string = $_;
if ( $string =~ m/$regex/ ) {
}
Otherwise, simply match the "context variable" ($_).
if ( m/$regex/ ) {
}
Also, USUW would help spot a few problems, proactively:
# Before everything else
use strict;
use warnings;

Related

Dynamically capture regular expression match in Perl

I'm trying to dynamically catch regex matching in Perl. I've known that eval will help me do this but I may be doing something wrong.
Code:
use strict;
use warnings;
my %testHash = (
'(\d+)\/(\d+)\/(\d+)' => '$1$2$3'
);
my $str = '1/12/2016';
foreach my $pattern (keys (%testHash)) {
my $value = $testHash{$pattern};
my $result;
eval {
local $_ = $str;
/$pattern/;
print "\$1 - $1\n";
print "\$2 - $2\n";
print "\$3 - $3\n";
eval { print "$value\n"; }
}
}
Is it also possible to store captured regex patterns in an array?
I believe what you really want is a dynamic version of the following:
say $str =~ s/(\d+)\/(\d+)\/(\d+)/$1$2$3/gr;
String::Substitution provides what we need to achieve that.
use String::Substitution qw( gsub_copy );
for my $pattern (keys(%testHash)) {
my $replacement = $testHash{$pattern};
say gsub_copy($str, $pattern, $replacement);
}
Note that $replacement can also be a callback. This permits far more complicated substitutions. For example, if you wanted to convert 1/12/2016 into 2016-01-12, you could use the following:
'(\d+)/(\d+)/(\d+)' => sub { sprintf "%d-%02d-%02d", #_[3,1,2] },
To answer your actual question:
use String::Substitution qw( interpolate_match_vars last_match_vars );
for my $pattern (keys(%testHash)) {
my $template = $testHash{$pattern};
$str =~ $pattern # Or /$pattern/ if you prefer
or die("No match!\n");
say interpolate_match_vars($template, last_match_vars());
}
I am not completely sure what you want to do here, but I don't think your program does what you think it does.
You are useing eval with a BLOCK of code. That's like a try block. If it dies inside of that eval block, it will catch that error. It will not run your string like it was code. You need a string eval for that.
Instead of explaining that, here's an alternative.
This program uses sprintf and numbers the parameters. The %1$s syntax in the pattern says _take the first argument (1$) and format it as a string (%s). You don't need to localize or assign to $_ to do a match. The =~ operator does that on other variables for you. I also use qr{} to create a quoted regular expression (essentially a variable containing a precompiled pattern) that I can use directly. Because of the {} as delimiter, I don't need to escape the slashes.
use strict;
use warnings;
use feature 'say'; # like print ..., "\n"
my %testHash = (
qr{(\d+)/(\d+)/(\d+)} => '%1$s.%2$s.%3$s',
qr{(\d+)/(\d+)/(\d+) nomatch} => '%1$s.%2$s.%3$s',
qr{(\d+)/(\d+)/(\d\d\d\d)} => '%3$4d-%2$02d-%1$02d',
qr{\d} => '%s', # no capture group
);
my $str = '1/12/2016';
foreach my $pattern ( keys %testHash ) {
my #captures = ( $str =~ $pattern );
say "pattern: $pattern";
if ($#+ == 0) {
say " no capture groups";
next;
}
unless (#captures) {
say " no match";
next;
}
# debug-output
for my $i ( 1 .. $#- ) {
say sprintf " \$%d - %s", $i, $captures[ $i - 1 ];
}
say sprintf $testHash{$pattern}, #captures;
}
I included four examples:
The first pattern is the one you had. It uses %1$s and so on as explained above.
The second one does not match. We check the number of elements in #captured by looking at it in scalar context.
The third one shows that you can also reorder the result, or even use the sprintf formatting.
The last one has no capture group. We check by looking at the index of the last element ($# as the sigil for arrays that usually have an # sigil) in #+, which holds the offsets of the ends of the last successful submatches in the currently active dynamic scope. The first element is the end of the overall match, so if this only has one element, we don't have capture groups.
The output for me is this:
pattern: (?^:(\d+)/(\d+)/(\d\d\d\d))
$1 - 1
$2 - 12
$3 - 2016
2016-12-01
pattern: (?^:(\d+)/(\d+)/(\d+) nomatch)
no match
pattern: (?^:\d)
no capture groups
pattern: (?^:(\d+)/(\d+)/(\d+))
$1 - 1
$2 - 12
$3 - 2016
1.12.2016
Note that the order in the output is mixed up. That's because hashes are not ordered in Perl, and if you iterate over the keys in a hash without sort the order is random.
Apologies! I realized both my question and sample code were both vague. But after reading your suggestions I came of with the following code.
I haven't optimized this code yet and there is a limit to the replacement.
foreach my $key (keys %testHash) {
if ( $str =~ $key ) {
my #matchArr = ($str =~ $key); # Capture all matches
# Search and replace (limited from $1 to $9)
for ( my $i = 0; $i < #matchArr; $i++ ) {
my $num = $i+1;
$testHash{$key} =~ s/\$$num/$matchArr[$i]/;
}
$result = $testHash{$key};
last;
}
}
print "$result\n";
Evaluing the regexp in list context returns the matches. so in your example:
use Data::Dumper; # so we can see the result
foreach my $pattern (keys (%testHash)) {
my #a = ($str =~/$pattern/);
print Dumper(\#a);
}
would do the job.
HTH
Georg
Is it also possible to store captured regex patterns in an array?
Of course it is possible to store captured substrings in an array:
#!/usr/bin/env perl
use strict;
use warnings;
my #patterns = map qr{$_}, qw{
(\d+)/(\d+)/(\d+)
};
my $str = '1/12/2016';
foreach my $pattern ( #patterns ) {
my #captured = ($str =~ $pattern)
or next;
print "'$_'\n" for #captured;
}
Output:
'1'
'12'
'2016'
I do not quite understand what you are trying to do with combinations of local, eval EXPR and eval BLOCK in your code and the purpose of the following hash:
my %testHash = (
'(\d+)\/(\d+)\/(\d+)' => '$1$2$3'
);
If you are trying to codify that this pattern should result in three captures, you can do that like this:
my #tests = (
{
pattern => qr{(\d+)/(\d+)/(\d+)},
ncaptures => 3,
}
);
my $str = '1/12/2016';
foreach my $test ( #tests ) {
my #captured = ($str =~ $test->{pattern})
or next;
unless (#captured == $test->{ncaptures}) {
# handle failure
}
}
See this answer to find out how you can automate counting the number of capture groups in a pattern. Using the technique in that answer:
#!/usr/bin/env perl
use strict;
use warnings;
use Test::More;
my #tests = map +{ pattern => qr{$_}, ncaptures => number_of_capturing_groups($_) }, qw(
(\d+)/(\d+)/(\d+)
);
my $str = '1/12/2016';
foreach my $test ( #tests ) {
my #captured = ($str =~ $test->{pattern});
ok #captured == $test->{ncaptures};
}
done_testing;
sub number_of_capturing_groups {
"" =~ /|$_[0]/;
return $#+;
}
Output:
ok 1
1..1

How can I match these function calls, and extract the nmae of the function and the first argument?

I am trying to parse an array of elements. Those who match a pattern like the following:
Jim("jjanson", Customer.SALES);
I want to create a hash table like Jim => "jjanson"
How can I do this?
I can not match the lines using:
if($line =~ /\s*[A-Za-z]*"(.*),Customer.*\s*/)
You're not matching either the '(' after the name, nor the ' ' after the comma, before "Customer.".
I can get 'jjanson"' using this expression:
/\s*[A-Za-z]\(*"(.*), Customer.*\s*/
But I assume you don't want jjanson", so we need to modify it like so. (I tend to include the negative character class when I'm looking for simply-delimited stuff. So, in this case I'll make the expression "[^"]*"
/\s*[A-Za-z]\(*"([^"]+)", Customer.*\s*/
Also, I try not to depend upon whitespace, presence or number, I'm going to replace the space with \s*. That you didn't notice that you skipped the whitespace is a good illustration of the need to say "ignore a bunch of whitespace".
/\s*[A-Za-z]\(*"([^"]+)",\s*Customer.*\s*/
Now it's only looking for the sequence ',' + 'Customer' in the significant characters. Functionally, the same, if more flexible.
But since you only do one capture, I can't see what you'd map to what. So I'll do my own mapping:
my %records;
while ( my $line = $source->()) { # simply feed for a source of lines.
my ( $first, $user, $tag )
= $line = m/\s*(\p{Alpha}+)\s*\(\s*"([^"]+)",\s*Customer\.(\S+?)\)\/
;
$records{ $user }
= { first => $first
, username => $user
, tag => $tag
};
}
This is much more than you would tend to need in a one-off, quick solution. But I like to store as much of my input as seems significant.
Note that Jim("jjanson", Customer.SALES); matches the syntax of a function call with two arguments. You can thus abuse string eval:
#!/usr/bin/env perl
use strict;
use warnings;
use YAML::XS;
my $info = extract_first_arg(q{ Jim("jjanson", Customer.SALES);} );
print Dump $info;
sub extract_first_arg {
my $call = shift;
my ($name) = ($call =~ m{ \A \s* (\w+) }x);
unless ($name) {
warn "Failed to find function name in '$call'";
return;
}
my $username = eval sprintf(q{
package My::DangerZone;
no strict;
local *{ %s } = sub { $_[0] };
%s
}, $name, $call);
return { $name => $username };
}
Output:
---
Jim: jjanson
Or, you can abuse autoloading:
our $AUTOLOAD;
print Dump eval 'no strict;' . q{ Jim("jjanson", Customer.SALES); };
sub AUTOLOAD {
my ($fn) = ($AUTOLOAD =~ /::(\w+)\z/);
return { $fn => $_[0] };
}
I would not necessarily recommend using these methods, especially on input that is not in your control, and in a situation where this script has access to sensitive facilities.
On the other hand, I have, in the right circumstances, utilized this kind of thing in transforming one given set of information into something that can be used elsewhere.
Try this:
$line = 'Jim("jjanson", Customer.SALES)';
my %hashStore = (); #Jim("jjanson"
if($line=~m/^\s*([^\(\)]*)\(\"([^\"]*)\"/g) { $hashStore{$1} = $2; }
use Data::Dumper;
print Dumper \%hashStore;
Output:
$VAR1 = {
'Jim' => 'jjanson'
};

Matching Regex in Perl

my program contains ascii.txt to match patterns from it.
my program is to implement sed command, just to try write perl code because I am studying perl.
#!/usr/bin/perl
# sed command implementation
use strict;
use warnings;
use subs qw(read_STDIN read_FILE usage);
use IO::File;
use constant {
SEARCH_PRINT => 0,
};
our $proj_name = $0;
main(#ARGV);
sub main
{
if(scalar #_ == 2) {
read_FILE #_;
}
else {
usage
}
}
sub read_FILE {
my ($sed_script, $file_name) = #_;
my $parsed_val = parse_sed_script($sed_script);
if( $parsed_val == SEARCH_PRINT ) {
search_print_lines($sed_script, $file_name);
}
}
sub parse_sed_script {
my $command = shift or return;
if($command =~ /^\/([^\/].)*\/$/) {
return SEARCH_PRINT;
}
}
sub search_print_lines {
my ($script, $file) = #_;
my $fh = IO::File->new($file, "r") or error("no file found $file");
while( $_ = $fh->getline ) {
print if $_ =~ $script
}
}
sub usage {
message("Usage: $proj_name sed-script [file]")
}
sub error
{
my $e = shift || 'unkown error';
print("$0: $e\n");
exit 0;
}
When I execute from the shell: sed.pl /Test/ ascii.txt
I found that print if $_ =~ $script, doesn't execute because of the REGEX is stored in scalar variable
the ascii.txt contains.
Test 1
REGEX TEST
When I use print $script in search_print_lines subroutine it prints the regex sent by the user
When you pass something in on the command line and use it in your script, the entire literal text is used. So if you pass in /Test/, it will see those slashes as literals, so the "real" regular expression it's looking at is something like \/Test\/ (escaping the slashes, because now it's looking for them. Try passing in the regex without the // surrounding it.
If your goal is to allow the // to show that it's a regular expression, I would remove them when the program starts.
One more edit: If you want to be able to pass in flags, you'd need to eval the input somehow.
$script = '/Test/i';
eval { "\$regex = $script" };
and then
"REGEX TEST" =~ $regex
should return true. Doing an eval like this is highly insecure, though.
edit: what happens in eval is that whatever's in the block is executed. So what happens in the eval above is that you're dynamically creating a regular expression and setting it to a variable. That allows you to use regular expression flags like i without having to do any special parsing of the command-line input. When the eval is executed, it will be as if you had typed in $regex = /Test/i. Then you can compare your text to $regex and it will work. I thought about this because your example would not work unless you had the i flag set to make the comparison case-insensitive.
You didn't remove the slashes from $sed_script variable. After I modified your read_FILE function, it started to work:
sub read_FILE {
my ($sed_script, $file_name) = #_;
my $parsed_val = parse_sed_script($sed_script);
if( $parsed_val == SEARCH_PRINT ) {
$sed_script =~ s/^\/(.*)\/$/$1/;
#you can also parse the regexp
#$sed_script = qr/$sed_script/;
search_print_lines($sed_script, $file_name);
}
}

using perl's File::Xcopy to copy a directory ignoring files using regex

Perl's xcopy has the method fn_pat to specify a regular expression for the pattern matching and I want to use this to recursively copy a directory ignoring all files/folders that any of these strings:
.svn
build
test.blah
I am stumbling with the syntax to do that, I have looked over many perl regular expression guides but for the life of me I just can not get the hang of it. I appreciate any help.
Thanks.
... update ...
I found a perl regex that seems to be working, just not with xcopy's fn_pat. Not sure if this is a bug with xcopy or if my expression is not correct, however my tests show its ok.
$exp = '^(?!.*(\.svn|build|test\.blah)).*$';
if( '/dev/bite/me/.svn' =~ $exp ){ print "A\n"; }
if( '/dev/bite/me/.svn/crumbs' =~ $exp ){ print "B\n"; }
if( '/dev/build/blah.ext' =~ $exp ){ print "C\n"; }
if( '/dev/crap/test.blah/bites' =~ $exp ){ print "D\n"; }
if( '/dev/whats/up.h' =~ $exp ){ print "E\n"; }
only E prints as I was hoping. I'm curious to know if this is correct or not as well as to any ideas why its not working with xcopy.
Here is where File::Xcopy calls File::Find::finddepth:
sub find_files {
my $self = shift;
my $cls = ref($self)||$self;
my ($dir, $re) = #_;
my $ar = bless [], $cls;
my $sub = sub {
(/$re/)
&& (push #{$ar}, {file=>$_, pdir=>$File::Find::dir,
path=>$File::Find::name});
};
finddepth($sub, $dir);
return $ar;
}
Here $re is your regexp.
According to the File::Find docs, $_ will be set to just the leaf name of the file being visited unless the no_chdir option used.
The only way I can see to get the no_chdir option passed to finddepth is to monkey-patch File::Xcopy::finddepth:
use File::Xcopy;
*{"File::Xcopy::finddepth"} = sub {
my ($sub, $dir) = #_;
File::Find::finddepth({ no_chdir => 1, wanted => $sub}, $dir);
};

Perl regexp use of uninitialized pattern patch

My script loads some stuff from some files in some arrays, you enter a text from the keyboard, the script searches the relevant part of the text in those arrays, if it finds it, it does something, if not, well, another thing, at least in theory.
I get the following errors:
Use of uninitialized value in pattern match (m//) at emo_full_dynamic.pl line 120, <STDIN> chunk 2.
Modification of a read-only value attempted at emo_full_dynamic.pl line 121, <STDIN> chunk 2.
line 120 = $plm3 =~ /arr_(\w+.txt)/;
My problem, I think, is at $plm3 =~ /arr_(\w+.txt)/;. I used it so that I can store the name of an array in $1.
Here's my code:
#!/usr/bin/perl
use warnings;
$idx = 0;
$oldsep = $/;
opendir(DIR, 'c:/downloads/text_files/arrs/');
#files = readdir(DIR);
while ($idx <= $#files )
{
$value = $files[$idx];
if ( $value !~ m/^arr/i)
{
splice #files, $idx, 1;
}
else
{
$idx++;
}
}
foreach $plm (#files)
{
if($plm =~ m/txt$/)
{
open(ARR, "C:/downloads/text_files/arrs/$plm") or die $!;
while(<ARR>)
{ {
chomp($_);
$plm =~ m/arr_(\w+).txt/;
push(#{$1}, $_);
}
close ARR;
}
}
$plm = 0;
$idx = 0;
$stare = <STDIN>;
chomp($stare);
while($stare)
{
foreach $plm2 (#files)
{
if($plm2 =~ m/txt$/)
{
$plm2 =~ m/arr_(\w+).txt/;
if(grep $stare =~ m/$_/i, #{$1})
{
$flag = 1;
}
else
{
$flag = 0;
}
}
}
if($flag == 1)
{
$/ = "%\n";
$plm3 =~ /arr_(\w+.txt)/;
open SUPARARE, "C:/downloads/text_files/replies/$1" or die $!;
etc etc....
First of all, it's always a good idea to use strict pragma -- unless you have a valid reason to avoid it --.
Second, I don't see $plm3 initialized anywhere in your code. You have probably forgot to initialize it.
I think you are assigning something to variable $1 on line 121
Apparently there are some copy/paste issues which negates my initial answer.
Other mistakes, great and small:
You don't use strict. (fatal flaw)
Your opendir is used once, then never closed.
You use global filehandles, instead of lexical (e.g. open my $fh, ...)
Using a complext loop + splice instead of grep (#files=grep /^arr/i, #files)
Using chomp($_) when chomp per default chomps the $_ variable
I don't even know what this line means:
if(grep $stare =~ m/$_/i, #{$1}) {
You seem to be using a pattern match, where $_ is the pattern (which in this case is.. what? Nothing? Anything?), whose return value is used as a grep pattern for an array reference, that may or may not be initialized. A very horrible statement. If it indeed works as intended, the readability is very low.
Redeclaring $/ seems like a frivolous thing to do in this context, but I can't really tell, as the script ends there.