Trying to match two variables that both contain special characters in Perl - regex

So here is a weird problem. I have a ton of scripts that are executed by "master" scripts and I need to verify that what is in the "master" is valid. Problem is, these scripts contain special characters and I need to match them to make sure the "Master" is referencing the correct scripts.
An example of one file might be
Example's of file names (20160517) [test].sh
Here is what my code looks like. #MasterScipt is an array where each element is a filename of what I expect the sub-scripts to be named.
opendir( DURR, $FileLocation ); # I'm looking in a directory where the subscripts reside
foreach ( readdir(DURR) ) {
for ( my $j = 0; $j != $MasterScriptlength; $j++ ) {
$MasterScipt[$j] =~ s/\r//g;
print "DARE TO COMPARE\n";
print "$MasterScipt[$j]\n";
print "$_\n";
#I added the \Q to quotemeta, but I think the issue is with $_
#I've tried variations like
#if(quotemeta($_) =~/\Q$MasterScipt[$j]/){
#To no avail, I also tried using eq operator and no luck :(
if ( $_ =~ /\Q$MasterScipt[$j]/ ) {
print "WE GOOD VINCENT\n";
}
}
}
closedir(DURR);
No matter what I seem to do, my output will always look like this
DARE TO COMPARE
Example's of file names (20160517) [test].sh
Example's of file names (20160517) [test].sh

OK, I was staring at this thing for too long, and I think writing this question out helped me answer it.
Not only did I need to add \Q in my regex, but there was a whitespace character. I did a chomp to both $_ and $MasterScipt[$j] and now its working.

I suggest that your code should look more like this. The main changes are that I have used a named variable $file for the values returned by readdir, and I iterate over the contents of the array #MasterScipt instead of its indexes because $j is never used in your own code except to access the array elements
s/\s+\z// for #MasterScipt;
opendir DURR, $FileLocation or die qq{Unable to open directory "$FileLocation": $!};
while ( my $file = readdir DURR ) {
for my $pattern ( #MasterScipt ) {
print "DARE TO COMPARE\n";
print "$pattern\n";
print "$file\n";
if ( $file =~ /\Q$pattern/ ) {
print "WE GOOD VINCENT\n";
}
}
}
closedir(DURR);
But this is a simple grep operation and it can be written as such. This alternative builds a single regular expression that will match any of the items in #MasterScipt and uses grep to build a list of all values returned by readdir that match it
s/\s+\z// for #MasterScipt;
my #matches = do {
my $re = join '|', map quotemeta, #MasterScipt;
opendir my $dh, $FileLocation or die qq{Unable to open directory "$FileLocation": $!};
grep { /$re/ } readdir $dh;
};

Related

how to include grep in a regex in perl

so i'm currently stuck on this problem:
1. i declare a constant list, say LIST
2. i want to read through a file, which i do so line by line in a while loop, and if the line has a keyword from LIST, i print the line, or so something with it.
this is what i have currently:
use constant LIST => ('keyword1', 'keyword2', 'keyword3');
sub main{
unless(open(MYFILE, $file_read)){die "Error\n"};
while(<MYFILE>){
my $line = $_;
chomp($line);
if($line =~ m//){#here is where i'm stuck, i want is if $line has either of the keywords
print $line;
}
}
}
What should i do in that if statement to match what i want the program to do? and can i do so without having the $line variable and simply using $_? i only used $line because i thought grep would automatically place the constants in LIST into $_.
Thanks!
The easiest way is to define a quoted regular expression as your constant instead of a list:
use strict;
use warnings;
use autodie; # Will kill program on bad opens, closes, and writes
use feature qw(say); # Better than "print" in most situations
use constant {
LIST => qr/keyword1|keyword2|keyword3/, # Now a regular expression.
FILE_READ => 'file.txt', # You're defining constants, make this one too.
};
open my $read_fh, "<", FILE_READ; # Use scalars for file handles
# This isn't Java. You don't have to define "main" subroutine
while ( my $line = <$read_fh> ) {
chomp $line;
if ( $line =~ LIST ) { #Now I can use the constant as a regex
say $line;
}
}
close $read_fh;
By the way, if you don't use autodie, the standard way of opening a file and failing if it doesn't open is to use the or syntax:
open my $fh, "<", $file_name or die qq(Can't open file "$file_name": $!);
If you have to use a list as a constant, then you can use join to make the regular expression:
use constant LIST => qw( keyword1 keyword2 keyword3 );
...
my $regex = join "|", map LIST;
while ( my $line = <$file_fh> ) {
chomp $line;
if ( $line =~ /$regex/ ) {
say $line;
}
}
The join takes a list (in this case, a constant list), and separates each member by the string or character you give it. I hope your keywords contain no special regular expression characters. Otherwise, you need to quote those special characters.
Addendum
my $regex = join '|' => map +quotemeta, LIST; – Zaid
Thanks Zaid. I didn't know about the quotemeta command before. I had been trying various things with \Q and \E, but it started getting too complex.
Another way to do what Zaid did:
my #list = map { quotemeta } LIST;
my $regex = join "|", #list;
The map is a bit difficult for beginners to understand. map takes each element in LIST and runs the quotemeta command against it. This returns list which I assign to #list.
Imagine:
use constant LIST => qw( periods.are special.characters in.regular.expressions );
When I run:
my #list = map { quotemeta } LIST;
This returns the list:
my #list = ( "periods\.are", "special\.characters", "in\.regular\.expressions" );
Now, the periods are literal periods instead of special characters in the regular expression. When I run:
my $regex = join "|", #list;
I get:
$regex = "periods\.are|special\.characters|in\.regular\.expressions";
And that's a valid regular expression.

Efficiently matching a set of filenames with regex in Perl

I'm using Perl to capture the names of files in some specified folders that have certain words in them. The keywords in those filenames are "offers" or "cleared" and "regup" or "regdn". In other words, one of "offers" or "cleared" AND one of "regup" or "regdn" must appear in the filename to be a positive match. The two words could be in any order and there are characters/words that will appear in front of and behind them. A sample matching filename is:
2day_Agg_AS_Offers_REGDN-09-JUN-11.csv
I have a regex that successfully captures each of the matching filenames as a full path, which is what I wanted, but it seems inelegant and inefficient. Attempts at slightly better code have all failed.
Working approach:
# Get the folder names
my #folders = grep /^\d{2}-/, readdir DIR;
foreach my $folder ( #folders ) {
# glob the contents of the folder (to get the file names)
my #contents = <$folder/*>;
# For each filename in the list, if it matches, print it
foreach my $item ( #contents ) {
if ($item =~ /^$folder(?=.*(offers|cleared))(?=.*(regup|regdn)).*csv$/i){
print "$item\n";
}
}
}
Attempt at something shorter/cleaner:
foreach my $folder ( #folders ) {
# glob the contents of the folder (to get the file names)
my #contents = <$folder/*>;
# Seems to determine that there are four matches in each folder
# but then prints the first matching filename four times
my $single = join("\n", #contents);
for ($single =~ /^$folder(?=.*(offers|cleared))(?=.*(regup|regdn)).*csv$/im) {
print "$&\n";#"Matched: |$`<$&>$'|\n\n";
}
}
I've tried other formatting with the regex, using other options (/img, /ig, etc.), and sending the output of the regex to an array, but nothing has worked properly. I'm not great with Perl, so I'm positive I'm missing some big opportunities to make this whole procedure more efficient. Thanks!
Collect only these file names which contain offers or cleared AND regup or regdn
my #contents = grep { /offers|cleared/i && /regup|regdn/i } <$folder/*>;
Why would it be shorter or cleaner to use join instead of a loop? I'd say it makes it more complicated. What you seem to be doing is just matching loosely based on the conditions
name contains offers or cleared
name contains regup or regdn
name ends with .csv.
So why not just do this:
if ( $file =~ /offers|cleared/i and
$file =~ /regup|regdn/i and
$file =~ /csv$/i)
You might be interested in something like this:
use strict;
use warnings;
use File::Find;
my $dir = "/some/dir";
my #files;
find(sub { /offers|cleared/i &&
/regup|regdn/i &&
/csv$/i && push #files, $File::Find::name }, $dir);
Which would completely exclude the use of readdir and other loops. File::Find is recursive.

perl hash array reading from files

I'm trying to read multiple files that have the same format and want to make some statistics based on regex.
i.e I want to count similar items that are within the []
NC_013618 NC_013633 ([T(nad6 trnE ,cob trnT ,)])
C_013481 NC_013479 ([T(trnP ,rrnS trnF trnV rrnL nad1 trnI ,)])
NC_013485 NC_003159 ([T(trnC ,trnY ,)])
NC_013554 NC_013254 ([T(trnR ,trnN ,)])
NC_013607 NC_013618 ([T(nad6 trnE ,cob trnT ,)])
the problem is that i'm not getting right values, below is my code:
use strict;
use warnings;
my %data;
#FILES = glob("../mitos-crex/*.out");
foreach my $file (#FILES) {
local $/ = undef;
open my $fh, '<', $file;
$data{$file} = <$fh>;
}
my #t;
my $c = 0;
foreach my $line (keys %data) {
foreach my $l ($data{$line}) {
print $l."\n";
($t[$c]) = $l =~ m/(\[.*\])/;
$c++;
}
}
#the problem is here the counter is not giving the right value
print $c;
my %counts;
$counts{$_}++ for #t;
thanks in advance
First of all, always use strict and use warnings. This measure is vital for all programming, as it will quickly reveal simple problems that you may otherwise overlook or waste time on debugging. This is especially true and a simple courtesy if you are asking for others' help with your program
You seem to have become confused between slurping an entire file into a single string, and into an array of lines. The way you have written it, each element $data{file} is a single scalar value containing all of the file's data, and then you try to iterate over it with foreach $l ($data{$line}) { ... } which executes just once and so only find the first [...] string in the file
Ordinarily I would say that you shouldn't read in all of your file data in this way, as the problem is likely to have a better streamed solution, but I don't know what else you want to use the captured data for, so my solution follows your own design
I think you need to slurp the data into a virtual array, instead of a scalar, and then iterate over that in your loops. You must leave $/ defined so that the file is read in lines, and build an anonymous array with [ <$fh> ]. Then you can iterate over the lines with foreach my $line (#{ $data{$file} }) { ... }
use strict;
use warnings;
my %data;
my #files = glob("../mitos-crex/*.out");
foreach my $file (#files) {
open my $fh, '<', $file or die $!;
$data{$file} = [ <$fh> ];
}
my $c = 0;
my #t;
foreach my $file (keys %data) {
foreach my $line (#{ $data{$file} }) {
($t[$c]) = $line =~ /(\[.*\])/;
$c++;
}
}
print $c;
my %counts;
$counts{$_}++ for #t;
The counter is giving a correct value. Your problem is that you are slurping the file (reading it all in at once), but then only storing the first value found:
($t[$c]) = $data{$line} =~ m/(\[.*\])/; # only finds first value in file
Either loop over each file properly, and use the above regex for each line, or do something like:
push #t, ($data{$line} =~ m/(\[.*\])/g);
You should always use
use strict;
use warnings;
And solve the errors/warnings that result. Not doing so is a bad idea, and is only hiding the problems in your code -- not solving them.
Also, you should be aware that this statement:
foreach $l ($data{$line}) {
Only iterates once, because each "line" here is an entire file, and $data{$line} is besides a scalar value. Moreover, you iterate using $l as an alias, but you still use $data{$line} inside the loop, which makes the loop completely redundant.

String replace in Perl

I am trying to deobfuscate code. This code uses a lot of long variable names which are substituted with meaningful names at the time of running the code.
How do I preserve the state while searching and replacing?
For instance, with an obfuscated line like this:
${${"GLOBALS"}["ttxdbvdj"]}=_hash(${$urqboemtmd}.substr(${${"GLOBALS"}["wkcjeuhsnr"]},${${"GLOBALS"}["gjbhisruvsjg"]}-${$rrwbtbxgijs},${${"GLOBALS"}["ibmtmqedn"]}));
There are multiple mappings in mappings.txt which match above obfuscated line like:
$rrwbtbxgijs = hash_length;
$urqboemtmd = out;
At the first run, it will replace $rrwbtbxgijs with hash_length in the obfuscated line above. Now, when it comes across the second mapping during the next iteration of the outer while loop, it will replace $urqboemtmd with out in the obfuscated line.
The problem is:
When it comes across first mapping, it does the substitution. However, when it comes across next mapping in the same line for a different matching string, the previous search/replace result is not there.
It should preserve the previous substitution. How do I do that?
I wrote a Perl script, which would pick one mapping from mapping.txt and search the entire obfuscated code for all the occurrences of this mapping and replace it with the meaningful text.
Here is the code I wrote:
#! /usr/bin/perl
use warnings;
($mapping, $input) = #ARGV;
open MAPPING, '<', $mapping
or die "couldn't read from the file, $mapping with error: $!\n";
while (<MAPPING>) {
chomp;
$line = $_;
($key, $value) = split("=", $line);
open INPUT, '<', $input;
while (<INPUT>) {
chomp;
if (/$key/) {
$_=~s/\Q$key/$value/g;
print $_,"\n";
}
}
close INPUT;
}
close MAPPING;
To match the literal meta characters inside your string, you can use quotemeta or:
s/\Q$key\E/$replace/
Just tell Perl not to interpret the characters in $key:
s/\Q$key/$value/g
Consider using B::Deobfuscate and gradually enter variable names into its configuration file as you figure out what they do.
I'm a little confused about your request to save state. What exactly are you doing/do you intend to do with the output? Here's an (untested) example of doing all the substitutions in one pass, if that helps?
my %map;
while ( my $line = <MAPPING> ) {
chomp $line;
my ($key, $value) = split("=", $line);
$map{$key} = $value;
}
close MAPPING;
my $search = qr/(#{[ join '|', map quotemeta, sort { length $b <=> length $a } keys %map ]})/;
while ( my $line = <INPUT> ) {
$line =~ s/$search/$map{$1}/g;
print OUTPUT $line;
}

How to Circumvent Perl's string escaping the replacement string in s/// when it's read from a file?

This question is similar to my last one, with one difference to make the toy script more similar to my actual one.
Here is the toy script, replace.pl (Edit: now with 'use strict;', etc)
#! /usr/bin/perl -w
use strict;
open(REPL, "<", $ARGV[0]) or die "Couldn't open $ARGV[0]: $!!";
my %replacements;
while(<REPL>) {
chomp;
my ($orig, $new, #rest) = split /,/;
# Processing+sanitizing of orig/new here
$replacements{$orig} = $new;
}
close(REPL) or die "Couldn't close '$ARGV[0]': $!";
print "Performing the following replacements\n";
while(my ($k,$v) = each %replacements) {
print "\t$k => $v\n";
}
open(IN, "<", $ARGV[1]) or die "Couldn't open $ARGV[1]: $!!";
while ( <IN> ) {
while(my ($k,$v) = each %replacements) {
s/$k/$v/gee;
}
print;
}
close(IN) or die "Couldn't close '$ARGV[1]': $!";
So, now lets say I have two files, replacements.txt (using the best answer from the last question, plus a replacement pair that doesn't use substitution):
(f)oo,q($1."ar")
cat,hacker
and test.txt:
foo
cat
When I run perl replace.pl replacements.txt test.txt I would like the output to be
far
hacker
but instead it's '$1."ar"' (too much escaping) but the results are anything but (even with the other suggestions from that answer for the replacement string). The foo turns into ar, and the cat/hacker is eval'd to the empty string, it seems.
So, what changes do I need to make to replace.pl and/or replacements.txt? Other people will be creating the replacements.txt's, so I'd like to make that file as simple as possible (although I acknowledge that I'm opening the regex can of worms on them).
If this isn't possible to do in one step, I'll use macros to enumerate all possible replacement pairs for this particular file, and hope the issue doesn't come up again.
Please don't give us non-working toy scripts that don't use strict and warnings. Because one of the first things people will do in debugging is to turn those on, and you've just caused work.
Second tip, use the 3-argument version of open rather than the 2-argument version. It is safer. Also in your error checking do as perlstyle says (see http://perldoc.perl.org/perlstyle.html for the full advice) and include the file name and $!.
Anyways your problem is that the code you were including was q($1."ar"). When executed this returns the string $1."ar". Get rid of the q() and it works fine. BUT it causes warnings. That can be fixed by moving the quoting into the replace script, and out of the original script.
Here is a fixed script for you:
#! /usr/bin/perl -w
use strict;
open(REPL, "<", $ARGV[0]) or die "Couldn't open '$ARGV[0]': $!!";
my %replacements;
while(<REPL>) {
chomp;
my ($orig, $new) = split /,/;
# Processing+sanitizing of orig/new here
$replacements{$orig} = '"' . $new . '"';
}
close(REPL) or die "Couldn't close '$ARGV[0]': $!";
print "Performing the following replacements\n";
while(my ($k,$v) = each %replacements) {
print "\t$k => $v\n";
}
open(IN, "<", $ARGV[1]) or die "Couldn't open '$ARGV[1]': $!!";
while ( <IN> ) {
while(my($k,$v) = each %replacements) {
s/$k/$v/gee;
}
print;
}
close(IN) or die "Couldn't close '$ARGV[1]': $!";
And the modified replacements.txt is:
(f)oo,${1}ar
cat,hacker
You have introduced one more level of interpolation since the last question.
You can get the right result by either:
Lay a 3rd "e" modifier on your substitution
s/$k/$v/geee; # eeek
Remove a layer of interpolation in replacements.txt by making the first line
(f)oo,$1."ar"
Get rid of the q() in the replacement string;
Should be just
(f)oo,$1."ar"
as in ($k,$v) = split /,/, $_;
Warning: using external input data in evals is very, very dangerous
Or, just make it
(f)oo,"${1}ar"
No modification to the code is necessary either way e.g. s///gee.
Edit #drhorrible, if it doesen't work then you have other problems.
use strict;use warnings;
my $str = "foo";
my $repl = '(f)oo,q(${1}."ar")';
my ($k,$v) = split /,/, $repl;
$str =~ s/$k/$v/gee;
print $str,"\n";
$str = "foo";
$repl = '(f)oo,$1."ar"';
($k,$v) = split /,/, $repl;
$str =~ s/$k/$v/gee;
print $str,"\n";
$str = "foo";
$repl = '(f)oo,"${1}ar"';
($k,$v) = split /,/, $repl;
$str =~ s/$k/$v/gee;
print $str,"\n";
output:
${1}."ar"
far
far