how to include grep in a regex in perl - regex

so i'm currently stuck on this problem:
1. i declare a constant list, say LIST
2. i want to read through a file, which i do so line by line in a while loop, and if the line has a keyword from LIST, i print the line, or so something with it.
this is what i have currently:
use constant LIST => ('keyword1', 'keyword2', 'keyword3');
sub main{
unless(open(MYFILE, $file_read)){die "Error\n"};
while(<MYFILE>){
my $line = $_;
chomp($line);
if($line =~ m//){#here is where i'm stuck, i want is if $line has either of the keywords
print $line;
}
}
}
What should i do in that if statement to match what i want the program to do? and can i do so without having the $line variable and simply using $_? i only used $line because i thought grep would automatically place the constants in LIST into $_.
Thanks!

The easiest way is to define a quoted regular expression as your constant instead of a list:
use strict;
use warnings;
use autodie; # Will kill program on bad opens, closes, and writes
use feature qw(say); # Better than "print" in most situations
use constant {
LIST => qr/keyword1|keyword2|keyword3/, # Now a regular expression.
FILE_READ => 'file.txt', # You're defining constants, make this one too.
};
open my $read_fh, "<", FILE_READ; # Use scalars for file handles
# This isn't Java. You don't have to define "main" subroutine
while ( my $line = <$read_fh> ) {
chomp $line;
if ( $line =~ LIST ) { #Now I can use the constant as a regex
say $line;
}
}
close $read_fh;
By the way, if you don't use autodie, the standard way of opening a file and failing if it doesn't open is to use the or syntax:
open my $fh, "<", $file_name or die qq(Can't open file "$file_name": $!);
If you have to use a list as a constant, then you can use join to make the regular expression:
use constant LIST => qw( keyword1 keyword2 keyword3 );
...
my $regex = join "|", map LIST;
while ( my $line = <$file_fh> ) {
chomp $line;
if ( $line =~ /$regex/ ) {
say $line;
}
}
The join takes a list (in this case, a constant list), and separates each member by the string or character you give it. I hope your keywords contain no special regular expression characters. Otherwise, you need to quote those special characters.
Addendum
my $regex = join '|' => map +quotemeta, LIST; – Zaid
Thanks Zaid. I didn't know about the quotemeta command before. I had been trying various things with \Q and \E, but it started getting too complex.
Another way to do what Zaid did:
my #list = map { quotemeta } LIST;
my $regex = join "|", #list;
The map is a bit difficult for beginners to understand. map takes each element in LIST and runs the quotemeta command against it. This returns list which I assign to #list.
Imagine:
use constant LIST => qw( periods.are special.characters in.regular.expressions );
When I run:
my #list = map { quotemeta } LIST;
This returns the list:
my #list = ( "periods\.are", "special\.characters", "in\.regular\.expressions" );
Now, the periods are literal periods instead of special characters in the regular expression. When I run:
my $regex = join "|", #list;
I get:
$regex = "periods\.are|special\.characters|in\.regular\.expressions";
And that's a valid regular expression.

Related

Load regex from file and match groups with it in Perl

I have a file containing regular expressions, e.g.:
City of (.*)
(.*) State
Now I want to read these (line by line), match them against a string, and print out the extraction (matched group). For example: The string City of Berlin should match with the first expression City of (.*) from the file, after that Berlin should be extracted.
This is what I've got so far:
use warnings;
use strict;
my #pattern;
open(FILE, "<pattern.txt"); # open the file described above
while (my $line = <FILE>) {
push #pattern, $line; # store it inside the #pattern variable
}
close(FILE);
my $exampleString = "City of Berlin"; # line that should be matched and
# Berlin should be extracted
foreach my $p (#pattern) { # try each pattern
if (my ($match) = $exampleString =~ /$p/) {
print "$match";
}
}
I want Berlin to be printed.
What happens with the regex inside the foreach loop?
Is it not compiled? Why?
Is there even a better way to do this?
Your patterns contain a newline character which you need to chomp:
while (my $line = <FILE>) {
chomp $line;
push #pattern, $line;
}
First off - chomp is the root of your problem.
However secondly - your code is also very inefficient. Rather than checking patterns in a foreach loop, consider instead compiling a regex in advance:
#!/usr/bin/env perl
use strict;
use warnings;
# open ( my $pattern_fh, '<', "pattern.txt" ) or die $!;
my #patterns = <DATA>;
chomp(#patterns);
my $regex = join( '|', #patterns );
$regex = qr/(?:$regex)/;
print "Using regex of: $regex\n";
my $example_str = 'City of Berlin';
if ( my ($match) = $example_str =~ m/$regex/ ) {
print "Matched: $match\n";
}
Why is this better? Well, because it scales more efficiently. With your original algorithm - if I have 100 lines in the patterns file, and 100 lines to check as example str, it means making 10,000 comparisons.
With a single regex, you're making one comparison on each line.
Note - normally you'd use quotemeta when reading in regular expressions, which will escape 'meta' characters. We don't want to do this in this case.
If you're looking for even more concise, you can use map to avoid needing an intermediate array:
my $regex = join( '|', map { chomp; $_ } <$pattern_fh> );
$regex = qr/(?:$regex)/;
print "Using regex of: $regex\n";
my $example_str = 'City of Berlin';
if ( my ($match) = $example_str =~ m/$regex/ ) {
print "Matched: $match\n";
}

Pattern match in perl

my $line = "Name:Amanda_Marry_Rose,Region:US,host:USE,cardType:DebitCard,product:Satin,Name:Raghav.S.Thomas,Region:UAE,";
my $name = "";
#name = ( $line =~ m/Name:([\w\s\_\,/g );
foreach (#name) {
print $name."\n";
}
I want to capture the word between Name: and ,Region whereever it occurs in the whole line. The main loophole is that the name can be of any format
Amanda_Marry_Rose
Amanda.Marry.Rose
Amanda Marry Rose
Amanda/Marry/Rose
I need a help in capturing such a pattern every time it occurs in the line. So for the line I provided, the output should be
Amanda_Marry_Rose
Raghav.S.Thomas
Does anyone has any idea how to do this? I tried keeping the below line, but it's giving me the wrong output as.
#name=($line=~m/Name:([\w\s\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\#\[\\\]\^\_\`\{\|\}\~\´]+)\,/g);
Output
Amanda_Marry_Rose,Region:US,host:USE,cardType:DebitCard,product:Satin,Name:Raghav.S.Thomas,Region:UAE
To capture between Name: and the first comma, use a negated character class:
/Name:([^,]+)/g
This says to match one or more characters following Name: which isn't a comma:
while (/Name:([^,]+)/g) {
print $1, "\n";
}
This is more efficient than a non-greedy quantifier, e.g:
/Name:(.+?),/g
As it doesn't require backtracking.
Reg-ex corrected:
my $line = "Name:Amanda_Marry_Rose,Region:US,host:USE,cardType:DebitCard,product:Satin,Name:Raghav.S.Thomas,Region:UAE,";
my #name = ($line =~ /Name\:([\w\s_.\/]+)\,/g);
foreach my $name (#name) {
print $name."\n";
}
What you have there is comma separated data. How you should parse this depends a lot on your data. If it is full-fledged csv data, the most safe approach is to use a proper csv parser, such as Text::CSV. If it is less strict data, you can get away with using the light-weight parser Text::ParseWords, which also has the benefit of being a core module in Perl 5. If what you have here is rather basic, user entered fields, then I would recommend split -- simply because when you know the delimiter, it is easier and safer to define it, than everything else inside it.
use strict;
use warnings;
use Data::Dumper;
my $line = "Name:Amanda_Marry_Rose,Region:US,host:USE,cardType:DebitCard,product:Satin,Name:Raghav.S.Thomas,Region:UAE,";
# Simple split
my #fields = split /,/, $line;
print Dumper for map /^Name:(.*)/, #fields;
use Text::ParseWords;
print Dumper map /^Name:(.*)/, quotewords(',', 0, $line);
use Text::CSV;
my $csv = Text::CSV->new({
binary => 1,
});
$csv->parse($line);
print Dumper map /^Name:(.*)/, $csv->fields;
Each of these options give the same output, save for the one that uses Text::CSV, which also issues an undefined warning, quite correctly, because your data has a trailing comma (meaning an empty field at the end).
Each of these has different strengths and weaknesses. Text::CSV can choke on data that does not conform with the CSV format, and split cannot handle embedded commas, such as Name:"Doe, John",....
The regex we use to extract the names very simply just captures the entire rest of the lines that begin with Name:. This also allows you to perform sanity checks on the field names, for example issue a warning if you suddenly find a field called Doe;Name:
The simple way is to look for all sequences of non-comma characters after every instance of Name: in the string.
use strict;
use warnings;
my $line = 'Name:Amanda_Marry_Rose,Region:US,host:USE,cardType:DebitCard,product:Satin,Name:Raghav.S.Thomas,Region:UAE,';
my #names = $line =~ /Name:([^,]+)/g;
print "$_\n" for #names;
output
Amanda_Marry_Rose
Raghav.S.Thomas
However, it may well be useful to parse the data into an array of hashes so that related fields are gathered together.
use strict;
use warnings;
my $line = 'Name:Amanda_Marry_Rose,Region:US,host:USE,cardType:DebitCard,product:Satin,Name:Raghav.S.Thomas,Region:UAE,';
my %info;
my #persons;
while ( $line =~ / ([a-z]+) : ([^:,]+) /gix ) {
my ($key, $val) = (lc $1, $2);
if ($info{$key}) {
push #persons, { %info };
%info = ();
}
$info{$key} = $val;
}
push #persons, { %info };
use Data::Dump;
dd \#persons;
print "\nNames:\n";
print "$_\n" for map $_->{name}, #persons;
output
[
{
cardtype => "DebitCard",
host => "USE",
name => "Amanda_Marry_Rose",
product => "Satin",
region => "US",
},
{
name => "Raghav.S.Thomas",
region => "UAE",
},
]
Names:
Amanda_Marry_Rose
Raghav.S.Thomas

Pass regex into perl subroutine

The Situation
I am in the process of creating a simple template file that will aid in creating future scripts for doing various tasks via command line on *nix systems. As part of this, I might like to ask the user to input data which needs to validated against a regular expression that is supplied in the source code.
The Issue
Errors are begin generated when I attempt to run the Perl code via command line. I am attempting to pass a regular expression into the repeat subroutine and I'm not sure how to exactly do this. I am aware that I can execute a string using eval, however this is something that I would like to avoid due to convention.
The errors:
Use of uninitialized value $_ in pattern match (m//) at scripts/template line 40.
Use of uninitialized value $resp in concatenation (.) or string at scripts/template line 37.
The code:
#!/usr/bin/env perl
use strict;
use warnings;
use Cwd;
use Term::ANSIColor;
use Data::Dumper;
my $log = "template.log";
my $task = "template";
my $cwd = getcwd();
my $logPath = $cwd . "/". $log;
print ucfirst($task) . " utility starting...\n";
system("cd ~/Desktop");
system("touch " . $log);
&writeLog("Test");
sub writeLog {
open(my $fh, '>>', $logPath) or die "Could not open file '$log' $!";
print $fh $_[0] . localtime() . "\n";
close $fh;
return 1;
}
sub ask {
my $question = $_[0];
my $input = $_[1];
my $resp = <>;
chomp($resp);
}
sub repeat {
my $pat = $_[0];
my $resp = $_[1];
print $pat . "\n";
print $resp . "\n";
}
&repeat(/foo|bar/i, "y");
What I have tried:
Based on these sources:
Match regex and assign results in single line of code
How to assign result of a regex match to a new variable, in a single line?
sub repeat {
my $pat =~ $_[0];
my $resp = $_[1];
if($pat !~ $resp) {
print "foo\n";
} else {
print "bar\n";
}
}
Any help is appreciated!
To create a regular expression for use later, we use qr//:
my $regexp = qr/^Perl$/;
This compiles the regular expression for use later. If there's a problem with your regular expression, you'll hear about it immediately. To use this pre-compiled regular expression you can use any of the following:
# See if we have a match
$string =~ $regexp;
# A simple substitution
$string =~ s/$regexp/Camel/;
# Comparing against $_
/$regexp/;
A bare regex literal like /.../ matches agains $_. To create an independent regex object, use qr// quotes:
repeat(qr/foo|bar/i, "y");
(and please don't invoke subs like &sub unless you know when and why this is neccessary.)

String replace in Perl

I am trying to deobfuscate code. This code uses a lot of long variable names which are substituted with meaningful names at the time of running the code.
How do I preserve the state while searching and replacing?
For instance, with an obfuscated line like this:
${${"GLOBALS"}["ttxdbvdj"]}=_hash(${$urqboemtmd}.substr(${${"GLOBALS"}["wkcjeuhsnr"]},${${"GLOBALS"}["gjbhisruvsjg"]}-${$rrwbtbxgijs},${${"GLOBALS"}["ibmtmqedn"]}));
There are multiple mappings in mappings.txt which match above obfuscated line like:
$rrwbtbxgijs = hash_length;
$urqboemtmd = out;
At the first run, it will replace $rrwbtbxgijs with hash_length in the obfuscated line above. Now, when it comes across the second mapping during the next iteration of the outer while loop, it will replace $urqboemtmd with out in the obfuscated line.
The problem is:
When it comes across first mapping, it does the substitution. However, when it comes across next mapping in the same line for a different matching string, the previous search/replace result is not there.
It should preserve the previous substitution. How do I do that?
I wrote a Perl script, which would pick one mapping from mapping.txt and search the entire obfuscated code for all the occurrences of this mapping and replace it with the meaningful text.
Here is the code I wrote:
#! /usr/bin/perl
use warnings;
($mapping, $input) = #ARGV;
open MAPPING, '<', $mapping
or die "couldn't read from the file, $mapping with error: $!\n";
while (<MAPPING>) {
chomp;
$line = $_;
($key, $value) = split("=", $line);
open INPUT, '<', $input;
while (<INPUT>) {
chomp;
if (/$key/) {
$_=~s/\Q$key/$value/g;
print $_,"\n";
}
}
close INPUT;
}
close MAPPING;
To match the literal meta characters inside your string, you can use quotemeta or:
s/\Q$key\E/$replace/
Just tell Perl not to interpret the characters in $key:
s/\Q$key/$value/g
Consider using B::Deobfuscate and gradually enter variable names into its configuration file as you figure out what they do.
I'm a little confused about your request to save state. What exactly are you doing/do you intend to do with the output? Here's an (untested) example of doing all the substitutions in one pass, if that helps?
my %map;
while ( my $line = <MAPPING> ) {
chomp $line;
my ($key, $value) = split("=", $line);
$map{$key} = $value;
}
close MAPPING;
my $search = qr/(#{[ join '|', map quotemeta, sort { length $b <=> length $a } keys %map ]})/;
while ( my $line = <INPUT> ) {
$line =~ s/$search/$map{$1}/g;
print OUTPUT $line;
}

Perl substitution using a hash

open (FH,"report");
read(FH,$text,-s "report");
$fill{"place"} = "Dhahran";
$fill{"wdesc:desc"} = "hot";
$fill{"dayno.days"} = 4;
$text =~ s/%(\w+)%/$fill{$1}/g;
print $text;
This is the content of the "report" template file
"I am giving a course this week in %place%. The weather is %wdesc:desc%
and we're now onto day no %dayno.days%. It's great group of blokes on the
course but the room is like the weather - %wdesc:desc% and it gets hard to
follow late in the day."
For reasons that I won't go into, some of the keys in the hash I'll be using will have dots (.) or colons (:) in them, but the regex stops working for these, so for instance in the example above only %place% gets correctly replaced. By the way, my code is based on this example.
Any help with the regex greatly appreciated, or maybe there's a better approach...
You could loosen it right up and use "any sequence of anything that isn't a %" for the replaceable tokens:
$text =~ s/%([^%]+)%/$fill{$1}/g;
Good answers so far, but you should also decide what you want to do with %foo% if foo isn't a key in the %fill hash. Plausible options are:
Replace it with an empty string (that's what the current solutions do, since undef is treated as an empty string in this context)
Leave it alone, so "%foo%" stays as it is.
Do some kind of error handling, perhaps printing a warning on STDERR, terminating the translation, or inserting an error indicator into the text.
Some other observations, not directly relevant to your question:
You should use the three-argument version of open.
That's not the cleanest way to read an entire file into a string. For that matter, for what you're doing you might as well process the input one line at a time.
Here's how I might do it (this version leaves unrecognized "%foo%" strings alone):
#!/usr/bin/perl
use strict;
use warnings;
my %fill = ( place => 'Dhahran',
'wdesc:desc' => 'hot',
'dayno.days' => 4 );
my $filename = 'report';
open my $FH,,'<', $filename or die "$filename: $!\n";
while (my $line = <$FH>) {
foreach my $key (keys %fill) {
$line =~ s/\Q%$key%/$fill{$key}/g;
}
print $line;
}
And here's a version that dies with an error message if there's an unrecognized key:
#!/usr/bin/perl
use strict;
use warnings;
my %fill = ( place => 'Dhahran',
'wdesc:desc' => 'hot',
'dayno.days' => 4 );
my $filename = 'report';
open my $FH,,'<', $filename or die "$filename: $!\n";
while (my $line = <$FH>) {
$line =~ s/%([^%]*)%/Replacement($1)/eg;
print $line;
}
sub Replacement {
my($key) = #_;
if (exists $fill{$key}) {
return $fill{$key};
}
else {
die "Unrecognized key \"$key\" on line $.\n";
}
}
http://codepad.org/G0WEDNyH
$text =~ s/%([a-zA-Z0-9_\.\:]+)%/$fill{$1}/g;
By default \w equates to [a-zA-Z0-9_], so you'll need to add in the \. and \:.