Pass regex into perl subroutine - regex

The Situation
I am in the process of creating a simple template file that will aid in creating future scripts for doing various tasks via command line on *nix systems. As part of this, I might like to ask the user to input data which needs to validated against a regular expression that is supplied in the source code.
The Issue
Errors are begin generated when I attempt to run the Perl code via command line. I am attempting to pass a regular expression into the repeat subroutine and I'm not sure how to exactly do this. I am aware that I can execute a string using eval, however this is something that I would like to avoid due to convention.
The errors:
Use of uninitialized value $_ in pattern match (m//) at scripts/template line 40.
Use of uninitialized value $resp in concatenation (.) or string at scripts/template line 37.
The code:
#!/usr/bin/env perl
use strict;
use warnings;
use Cwd;
use Term::ANSIColor;
use Data::Dumper;
my $log = "template.log";
my $task = "template";
my $cwd = getcwd();
my $logPath = $cwd . "/". $log;
print ucfirst($task) . " utility starting...\n";
system("cd ~/Desktop");
system("touch " . $log);
&writeLog("Test");
sub writeLog {
open(my $fh, '>>', $logPath) or die "Could not open file '$log' $!";
print $fh $_[0] . localtime() . "\n";
close $fh;
return 1;
}
sub ask {
my $question = $_[0];
my $input = $_[1];
my $resp = <>;
chomp($resp);
}
sub repeat {
my $pat = $_[0];
my $resp = $_[1];
print $pat . "\n";
print $resp . "\n";
}
&repeat(/foo|bar/i, "y");
What I have tried:
Based on these sources:
Match regex and assign results in single line of code
How to assign result of a regex match to a new variable, in a single line?
sub repeat {
my $pat =~ $_[0];
my $resp = $_[1];
if($pat !~ $resp) {
print "foo\n";
} else {
print "bar\n";
}
}
Any help is appreciated!

To create a regular expression for use later, we use qr//:
my $regexp = qr/^Perl$/;
This compiles the regular expression for use later. If there's a problem with your regular expression, you'll hear about it immediately. To use this pre-compiled regular expression you can use any of the following:
# See if we have a match
$string =~ $regexp;
# A simple substitution
$string =~ s/$regexp/Camel/;
# Comparing against $_
/$regexp/;

A bare regex literal like /.../ matches agains $_. To create an independent regex object, use qr// quotes:
repeat(qr/foo|bar/i, "y");
(and please don't invoke subs like &sub unless you know when and why this is neccessary.)

Related

How to match strings with regex pattern like [aaa-bbb.com] in perl

I have file which have domain name inside "[" "]" brackets. I want to check whether a particular domain name is present or not.
sub main
{
my $file = '/home/deeps/sample.txt';
open(FH, $file) or die("File not found");
my $host = "deeps-cet.helll.com";
my match_patt = "\[$host\]";
while (my $String = <FH>)
{
if($String =~ $match_patt)
{
print "match";
}
}
close(FH);
}
main();
The above code throws error - Invalid [] range "s-c" in regex. help to resolve it.
Use quotemeta to escape ASCII non-"word" characters, that could have a special meaning in the regex
my $match_patt = quotemeta "[$host]";
Or use \Q escape right in the regex, implemented using quotemeta. See docs.
What happens in your code is that the well-meant escape of the bracket, \[, is evaluated already under the double quotes when you form the pattern, so after "\[$host\]" is assigned to $match_patt then that variable ends up with the string [deeps-cet.helll.com]. These [] are treated as the range operator in the regex and fail because of the "backwards" s-c range.†
This can be seen with the pattern built using non-interpolating single quotes for \[
my $match_patt = '\[' . $host . '\]';
which now works. But of course it is in principle better to use quotemeta.
† This is really lucky -- if the range were valid, like ac-sb.etc, then this would be a legitimate pattern inside [] which would silently do completely wrong things.
Bellow is corrected code, if you plan to use a variable for regular expression for this purpose available my $regex = qr/..../, and when you do match you should use construction $variable =~ /$regex/;
use strict;
use warnings;
use feature 'say';
my $fname = shift || '/home/deeps/sample.txt';
my $host = shift || 'deeps-cet.helll.com';
search($fname,$host);
sub search {
my $fname = shift;
my $host = shift;
my $regex = qr/\[$host\]/;
open my $fh, '<', $fname
or die "Can't open $fname";
/$regex/ && say "match" while <$fh>;
close $fh;
}

Simple perl regex replacement

Here is my perl code:
my $var="[url=/jobs/]click here[/url]";
$var =~ /\[url=(.+?)\](.+?)\[\/url\]/\2/g
I'm very new to perl so i am aware that its incorrect but how do i perform this regex replacement correctly.
The end result would be a transformation of $var to click here
So, with all the answers you know the substitute form is s///
However, with something this big you should break it up into parts
to make it easier to maintain. And also helps to get out of the
quagmire of delimiter hell.
This uses a pre-compiled regex and a callback function invoked with s///e
use strict;
use warnings;
# Pre-compiled regex
my $rx = qr{\[url=(.+?)\](.+?)\[/url\]};
# Callback
sub MakeAnchor {
my ($href,$text) = #_;
return '' . $text . '';
}
my $input = '[url=/jobs/]click here[/url]';
$input =~ s/$rx/MakeAnchor($1,$2)/eg;
print $input;
Outout
click here

RegEx in perl that Uses Groups to Extract Information From A Filepath

So I need to take something in this format: 2015-08-15_15-41-32_44100_logo.txtand extract the date, time, and frequency from it, using these two pieces of code. Right now it's in the form <date>_<time>_<frequency>_logo.txt.Below is my attempt to make it a regex, but I know I'm missing something. How do I use groups in perl to do this?
The code below searches through a directory for every filepath that follows the pattern, and returns those files in a list. What I need help with is the regex itself. I need to be able to get the frequency.
$pattern =qr/^(\d+)-(\d+)-(\d+)_(\d+)-(\d+)-(\d+)_44100_(\w+).(\w+)$/;
#listFiles = grep_files($bee_music_dir,$pattern);
print join(",",#listFiles);
sub grep_files {
my ($dir, $pat) = #_;
opendir(my $dir_handle, $dir) or die $!;
my #files = grep { $_ =~ /$pat/ } readdir($dir_handle);
closedir($dir_handle);
return \#files;
}
Regular expression groups in perl are used like this:
my ($a, $b, $c) = $somestring=~ /(\d+)-(\d+)-(\d+)/;
Here, each variable in the list ($a, $b, $c) gets assigned the value of the matching groups, which are also available as $1, $2, and $3. So the above line is equivalent to:
$somestring =~ /(\d+)-(\d+)-(\d+)/;
my ($a, $b, $c) = ($1, $2, $3);
(you could even do my $a = $1; my $b = $2; my $c = $3).
If you want to declare a $pattern variable you should do it like this:
my $pattern = qr/(\d+)-(\d+)-(\d+)_(\d+)-(\d+)-(\d+)_(\d+)_(\w+).(\w+)/;
where qr is the quote-regexp operator, pre-compiling the regular expression for optimisation. You shouldn't use the =~ operator here, because it would apply the regular expression to $pattern rather than defining $pattern as that regular expression.
Defining a patter this way allows you to just
$stringtomatch =~ $pattern;
(but =~ /$pattern/ will also work).
The regular expression to match files formatted like 2015-08-15_15-41-32_44100_logo.txt or <date>_<time>_<frequency>_logo.txt looks like this:
/^(\d\d\d\d)-(\d\d)-(\d\d)_(\d\d)-(\d\d)-(\d\d)-(\d+)_logo\.txt$/
You could use \d+ but it won't necessarily match a date. Also, . in regular expressions means 'any character', so if you really mean . you should escape it: \..
Here's a more verbose version of part of your sub illustrating access to the groups:
my #files = ();
while ( my $file = readdir($dir_handle) ) {
if ( my ($year,$month,$day,$hour,$minute,$second,$freq) = $file =~ $pattern ) {
# do something with $freq
push #files, $file;
}
}
If all you are after is a list of the frequencies, it would suffice to only 'group' the wanted field:
my $pattern = qr/^\d+-\d+-\d+_\d+-\d+-\d+_(\d+)_logo\.txt$/;
You were close, just a few changes. Here's the script and a test run:
$ cat freq.pl
#!/usr/bin/perl --
use strict;
use warnings;
my $pattern = qr/^(\d+)-(\d+)-(\d+)_(\d+)-(\d+)-(\d+)_(\d+)_(\w+).(\w+)$/;
sub grep_files {
my ($dir, $pat) = #_;
opendir(my $dir_handle, $dir) or die $!;
my #files = grep { $_ =~ /$pat/ } readdir($dir_handle);
s/$pat/$7/ foreach #files;
closedir($dir_handle);
\#files;
}
print join("\n", #{grep_files '.', $pattern}), "\n";
$ ls
2015-08-15_15-41-32_44100_logo.txt freq.pl
2015-08-25_25-41-32_48000_logo.txt
$ ./freq.pl
44100
48000
freq.pl extracts the frequency from the filenames in the current directory. It's based on yours, with some key differences:
You're matching the pattern against an undefined variable. You really want to store the pattern in the variable. I also anchor the pattern at the beginning and end, so in the (admittedly unlikely event in this case) you have other files with stuff at the start or end, it won't match those by accident. You were also missing a semi-colon at the end of the line.
You were selecting the files that match the pattern, but then not extracting the frequency. The s/$pat/$7/ foreach #files; loops over all the files matching the pattern and replaces everything with just the 7th group, which is the frequency. You could also select files and extract the frequency in one step by using map instead of grep.
I added the last line for testing.
While not directly related, always use use strict and use warnings at the top of your scripts. use strict makes some questionable constructs errors and use warnings warns about some possible problems with the script.
The ls shows the examples files in the current directory, and freq.pl runs the script showing the output.

How to pass a replacing regex as a command line argument to a perl script

I am trying to write a simple perl script to apply a given regex to a filename among other things, and I am having trouble passing a regex into the script as an argument.
What I would like to be able to do is somthing like this:
> myscript 's/hi/bye/i' hi.h
bye.h
>
I have produced this code
#!/utils/bin/perl -w
use strict;
use warnings;
my $n_args = $#ARGV + 1;
my $regex = $ARGV[0];
for(my $i=1; $i<$n_args; $i++) {
my $file = $ARGV[$i];
$file =~ $regex;
print "OUTPUT: $file\n";
}
I cannot use qr because apparently it cannot be used on replacing regexes (although my source for this is a forum post so I'm happy to be proved wrong).
I would rather avoid passing the two parts in as seperate strings and manually doing the regex in the perl script.
Is it possible to pass the regex as an argument like this, and if so what is the best way to do it?
There's more than one way to do it, I think.
The Evial Way:
As you basically send in a regex expression, it can be evaluated to get the result. Like this:
my #args = ('s/hi/bye/', 'hi.h');
my ($regex, #filenames) = #args;
for my $file (#filenames) {
eval("\$file =~ $regex");
print "OUTPUT: $file\n";
}
Of course, following this way will open you to some very nasty surprises. For example, consider passing this set of arguments:
...
my #args = ('s/hi/bye/; print qq{MINE IS AN EVIL LAUGH!\n}', 'hi.h');
...
Yes, it will laugh at you most evailly.
The Safe Way:
my ($regex_expr, #filenames) = #args;
my ($substr, $replace) = $regex_expr =~ m#^s/((?:[^/]|\\/)+)/((?:[^/]|\\/)+)/#;
for my $file (#filenames) {
$file =~ s/$substr/$replace/;
print "OUTPUT: $file\n";
}
As you can see, we parse the expression given to us into two parts, then use these parts to build a full operator. Obviously, this approach is less flexible, but, of course, it's much more safe.
The Easiest Way:
my ($search, $replace, #filenames) = #args;
for my $file (#filenames) {
$file =~ s/$search/$replace/;
print "OUTPUT: $file\n";
}
Yes, that's right - no regex parsing at all! What happens here is we decided to take two arguments - 'search pattern' and 'replacement string' - instead of a single one. Will it make our script less flexible than the previous one? No, as we still had to parse the regex expression more-or-less regularly. But now user clearly understand all the data that is given to a command, which is usually quite an improvement. )
#args in both examples corresponds to #ARGV array.
The s/a/b/i is an operator, not simply a regular expression, so you need to use eval if you want it to be interpreted properly.
#!/usr/bin/env perl
use warnings;
use strict;
my $regex = shift;
my $sub = eval "sub { \$_[0] =~ $regex; }";
foreach my $file (#ARGV) {
&$sub($file);
print "OUTPUT: $file\n";
}
The trick here is that I'm substituting this "bit of code" into a string to produce Perl code that defines an anonymous subroutine $_[0] =~ s/a/b/i; (or whatever code you pass it), then using eval to compile that code and give me a code reference I can call from within the loop.
$ test.pl 's/foo/bar/' foo nicefood
OUTPUT: bar
OUTPUT: nicebard
$ test.pl 'tr/o/e/' foo nicefood
OUTPUT: fee
OUTPUT: nicefeed
This is more efficient than putting an eval "\$file =~ $regex;" inside the loop as then it'll get compiled and eval-ed at every iteration rather than just once up-front.
A word of warning about eval - as raina77ow's answer explains, you should avoid eval unless you're 100% sure you are always getting your input from a trusted source...
s/a/b/i is not a regex. It is a regex plus substitution. Unless you use the string eval, make this work might be pretty tough (consider s{a}<b>e and so on).
The trouble is that you are trying to pass a perl operator when all you really need to pass is the arguments:
myscript hi bye hi.h
In the script:
my ($find, $replace, #files) = #ARGV;
...
$file =~ s/$find/$replace/i;
Your code is a bit clunky. This is all you need:
use strict;
use warnings;
my ($find, $replace, #files) = #ARGV;
for my $file (#files) {
$file =~ s/$find/$replace/i;
print "$file\n";
}
Note that this way allows you to use meta characters in the regex, such as \w{2}foo?. This can be both a good thing and a bad thing. To make all characters intepreted literally (disable meta characters), you can use \Q ... \E like so:
... s/\Q$find\E/$replace/i;

Using a regular expression in Perl to list variables from another Perl script

My thoughts on how to grab all scalars and arrays out of a Perl file went along the lines of:
open (InFile, "SomeScript.pl");
#InArray = <InFile>;
#OutArray = {};
close (InFile);
$ArrayCount = #InArray;
open (OutFile, ">outfile.txt");
for ($x=0; $x<=$ArrayCount; $x++){
$Testline = #InArray[$x];
if($Testline =~ m/((#|\$)[A-Z]+)/i){
$Outline = "$1\n";
push #OutArray, $Outline;
}
}
print OutFile #OutArray;
close(OutFile);
...and this works fairly well. The problem is that if multiple variables appear on a line it will only grab the first variable. An example might be:
$FirstVar = $SecondVar + $ThirdVar;
The script would only grab $FirstVar and output to a file. This might still work though because $SecondVar and $ThirdVar have to be initialized somewhere else before the proceeding line has any meaning. I guess the exception to the rule would be a line in which multiple variables are initialized at the same time.
Could an example in real Perl code break this script?
Also, how to grab multiple items that match my regular expression's criteria from the same line?
Don't do that
You can't really parse Perl with regexes, so I wouldn't even try.
You can't even properly parse it without actually running it, but you can get close with PPI.
perl-variables.pl
#! /usr/bin/env perl
use strict;
use warnings;
use 5.10.1;
use PPI;
use PPI::Find;
my($filename) = (#ARGV, $0); # checks itself by default
my $Doc = PPI::Document->new($filename);
my $Find = PPI::Find->new( sub{
return 0 unless $_[0]->isa('PPI::Token::Symbol');
return 1;
});
$Find->start($Doc);
while( my $symbol = $Find->match ){
my $raw = $symbol->content;
my $var = $symbol->symbol;
if( $raw eq $var ){
say $var;
} else {
say "$var\t($raw)";
}
}
print "\n";
my #found = $Find->in($Doc);
my %found;
$found{$_}++ for #found;
say for sort keys %found;
Running it against itself, produces:
$filename
#ARGV
$0
$Doc
$filename
$Find
#_ ($_)
$Find
$Doc
$symbol
$Find
$raw
$symbol
$var
$symbol
$raw
$var
$var
#found
$Find
$Doc
%found
%found ($found)
$_
#found
%found
$0
$Doc
$Find
$_
$filename
$found
$raw
$symbol
$var
%found
#ARGV
#found
It looks like this will miss fully qualified variable names ($My::Package::Foo) and the rare but valid variable names enclosed with braces (${variable}, ${"varname!with#special+chars"}). Your script will also match element accesses of hashes and arrays ($array[4] ==> $array, $hash{$key} ==> $hash), and object method calls ($object->method() ==> $object), which may or may not be what you want.
You also mismatch variables with underscores ($my_var) and numbers ($var3), and you could get false positives from comments, quoted strings, pod, etc. (# report bugs to bob#company.org).
Matching multiple expressions is a matter of using the /g modifier, which will return a list of matches:
#vars = $Testline =~ /[#\$]\w+/gi;
if (#vars > 0) {
push #OutArray, #vars;
}
Time simple-minded answer is to the /g flag on your regexp.
The complex answer is that this sort of code analysis is very difficult for perl. Look at the module PPI for a better, more full featured, semantic analysis of perl code.
I can't answer either of your questions directly, but I will offer this: I don't know why you're trying to extract scalars, but the debugger package that comes with perl has to "know" about all variables, and the last time I looked it was written in Perl. You may be better off trying to evaluate a perl script using the debugger package or techniques borrowed from that package rather than reinventing the wheel.
Despite the limitations with the method, here is a slightly simpler version of the script above that reads from stdin.
#!/usr/bin/perl
use strict;
use warnings;
my %vars;
while (<>) {
$vars{$_}++ for (m'([$#]\w+)'g);
}
my #vars = keys %vars;
print "#vars\n";