Matching Regex in Perl - regex

my program contains ascii.txt to match patterns from it.
my program is to implement sed command, just to try write perl code because I am studying perl.
#!/usr/bin/perl
# sed command implementation
use strict;
use warnings;
use subs qw(read_STDIN read_FILE usage);
use IO::File;
use constant {
SEARCH_PRINT => 0,
};
our $proj_name = $0;
main(#ARGV);
sub main
{
if(scalar #_ == 2) {
read_FILE #_;
}
else {
usage
}
}
sub read_FILE {
my ($sed_script, $file_name) = #_;
my $parsed_val = parse_sed_script($sed_script);
if( $parsed_val == SEARCH_PRINT ) {
search_print_lines($sed_script, $file_name);
}
}
sub parse_sed_script {
my $command = shift or return;
if($command =~ /^\/([^\/].)*\/$/) {
return SEARCH_PRINT;
}
}
sub search_print_lines {
my ($script, $file) = #_;
my $fh = IO::File->new($file, "r") or error("no file found $file");
while( $_ = $fh->getline ) {
print if $_ =~ $script
}
}
sub usage {
message("Usage: $proj_name sed-script [file]")
}
sub error
{
my $e = shift || 'unkown error';
print("$0: $e\n");
exit 0;
}
When I execute from the shell: sed.pl /Test/ ascii.txt
I found that print if $_ =~ $script, doesn't execute because of the REGEX is stored in scalar variable
the ascii.txt contains.
Test 1
REGEX TEST
When I use print $script in search_print_lines subroutine it prints the regex sent by the user

When you pass something in on the command line and use it in your script, the entire literal text is used. So if you pass in /Test/, it will see those slashes as literals, so the "real" regular expression it's looking at is something like \/Test\/ (escaping the slashes, because now it's looking for them. Try passing in the regex without the // surrounding it.
If your goal is to allow the // to show that it's a regular expression, I would remove them when the program starts.
One more edit: If you want to be able to pass in flags, you'd need to eval the input somehow.
$script = '/Test/i';
eval { "\$regex = $script" };
and then
"REGEX TEST" =~ $regex
should return true. Doing an eval like this is highly insecure, though.
edit: what happens in eval is that whatever's in the block is executed. So what happens in the eval above is that you're dynamically creating a regular expression and setting it to a variable. That allows you to use regular expression flags like i without having to do any special parsing of the command-line input. When the eval is executed, it will be as if you had typed in $regex = /Test/i. Then you can compare your text to $regex and it will work. I thought about this because your example would not work unless you had the i flag set to make the comparison case-insensitive.

You didn't remove the slashes from $sed_script variable. After I modified your read_FILE function, it started to work:
sub read_FILE {
my ($sed_script, $file_name) = #_;
my $parsed_val = parse_sed_script($sed_script);
if( $parsed_val == SEARCH_PRINT ) {
$sed_script =~ s/^\/(.*)\/$/$1/;
#you can also parse the regexp
#$sed_script = qr/$sed_script/;
search_print_lines($sed_script, $file_name);
}
}

Related

How do I get perl to replace strtolower($value) with strtolower(substr($value, 0, 2))

I have several hundred PHP scripts that expect a language field to contain an ISO 639-1 2-character identifier. for example "en", which I now want to modify to support language codes qualified by country code, for example "fr-CA". In each of these scripts there is the following code:
case 'lang':
{ // language code
if (strlen($value) == 2)
$lang = strtolower($value);
break;
} // language code
which I want to modify to:
case 'lang':
{ // language code
if (strlen($value) >= 2)
$lang = strtolower(substr($value,0,2));
break;
} // language code
So I wrote a perl script to run over the entire directory tree and modify all of the matching scripts. For testing I have set the script up to create all of the modified scripts in a new directory structure:
use strict;
use warnings;
use 5.010;
use File::Find;
use File::Slurp;
my #content;
find( \&wanted, '/home/jcobban/public_html/');
exit;
sub wanted {
if (-f)
{
print "wanted: ", $File::Find::name, "\n";
my $odir = '/home/jcobban/testlang' . substr($File::Find::dir, 25);
if ((substr $odir, -1) ne "/"){
$odir = "$odir/";
}
if (! -d $odir){
mkdir $odir;
}
print "odir '$odir'\n";
my #lines = read_file($File::Find::name);
my $caselang = 0;
my $updated = 0;
foreach my $line (#lines){
if ($line =~ /\bcase\b/)
{
$caselang = $line =~ /\blang\b/i;
}
if ($line =~ /\bbreak\b/)
{
$caselang = 0;
}
if ($caselang)
{
print "old $line\n";
$line =~ s/ == 2/ >= 2/;
$line =~ s/strtolower(.value)/strtolower(substr(\$value,0,2))/;
$updated = 1;
print "new $line\n";
}
}
if ($updated)
{
# my $newfile = $File::Find::dir . "/" . $_;
my $newfile = $odir . $_;
print "alter \$lang to support ll-CC $newfile\n";
write_file($newfile, #lines);
}
else
{
print "did not find lang support in $_\n";
}
}
return;
}
The first match replace works, to change the == to >=, but the second match replace does not modify any lines and I do not understand well. I thought maybe there was a problem with matching to "\$" so I replaced it with "." but still no lines are changed. I applied the same command to other regex engines and they all worked. The output for a typical file is:
wanted: /home/jcobban/public_html/videoTutorials.php
odir '/home/jcobban/testlang/'
old case 'lang':
new case 'lang':
old {
new {
old if (strlen($value) == 2)
new if (strlen($value) >= 2)
old $lang = strtolower($value);
new $lang = strtolower($value);
alter $lang to support ll-CC /home/jcobban/testlang/videoTutorials.php
I have obviously been spending too much time using VIM. The problem with my code was that I needed to escape the round brackets so they were not interpreted as a subpattern.
$line =~ s/strtolower\(.value\)/strtolower(substr(\$value,0,2))/;
Just want to show some hacks, maybe it will be interesting for you:
s'strtolower\(\K\$value'substr($value,0,2)'
We can quote substitution with whatever we want:
s/foo/bar/;
s'foo'bar';
s(foo)(bar);
If we choose single quotes, variables will not be interpolated, but we still have to escape dollar sign in pattern side, because it will be treated as "end of line" by re engine.
\K Keep the stuff left of the \K
more information in perldoc perlre

Perl Grepping from an Array

I need to grep a value from an array.
For example i have a values
#a=('branches/Soft/a.txt', 'branches/Soft/h.cpp', branches/Main/utils.pl');
#Array = ('branches/Soft/a.txt', 'branches/Soft/h.cpp', branches/Main/utils.pl','branches/Soft/B2/c.tct', 'branches/Docs/A1/b.txt');
Now, i need to loop #a and find each value matches to #Array. For Example
It works for me with grep. You'd do it the exact same way as in the More::ListUtils example below, except for having grep instead of any. You can also shorten it to
my $got_it = grep { /$str/ } #paths;
my #matches = grep { /$str/ } #paths;
This by default tests with /m against $_, each element of the list in turn. The $str and #paths are the same as below.
You can use the module More::ListUtils as well. Its function any returns true/false depending on whether the condition in the block is satisfied for any element in the list, ie. whether there was a match in this case.
use warnings;
use strict;
use Most::ListUtils;
my $str = 'branches/Soft/a.txt';
my #paths = ('branches/Soft/a.txt', 'branches/Soft/b.txt',
'branches/Docs/A1/b.txt', 'branches/Soft/B2/c.tct');
my $got_match = any { $_ =~ m/$str/ } #paths;
With the list above, containing the $str, the $got_match is 1.
Or you can roll it by hand and catch the match as well
foreach my $p (#paths) {
print "Found it: $1\n" if $p =~ m/($str)/;
}
This does print out the match.
Note that the strings you show in your example do not contain the one to match. I added it to my list for a test. Without it in the list no match is found in either of the examples.
To test for more than one string, with the added sample
my #strings = ('branches/Soft/a.txt', 'branches/Soft/h.cpp', 'branches/Main/utils.pl');
my #paths = ('branches/Soft/a.txt', 'branches/Soft/h.cpp', 'branches/Main/utils.pl',
'branches/Soft/B2/c.tct', 'branches/Docs/A1/b.txt');
foreach my $str (#strings) {
foreach my $p (#paths) {
print "Found it: $1\n" if $p =~ m/($str)/;
}
# Or, instead of the foreach loop above use
# my $match = grep { /$str/ } #paths;
# print "Matched for $str\n" if $match;
}
This prints
Found it: branches/Soft/a.txt
Found it: branches/Soft/h.cpp
Found it: branches/Main/utils.pl
When the lines with grep are uncommented and foreach ones commented out I get the corresponding prints for the same strings.
The slashes dot in $a will pose a problem so you either have to escape them it when doing regex match or use a simple eq to find the matches:
Regex match with $a escaped:
my #matches = grep { /\Q$a\E/ } #array;
Simple comparison with "equals":
my #matches = grep { $_ eq $a } #array;
With your sample data both will give an empty array #matches because there is no match.
This Solved My Question. Thanks to all especially #zdim for the valuable time and support
my #SVNFILES = ('branches/Soft/a.txt', 'branches/Soft/b.txt');
my #paths = ('branches/Soft/a.txt', 'branches/Soft/b.txt',
'branches/Docs/A1/b.txt', 'branches/Soft/B2/c.tct');
foreach my $svn (#SVNFILES)
{
chomp ($svn);
my $m = grep { /$svn/ } (#paths);
if ( $m eq '0' ) {
print "Files Mismatch\n";
exit 1;
}
}
You should escape characters like '/' and '.' in any regex when you need it as a character.
Likewise :
$a="branches\/Soft\/a\.txt"
Retry whatever you did with either grep or perl with that. If it still doesn't work, tell us precisely what you tried.

How can I match these function calls, and extract the nmae of the function and the first argument?

I am trying to parse an array of elements. Those who match a pattern like the following:
Jim("jjanson", Customer.SALES);
I want to create a hash table like Jim => "jjanson"
How can I do this?
I can not match the lines using:
if($line =~ /\s*[A-Za-z]*"(.*),Customer.*\s*/)
You're not matching either the '(' after the name, nor the ' ' after the comma, before "Customer.".
I can get 'jjanson"' using this expression:
/\s*[A-Za-z]\(*"(.*), Customer.*\s*/
But I assume you don't want jjanson", so we need to modify it like so. (I tend to include the negative character class when I'm looking for simply-delimited stuff. So, in this case I'll make the expression "[^"]*"
/\s*[A-Za-z]\(*"([^"]+)", Customer.*\s*/
Also, I try not to depend upon whitespace, presence or number, I'm going to replace the space with \s*. That you didn't notice that you skipped the whitespace is a good illustration of the need to say "ignore a bunch of whitespace".
/\s*[A-Za-z]\(*"([^"]+)",\s*Customer.*\s*/
Now it's only looking for the sequence ',' + 'Customer' in the significant characters. Functionally, the same, if more flexible.
But since you only do one capture, I can't see what you'd map to what. So I'll do my own mapping:
my %records;
while ( my $line = $source->()) { # simply feed for a source of lines.
my ( $first, $user, $tag )
= $line = m/\s*(\p{Alpha}+)\s*\(\s*"([^"]+)",\s*Customer\.(\S+?)\)\/
;
$records{ $user }
= { first => $first
, username => $user
, tag => $tag
};
}
This is much more than you would tend to need in a one-off, quick solution. But I like to store as much of my input as seems significant.
Note that Jim("jjanson", Customer.SALES); matches the syntax of a function call with two arguments. You can thus abuse string eval:
#!/usr/bin/env perl
use strict;
use warnings;
use YAML::XS;
my $info = extract_first_arg(q{ Jim("jjanson", Customer.SALES);} );
print Dump $info;
sub extract_first_arg {
my $call = shift;
my ($name) = ($call =~ m{ \A \s* (\w+) }x);
unless ($name) {
warn "Failed to find function name in '$call'";
return;
}
my $username = eval sprintf(q{
package My::DangerZone;
no strict;
local *{ %s } = sub { $_[0] };
%s
}, $name, $call);
return { $name => $username };
}
Output:
---
Jim: jjanson
Or, you can abuse autoloading:
our $AUTOLOAD;
print Dump eval 'no strict;' . q{ Jim("jjanson", Customer.SALES); };
sub AUTOLOAD {
my ($fn) = ($AUTOLOAD =~ /::(\w+)\z/);
return { $fn => $_[0] };
}
I would not necessarily recommend using these methods, especially on input that is not in your control, and in a situation where this script has access to sensitive facilities.
On the other hand, I have, in the right circumstances, utilized this kind of thing in transforming one given set of information into something that can be used elsewhere.
Try this:
$line = 'Jim("jjanson", Customer.SALES)';
my %hashStore = (); #Jim("jjanson"
if($line=~m/^\s*([^\(\)]*)\(\"([^\"]*)\"/g) { $hashStore{$1} = $2; }
use Data::Dumper;
print Dumper \%hashStore;
Output:
$VAR1 = {
'Jim' => 'jjanson'
};

Issue with regex matching in Perl

I am new to Perl and any help will be appreciated. I have 2 variables: $release and $env_type. I want to check if a string contains $release_$env_type, then do something. For example,
$release="beta";
$env_type="testing";
so string is beta_testing
Code snippet:
if ( $_ =~ /${release}_${env_type}/ ) {
#do Something
}
This if condition doesn't get resolved. Kindly let me know what is the correct syntax to make this check? I searched on Google but didn't get any good post..
Kindly help!
I have a file with contents:
admin_vh_c9_simv2_edg=/console,/consolehelp
idminternal_vh_c9_simv2_edg=/oim,/soa-infra
sso_vh_c9_simv2_edg=/oim,/soa-infra,/odsm
my $env_type = "edg";
my $release = "c9_simv2";
#Input file containing contexts
my $idmInternal = "./IdmContexts.conf";
if ( !-e $idmInternal ) {
die "Unable to find the file $idmInternal!\n";
}
open( MYFILE, $idmInternal );
while (<MYFILE>) {
chomp;
if ( $_ =~ /${release}_${env_type}/ ) {
push( #filtered, $_ );
}
}
Your code is fine. The problem is elsewhere. The following prints match.
my $release="beta";
my $env_type="testing";
$_ = "so string is beta_testing";
if ( $_ =~ /${release}_${env_type}/ ) {
print "match\n";
}
Note: /\Q${release}_${env_type}/ would be better. It'll make sure that special characters in the interpolated variables match themselves.
Most likely problem: You read the value of $release and/or $env_type from a file, and forgot to chomp the trailing newline.
If you are using $_ then this will work.
if (m/${release}_${env_type}/)
{
# Do something
}
The m// match operator binds automatically to $_. There is no need to bind it explicitly.
To really tell what is going on, you can inject code before the test. For example compile a regex first and then print it.
# compile the regex first
my $regex = qr/${release}_${env_type}/;
say qq{\$regex="$regex"};
# then print your scanned text
say qq{\$_="$_"};
if ( m/$regex/ ) {
# do something
}
If you're going to explicitly bind to a regex, then use variables:
my $string = $_;
if ( $string =~ m/$regex/ ) {
}
Otherwise, simply match the "context variable" ($_).
if ( m/$regex/ ) {
}
Also, USUW would help spot a few problems, proactively:
# Before everything else
use strict;
use warnings;

Perl Regex match works, but replace does not

I have put together a Perl script to go through a directory and match various keys in the source and output the results to a text file. The match operation works well, however the end goal is to perform a replace operation. The Perl script is as follows:
#!/usr/bin/perl
#use strict;
use warnings;
#use File::Slurp;
#declare variables
my $file = '';
my $verbose = 0;
my $logfile;
my #files = grep {/[.](pas|cmm|ptd|pro)$/i} glob 'C:\users\perry_m\desktop\epic_test\pascal_code\*.*';
#iterate through the files in input directory
foreach $file (#files) {
print "$file\n";
#read the file into a single string
open FILEHANDLE, $file or die $!;
my $string = do { local $/; <FILEHANDLE> };
#perfrom REGEX on this string
########################################################
#fix the include formats to conform to normal PASCAL
$count = 0;
while ($string =~ m/%INCLUDE/g)
{
#%include
$count++;
}
if ($count > 0)
{
print " $count %INCLUDE\n";
}
$count = 0;
while ($string =~ m/INCLUDE/g)
{
#%INCLUDE;
$count++;
}
if ($count > 0)
{
print " $count INCLUDE\n";
}
$count = 0;
while ($string =~ m/(%include\s+')[A-Za-z0-9]+:([A-Za-z0-9]+.[A-Za-z]+')/g)
{
#$1$2;
$count++;
}
if ($count > 0)
{
print " $count XXXX:include \n";
}
}
This produces output as desired, an example is below:
C:\users\perry_m\desktop\epic_test\pascal_code\BRTINIT.PAS
1 INCLUDE
2 XXXX:include
39 external and readonly
However if I change the regex operations to try and implement a replace, using the replacement operation shown in the commented lines above, the scripts hangs and never returns. I imagine it is somehow related to memory, but I am new to Perl. I was also trying to avoid parsing the file by line if possible.
Example:
while ($string =~ s/%INCLUDE/%include/g)
{
#%include
$count++;
}
and
while ($string =~ s/(%include\s+')[A-Za-z0-9]+:([A-Za-z0-9]+.[A-Za-z]+')/$1$2;/g)
{
#$1$2;
$count++;
}
Edit: simplified the examples
The problem is with your while loops. A loop like
while ($string =~ m/INCLUDE/g) { ... }
will execute once for each ocurrence of INCLUDE in the target string, but a subtitution like
$string =~ s/INCLUDE/%INCLUDE;/
will make all of the replacement in one go and retuen the number of replacements made. So a loop
while ($string =~ s/INCLUDE/%INCLUDE;/g) { ... }
will endlessly add more and more percentage signs before and semicolons after every INCLUDE.
To find the number of replacements made, change all your loops like this to just
$count = $string =~ s/INCLUDE/%INCLUDE;/g
the pattern in s/INCLUDE/%INCLUDE/g will match the replacement also, so if you're running it in a while loop it will run forever (until you run out of memory).
s///g will replace all matches in a single shot so you very rarely will need to put it in a loop. Same goes for m//g, it will do the counting in a single step if you put it in list context.