Searching for Files with specific Regex in filename in Perl - regex

Hi all I was wondering how I can go about searching for files in perl.
Right now I have a line with information that I have tokenized with tab as a delimiter stored into an array. (using split) These arrays contain stub text of filenames I want to search for in a directory. For example Engineering_4.txt would just be "Engin" in my array.
If there are two different files... Engineering_4 and Engineering_5, it would search both these files for content and just extract the information I need from one of them (only 1 contains information I want). I would imagine my script will have to search and store all file names that match and then search through each of these files.
My question is how do I go about searching for files in a directory matching a regular expression in Perl? Also is there a way to limit the file types that I want to search for. For example, I just want to only search for ".txt" files.
Thanks everyone

I guess since you already know the directory you could open it and read it while also filtering it :
opendir D, 'yourDirectory' or die "Could not open dir: $!\n";
my #filelist = grep(/yourRegex/i, readdir D);

You can do this using glob function of <glob> operator.
while (<Engin*.txt>) {
print "$_\n";
}

The glob function returns an array of matching files when provided a wildcard expression.
This means that the files can also be sort-ed before processing:
use Sort::Key::Natural 'natsort';
foreach my $file ( natsort glob "*.txt" ) { # Will loop over only txt files
open my $fh, '<', $file or die $!; # Open file and process
}

You can also use the File::Find module:
#!/usr/bin/env perl
use strict;
use warnings;
use File::Find;
my #dirs = #ARGV ? #ARGV : ('.');
my #list;
find( sub{
push #list, $File::Find::name if -f $_ && $_ =~ m/.+\.txt/ },
#dirs );
print "$_\n" for #list;

Related

How to handle files that do not contain a pattern?

I need help with my Perl program. The idea is to pass in a pattern and a file list from the command line. If the file name matches the pattern, print the file name. Then if the file name doesn't match, it should look for instances of the pattern in the text of the file and print filename : first line of text that contained occurrence.
However should the user add the -i option at the beginning the opposite should occur. If the filename does not match print it. Then print any files that do not contain any instances of the pattern in their text.
This last part is where I'm struggling I'm not exactly sure how to get files that don't have the pattern in their text. For example in my code
#!/usr/bin/perl -w
die("\n Usage: find.pl [-i] <perlRegexPattern> <listOfFiles>\n\n") if(#ARGV<2);
my (#array,$pattern,#filesmatch,#files);
#I can separate files based on name match
($pattern,#array) = ($ARGV[0] eq "-i") ? (#ARGV[1 .. $#ARGV]) : (#ARGV);
foreach(#array){
($_ =~ m/.*\/?$pattern/) ? (push #filesmatch,$_) : (push #files, $_);
}
#and I can get files that contain a pattern match in their text
if($ARGV[0] ne "-i"){
for my $matches(#filesmatch){ #remove path print just file name
$matches =~s/.*\///; #/
print "$matches\n";
}
for my $file(#files){
open(FILE,'<',$file) or die("\nCould not open file $file\n\n");
while(my $line = <FILE>){
if($line =~ m/$pattern/){
$file =~ s/.*\///; #/ remove path print just file name
print "$file: $line";
next;
}
}
}
}
#however I'm not sure how to say this file dosen't have any matches so print it
else{
for my $matches(#files){ #remove path print just file name
$matches =~ s/.*\///;
print "$matches\n";
}
for my $file(#filesmatch){
open(FILE,'<',$file) or die("\nCould not open file $file\n\n");;
while(my $line = <FILE>){...
I'm not sure if something like grep could be used to do this but I'm having a hard time working with Perl's grep.
In order to decide whether to print or not a file based on its content you have to first read the file. With your criterion -- that a phrase does not exist -- you have to check the whole file.
A standard way is to use a separate variable ("flag") to record the condition then go back to print
my $has_match;
while (<$fh>) {
if (/$pattern/) {
$has_match = 1;
last;
}
}
if (not $has_match) {
seek $fh, 0, 0; # rewind to the beginning
print while <$fh>;
}
This can be simplified by reading the file into a variable first, and by using labels (also see perlsyn)
FILE: foreach my $file (#filesmatch) {
open my $fh, '<', $file or die "Can't open $file: $!";
my #lines = <$fh>;
for (#lines) {
next FILE if /$pattern/;
}
print for #lines;
}
Note that skipping an iteration in the middle of a loop isn't the cleanest way since one has to always keep in mind that the rest of the loop may not run.
Each file is read first so that we don't read it twice, but don't do that if any of the files can be huge.
If there is any command line processing it is better to use a module; Getopt::Long is nice.
use Getopt::Long;
my ($inverse, $pattern);
GetOptions('inverse|i' => \$inverse, 'pattern=s' => \$pattern)
or usage(), exit;
usage(), exit if not $pattern or not #ARGV;
sub usage { say STDERR "Usage: $0 ... " }
Call the program as progname [-i] --patern PATTERN files. The module provides a lot, please see docs. For example, in this case you can also just use -p PATTERN.
As GetOptions parses the command line the submitted options are removed from #ARGV and what remains in it are file names. And you have the $inverse variable to nicely make decisions.
Please have use warnings; (not -w) and use strict; at the top of every program.

No such file or directory error: Perl

I am naive in Perl. I have written the following code and I am breaking my head since two days because I am getting the following error when I am trying to open the file: No such file or directory at line 23 (open (FILE, "$config_file") or die $!;)
What I am doing is:
Open the folder and list all the files inside it.
Iterate over each files to look for a particular strings.
create new files for all of the files with the matching string replaced by some other string.
I would really appreciate your help.
Following is my code:
#!/usr/bin/perl -w
#~ The perl script that changes the IP addresses in configuration files from 192.168.3.x into 192.168.31.x in any particular folder
use strict;
use warnings;
use diagnostics;
#~ Get list of files in the Firewall folder
my $directory = 'C:\Users\asura\Desktop\ConfigFiles\Firewall';
opendir (my $dir, $directory) or die $!;
my #list_of_files = readdir($dir);
my $file;
while ($file = readdir ($dir)) {
push #list_of_files, $file;
}
closedir $dir;
print "#list_of_files\n";
#~ Iterate over each files to replace some strings
foreach my $config_file (#list_of_files) {
next unless (($config_file !~ /^\.+$/));
open (FILE, "$config_file") or die $!;
my #original_array = <FILE>;
close(FILE);
my #new_array;
foreach my $line (#original_array) {
chomp($line);
$line =~ s/192\.168\.3/192\.168\.31/g;
push (#new_array, $line);
}
print #new_array;
#~ Create a new files with modified strings
my $new_config_file = $config_file.1;
my $newfile = 'C:\Users\asura\Desktop\ConfigFiles\Firewall\$new_config_file';
open (NEW_FILE, ">", "$newfile") or die $!;
foreach (#new_array){
print NEW_FILE "$_\n";
}
close(NEW_FILE);
}
exit 0;
When you push items onto #list_of_files, you are pushing only the filename (the value returned from readdir). Unless your script is running in C:\Users\asura\Desktop\ConfigFiles\Firewall, the open at line 22 using just the filename (a relative path) will fail.
You need to push absolute paths onto #list_of_files at line 14, like so:
push #list_of_files, $directory . "\\" . $file;
Also, as #Michael-sqlbot mentions, you need to double-quote the string at line 35 for string interpolation to be performed (or use concatenation).
Finally, you should also properly quote the string concatenation on line 34.
The following is a simplification of your code that removes the bugs.
First off kudos including use strict and use warnings in EVERY script. One additional tool that you can use is use autodie; anytime that you're doing file processing.
The primary flaw in your code was the fact that you weren't including the path information when opening your files. There are two main ways to solve this. You can manually specify the path, like you did for your open to your output file handle, or you can use glob instead of opendir as that will automatically include the path in the returned results.
There was a secondary bug in your regex where you were missing a word boundary after .3. This would have led numbers in the thirties to matching mistakenly.
To simplify your code I just removed all of the superfluous temporary variables and instead process things file by file and line by line. This has the benefit of making it more clear when an input and output file handles are obviously related. Finally, if you're actually wanting to edit the files, there are lots of methods demonstrated at perlfaq4.
#!/usr/bin/perl -w
#~ The perl script that changes the IP addresses in configuration files from 192.168.3.x into 192.168.31.x in any particular folder
use strict;
use warnings;
use autodie;
use diagnostics;
#~ Get list of files in the Firewall folder
my $dir = 'C:\Users\asura\Desktop\ConfigFiles\Firewall';
opendir my $dh, $dir;
#~ Iterate over each files to replace some strings
while (my $file = readdir($dh)) {
next if $file =~ /^\.+$/;
open my $infh, '<', "$dir\\$file";
open my $outfh, '>', "$dir\\${file}.1"; #~ Create a new files with modified strings
while (<$infh>) {
s/(?<=192\.168)\.3\b/.31/g;
print $outfh $_;
}
close $infh;
close $outfh;
}
closedir $dh;

Whats wrong with my perl substitution?

I have a directory of files I am trying to split down into subdirectories using perl due to the quantity of files. The filenames are formatted with dates at the start in the form YYYYMMDD and I'm trying to split on that. I am using the following code adapted from this StackOverflow Answer:
#!/usr/bin/perl -w
use strict;
opendir DIR, "." or die "opendir: $!";
my #files = readdir(DIR);
closedir DIR;
foreach my $f (#files) {
-f $f or next;
(my $new_name = $f) =~ s!^((....)(..)(..).*)$!$2/$3/$4/$1/;
-e $new_name and die "$new_name already exists";
rename($f, $new_name);
}
However I get a 'Substitution replacement not terminated at movefiles.pl line 10.' when I try and run this code. As far as I can see I am escaping and terminating the substitution correctly?
You are using ! as a regular expression delimiter. You have one to start it, one to separate the match part from the replace part, but don't have one at the end.

Replacing mutiple strings recursively within all files in a directory using Perl

I'm new with perl. saw many samples but had problems composing a solution
I have a list of strings which each string should be replaced in a different string a->a2, b->b34, etc. list of replacement is in some csv file. need to perform this replacement recursively on all files in directory.
might be any other language just thought perl would be the quickest
Your problem can be split into three steps:
Getting the search-and-replace strings from the CSV file,
Getting a list of all text files inside a given directory incl. subdirectories, and
Replacing all occurences of the search strings with their replacements.
So lets do a countdown and see how we can do that :)
#!/usr/bin/perl
use strict; use warnings;
3. Search and replace
We will define a sub searchAndReplace. It takes a file name as argument and accesses an outside hash. We will call this hash %replacements. Each key is a string we want to replace, and the value is the replacement. This "imposes" the restriction that there can only be one replacement per search string, but that should seem natural. I will further assume that each file is reasonably small (i.e. fits into RAM).
sub searchAndReplace {
my ($filename) = #_;
my $content = do {
open my $file, "<", $filename or die "Cant open $filename: $!";
local $/ = undef; # set slurp mode
<$file>;
};
while(my ($string, $replacement) = each %replacements) {
$content =~ s/\Q$string\E/$replacement/g;
}
open my $file, ">", $filename or die "Can't open $filename: $!";
print $file $content; # I didn't forget the comma
close $file;
}
This code is pretty straightforward, I escape the $string inside the regex so that the contents aren't treated as a pattern. This implementation has the side effect of possibly replacing part of the $content string where something already was replaced, but one could work around that if this is absolutely neccessary.
2. Traversing the file tree
We will define a sub called anakinFileWalker. It takes a filename or a name of an directory and the searchAndReplace sub as arguments. If the filename argument is a plain file, it does the searchAndReplace, if it is a directory, it opens the directory and calls itself on each entry.
sub anakinFileWalker {
my ($filename, $action) = #_;
if (-d $filename) {
opendir my $dir, $filename or die "Can't open $filename: $!";
while (defined(my $entry = readdir $dir)) {
next if $entry eq '.' or $entry eq '..';
# come to the dark side of recursion
anakinFileWalker("$filename/$entry", $action); # be sure to give full path
}
} else {
# Houston, we have a plain file:
$action->($filename);
}
}
Of course, this sub blows up if you have looping symlinks.
1. Setting up the %replacements
There is a nice module Text::CSV which will help you with all your needs. Just make sure that the %replacements meet the definition above, but that isn't hard.
Starting it all
When the %replacements are ready, we just do
anakinFileWalker($topDirectory, \&searchAndReplace);
and it should work. If not, this should have given you an idea about how to solve such a problem.

Search, Create and Move in Perl

I have a directory of about 800 html files. I am trying to search each file and return text between tags. Then I want to create a directory with that text and move (or copy) the files there. This seemed like a pretty easy endeavor when I thought it up but I am having a ton of problems even identifying the modules I would need for this. I have looked at File::Find and glob, but am not exactly sure about how I would implement this with a regex for txt within the files (not the file name.) I am basically a newbie to perl so any and all help would be appreciated. Thanks in advance.
EDIT
To clarify: What I am trying to accomplish:
Read Directory = ~/me/project/
For ~/me/project/ find all the files =~ /.html$/i
For each file, search the html for = div class="recip" id="objectTo">(.*) /div
For every (.*) IE john#doewww.com or John Doe create a directory with that same name
Loop back and move every file that has an instance of xxxxxxxx#xxxxx.com or John Doe to its corresponding directory.
I really appreciate the help!
You're on the right track with File::Find.
You will create a 'wanted()' function, and within that function, the name of the file found will be $File::Find::name. You can then use that to open a file handle, read in the file, search for the tags and extract the data that you're looking for, and close the file handle. File::Find will then move on to the next file.
#! /usr/bin/perl
use warnings;
use strict;
use File::Find;
sub wanted {
my $file=$File::Find::name;
# if the file has the extension '.html' (case insensitive) ...
if( $file =~ /\.html$/i ) {
my $FH;
open( $FH, '<', $file) or die "Could not open '$file' for reading: $!";
local $/ = '';
my $contents = <$FH>; # slurp file into $contents
# search $contents for the tags that you're looking for,
#
close $FH;
}
}
my #directories = (
'./htmlfiles'
, './www'
, './web'
);
find(\&wanted, #directories);
Warning: The code passes perl -c, but I haven't run it.
For the second part of your question, Check out HTML::Strip for stripping HTML markup from text.