Whats wrong with my perl substitution? - regex

I have a directory of files I am trying to split down into subdirectories using perl due to the quantity of files. The filenames are formatted with dates at the start in the form YYYYMMDD and I'm trying to split on that. I am using the following code adapted from this StackOverflow Answer:
#!/usr/bin/perl -w
use strict;
opendir DIR, "." or die "opendir: $!";
my #files = readdir(DIR);
closedir DIR;
foreach my $f (#files) {
-f $f or next;
(my $new_name = $f) =~ s!^((....)(..)(..).*)$!$2/$3/$4/$1/;
-e $new_name and die "$new_name already exists";
rename($f, $new_name);
}
However I get a 'Substitution replacement not terminated at movefiles.pl line 10.' when I try and run this code. As far as I can see I am escaping and terminating the substitution correctly?

You are using ! as a regular expression delimiter. You have one to start it, one to separate the match part from the replace part, but don't have one at the end.

Related

Excluding a file with perl grep

I want to go over all of the files in the directory, except for files ending with '.py'.
The line in the existing script is:
my #files = sort(grep(!/^(\.|\.\.)$/, readdir($dir_h)));
And I want something like:
my #files = sort(grep(!/^(\.|\.\.|"*.py")$/, readdir($dir_h)));
Can you please help with the exact syntax?
grep uses regular expressions, not globs (aka wildcards). The correct syntax is
my #files = sort(grep(!/^(\.|\.\.|.*\.py)$/, readdir($dir_h)));
or, without the unnecessary parentheses
my #files = sort grep ! /^(\.|\.\.|.*\.py)$/, readdir $dir_h;
As the parentheses in the regular expression aren't used for capturing, but only for precedence, you can change them to non-capturing:
my #files = sort grep ! /^(?:\.|\.\.|.*\.py)$/, readdir $dir_h;
You can express the same in many different ways, e.g.
/^\.{1,2}$|\.py$/
i.e. dot once or twice with nothing around, or .py at the end.
perl's build in grep is actually very clever - it iterates an array, applying a condition to each element in turn. It sets each element to $_.
This condition can be a simple regular expression, but it doesn't have to be.
So you can - for example:
my #files = grep { -f $_ } readir(DIR);
But because -f defaults to $_ you can also:
my #files = grep { -f } readdir (DIR);
You can also apply a regular expression to $_
my #files = grep { not m/\.py$/ } readdir (DIR);
(Note - this is the same as not $_ =~ m/\.py$/ - patterns apply to $_ by default).
So you can do what you want by:
my #files = sort grep { not m/\.py$/ and -f } readdir (DIR);
Although note - that will work in the current working directory, not for reading a separate path. You can use readdir for different directories, but personally I prefer glob - because it fills in the path as well:
my #files = sort grep { not m/\.py$/ and -f } glob ( "$dir/*" );
Check that the directory entries are files and then exclude those that end in .py:
#!/usr/bin/env perl
use warnings;
use strict;
my $dir = "/home/me/somedir";
# good examples in the perldoc:
# perldoc -f readdir
opendir(my $DIR, $dir) || die "Unable to open $dir : $!";
# -f checks that it is a plain file ( perldoc perlfunc )
# !~ means does not match ( perldoc perlre )
# m|\.py$| means a match string that ends in '.py'
my #files = sort grep { -f "$dir/$_" && $_ !~ m|\.py$| } readdir($DIR);

How to update certain part of text file in perl

I have write the code but it is not working fine . I wish to change this "/" to this "\".
use strict;
use warnings;
open(DATA,"+<unix_url.txt") or die("could not open file!");
while(<DATA>){
s/\//\\/g;
s/\\/c:/;
print DATA $_;
}
close(DATA);
my original file is
/etc/passwd
/home/bob/bookmarks.xml
/home/bob/vimrc
expected output is
C:\etc\passwd
C:\home\bob\bookmarks.xml
C:\home\bob\vimrc
original output is
/etc/passwd
/home/bob/bookmarks.xml
/home/bob/vimrc/etc/passwd
\etc\passwd
kmarks.xml
kmarks.xml
mrcmrc
Trying to read and write the same file, line by line, in a while loop that is reading till the end of that same file, seems very dicey and unpredictable. I'm not at all sure where your file pointers are going to end up each time you try to write. You would be much safer sending your output to a new file (and then moving it to replace your old file afterwards if you wish).
open(DATA,"<unix_url.txt") or die("could not open file for reading!");
open(NEWDATA, ">win_url.txt") or die ("could not open file for writing!");
while(<DATA>){
s/\//\\/g;
s/\\/c:\\/;
# ^ (note - from your expected output you also wanted to preserve this backslash)
print NEWDATA $_;
}
close(DATA);
close(NEWDATA);
rename("win_url.txt", "unix_url.txt");
See also this answer:
Perl Read/Write File Handle - Unable to Overwrite
If the point of the exercise is less about using regular expressions, and more about getting things done, I would consider using modules from the File::Spec family:
use warnings;
use strict;
use File::Spec::Win32;
use File::Spec::Unix;
while (my $unixpath = <>) {
my #pieces = File::Spec::Unix->splitpath($unixpath);
my $winpath = File::Spec::Win32->catfile('c:', #pieces);
print "$winpath\n";
}
You don't really need to write a program do achieve this. You can use Perl Pie:
perl -pi -e 's|/|\\|g; s|\\|c:\\|;' unix_url.txt
However if you are running on windows and you use Cygwin, I would suggest to use the cygpath tool that convert POSIX paths into Windows paths.
Also you need to quote your paths since it is allowed to have spaces into windows paths. Or, you can escape the space char:
perl -pi -e 's|/|\\/g; s|\\|c:\\|; s| |\\ |g;' unix_url.txt
Now concerning your initial question, if you still want to use your own script you can use this (if you want a backup):
use strict;
use autodie;
use File::Copy;
my $file = "unix_url.txt";
open my $fh, "<", $file;
open my $tmp, ">", "$file.bak";
while (<$fh>) {
s/\//\\/g;
s/\\/c:/;
} continue { print $tmp $_ }
close $tmp;
close $fh;
move "$file.bak", $file;

Bulk renaming files with bash and Perl based on file name

I'm looking to bulk rename files in the current directory only and remove certain strings from the end of file names.
Sample:
foo-bar-(ab-4529111094).txt
foo-bar-foo-bar-(ab-189534).txt
foo-bar-foo-bar-bar-(ab-24937932201).txt
the output should look like this:
foo-bar.txt
foo-bar-foo-bar.txt
foo-bar-foo-bar-bar.txt
I want to remove the string -(ab-2492201) at the end of each file name
knowing that the digits can vary in length.
A Perl regex is preferred over modules and without using any utilities and for bash oneliner command is highly preferred.
How to accomplish that in both Perl and Bash Shell on Linux? interested to know both solutions.
Try:
$ rename 's/-\(ab-\d+\)(?=\.txt$)//' *.txt
There's a rename command written in Perl. Its first argument is Perl code describing how to transform a filename. You could use the same s/// command in your own Perl program or one-liner.
If that doesn't work, try prename instead of rename; there's a different, non-Perl-based, rename command installed on some systems, in which case the Perl one may be called prename.
Using Perl Regex to Rename Files
With find, perl, and xargs, you could use this one-liner
find . -type f | perl -pe 'print $_; s/input/output/' | xargs -n2 mv
Results without calling mv should just be
OldName NewName
OldName NewName
OldName NewName
How does it work?
find . -type f outputs file paths (or file names...you control what gets processed by regex here!)
-p prints file paths to be processed by regex, -e executes inline script
print $_ prints the original file name first (independent of -p)
-n2 prints two elements per line
mv gets the input of the previous line
In bash, you could write something like:
for file in *-\(ab-[0-9]*\)*; do
newfile="${file/-(ab-[0-9]*)/}"
mv "$file" "$newfile"
done
When you say under the current directory, do you mean in the current directory, or anywhere in or beaneath the current directory and its descendants?
File::Find is a simple way to do the latter, and is a core module so won't need installing. Like so:
use strict;
use warnings;
use autodie;
use File::Find;
find(\&rename, '.');
sub rename {
return unless -f;
my $newname = $_;
return unless $newname =~ s/-\(ab-[0-9]+\)(\.txt)$/$1/i;
print "rename $_, $newname\n";
}
Update
This program will rename all the files with the given filename pattern only within the current directory.
Note that the initial open loop is there only to create sample files for renaming.
use strict;
use warnings;
use autodie;
open my $fh, '>', $_ for qw(
foo-bar-(ab-4529111094).txt
foo-bar-foo-bar-(ab-189534).txt
foo-bar-foo-bar-bar-(ab-24937932201).txt
);
for (glob '*.txt') {
next unless -f;
my $newname = $_;
next unless $newname =~ s/-\(ab-[0-9]+\)(\.txt)$/$1/i;
print "rename $_, $newname\n";
rename $_, $newname;
}
output
rename foo-bar-(ab-4529111094).txt, foo-bar.txt
rename foo-bar-foo-bar-(ab-189534).txt, foo-bar-foo-bar.txt
rename foo-bar-foo-bar-bar-(ab-24937932201).txt, foo-bar-foo-bar-bar.txt
A simpler, shorter (better ? :) ) rename regex :
rename 's#-\(.*?\)##' foo*.txt
check this:
ls -1 | nawk '/foo-bar-/{old=$0;gsub(/-\(.*\)/,"",$0);system("mv \""old"\" "$0)}'
> ls -1 foo*
foo-bar-(ab-4529111094).txt
foo-bar-foo-bar-(ab-189534).txt
foo-bar-foo-bar-bar-(ab-24937932201).txt
> ls -1 | nawk '/foo-bar-/{old=$0;gsub(/-\(.*\)/,"",$0);system("mv \""old"\" "$0)}'
> ls -1 foo*
foo-bar-foo-bar-bar.txt
foo-bar-foo-bar.txt
foo-bar.txt
>
For detailed explanation check here
Another way using just perl:
perl -E'for (<*.*>){ ($new = $_) =~ s/(^.+?)(-\(.+)(\..*$)/$1$3/; say $_." -> ".$new}'
(say ... is nice for testing, just replace it with rename $_,$new or rename($_,$new) )
<*.*> read every file in the current directory
($new = $_) =~ saves the following substitution in $new and leaves $_ as intact
(^.+?) save this match in $1 and non-greedy match from the beginning until...
(-\(.+) the sequence "-( ...anything..." is found. (this match would be saved in $2)
(\..*$) save everything from the last "." (period) before the end ($) of the line until and including the end of the line -> into $3
substitute the match with the string generated from $1$3
( you could also do it for a specific directory with perl -E'for (</tmp/my/directory/*.*>){ .....

Perl regexp how to get the file name out?

I have this directory path:
\main\ABC_PRD\ABC_QEM\1\testQEM.txt\main\ABC_QEM\1
How can I get the file name testQEM.txt from the above string?
I use this:
$file =~ /(.+\\)(.+\..+)(\\.+)/;
But get this result:
file = testQEM.txt\main\ABC_QEM
Thanks,
Jirong
I'm not sure I understand, as paths cannot have a file node half way through them! Have multiple paths got concatenated somehow?
Anyway, I suggest you work though the path looking for the first node that validates as a real file using -f
Here is an example
use strict;
use warnings;
my $path = '\main\ABC_PRD\ABC_QEM\1\testQEM.txt\main\ABC_QEM\1';
my #path = split /\\/, $path;
my $file = shift #path;
$file .= '\\'.shift #path until -f $file or #path == 0;
print "$file\n";
/[^\\]+\.[^\\]+/
Capture anything separated by a . between two backslashes. Is this what you where looking for?
This is a bit difficult, as directory names can contain contain periods. This is especially true for *nix Systems, but is valid under Windows as well.
Therefore, each possible subpath has to be tested iteratively for file-ness.
I'd maybe try something like this:
my $file;
my $weirdPath = q(/main/ABC_PRD/ABC_QEM/1/testQEM.txt/main/ABC_QEM/1);
my #parts = split m{/} $weirdPath;
for my $i (0 .. $#parts) {
my $path = join "/", #parts[0 .. $i];
if (-f $path) { # optionally "not -d $path"
$file = $parts[$i];
last;
}
}
print "file=$file\n"; # "file=testQEM.txt\n"
I split the weird path at all slashes (change to backslashes if interoperability is not an issue for you). Then I join the first $i+1 elements together and test if the path is a normal file. If so, I store the last part of the path and exit the loop.
If you can guarantee that the file is the only part of the path that contains periods, then using one of the other solutions will be preferable.
my $file = '\main\ABC_PRD\ABC_QEM\1\testQEM.txt\main\ABC_QEM\1';
my ($result) = $file =~ /\\([^\\]+\.[^\\]+)\\/;
Parentheses around $result force the list context on the right hand side expression, which in turn returns what matches in parentheses.
Use regex pattern /(?=[^\\]+\.)([^\\]+)/
my $path = '\main\ABC_PRD\ABC_QEM\1\testQEM.txt\main\ABC_QEM\1';
print $1 if $path =~ /(?=[^\\]+\.)([^\\]+)/;
Test this code here.
>echo "\main\ABC_PRD\ABC_QEM\1\testQEM.txt\main\ABC_QEM\1"|perl -pi -e "s/.*([\\][a-zA-Z]*\.txt).*/\1/"
\testQEM.txt
i suggest you may comprehend principle of regexp Backtracking ,such as how * and + to work.
you only make a little change about your regexp as:
/(.+\\)(.+\..+?)(\\.+)/

Searching for Files with specific Regex in filename in Perl

Hi all I was wondering how I can go about searching for files in perl.
Right now I have a line with information that I have tokenized with tab as a delimiter stored into an array. (using split) These arrays contain stub text of filenames I want to search for in a directory. For example Engineering_4.txt would just be "Engin" in my array.
If there are two different files... Engineering_4 and Engineering_5, it would search both these files for content and just extract the information I need from one of them (only 1 contains information I want). I would imagine my script will have to search and store all file names that match and then search through each of these files.
My question is how do I go about searching for files in a directory matching a regular expression in Perl? Also is there a way to limit the file types that I want to search for. For example, I just want to only search for ".txt" files.
Thanks everyone
I guess since you already know the directory you could open it and read it while also filtering it :
opendir D, 'yourDirectory' or die "Could not open dir: $!\n";
my #filelist = grep(/yourRegex/i, readdir D);
You can do this using glob function of <glob> operator.
while (<Engin*.txt>) {
print "$_\n";
}
The glob function returns an array of matching files when provided a wildcard expression.
This means that the files can also be sort-ed before processing:
use Sort::Key::Natural 'natsort';
foreach my $file ( natsort glob "*.txt" ) { # Will loop over only txt files
open my $fh, '<', $file or die $!; # Open file and process
}
You can also use the File::Find module:
#!/usr/bin/env perl
use strict;
use warnings;
use File::Find;
my #dirs = #ARGV ? #ARGV : ('.');
my #list;
find( sub{
push #list, $File::Find::name if -f $_ && $_ =~ m/.+\.txt/ },
#dirs );
print "$_\n" for #list;