I have several files (*.txt) that I need to remove lines from. The files look like this:
This is a line to keep.
keep me too
START
some stuff to remove
other to remove
END
keep me!
And I want them to look like this:
This is a line to keep.
keep me too
keep me!
I've gotten this far:
perl -i -p0e 's/#START.*?END/ /s' file.txt
Which will remove the first instance of that from file.txt, but I can't figure out how to remove all instances from file.txt (and then how to apply this to all *.txt files?)
If what you show works for the first instance, all you should need to add is the /g flag to do all instances, and a shell glob to pick out all .txt files:
perl -i -p0e 's/#START.*?END/ /gs' *.txt
This seems to be right for the flip-flop operator
#!/usr/bin/env perl
use strict;
use warnings;
while( <DATA> ) {
print unless (/^START/ .. /^END/);
}
__DATA__
This is a line to keep.
keep me too
START
some stuff to remove
other to remove
END
keep me!
Output:
This is a line to keep.
keep me too
keep me!
It can also be written as a one-liner:
perl -n -e 'print unless (/^START/ .. /^END/);' input.txt > output.txt
Or, to edit the files in-place:
perl -n -i -e 'print unless (/^START/ .. /^END/);' *.txt
A bookkeeping thing to take care of here is opening and writing of individual files. The processing itself is handled by the range operator.
use warnings;
use strict;
my #files = #ARGV;
my ($fh_in, $fh_out);
foreach my $file (#files)
{
my $outfile = "new_$file";
open $fh_in, '<', $file or die "Can't open $file: $!";
open $fh_out, '>', $outfile or die "Can't open $outfile: $!";
print "Processing $file, writing to $outfile.\n";
while (<$fh_in>) {
print $fh_out $_ if not /^START$/ .. /^END$/;
}
}
This is invoked as script.pl file-list.
Since we use the same filehandle for reading (and the same one for writing), when a new file is opened the previous one is closed, see perlopentut and open. So we don't have to close
You don't have to close FILEHANDLE if you are immediately going to do another open on it, because open closes it for you. (See open.)
I name the new files as new_$file, just to provide a working sample. You could, for example, rename the old one to $file.orig and new one to $file instead, after the while loop. I'd use functions from the core File::Copy module for this. In this case we do need to close files explicitly first.
Related
I'm a total noob at Perl, trying to learn some new code for a specific project. In short, I'm making a script (on osx) that is to search all xml-files in a folder and censor specific numbers. I know a one-liner could have helped, but the amount of files will be pretty huge (thousands of files), and would happen regularly so a script to do it would be nicer. And besides, there is the learning to script part :)
I've managed to open my files, make the regex work on every line on the original for my specific needs and generate a writable tempfile for my new information. This is where things stop working. I've tried to copy the new file over the old file after the loop, but I end up with a blank(!) file. I suspected there to be an error with the temp-file, but that looks perfect. I even tried, as a noobs way out, to reverse the process line by line from the temp back to the original file after changing the open mode (read) on them, but that ALSO gave an empty file.
And now my head is sort of empty. Any help would be appreciated :)
#!/usr/bin/perl
use strict;
use warnings;
use File::Copy;
chdir "/perltest/test"; #debugsafety
#file
my $workingfiles = "*.XML";
my #files = glob("$workingfiles");
#process files
my $old;
my $tmpfile;
foreach my $file (#files) {
print "$file \n";
open ($old, "<", $file) or die "No file";
open ($tmpfile, ">", 'temp.tmp') or die;
while(my $line = <$old> ) {
my $subz = $line;
$subz =~ s/([[:upper:]]{2}[[:digit:]]{6})|([[:upper:]]{1}[[:digit:]]{7})|(?:(?<![[:digit:]])[[:digit:]]{8}(?![[:digit:]])|([[:upper:]]{2}[[:digit:]]{5}[AB]))/**CENS**/g;
print $subz;
print $tmpfile $subz;
}
print "Start copying.\n";
open (my $old, ">", $file) or die "No file";
open (my $tmpfile, "<", 'temp.tmp') or die;
#copy $tmpfile, $old or die "Couldn't copy";
my $y = 0; #debug
while (my $line = <$tmpfile> ) {
print $y++; #debug
my $subz = $line;
print $subz;
print $old $subz;
}
}
print "Complete.\n";
exit;
You re-open your file handles before closing them. I'm an Oracle DBA masquerading as a perl developer, so I can't give the why behind it. But I know if you close your file handles, your script should work as is.
close ($old); # add this line
close ($tmpfile); # add this line
print "Start copying.\n";
It would then be good practice to close them again when you are done "copying" back to them.
Explicitly close the filehandle when you're done writing to it. Things will still be buffered until you do that.
Also would make more sense to
rename($file, "$file.old");
rename("temp.tmp", $file);
rather than looping through the file (or using File::Copy::copy) to make a backup copy of it.
Lastly, for simple edits I could suggest making an effort to get comfortable with doing it on the command line so you don't need scratch your head and wonder "now what did I do in that script last time?". It can be a big timesaver in the long run.
perl -p -i.bak -e 's/pattern/text/;' files*
is the general form.
I want to replace a string in a file. Of course I can use
perl -pi -e 's/pattern/replacement/g' file
but I want to do it with a script.
Is there any other way to do that instead of system("perl -pi -e s/pattern/replacement/g' file")?
-i takes advantage that you can still read an unlinked filehandle, you can see the code it uses in perlrun. Do the same thing yourself.
use strict;
use warnings;
use autodie;
sub rewrite_file {
my $file = shift;
# You can still read from $in after the unlink, the underlying
# data in $file will remain until the filehandle is closed.
# The unlink ensures $in and $out will point at different data.
open my $in, "<", $file;
unlink $file;
# This creates a new file with the same name but points at
# different data.
open my $out, ">", $file;
return ($in, $out);
}
my($in, $out) = rewrite_file($in, $out);
# Read from $in, write to $out as normal.
while(my $line = <$in>) {
$line =~ s/foo/bar/g;
print $out $line;
}
You can duplicate what Perl does with the -i switch easily enough.
{
local ($^I, #ARGV) = ("", 'file');
while (<>) { s/foo/bar/; print; }
}
You can try the below simple method. See if it suits your requirement best.
use strict;
use warnings;
# Get file to process
my ($file, $pattern, $replacement) = #ARGV;
# Read file
open my $FH, "<", $file or die "Unable to open $file for read exited $? $!";
chomp (my #lines = <$FH>);
close $FH;
# Parse and replace text in same file
open $FH, ">", $file or die "Unable to open $file for write exited $? $!";
for (#lines){
print {$FH} $_ if (s/$pattern/$replacement/g);
}
close $FH;
1;
file.txt:
Hi Java, This is Java Programming.
Execution:
D:\swadhi\perl>perl module.pl file.txt Java Source
file.txt
Hi Source, This is Source Programming.
You can handle the use case in the question without recreating the -i flag's functionality or creating throwaway variables. Add the flag to the shebang of a Perl script and read STDIN:
#!/usr/bin/env perl -i
while (<>) {
s/pattern/replacement/g;
print;
}
Usage: save the script, make it executable (with chmod +x), and run
path/to/the/regex-script test.txt
(or regex-script test.txt if the script is saved to a directory in your $PATH.)
Going beyond the question:
If you need to run multiple sequential replacements, that's
#!/usr/bin/env perl -i
while (<>) {
s/pattern/replacement/g;
s/pattern2/replacement2/g;
print;
}
As in the question's example, the source file will not be backed up. Exactly like in an -e oneliner, you can back up to file.<backupExtension> by adding a backupExtension to the -i flag. For example,
#!/usr/bin/env perl -i.bak
You can use
sed 's/pattern/replacement/g' file > /tmp/file$$ && mv /tmp/file$$ file
Some sed versions support the -i command, so you won't need a tmpfile. The -i option will make the temp file and move for you, basicly it is the same solution.
Another solution (Solaris/AIX) can be using a here construction in combination with vi:
vi file 2>&1 >/dev/null <#
1,$ s/pattern/replacement/g
:wq
#
I do not like the vi solution. When your pattern has a / or another special character, it will be hard debugging what went wrong. When replacement is given by a shell variable, you might want to check the contents first.
I have a file place.txt with the content say:
I want to go Rome. Will Go.
I want to go Rome. Will Not Go.
I want to go Rome. Will Not Go.
I want to go Rome. Will Go.
I want to go India. Will Not Go.
I want to go India. Will Not Go.
I want to go Rome. Will Go.
I want to read this file and match the lines with the pattern "I want to go Rome." and omit those lines matching the pattern from this file in perl.
My sample code is:
$file = new IO::File;
$file->open("<jobs.txt") or die "Cannot open jobs.txt";
while(my $line = $file->getline){
next if $line =~ m{/I want to go Rome/};
print $line;
}
close $file;
Note: My file would be a big one. Can we use sed or awk?
It's as simple as
perl -ine'print unless /I want to go Rome/'
If you prefer script
use strict;
use warnings;
use autodie;
use constant FILENAME => 'jobs.txt';
open my $in, '<', FILENAME;
while (<$in>) {
if ( $. == 1 ) { # you need to do this after read first line
# $in will keep i-node with the original file
open my $out, '>', FILENAME;
select $out;
}
print unless /I want to go Rome/;
}
grep -v 'I want to go Rome' jobs.txt
Is much simpler still to write.
awk '$0~/I want to go Rome/' jobs.txt
try this :
use strict;
use warnings;
use File::Copy;
open my $file, "<", "jobs.txt" or die "Cannot open jobs.txt : $!";
open my $outtemp, ">", "out.temp" or die "Unable to open file : $!\n"; # create a temporary file to write the lines which doesn't match pattern.
while(my $line = <$file>)
{
next if $line =~ m/I want to go Rome/; # ignore lines which matches pattern
print $outtemp $line; # write the rest lines in temp file.
}
close $file;
close $outtemp;
move("out.temp", "jobs.txt"); # move temp file into original file.
I have write the code but it is not working fine . I wish to change this "/" to this "\".
use strict;
use warnings;
open(DATA,"+<unix_url.txt") or die("could not open file!");
while(<DATA>){
s/\//\\/g;
s/\\/c:/;
print DATA $_;
}
close(DATA);
my original file is
/etc/passwd
/home/bob/bookmarks.xml
/home/bob/vimrc
expected output is
C:\etc\passwd
C:\home\bob\bookmarks.xml
C:\home\bob\vimrc
original output is
/etc/passwd
/home/bob/bookmarks.xml
/home/bob/vimrc/etc/passwd
\etc\passwd
kmarks.xml
kmarks.xml
mrcmrc
Trying to read and write the same file, line by line, in a while loop that is reading till the end of that same file, seems very dicey and unpredictable. I'm not at all sure where your file pointers are going to end up each time you try to write. You would be much safer sending your output to a new file (and then moving it to replace your old file afterwards if you wish).
open(DATA,"<unix_url.txt") or die("could not open file for reading!");
open(NEWDATA, ">win_url.txt") or die ("could not open file for writing!");
while(<DATA>){
s/\//\\/g;
s/\\/c:\\/;
# ^ (note - from your expected output you also wanted to preserve this backslash)
print NEWDATA $_;
}
close(DATA);
close(NEWDATA);
rename("win_url.txt", "unix_url.txt");
See also this answer:
Perl Read/Write File Handle - Unable to Overwrite
If the point of the exercise is less about using regular expressions, and more about getting things done, I would consider using modules from the File::Spec family:
use warnings;
use strict;
use File::Spec::Win32;
use File::Spec::Unix;
while (my $unixpath = <>) {
my #pieces = File::Spec::Unix->splitpath($unixpath);
my $winpath = File::Spec::Win32->catfile('c:', #pieces);
print "$winpath\n";
}
You don't really need to write a program do achieve this. You can use Perl Pie:
perl -pi -e 's|/|\\|g; s|\\|c:\\|;' unix_url.txt
However if you are running on windows and you use Cygwin, I would suggest to use the cygpath tool that convert POSIX paths into Windows paths.
Also you need to quote your paths since it is allowed to have spaces into windows paths. Or, you can escape the space char:
perl -pi -e 's|/|\\/g; s|\\|c:\\|; s| |\\ |g;' unix_url.txt
Now concerning your initial question, if you still want to use your own script you can use this (if you want a backup):
use strict;
use autodie;
use File::Copy;
my $file = "unix_url.txt";
open my $fh, "<", $file;
open my $tmp, ">", "$file.bak";
while (<$fh>) {
s/\//\\/g;
s/\\/c:/;
} continue { print $tmp $_ }
close $tmp;
close $fh;
move "$file.bak", $file;
I am naive in Perl. I have written the following code and I am breaking my head since two days because I am getting the following error when I am trying to open the file: No such file or directory at line 23 (open (FILE, "$config_file") or die $!;)
What I am doing is:
Open the folder and list all the files inside it.
Iterate over each files to look for a particular strings.
create new files for all of the files with the matching string replaced by some other string.
I would really appreciate your help.
Following is my code:
#!/usr/bin/perl -w
#~ The perl script that changes the IP addresses in configuration files from 192.168.3.x into 192.168.31.x in any particular folder
use strict;
use warnings;
use diagnostics;
#~ Get list of files in the Firewall folder
my $directory = 'C:\Users\asura\Desktop\ConfigFiles\Firewall';
opendir (my $dir, $directory) or die $!;
my #list_of_files = readdir($dir);
my $file;
while ($file = readdir ($dir)) {
push #list_of_files, $file;
}
closedir $dir;
print "#list_of_files\n";
#~ Iterate over each files to replace some strings
foreach my $config_file (#list_of_files) {
next unless (($config_file !~ /^\.+$/));
open (FILE, "$config_file") or die $!;
my #original_array = <FILE>;
close(FILE);
my #new_array;
foreach my $line (#original_array) {
chomp($line);
$line =~ s/192\.168\.3/192\.168\.31/g;
push (#new_array, $line);
}
print #new_array;
#~ Create a new files with modified strings
my $new_config_file = $config_file.1;
my $newfile = 'C:\Users\asura\Desktop\ConfigFiles\Firewall\$new_config_file';
open (NEW_FILE, ">", "$newfile") or die $!;
foreach (#new_array){
print NEW_FILE "$_\n";
}
close(NEW_FILE);
}
exit 0;
When you push items onto #list_of_files, you are pushing only the filename (the value returned from readdir). Unless your script is running in C:\Users\asura\Desktop\ConfigFiles\Firewall, the open at line 22 using just the filename (a relative path) will fail.
You need to push absolute paths onto #list_of_files at line 14, like so:
push #list_of_files, $directory . "\\" . $file;
Also, as #Michael-sqlbot mentions, you need to double-quote the string at line 35 for string interpolation to be performed (or use concatenation).
Finally, you should also properly quote the string concatenation on line 34.
The following is a simplification of your code that removes the bugs.
First off kudos including use strict and use warnings in EVERY script. One additional tool that you can use is use autodie; anytime that you're doing file processing.
The primary flaw in your code was the fact that you weren't including the path information when opening your files. There are two main ways to solve this. You can manually specify the path, like you did for your open to your output file handle, or you can use glob instead of opendir as that will automatically include the path in the returned results.
There was a secondary bug in your regex where you were missing a word boundary after .3. This would have led numbers in the thirties to matching mistakenly.
To simplify your code I just removed all of the superfluous temporary variables and instead process things file by file and line by line. This has the benefit of making it more clear when an input and output file handles are obviously related. Finally, if you're actually wanting to edit the files, there are lots of methods demonstrated at perlfaq4.
#!/usr/bin/perl -w
#~ The perl script that changes the IP addresses in configuration files from 192.168.3.x into 192.168.31.x in any particular folder
use strict;
use warnings;
use autodie;
use diagnostics;
#~ Get list of files in the Firewall folder
my $dir = 'C:\Users\asura\Desktop\ConfigFiles\Firewall';
opendir my $dh, $dir;
#~ Iterate over each files to replace some strings
while (my $file = readdir($dh)) {
next if $file =~ /^\.+$/;
open my $infh, '<', "$dir\\$file";
open my $outfh, '>', "$dir\\${file}.1"; #~ Create a new files with modified strings
while (<$infh>) {
s/(?<=192\.168)\.3\b/.31/g;
print $outfh $_;
}
close $infh;
close $outfh;
}
closedir $dh;