I'm a total noob at Perl, trying to learn some new code for a specific project. In short, I'm making a script (on osx) that is to search all xml-files in a folder and censor specific numbers. I know a one-liner could have helped, but the amount of files will be pretty huge (thousands of files), and would happen regularly so a script to do it would be nicer. And besides, there is the learning to script part :)
I've managed to open my files, make the regex work on every line on the original for my specific needs and generate a writable tempfile for my new information. This is where things stop working. I've tried to copy the new file over the old file after the loop, but I end up with a blank(!) file. I suspected there to be an error with the temp-file, but that looks perfect. I even tried, as a noobs way out, to reverse the process line by line from the temp back to the original file after changing the open mode (read) on them, but that ALSO gave an empty file.
And now my head is sort of empty. Any help would be appreciated :)
#!/usr/bin/perl
use strict;
use warnings;
use File::Copy;
chdir "/perltest/test"; #debugsafety
#file
my $workingfiles = "*.XML";
my #files = glob("$workingfiles");
#process files
my $old;
my $tmpfile;
foreach my $file (#files) {
print "$file \n";
open ($old, "<", $file) or die "No file";
open ($tmpfile, ">", 'temp.tmp') or die;
while(my $line = <$old> ) {
my $subz = $line;
$subz =~ s/([[:upper:]]{2}[[:digit:]]{6})|([[:upper:]]{1}[[:digit:]]{7})|(?:(?<![[:digit:]])[[:digit:]]{8}(?![[:digit:]])|([[:upper:]]{2}[[:digit:]]{5}[AB]))/**CENS**/g;
print $subz;
print $tmpfile $subz;
}
print "Start copying.\n";
open (my $old, ">", $file) or die "No file";
open (my $tmpfile, "<", 'temp.tmp') or die;
#copy $tmpfile, $old or die "Couldn't copy";
my $y = 0; #debug
while (my $line = <$tmpfile> ) {
print $y++; #debug
my $subz = $line;
print $subz;
print $old $subz;
}
}
print "Complete.\n";
exit;
You re-open your file handles before closing them. I'm an Oracle DBA masquerading as a perl developer, so I can't give the why behind it. But I know if you close your file handles, your script should work as is.
close ($old); # add this line
close ($tmpfile); # add this line
print "Start copying.\n";
It would then be good practice to close them again when you are done "copying" back to them.
Explicitly close the filehandle when you're done writing to it. Things will still be buffered until you do that.
Also would make more sense to
rename($file, "$file.old");
rename("temp.tmp", $file);
rather than looping through the file (or using File::Copy::copy) to make a backup copy of it.
Lastly, for simple edits I could suggest making an effort to get comfortable with doing it on the command line so you don't need scratch your head and wonder "now what did I do in that script last time?". It can be a big timesaver in the long run.
perl -p -i.bak -e 's/pattern/text/;' files*
is the general form.
Related
My overall perl script is trying to operate as follows:
Read source file for content
Find regex match and use that match for Destination filename
Open new Destination filename
Find regex match for C format comments /* */ that contain a keyword, i.e. abcd
Write string matches found to destination filename
#!/usr/bin/perl
use warnings;
use strict;
my $src = 'D:\\Scripts\\sample.c';
my $fileName;
# open source file for reading
open(SRC,'<',$src) or die $!;
while(my $row = <SRC>){
if ($row =~ /([0-9]{2}\.[0-9]{2}\.[0-9]{3}\.[a-z,0-9]{2}|[0-9]{2}\.[0-9]{2}\.[0-9]{3}\.[a-z,0-9]{3})/){
$fileName = $1;
}
}
my $des = "D:\\Scripts\\" . $fileName . ".txt";
# open destination file for writing
open(DES,'>',$des) or die $!;
print("copying content from $src to $des\n");
while(my $row = <SRC>){
if ($row =~ /(\/\*.*abcd.[\s\S]*?\*\/)/){
print DES $1;
}
}
# always close the filehandles
close(SRC);
close(DES);
print "File content copied successfully!\n";
I am running this in Windows 10 command line with Perl 5.32.1. My problem is I am not getting any content writing to the destination file. The file gets created but no content gets written to it. When I change:
print DES $1; -> print "$1\n";
I get no content coming out the command line window either. When I move the entire second if statement to be nested underneath the first while loop after the 1st if statement I get output to the command line. But I cannot keep the 2nd if statement there because I want it to write to the destination file.
As the $src file is read the first time the SRC filehandle reaches the file end. So when you try to read the file again, there is nothing to read on that filehandle (and it won't tell).
After the first read is done reposition the filehandle to the beginning of the file
seek SRC, 0, 0;
There are nice symbolic constants that can be used with seek, see Fcntl
Another option is to close and open the file again. (Or even just re-open the same filehandle to it, in which case it gets closed first.)
Note: it is just better to use lexical filehandles than typeglobs, like
open my $src_fh, '<', $src_file or die $!;
See a comment on it in perldata, and search SO posts (here is one for example).
I'd change the problem slightly. Instead of reading the file twice, read it once. Write to a temp file where the name is not important. Along the way, discover the final filename. After you are done, rename the temp file.
I have a xml file contain data's like:
<get>9090</get><br>
<setId>setIdHere</set>
<mainId>121</mainId>
As I'm not using any external lib/packages, however I'm need to do some changes using I/O.
I need to change the string setIdHere with something. Please find the perl code below:
my $filename="file1.xml";
my $idVal=3232;
open(my $fh , '>>' ,$fileName);
select $fh or die $!;
s/setIdHere/$idVal;
print;
select STDOUT;
close($fh);
The above code is appending the value in the end, but I want to replace it with the string setIdHere.
I'm new to perl not sure what's wrong with the above code.
Thanks in advance.
First off, your code is using some unusually outdated techniques. select $fh has a global effect and is best avoided.
In general to edit a file you need to open it for reading, read it in, alter it, and write it back out again. To avoid pulling the whole file into memory, the file can be very big, you generally do this line by line.
You can't write to the same file you're reading from (well, you can, but it makes a mess), so instead you write to a temp file and then when you're done rename to be the original.
# This forces you to declare all variables protecting against typos
use strict;
# This lets you know when you've done something you probably shouldn't.
use warnings;
# This will error if file operations failed, no more "or die $!"
use autodie;
my $file = "file1.xml";
my $tmp = $file.".new"; # file1.xml.new
open my $in, "<", $file; # open the XML file for reading
open my $out, ">", $tmp; # open a temp file for writing
# Read the file line by line
while(my $line = <$in>) {
# Change the line.
$line =~ s{this}{that}g;
# Write it to the temp file.
print $out $line;
}
# If you don't do this, it might not have finished writing.
close $out;
close $in;
# Overwrite the old file with the new one.
rename $temp, $file;
HOWEVER you're editing XML. XML is structured and you should not try to read and edit it with regexes. You instead need to parse it with an XML library like XML::LibXML or XML::Twig.
You say you can't use any external library, but I bet you can, it's just a matter of figuring out how. You'll have a much easier time of it if you do. Generally the reason is that you don't have admin privileges. The simplest solution is to install perlbrew and install your own copy of Perl that you can manage. Perlbrew makes this easy.
Please, never ever use regular expressions to parse XML. XML is contextual, and regular expressions are not. Therefore it's only ever going to be a dirty hack.
I would recommend XML::Twig if you need to modify an XML file. It supports xpath, which is like regular expressions, but inherently handles the context problem.
XML::Twig also does 'parsefile_inplace' for in place editing of your file:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
sub modify_setId {
my ( $twig, $setId ) = #_;
$setId -> set_text('3232');
$twig -> flush;
}
my $twig = XML::Twig -> new ( twig_handlers => { 'setId' => \&modify_setId } );
$twig -> set_pretty_print('indented');
$twig -> parsefile_inplace('test.xml');
I have several files (*.txt) that I need to remove lines from. The files look like this:
This is a line to keep.
keep me too
START
some stuff to remove
other to remove
END
keep me!
And I want them to look like this:
This is a line to keep.
keep me too
keep me!
I've gotten this far:
perl -i -p0e 's/#START.*?END/ /s' file.txt
Which will remove the first instance of that from file.txt, but I can't figure out how to remove all instances from file.txt (and then how to apply this to all *.txt files?)
If what you show works for the first instance, all you should need to add is the /g flag to do all instances, and a shell glob to pick out all .txt files:
perl -i -p0e 's/#START.*?END/ /gs' *.txt
This seems to be right for the flip-flop operator
#!/usr/bin/env perl
use strict;
use warnings;
while( <DATA> ) {
print unless (/^START/ .. /^END/);
}
__DATA__
This is a line to keep.
keep me too
START
some stuff to remove
other to remove
END
keep me!
Output:
This is a line to keep.
keep me too
keep me!
It can also be written as a one-liner:
perl -n -e 'print unless (/^START/ .. /^END/);' input.txt > output.txt
Or, to edit the files in-place:
perl -n -i -e 'print unless (/^START/ .. /^END/);' *.txt
A bookkeeping thing to take care of here is opening and writing of individual files. The processing itself is handled by the range operator.
use warnings;
use strict;
my #files = #ARGV;
my ($fh_in, $fh_out);
foreach my $file (#files)
{
my $outfile = "new_$file";
open $fh_in, '<', $file or die "Can't open $file: $!";
open $fh_out, '>', $outfile or die "Can't open $outfile: $!";
print "Processing $file, writing to $outfile.\n";
while (<$fh_in>) {
print $fh_out $_ if not /^START$/ .. /^END$/;
}
}
This is invoked as script.pl file-list.
Since we use the same filehandle for reading (and the same one for writing), when a new file is opened the previous one is closed, see perlopentut and open. So we don't have to close
You don't have to close FILEHANDLE if you are immediately going to do another open on it, because open closes it for you. (See open.)
I name the new files as new_$file, just to provide a working sample. You could, for example, rename the old one to $file.orig and new one to $file instead, after the while loop. I'd use functions from the core File::Copy module for this. In this case we do need to close files explicitly first.
I have write the code but it is not working fine . I wish to change this "/" to this "\".
use strict;
use warnings;
open(DATA,"+<unix_url.txt") or die("could not open file!");
while(<DATA>){
s/\//\\/g;
s/\\/c:/;
print DATA $_;
}
close(DATA);
my original file is
/etc/passwd
/home/bob/bookmarks.xml
/home/bob/vimrc
expected output is
C:\etc\passwd
C:\home\bob\bookmarks.xml
C:\home\bob\vimrc
original output is
/etc/passwd
/home/bob/bookmarks.xml
/home/bob/vimrc/etc/passwd
\etc\passwd
kmarks.xml
kmarks.xml
mrcmrc
Trying to read and write the same file, line by line, in a while loop that is reading till the end of that same file, seems very dicey and unpredictable. I'm not at all sure where your file pointers are going to end up each time you try to write. You would be much safer sending your output to a new file (and then moving it to replace your old file afterwards if you wish).
open(DATA,"<unix_url.txt") or die("could not open file for reading!");
open(NEWDATA, ">win_url.txt") or die ("could not open file for writing!");
while(<DATA>){
s/\//\\/g;
s/\\/c:\\/;
# ^ (note - from your expected output you also wanted to preserve this backslash)
print NEWDATA $_;
}
close(DATA);
close(NEWDATA);
rename("win_url.txt", "unix_url.txt");
See also this answer:
Perl Read/Write File Handle - Unable to Overwrite
If the point of the exercise is less about using regular expressions, and more about getting things done, I would consider using modules from the File::Spec family:
use warnings;
use strict;
use File::Spec::Win32;
use File::Spec::Unix;
while (my $unixpath = <>) {
my #pieces = File::Spec::Unix->splitpath($unixpath);
my $winpath = File::Spec::Win32->catfile('c:', #pieces);
print "$winpath\n";
}
You don't really need to write a program do achieve this. You can use Perl Pie:
perl -pi -e 's|/|\\|g; s|\\|c:\\|;' unix_url.txt
However if you are running on windows and you use Cygwin, I would suggest to use the cygpath tool that convert POSIX paths into Windows paths.
Also you need to quote your paths since it is allowed to have spaces into windows paths. Or, you can escape the space char:
perl -pi -e 's|/|\\/g; s|\\|c:\\|; s| |\\ |g;' unix_url.txt
Now concerning your initial question, if you still want to use your own script you can use this (if you want a backup):
use strict;
use autodie;
use File::Copy;
my $file = "unix_url.txt";
open my $fh, "<", $file;
open my $tmp, ">", "$file.bak";
while (<$fh>) {
s/\//\\/g;
s/\\/c:/;
} continue { print $tmp $_ }
close $tmp;
close $fh;
move "$file.bak", $file;
I am naive in Perl. I have written the following code and I am breaking my head since two days because I am getting the following error when I am trying to open the file: No such file or directory at line 23 (open (FILE, "$config_file") or die $!;)
What I am doing is:
Open the folder and list all the files inside it.
Iterate over each files to look for a particular strings.
create new files for all of the files with the matching string replaced by some other string.
I would really appreciate your help.
Following is my code:
#!/usr/bin/perl -w
#~ The perl script that changes the IP addresses in configuration files from 192.168.3.x into 192.168.31.x in any particular folder
use strict;
use warnings;
use diagnostics;
#~ Get list of files in the Firewall folder
my $directory = 'C:\Users\asura\Desktop\ConfigFiles\Firewall';
opendir (my $dir, $directory) or die $!;
my #list_of_files = readdir($dir);
my $file;
while ($file = readdir ($dir)) {
push #list_of_files, $file;
}
closedir $dir;
print "#list_of_files\n";
#~ Iterate over each files to replace some strings
foreach my $config_file (#list_of_files) {
next unless (($config_file !~ /^\.+$/));
open (FILE, "$config_file") or die $!;
my #original_array = <FILE>;
close(FILE);
my #new_array;
foreach my $line (#original_array) {
chomp($line);
$line =~ s/192\.168\.3/192\.168\.31/g;
push (#new_array, $line);
}
print #new_array;
#~ Create a new files with modified strings
my $new_config_file = $config_file.1;
my $newfile = 'C:\Users\asura\Desktop\ConfigFiles\Firewall\$new_config_file';
open (NEW_FILE, ">", "$newfile") or die $!;
foreach (#new_array){
print NEW_FILE "$_\n";
}
close(NEW_FILE);
}
exit 0;
When you push items onto #list_of_files, you are pushing only the filename (the value returned from readdir). Unless your script is running in C:\Users\asura\Desktop\ConfigFiles\Firewall, the open at line 22 using just the filename (a relative path) will fail.
You need to push absolute paths onto #list_of_files at line 14, like so:
push #list_of_files, $directory . "\\" . $file;
Also, as #Michael-sqlbot mentions, you need to double-quote the string at line 35 for string interpolation to be performed (or use concatenation).
Finally, you should also properly quote the string concatenation on line 34.
The following is a simplification of your code that removes the bugs.
First off kudos including use strict and use warnings in EVERY script. One additional tool that you can use is use autodie; anytime that you're doing file processing.
The primary flaw in your code was the fact that you weren't including the path information when opening your files. There are two main ways to solve this. You can manually specify the path, like you did for your open to your output file handle, or you can use glob instead of opendir as that will automatically include the path in the returned results.
There was a secondary bug in your regex where you were missing a word boundary after .3. This would have led numbers in the thirties to matching mistakenly.
To simplify your code I just removed all of the superfluous temporary variables and instead process things file by file and line by line. This has the benefit of making it more clear when an input and output file handles are obviously related. Finally, if you're actually wanting to edit the files, there are lots of methods demonstrated at perlfaq4.
#!/usr/bin/perl -w
#~ The perl script that changes the IP addresses in configuration files from 192.168.3.x into 192.168.31.x in any particular folder
use strict;
use warnings;
use autodie;
use diagnostics;
#~ Get list of files in the Firewall folder
my $dir = 'C:\Users\asura\Desktop\ConfigFiles\Firewall';
opendir my $dh, $dir;
#~ Iterate over each files to replace some strings
while (my $file = readdir($dh)) {
next if $file =~ /^\.+$/;
open my $infh, '<', "$dir\\$file";
open my $outfh, '>', "$dir\\${file}.1"; #~ Create a new files with modified strings
while (<$infh>) {
s/(?<=192\.168)\.3\b/.31/g;
print $outfh $_;
}
close $infh;
close $outfh;
}
closedir $dh;