Perl in-place substitution - regex

In Perl one liner, we can use the -i argument to do an in-place substitution. What's the equivalence of -i when writing perl code in the IDE?
Consider the following code:
binmode(STDOUT, ':raw');
open my $fh, '<', $filename;
while (<$fh>) {
s/^/<rootroot>/ if $.==1;
if (/(<link rel[^<>\n]*?)(\/?)(>)/g) {
my ($p1, $p2, $p3) = ($1, $2, $3);
s/$p1$p2$p3/($p2 ? qq|$p1$p2$p3<span class="entry">| : qq|$p1\/$p3<span class="entry">|)/ge;
};
s/<\/>/<entry_end><\/entry_end>/;
s/$/<\/rootroot>/ if eof;
}
How can we save all the lines of changes in-place?
Because I need to do a quick validation on the html file using XML::LibXML right after the in-place change of the html source..
Thanks in advance.

You can try something like this:
my $filename = 'test.dat';
#ARGV = ($filename);
$^I = '';
while(<<>>) {
binmode(ARGV, ':raw');
# Do the substitiution on $_ here ...
print;
}
I did not find out how to set binmode before the loop, since ARGV is only defined after the <> operator has been used.
The $^I and ARGVvariables are decribed in perlvar
See perlop for information about why you should use <<>> instead of <>.
Some notes:
The while(<>) { ... }
According to perlop, the loop
while (<>) { ... # code for each line
}
is equivalent to the following Perl-like pseudo code:
unshift(#ARGV, '-') unless #ARGV;
while ($ARGV = shift) {
open(ARGV, $ARGV);
while (<ARGV>) {
... # code for each line
}
}
Using in-place edit without a backup file: $^I="":
According to perlrun:
If no extension is supplied, and your system supports it, the original
file is kept open without a name while the output is redirected to a
new file with the original filename. When perl exits, cleanly or not,
the original file is unlinked.
and some more information in this blog:
Perl opens and immediately unlink()s the original file, then opens a
new file with the same name (new file descriptor and inode), and sends
output to this second file; at the end, the old file is closed and
thus deleted because it was unlinked, and what's left is a changed
file with the same name as the original.
See also doio.c for actual implementation.
According to the above, the following might work:
my $fn = 'test.dat';
open ( my $fh, '<:raw', $fn ) or die "Could not open file '$fn': $!";
unlink $fn or die "$!";
open ( my $fh2, '>:raw', $fn ) or die "Could not reopen file '$fn': $!";
while(<$fh>) {
# Do the substitutions on $_ here ...
print $fh2 $_;
}
close $fh;
close $fh2;

Related

Perl: Regex not grabbing multiline C style comments in code

I have a Perl program that:
Reads a SRC file written in C
Uses a regex match from SRC file to find specific formatted data to use as the Destination filename
Opens new Destination file
Performs another regex match to find all C style comments /* */ that contain a keyword abcd. Note: these comments can be 1 line or more than 1 line so the regex is looking for the first /* and then the keyword abcd and then any amount of text and space before it encounters a closing */
Writes the regex matches to the destination file
#!/usr/bin/perl
use warnings;
use strict;
my $src = 'D:\\Scripts\\sample.c';
my $fileName;
# open source file for reading
open(SRC_FH,'<',$src) or die $!;
while(my $row = <SRC_FH>){
if ($row =~ /([0-9]{2}\.[0-9]{2}\.[0-9]{3}\.[a-z,0-9]{2}|[0-9]{2}\.[0-9]{2}\.[0-9]{3}\.[a-z,0-9]{3})/){
$fileName = $1;
}
}
my $des = "D:\\Scripts\\" . $fileName . ".txt";
# open destination file for writing
open(DES_FH,'>',$des) or die $!;
print("copying content from $src to $des\n");
seek SRC_FH, 0, 0;
while(my $row = <SRC_FH>){
if ($row =~ /(\/\*.*abcd.[\s\S]*?\*\/)/){
print DES_FH "$1\n";
}
}
# always close the filehandles
close(SRC_FH);
close(DES_FH);
print "File content copied successfully!\n";
My problem is I think because of the way perl code executes although by regex is correct, my destination file is only getting the 1 line comments written to it. Any C style comments that are more than 1 line are not getting written to the destination file. What am I missing in my 2nd if statement?
I checked my 2nd if statement regex here https://regexr.com/ and it works as its supposed to capturing multi line C style comments as well as single line comments that also contain the keyword abcd.
So I tried the 1st suggestion below by zdim. Here is what I used:
#!/usr/bin/perl
use warnings;
use strict;
my $src = 'D:\\Scripts\\sample.c';
my $fileName;
my #comments;
# open source file for reading
open(SRC_FH,'<',$src) or die $!;
while(my $row = <SRC_FH>){
if ($row =~ /([0-9]{2}\.[0-9]{2}\.[0-9]{3}\.[a-z,0-9]{2}|[0-9]{2}\.[0-9]{2}\.[0-9]{3}\.[a-z,0-9]{3})/){
$fileName = $1;
}
}
my $des = "D:\\Scripts\\" . $fileName . ".txt";
# open destination file for writing
open(DES_FH,'>',$des) or die $!;
print("copying content from $src to $des\n");
#seek SRC_FH, 0, 0;
my $content = do {
#read whole file at once
local $/;
open (SRC_FH,'<', $src) or die $!;
<SRC_FH>;
};
#if($content =~ /(\/\*.*abcd.[\s\S]*?\*\/)/sg){
# my #comments = $content;
# }
my #comments = $content =~ /(\/\*.*abcd.[\s\S]*?\*\/)/sg;
foreach (#comments){
print DES_FH "$1\n";
}
#while(my $row = <SRC_FH>){
# if ($row =~ /(\/\*.*abcd.[\s\S]*?\*\/)/){
# print DES_FH "$1\n";
# }
#}
# always close the filehandles
close(SRC_FH);
close(DES_FH);
print "File content copied successfully!\n";
The result is all the content from sample.c are copied to the destination file. A full 1:1 copy. Where I am looking to pull all comments single line and multiline out of the C file.
Example 1:
/* abcd */
Example 2:
/* some text
* some more comments
abcd and some more comments */
Final Solution
#!/usr/bin/perl
use warnings;
use strict;
my $src = 'D:\\Scripts\\sample.c';
my $fileName;
# open source file for reading
open(SRC_FH,'<',$src) or die $!;
while(my $row = <SRC_FH>){
if ($row =~ /([0-9]{2}\.[0-9]{2}\.[0-9]{3}\.[a-z,0-9]{2}|[0-9]{2}\.[0-9]{2}\.[0-9]{3}\.[a-z,0-9]{3})/){
$fileName = $1;
}
}
my $des = "D:\\Scripts\\" . $fileName . ".txt";
# open destination file for writing
open(DES_FH,'>',$des) or die $!;
print("copying content from $src to $des\n");
seek SRC_FH, 0, 0;
my $content = do{local $/; <SRC_FH>};
my #comments = $content =~ /(\/\*.*abcd.[\s\S]*?\*\/)/g;
for(#comments){
print DES_FH "$_\n";
}
# always close the filehandles
close(SRC_FH);
close(DES_FH);
print "File content copied successfully!\n";
What am I missing in my 2nd if statement?
Well, nothing -- it's just that in a multiline C comment neither of its lines has both /* and */. Thus that regex just cannot match a multiline comment when a file is read line by line.
To catch such comments either:
Read the whole file into a string ("slurp" it), and then add /s modifier on the regex so that . matches a newline as well. Also use /g modifier so to capture all such patterns in the string. One way
my $content = do {
local $/; # undef record separator so the whole file is read at once
open my $src_fh, '<', $src_file or die $!; # have to re-open
<$src_fh>; # reads it all
}; # lexical filehandle gets closed as we leave scope
# NOTE -- there may be difficulties in capturing comments in a C source file
my #comments = $content =~ /.../sg; # your regex
Or use a library to slurp a file, like
use Path::Tiny;
my $content = path($src_file)->slurp;
Or,
Set a flag when you see /*, get/print all lines until you hit the closing */, then unset the flag. Here is a rudimentary version of that
my $inside_comment = 0;
while (<$src_fh>) {
if (m{(/\*.*)}) { #/ fix syntax hilite
$inside_comment = 1; # opening line for the comment
say $des_fh $1;
}
elsif (m{(.*\*/)}) { # closing line for the comment
say $des_fh $1;
$inside_comment = 0;
}
elsif ($inside_comment) { say $des_fh $_}
}
I tested all this but please check and improve. For one, this plays funny with leading spaces.
Note: Getting all comments out of a C program in general may be rather tricky.
Here is a one-line version of slurping
my $file_content = do { local (#ARGV, $/) = $file_name; <> }

Using perl to process list of name and details

There are two files. one file is list of names. another file is list of names and details. I want to create 3rd file which contains names from 1st file and details(of that name) from 2nd file. Can you please suggest.
Details from 2nd file are delimited by pattern "list[i]"(which are names from 1st file) and "</reg>"
#!/usr/intel/bin/perl
use warnings;
use strict;
use Data::Dumper;
my $handle;
unless (open $handle, "<:encoding(utf8)", "/nfs/fm/disks/fm_nvm_7138/WLRD_LOGIC_users/cgoudarx/willard_b02/chiplevel/verif/testsuites/upf/pss_ret_regs.txt") {
print STDERR "Could not open file '/nfs/fm/disks/fm_nvm_7138/WLRD_LOGIC_users/cgoudarx/willard_b02/chiplevel/verif/testsuites/upf/pss_ret_regs.txt': $!\n";
# we return 'undefined', we could also 'die' or 'croak'
return undef
}
chomp(my #list = <$handle>);
unless (close $handle) {
# what does it mean if close yields an error and you are just reading?
print STDERR "Don't care error while closing '/nfs/fm/disks/fm_nvm_7138/WLRD_LOGIC_users/cgoudarx/willard_b02/chiplevel/verif/testsuites/upf/pss_ret_regs.txt': $!\n";
}
open ( INPUT, "/nfs/fm/disks/fm_nvm_7138/WLRD_LOGIC_users/cgoudarx/willard_b02/chiplevel/verif/testsuites/upf/tet.xml" ) or die("Could not open xml file.");
my $outffile ="newlist.xml";
open(FILEOUT2, ">$outffile") || die "ERROR: Can't open the output file $outffile: $!";
my $size = #list;
for (my $i=0; $i < $size; $i++) {
while( my $line = <INPUT> )
{
if ( $line =~ m/$list[$i]/) {
print FILEOUT2 $line;
while( $line = <INPUT>) # print till empty line
{
last if ( $line =~ m/<\/reg>/);
print FILEOUT2 $line;
}
print FILEOUT2 $line;
};
};
};
close(INPUT);
One of your input files is an XML document. You shouldn't parse XML documents with regular expressions. It is a far better idea to use a proper XML parser (I'd recommend XML::LibXML).
If you insist in parsing XML using regexes, then you cannot process your input file a line at a time, as XML elements will often (usually?) span multiple lines.
Also, please update your file-handling code to use the three-arg version of open() and lexical filehandles.
open ( my $in_fh, '<', "...") or die("Could not open xml file.");
And
open( my $out_fh, '>', $outffile) || die "ERROR: Can't open the output file $outffile: $!";
Oh, and it's a good idea to standardise on using or or || in those commands.

How to handle files that do not contain a pattern?

I need help with my Perl program. The idea is to pass in a pattern and a file list from the command line. If the file name matches the pattern, print the file name. Then if the file name doesn't match, it should look for instances of the pattern in the text of the file and print filename : first line of text that contained occurrence.
However should the user add the -i option at the beginning the opposite should occur. If the filename does not match print it. Then print any files that do not contain any instances of the pattern in their text.
This last part is where I'm struggling I'm not exactly sure how to get files that don't have the pattern in their text. For example in my code
#!/usr/bin/perl -w
die("\n Usage: find.pl [-i] <perlRegexPattern> <listOfFiles>\n\n") if(#ARGV<2);
my (#array,$pattern,#filesmatch,#files);
#I can separate files based on name match
($pattern,#array) = ($ARGV[0] eq "-i") ? (#ARGV[1 .. $#ARGV]) : (#ARGV);
foreach(#array){
($_ =~ m/.*\/?$pattern/) ? (push #filesmatch,$_) : (push #files, $_);
}
#and I can get files that contain a pattern match in their text
if($ARGV[0] ne "-i"){
for my $matches(#filesmatch){ #remove path print just file name
$matches =~s/.*\///; #/
print "$matches\n";
}
for my $file(#files){
open(FILE,'<',$file) or die("\nCould not open file $file\n\n");
while(my $line = <FILE>){
if($line =~ m/$pattern/){
$file =~ s/.*\///; #/ remove path print just file name
print "$file: $line";
next;
}
}
}
}
#however I'm not sure how to say this file dosen't have any matches so print it
else{
for my $matches(#files){ #remove path print just file name
$matches =~ s/.*\///;
print "$matches\n";
}
for my $file(#filesmatch){
open(FILE,'<',$file) or die("\nCould not open file $file\n\n");;
while(my $line = <FILE>){...
I'm not sure if something like grep could be used to do this but I'm having a hard time working with Perl's grep.
In order to decide whether to print or not a file based on its content you have to first read the file. With your criterion -- that a phrase does not exist -- you have to check the whole file.
A standard way is to use a separate variable ("flag") to record the condition then go back to print
my $has_match;
while (<$fh>) {
if (/$pattern/) {
$has_match = 1;
last;
}
}
if (not $has_match) {
seek $fh, 0, 0; # rewind to the beginning
print while <$fh>;
}
This can be simplified by reading the file into a variable first, and by using labels (also see perlsyn)
FILE: foreach my $file (#filesmatch) {
open my $fh, '<', $file or die "Can't open $file: $!";
my #lines = <$fh>;
for (#lines) {
next FILE if /$pattern/;
}
print for #lines;
}
Note that skipping an iteration in the middle of a loop isn't the cleanest way since one has to always keep in mind that the rest of the loop may not run.
Each file is read first so that we don't read it twice, but don't do that if any of the files can be huge.
If there is any command line processing it is better to use a module; Getopt::Long is nice.
use Getopt::Long;
my ($inverse, $pattern);
GetOptions('inverse|i' => \$inverse, 'pattern=s' => \$pattern)
or usage(), exit;
usage(), exit if not $pattern or not #ARGV;
sub usage { say STDERR "Usage: $0 ... " }
Call the program as progname [-i] --patern PATTERN files. The module provides a lot, please see docs. For example, in this case you can also just use -p PATTERN.
As GetOptions parses the command line the submitted options are removed from #ARGV and what remains in it are file names. And you have the $inverse variable to nicely make decisions.
Please have use warnings; (not -w) and use strict; at the top of every program.

help with perl code to parse a file

I am new to Perl and have a question about the syntax. I received this code for parsing a file containing specific information. I was wondering what the if (/DID/) part of the subroutine get_number is doing? Is this leveraging regular expressions? I'm not quite sure because regular-expression matches look like $_ =~ /some expression/. Finally, is the while loop in the get_number subroutine necessary?
#!/usr/bin/env perl
use Scalar::Util qw/ looks_like_number /;
use WWW::Mechanize;
# store the name of all the OCR file names in an array
my #file_list=qw{
blah.txt
};
# set the scalar index to zero
my $file_index=0;
# open the file titled 'outputfile.txt' and write to it
# (or indicate that the file can't be opened)
open(OUT_FILE, '>', 'outputfile.txt')
or die "Can't open output file\n";
while($file_index < 1){
# open the OCR file and store it in the filehandle IN_FILE
open(IN_FILE, '<', "$file_list[$file_index]")
or die "Can't read source file!\n";
print "Processing file $file_list[$file_index]\n";
while(<IN_FILE>){
my $citing_pat=get_number();
get_country($citing_pat);
}
$file_index=$file_index+1;
}
close IN_FILE;
close OUT_FILE;
The definition of get_number is below.
sub get_number {
while(<IN_FILE>){
if(/DID/){
my #fields=split / /;
chomp($fields[3]);
if($fields[3] !~ /\D/){
return $fields[3];
}
}
}
}
Perl has a variable $_ that is sort of the default dumping ground for a lot of things.
In get_number, while(<IN_FILE>){ is reading a line into $_, and the next line is checking if $_ matches the regular expression DID.
It's also common to see chomp; which also operates on $_ when no argument is given.
In that case, if (/DID/) by default searches the $_ variable, so it is correct. However, it is a rather loose regex, IMO.
The while loop in the sub may be necessary, it depends on what your input looks like. You should be aware that the two while loops will cause some lines to get completely skipped.
The while loop in the main program will take one line, and do nothing with it. Basically, this means that the first line in the file, and every line directly following a matching line (e.g. a line that contains "DID" and the 4th field is a number), will also be discarded.
In order to answer that question properly, we'd need to see the input file.
There are a number of issues with this code, and if it works as intended, it's probably due to a healthy amount of luck.
Below is a cleaned up version of the code. I kept the modules in, since I do not know if they are used elsewhere. I also kept the output file, since it might be used somewhere you have not shown. This code will not attempt to use undefined values for get_country, and will simply do nothing if it does not find a suitable number.
use warnings;
use strict;
use Scalar::Util qw/ looks_like_number /;
use WWW::Mechanize;
my #file_list=qw{ blah.txt };
open(my $outfile, '>', 'outputfile.txt') or die "Can't open output file: $!";
for my $file (#file_list) {
open(my $in_file, '<', $file) or die "Can't read source file: $!";
print "Processing file $file\n";
while (my $citing_pat = get_number($in_file)) {
get_country($citing_pat);
}
}
close $out_file;
sub get_number {
my $fh = shift;
while(<$fh>) {
if (/DID/) {
my $field = (split)[3];
if($field =~ /^\d+$/){
return $field;
}
}
}
return undef;
}

Why is this substitution involving the end of each line adding more to the beginning of the line?

I have some document into which I want to add something at the beginning and at the end of each line. The original document looks like this:
firstLine
secondline
I want to turn it into this:
put 'firstLine';
put 'secondline';
By using the following Perl script, I can only turn it into this:
put 'firstLine';
';put 'secondline';
It seems that there is a $ at the end of the first line and at the beginning of the second line. Could someone help me to figure out what is wrong with the following Perl script?
use File::Find;
use strict;
my ($filename, #lines, $oldterm, $newterm); #,$File::Find::name);
my $dir = ".";
open MYFILE, ">error.txt" or die $!;
find(\&edits, $dir);
sub edits() {
$filename = $File::Find::name;
if (grep(/\.txt$/, $filename)) { #only process the perl files
# open the file and read data
# die with grace if it fails
open(FILE, "<$filename") or die "Can't open $filename: $!\n";
#lines = <FILE>;
close FILE;
# open same file for writing, reusing STDOUT
open(STDOUT, ">$filename") or die "Can't open $filename: $!\n";
# walk through lines, putting into $_, and substitute 2nd away
for (#lines) {
s/(&.+)/' "$1" '/ig;
s/^/put '/ig;
s/$/';/ig;
print;
}
#Finish up
close STDOUT;
}
}
don't use regular expressions at all: you already have the lines separated in the #lines array:
for ( #lines ) {
chomp; # remove newline at the end of the implicit variable $_
print "puts '$_'\n";
}
If you do it in one step you should have better luck. Something like:
s/^(&.+)$/put '$1';/im;