Perl replace a string which is in a line above matching pattern - regex

If there is a string (e.g. "TODAY,TOMORROW,YESTERDAY") in any line for all the *.java files in a directory(and its sub directories), then replace a string in the line above (eg., "Raining,Cloudy,Windy" to "Sunny") and print them to a csv file before replacing (eg., file1.java TODAY Raining, File2.java TOMMOROW Cloudy)
But my regex is not working as desired. Also, is there any other better way to achieve the mentioned requirement?
use strict;
use warnings;
use File::Find::Rule;
my #day = ("TODAY", "TOMORROW", "YESTERDAY");
my #weather = ("Raining", "Cloudy", "Windy");
my $dayregex = join "|", #day;
$dayregex = qr/\b($dayregex)\b/;
my $weatherregex = join "|", #weather;
$weatherregex = qr/\b($weatherregex)\b/;
my $output = 'output.csv';
#Getting list of files in dir and sub dirs
my #files = File::Find::Rule->file()
->name( '*.java' )
->in( 'C:/Users/path/to/folder/' );
for my $file (#files) {
print "Opening file: $file\n";
open(INPUT, $file) or die("Input file $file not found. \n");
while (my $line = <INPUT>) {
if ($line =~ m/$dayregex/ {
print "There was a match on $1 from array day\n";
#send the output to csv before replacing
open(OUTPUT, '>'.$output) or die("Cannot create $output file. \n");
print OUTPUT $file $1 $weatherregex;
close(OUTPUT);
#Replace the matched weatherregex string with Sunny
$line =~ s/$weatherregex(.*\n.*$1)/Sunny/g ;
}
}
}
close(INPUT);

Try this:
weatherregex(.*\n)+
I think this will work to find weatherregex is presented in multiple line or splitted into two lines.

Related

Perl apply partial match regex on a line in long text file using hash key

Input1: I have a chemicalnames hash.These names are short names and are the keys to hash.
Input2: I have a text book (I mean a very long text file) where above shortnames appear in full.
Task: Where ever the name appears in full in text file , if the next line is with "" then I have to replace this "" with relevant hash value description. $hash{key}{description}.
Example: if hash key = Y then it might appear in text file as either
X.Y.Z or just X.YZ or XYZ or XY2 or X_Y_Z02 .Its unpredictable but it appears somewhere in the middle or end.
That means the text file name is a partial match to hash key name.
My Trails: I tried keeping full file into array then tried to find where empty "" appears .Once it appear I do regex compare on previous line with hash key.But this doesnot work :( .Also the process is too slow.I have tried different kind of techniques with experts help but failed to reduce speed with other methods.Please help
My program is as follows:
use strict;
use warnings;
my $file = "Chemicalbook.txt"; #In text file full name might appear as Dihydrogen.monoxide.hoax_C
my $previous_line = "";
my %hash;
$hash{'monoxide'}{description} = "accelerates corrosion";
open(my $FILE,'<',$file) or die "Cannot open input file";
open(my $output,'>',"outfile.txt") or die "Cannot open output file";
my #file_in_array = <$FILE>;
foreach my $line (#file_in_array) {
my $name = $previous_line;
if($line =~ /""/) {
foreach my $shortname(keys %hash)
{
if($previous_line =~ /$shortname/) {
$line = s/""/$hash{$shortname}{description}/;
}
}
}
$previous_line = $line;
print {$output} $line ;
}
close($FILE);
close($output);
Looping over all keys for each line is hopeless(ly slow). Try replacing the entire inner foreach loop with this:
while ($previous_line =~ /(\w+)/g)
{
if (my $s = $hash{$1})
{
$line = $$s{description};
}
}
It will pick up shortnames as long as they're "standing alone" in the text.
my %hash;
my #arr=qw(X.Y.Z X.YZ XYZ XY2 ZZZ Chromium.trioxideChromic_02acid);
$hash{'Y'}='Hello';
$hash{'R'}='Hai';
$hash{'trioxide'}='Testing';
foreach my $line (#arr)
{
if( my($key)= grep { $line =~ /$_/ } keys(%hash)) {
print "$line - $hash{$key} \n";
}
else {
print "Unmatched $line\n";
}
}

Search for a pattern in a file

I have a file eg.txt with contents of this sort :
....text...
....text...
COMP1 = ../../path1/path2/path3
COMP2 = ../../path4/path5/path6
and so on, for a large number of application names (the "COMP"s). I need to get the path -- the stuff including and after the second slash -- for a user-specified application.
This is the code I've been trying :
use strict;
use warnings;
my $line = "";
my $app = "";
print "Enter the app";
$app = <STDIN>;
print $app;
open my $fh, '<', "eg.txt" or die "Cannot open $!";
while (<$fh>) {
$line = <$fh>;
if ( $line && $line =~ /($app)( = )(..\/)(..)(.*)/ ) {
print $5;
}
}
This prints the name of the user-input application, and does nothing else. Any help would be greatly appreciated!
There are two main problems with your program
The $app variable contains a newline at the end from the enter key you pressed when you typed it in. That will prevent the pattern from matching so you need to use chomp to remove it. The same applies to lines read from your file
The <$fh> in your while statement reads a line from your file into the default variable $_, and then $line = <$fh> reads another, so you are ignoring alternate lines from the file
Here is a version of your program that I think should work although I am unable to test it at present. I have dropped your $line variable altogether and hope that doesn't confuse you. $_ is the default variable for the pattern match so it isn't mentioned explicitly anywhere
use strict;
use warnings;
print "Enter the app: ";
my $app = <STDIN>;
chomp $app;
open my $fh, '<', 'eg.txt' or die "Cannot open: $!";
while ( <$fh> ) {
if ( /$app\s*=\s*(.+)/ ) {
my $path = $1;
$path =~ s/.*\.\.//;
print $path, "\n";
}
}
The input did not matched in regex because newlines were coming along with them, so better use chomp to trim them. In while loop you are displacing two times the file handle, I don't know why. So after corrections this should work:
use strict;
use warnings;
my $line = "";
my $app = "";
print "Enter the app";
chomp($app = <STDIN>);
print "$app: ";
open my $fh, '<', "eg.txt" or die "Cannot open $!";
while($line = <$fh>)
{
chomp $line;
if($line && $line =~ /($app)( = )(..\/)(..)(.*)/)
{
print "$5 \n";
}
}
close($fh);
Try this code:
use strict;
use warnings;
my $line = "";
my $app = "";
print "Enter the app";
$app = <STDIN>;
print $app;
open my $fh, '<', "eg.txt" or die "Cannot open $!";
my #line = <$fh>;
my #fetch = map { /COMP\d+\s\=\s(\..\/\..\/.*)/g } #line ;
$, = "\n";
print #fetch;
and then please send your response.
You are accessing <$fh> twice in your loop. This will have the effect of interpreting only every other line. You might want to change the top of the loop to something like this:
while (defined(my $line = <$fh>)) {
and remove the my $line ... at the top of the program.
Also, you might want to consider chomping your input line so that you don't have to think about the trailing newline character:
while (defined(my $line = <$fh>)) {
chomp $line;
Your regular expression is also a bit dicey. You probably want to bind it to the beginning and end of the search space and escape the literal dots. You may also want $app to be interpreted as a string rather than a regexp, which can be done by wrapping it with \Q...\E. Also unless your file format specifies single spaces around the equals, I'd be tempted to make those flexible to zero or more occurrences. Also, if you aren't going to use the earlier captures, I would say don't do them, so:
if ($line && $line =~ /^\Q$app\E *= *\.\.\/\.\.(.*)$/)
{
print $1;
(Some may say you should use \A and \z rather than ^ and $. That choice is left as an exercise to the reader.)

Search a pattern and replace the entire line of a perl module file

I have a perl module file. like this :
$release_name = 'Software Release';
$primary_version = '1';
$secondary_version = 'R00.0';
$Main_version = 'R00.0';
I want to search the Main_version and replace the line to
$Main_version = R00.1
when i run the script.
I have tried like this. but its not working.
#!/usr/bin/perl -w
use strict;
my $base;
my $file = "/main-dir/work/Myfile.pm";
open(FILE, $file) || die "File not found";
my #base = <FILE>;
close(FILE);
my $item = '$Main_version';
my newitem="R00.1";
foreach $base(#base)
{
if($base =~ /$item/){
$base =~ s/$item/$item='$newitem'/gi;
print ("Hello, world!\n");
}
#else { print $base;}
}
open (BASE, ">$file");
print BASE #base;
close (BASE);
How to search and change the entire line of a perl module? Thanks.
Had you used the standard VERSION variable, you could have profited from perl-reversion.
The dollar sign is special in regular expressions, it means "the end of the line". Backslash it, or use quotemeta which could be shortened to \Q in a regex:
$base =~ /\Q$item/

Perl Regular Expression Question

I wrote a Perl program which reads text from text file and prints it out.
I want to print out a line which has specific format.
For example, there are some lines like this:
information:
Ahmad.prn:592118:2001:7:5:/Essay
Ashford.rtf:903615:2001:6:28:/usr/Essay
Barger.doc:243200:2001:7:4:/home/dir
end of Information.
I want to read only these three lines:
Ahmad.prn:592118:2001:7:5:/Essay
Ashford.rtf:903615:2001:6:28:/usr/Essay
Barger.doc:243200:2001:7:4:/home/dir
I think that the meaning of the fields is:
Ahmad.prn <- file name
592118 <- size of file
2001:7:5 <- created date
/Essay <- path of file
My code is this:
#!/usr/bin/perl
use strict;
use warnings;
open (my $infh, "<", $file)||die "cant open";
while(my $line = <$infh>) {
chomp ($line);
if ($line =~ /(what regular expression do I have to put in here?)/) {
print "$line";
}
}
close ($infh);
If lines you need always ends with /Essay, you may use following regex
/:\/Essay$/
Edit 1: looks there is middle parts are only numbers, you may match this way.
/:\d+:\d+:\d+:\d+:/
Since you have this format for Ahmad.prn:592118:2001:7:5:/Essay
Ahmad.prn <- file name
592118 <- size of file
2001:7:5 <- created date
/Essay <- path of file
you can use this regular expression
/^\s*(\S+):(\d+):(\d+:\d+:\d+):(\S+)\s*$/
With this you will have file name in $1, Size of the file in $2, Date of creation in $3, Path to the file in $4
I added optional spaces in the start and end of the line, if you want to allow optional spaces after or before : you can add \s*
#!/usr/bin/perl
use strict;
my $inputText = qq{
Ahmad.prn:592118:2001:7:5:/Essay
Ashford.rtf:903615:2001:6:28:/usr/Essay
Barger.doc:243200:2001:7:4:/home/dir
end of Information.
};
my #input = split /\n/, $inputText;
my $i = 0;
while ($input[$i] !~ /^end of Information.$/) {
if ($input[$i] !~ /:/) {
$i++;
next;
}
my ($fileName, $fileSize, $year, $month, $day, $filePath) = split /:/, $input[$i];
print "$fileName\t $fileSize\t $month/$day/$year\t $filePath\n";
$i++;
}
$line =~ ([a-zA-Z.]+):(\d+):(\d+):(\d+):(\d+):([\/A-Za-z]+)
$name = $1; #Ahmad.prn
$id = $2; #592118
$year = $3; #2001
$dir = $6; #/Essay
Note: loop through it for multiple names

How to match exactly two empty lines

I have a question about regular expressions. I have a file and I need to parse it in such a way that I could distinguish some specific blocks of text in it. These blocks of text are separated by two empty lines (there are blocks which are separated by 3 or 1 empty lines but I need exactly 2). So I have a piece of code and this is \s*$^\s*$/ regular expression I think should match, but it does not.
What is wrong?
$filename="yu";
open($in,$filename);
open(OUT,">>out.text");
while($str=<$in>)
{
unless($str = /^\s*$^\s*$/){
print "yes";
print OUT $str;
}
}
close($in);
close(OUT);
Cheers,
Yuliya
By default, Perl reads files a line at a time, so you won't see multiple new lines. The following code selects text terminated by a double new line.
local $/ = "\n\n" ;
while (<> ) {
print "-- found $_" ;
}
New Answer
After having problems excluding >2 empty lines, and a good nights sleep here is a better method that doesn't even need to slurp.
#!/usr/bin/perl
use strict;
use warnings;
my $file = 'yu';
my #blocks; #each element will be an arrayref, one per block
#that referenced array will hold lines in that block
open(my $fh, '<', $file);
my $empty = 0;
my $block_num = 0;
while (my $line = <$fh>) {
chomp($line);
if ($line =~ /^\s*$/) {
$empty++;
} elsif ($empty == 2) { #not blank and exactly 2 previous blanks
$block_num++; # move on to next block
$empty = 0;
} else {
$empty = 0;
}
push #{ $blocks[$block_num] }, $line;
}
#write out each block to a new file
my $file_num = 1;
foreach my $block (#blocks) {
open(my $out, '>', $file_num++ . ".txt");
print $out join("\n", #$block);
}
In fact rather than store and write later, you could simply write to one file per block as you go:
#!/usr/bin/perl
use strict;
use warnings;
my $file = 'yu';
open(my $fh, '<', $file);
my $empty = 0;
my $block_num = 1;
open(OUT, '>', $block_num . '.txt');
while (my $line = <$fh>) {
chomp($line);
if ($line =~ /^\s*$/) {
$empty++;
} elsif ($empty == 2) { #not blank and exactly 2 previous blanks
close(OUT); #just learned this line isn't necessary, perldoc -f close
open(OUT, '>', ++$block_num . '.txt');
$empty = 0;
} else {
$empty = 0;
}
print OUT "$line\n";
}
close(OUT);
use 5.012;
open my $fh,'<','1.txt';
#slurping file
local $/;
my $content = <$fh>;
close $fh;
for my $block ( split /(?<!\n)\n\n\n(?!\n)/,$content ) {
say 'found:';
say $block;
}
Deprecated in favor of new answer
justintime's answer works by telling perl that you want to call the end of a line "\n\n", which is clever and will work well. One exception is that this must match exactly. By the regex you are using it makes it seem like there might be whitespace on the "empty" lines, in which case this will not work. Also his method will split even on more than 2 linebreaks, which was not allowed in the OP.
For completeness, to do it the way you were asking, you need to slurp the whole file into a variable (if the file is not so large as to use all your memory, probably fine in most cases).
I would then probably say to use the split function to split the block of text into an array of chunks. Your code would then look something like:
#!/usr/bin/perl
use strict;
use warnings;
my $file = 'yu';
my $text;
open(my $fh, '<', $file);
{
local $/; enables slurp mode inside this block
$text = <$fh>;
}
close($fh);
my #blocks = split(
/
(?<!\n)\n #check to make sure there isn't another \n behind this one
\s*\n #first whitespace only line
\s*\n #second "
(?!\n) #check to make sure there isn't another \n after this one
/x, # x flag allows comments and whitespace in regex
$text
);
You can then do operations on the array. If I understand your comment to justintime's answer, you want to write each block out to a different file. That would look something like
my $file_num = 1;
foreach my $block (#blocks) {
open(my $out, '>', $file_num++ . ".txt");
print $out $block;
}
Notice that since you open $out lexically (with my) when it reaches the end of the foreach block, the $out variable dies (i.e. "goes out of scope"). When this happens to a lexical filehandle, the file is automatically closed. And you can do a similar thing to that with justintime's method as well:
local $/ = "\n\n" ;
my $file_num = 1;
while (<>) {
open(my $out, '>', $file_num++ . ".txt");
print $out $block;
}