Perl search and replace enters endless loop - regex

I am trying to match and replace in multiple files some string using
local $/;
open(FILE, "<error.c");
$document=<FILE>;
close(FILE);
$found=0;
while($document=~s/([a-z_]+)\.h/$1_new\.h/gs){
$found=$found+1;
};
open(FILE, ">error.c");
print FILE "$document";
close(FILE);'
It enters an endless loop, because the result of the substitution is matched again by the regular expression searched for. But shouldn't this be avoided by the s///g construct?
EDIT:
I found that also a foreach loop will not do exactly what I want (it will replace all occurrences, but print only one of them). The reason seems to be that the perl substitution and and search behave quite differently in the foreach() and while() constructs. To have a solution to replace in multiple files which outputs also all individual replacements, I came up with the following body:
# mandatory user inputs
my #files;
my $subs;
my $regex;
# additional user inputs
my $fileregex = '.*';
my $retval = 0;
my $opt_testonly=0;
foreach my $file (#files){
print "FILE: $file\n";
if(not($file =~ /$fileregex/)){
print "filename does not match regular expression for filenames\n";
next;
}
# read file
local $/;
if(not(open(FILE, "<$file"))){
print STDERR "ERROR: could not open file\n";
$retval = 1;
next;
};
my $string=<FILE>;
close(FILE);
my #locations_orig;
my #matches_orig;
my #results_orig;
# find matches
while ($string =~ /$regex/g) {
push #locations_orig, [ $-[0], $+[0] ];
push #matches_orig, $&;
my $result = eval("\"$subs\"");
push #results_orig, $result;
print "MATCH: ".$&." --> ".$result." #[".$-[0].",".$+[0]."]\n";
}
# reverse order
my #locations = reverse(#locations_orig);
my #matches = reverse(#matches_orig);
my #results = reverse(#results_orig);
# number of matches
my $length=$#matches+1;
my $count;
# replace matches
for($count=0;$count<$length;$count=$count+1){
substr($string, $locations[$count][0], $locations[$count][1]-$locations[$count][0]) = $results[$count];
}
# write file
if(not($opt_testonly) and $length>0){
open(FILE, ">$file"); print FILE $string; close(FILE);
}
}
exit $retval;
It first reads the file creates lists of the matches, their positions and the replacement text in each file (printing each match). Second it will replace all occurrences starting from the end of the string (in order not to change the position of previous messages). Finally, if matches were found, it writes the string back to the file. Can surely be more elegant, but it does what I want.

$1_new will still match ([a-z_]+). It enters an endless loop because you use while there. With the s///g construct, ONE iteration will replace EVERY occurence in the string.
To count the replacements use:
$replacements = () = $document =~ s/([a-z_]+)\.h/$1_new\.h/gs;
$replacements will contain the number of replaced matches.
If you essentially just want the matches, not the replacements:
#matches = $document =~ /([a-z_]+)\.h/gs;
You can then take $replacement = scalar #matches to obtain their count.

I'd say you're over-engineering this. I did this in the past with:
perl -i -p -e 's/([a-z_]+)\.h/$1_new\.h/g' error.c
This works correctly when the substituted string contains the matching pattern.

the /g option is like a loop in itself. I think you want this:
while($document=~s/([a-z_]+)(?!_new)\.h/$1_new\.h/s){
$found=$found+1;
};
Because you are replacing the match with itself and more, you need a negative lookahead assertion.

Related

Perl how do you assign a varanble to a regex match result

How do you create a $scalar from the result of a regex match?
Is there any way that once the script has matched the regex that it can be assigned to a variable so it can be used later on, outside of the block.
IE. If $regex_result = blah blah then do something.
I understand that I should make the regex as non-greedy as possible.
#!/usr/bin/perl
use strict;
use warnings;
# use diagnostics;
use Win32::OLE;
use Win32::OLE::Const 'Microsoft Outlook';
my #Qmail;
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\\s\*owner \#/";
my $outlook = Win32::OLE->new('Outlook.Application')
or warn "Failed Opening Outlook.";
my $namespace = $outlook->GetNamespace("MAPI");
my $folder = $namespace->Folders("test")->Folders("Inbox");
my $items = $folder->Items;
foreach my $msg ( $items->in ) {
if ( $msg->{Subject} =~ m/^(.*test alert) / ) {
my $name = $1;
print " processing Email for $name \n";
push #Qmail, $msg->{Body};
}
}
for(#Qmail) {
next unless /$regex|^\s*description/i;
print; # prints what i want ie lines that start with owner and description
}
print $sentence; # prints ^\\s\*offense \ # not lines that start with owner.
One way is to verify a match occurred.
use strict;
use warnings;
my $str = "hello what world";
my $match = 'no match found';
my $what = 'no what found';
if ( $str =~ /hello (what) world/ )
{
$match = $&;
$what = $1;
}
print '$match = ', $match, "\n";
print '$what = ', $what, "\n";
Use Below Perl variables to meet your requirements -
$` = The string preceding whatever was matched by the last pattern match, not counting patterns matched in nested blocks that have been exited already.
$& = Contains the string matched by the last pattern match
$' = The string following whatever was matched by the last pattern match, not counting patterns matched in nested blockes that have been exited already. For example:
$_ = 'abcdefghi';
/def/;
print "$`:$&:$'\n"; # prints abc:def:ghi
The match of a regex is stored in special variables (as well as some more readable variables if you specify the regex to do so and use the /p flag).
For the whole last match you're looking at the $MATCH (or $& for short) variable. This is covered in the manual page perlvar.
So say you wanted to store your last for loop's matches in an array called #matches, you could write the loop (and for some reason I think you meant it to be a foreach loop) as:
my #matches = ();
foreach (#Qmail) {
next unless /$regex|^\s*description/i;
push #matches_in_qmail $MATCH
print;
}
I think you have a problem in your code. I'm not sure of the original intention but looking at these lines:
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\s*owner #/";
I'll step through that as:
Assign $regexto the string ^\s*owner #.
Assign $sentence to value of running a match within $regex with the regular expression /^s*owner $/ (which won't match, if it did $sentence will be 1 but since it didn't it's false).
I think. I'm actually not exactly certain what that line will do or was meant to do.
I'm not quite sure what part of the match you want: the captures, or something else. I've written Regexp::Result which you can use to grab all the captures etc. on a successful match, and Regexp::Flow to grab multiple results (including success statuses). If you just want numbered captures, you can also use Data::Munge
You can do the following:
my $str ="hello world";
my ($hello, $world) = $str =~ /(hello)|(what)/;
say "[$_]" for($hello,$world);
As you see $hello contains "hello".
If you have older perl on your system like me, perl 5.18 or earlier, and you use $ $& $' like codequestor's answer above, it will slow down your program.
Instead, you can use your regex pattern with the modifier /p, and then check these 3 variables: ${^PREMATCH}, ${^MATCH}, and ${^POSTMATCH} for your matching results.

Find multiline blocks

I'm trying to find occurrences of BLOB_SMUGHO, from the file test.out from the bottom of the file. If found, return a chunk of data which I'm interested in between the string "2014.10"
I'm getting Use of uninitialized value $cc in pattern match (m//) at
Whats is wrong with this script?
#!/usr/bin/perl
use strict;
use warnings;
use POSIX qw(strftime);
use File::ReadBackwards;
my $find = "BLOB_SMUGHO";
my $chnkdelim = "\n[" . strftime "%Y.%m", localtime;
my $fh = File::ReadBackwards->new('test.out', $chnkdelim, 0) or die "err-file: $!\n";
while ( defined(my $line = $fh->readline) ) {
if(my $cc =~ /$find/){
print $cc;
}
}
close($fh);
In case if this helps, here is a sample content of test.out
2014.10.31 lots and
lots of
gibbrish
2014.10.31 which I'm not
interested
in. It
also
2014.10.31 spans
across thousands of
lines and somewhere in the middle there will be
2014.10.31
this precious word BLOB_SMUGHO and
2014.10.31 certain other
2014.10.31 words
2014.10.31
this precious word BLOB_SMUGHO and
2014.10.31
this precious word BLOB_SMUGHO and
which
I
will
be
interested
in.
And I'm expecting to capture all the multiple occurrences of the chunk of the text from bottom of the file.
2014.10.31
this precious word BLOB_SMUGHO and
First, you have written your match incorrectly due to misunderstanding the =~ operator:
if(my $cc =~ /$find/){ # incorrect, like saying if(undef matches something)
If you want to match what is in $line against the pattern between /.../ then do:
if($line =~ /$find/) {
The match operator expects a value on left side as well as right side. you were using it like an assignment operator.
If you need to capture the match(es) into a variable or list, then add it to the left of an equal sign:
if(my ($cc) = $line =~ /$find/) { <-- wrap $cc in () for list context
By the way, I think you are better off writing:
if($line =~ /$find/) {
print $line;
or if you want to print what you matched only
print $0;
Since you aren't capturing a substring, it doesnt really matter here.
Now, as to how to match everything between two patterns, the task is easier if you don't match line by line, but match across newlines using the /s modifier.
In Perl, you can set the record separator to undef and use slurp mode.
local $/ = undef;
my $s = <>; # read all lines into $s
Now to scan $s for patterns
while($s =~ /(START.*?STOP)/gsm) { print "$1\n"; } # print the pattern inclusive of START and STOP
Or to capture between START and STOP
while($s =~ /START(.*?)STOP/gsm) { print "$1\n"; } # print the pattern between of START and STOP
So in your case the start pattern is 2014.10.31 and stop is BLOB_SMUGHO
while($s =~ /(2014\.10\.31.*?BLOB_SMUGHO)/gsm) {
print "$1\n";
}
NOTE: Regex modifiers in Perl come after the last / so if you see I use /gsm for multiline, match newline, and global matching (get multiple matches in a loop by remembering the last location).

Print line of pattern match in Perl regex

I am looking for a keyword in a multiline input using a regex like this,
if($input =~ /line/mi)
{
# further processing
}
The data in the input variable could be like this,
this is
multi line text
to be matched
using perl
The code works and matches the keyword line correctly. However, I would also like to obtain the line where the pattern was matched - "multi line text" - and store it into a variable for further processing. How do I go about this?
Thanks for the help.
You can grep out the lines into an array, which will then also serve as your conditional:
my #match = grep /line/mi, split /\n/, $input;
if (#match) {
# ... processing
}
TLP's answer is better but you can do:
if ($input =~ /([^\n]+line[^\n]+)/i) {
$line = $1;
}
I'd look if the match is in the multiline-String and in case it is, split it into lines and then look for the correct index number (starting with 0!):
#!/usr/bin/perl
use strict;
use warnings;
my $data=<<END;
this is line
multi line text
to be matched
using perl
END
if ($data =~ /line/mi){
my #lines = split(/\r?\n/,$data);
for (0..$#lines){
if ($lines[$_] =~ /line/){
print "LineNr of Match: " . $_ . "\n";
}
}
}
Did you try his?
This works for me. $1 represents the capture of regex inside ( and )
Provided there is only one match in one of the lines.If there are matches in multiple lines, then only the first one will be captured.
if($var=~/(.*line.*)/)
{
print $1
}
If you want to capture all the lines which has the string line then use below:
my #a;
push #a,$var=~m/(.*line.*)/g;
print "#a";

Printing first instance of match in each line of file (Perl)

I have the following in an executable .pl file:
#!/usr/bin/env perl
$file = 'TfbG_peaks.txt';
open(INFO, $file) or die("Could not open file.");
foreach $line (<INFO>) {
if ($line =~ m/[^_]*(?=_)/){
#print $line; #this prints lines, which means there are matches
print $1; #but this prints nothing
}
}
Based on my reading at http://goo.gl/YlEN7 and http://goo.gl/VlwKe, print $1; should print the first match in each line, but it doesn't. Help!
No, $1 should print the string saved by so-called capture groups (created by the bracketing construct - ( ... )). For example:
if ($line =~ m/([^_]*)(?=_)/){
print $1;
# now this will print something,
# unless string begins from an underscore
# (which still matches the pattern, as * is read as 'zero or more instances')
# are you sure you don't need `+` here?
}
The pattern in your original code didn't have any capture groups, that's why $1 was empty (undef, to be precise) there. And (?=...) didn't count, as these were used to add a look-ahead subexpression.
$1 prints what the first capture ((...)) in the pattern captured.
Maybe you were thinking of
print $& if $line =~ /[^_]*(?=_)/; # BAD
or
print ${^MATCH} if $line =~ /[^_]*(?=_)/p; # 5.10+
But the following would be simpler (and work before 5.10):
print $1 if $line =~ /([^_]*)_/;
Note: You'll get a performance boost when the pattern doesn't match if you add a leading ^ or (?:^|_) (whichever is appropriate).
print $1 if $line =~ /^([^_]*)_/;

How to user Perl to find all instances of strings that match a regexp?

How to user Perl to find and print all strings that match a regexp?
The following only finds the first match.
$text="?Adsfsadfgaasdf.
?Bafadfdsaadsfadsf.
xcxvfdgfdg";
if($text =~ m/\\?([^\.]+\.)/) {
print "$1\n";
}
EDIT1: /g doesn't work
#!/usr/bin/env perl
$text="?Adsfsadfgaasdf.
?Bafadfdsaadsfadsf.
xcxvfdgfdg";
if($text =~ m/\\?([^\.]+\.)/g) {
print "$1\n";
}
$ ./test.pl
?Adsfsadfgaasdf.
The problem is that the /g modifier does not use capture groups for multiple matches. You need to either iterate over the matches in scalar context, or catch the returned list in list context. For example:
use v5.10; # required for say()
$text="?Adsfsadfgaasdf.
?Bafadfdsaadsfadsf.
xcxvfdgfdg";
while ($text =~ /\?([^.]+\.)/g) { # scalar context
say $1;
}
for ($text =~ /\?[^.]+\./g) { # list context
say; # match is held in $_
}
Note in the second case, I skipped the parens, because in list context the whole match is returned if there are no parens. You may add parens to select part of the string.
Your version, using if, uses scalar context, which saves the position of the most recent match, but does not continue. A way to see what happens is:
if($text =~ m/\?([^\.]+\.)/g) {
print "$1\n";
}
say "Rest of string: ", substr $text, pos;
pos gives the position of the most recent match.
In previous answer #TLP correctly wrote that matching should be in list context.
use Data::Dumper;
$text="?Adsfsadfgaasdf.
?Bafadfdsaa.
dsfadsf.
xcxvfdgfdg";
#arr = ($text =~ /\?([^\.]+\.)/g);
print Dumper(#arr);
Expected result:
$VAR1 = 'Adsfsadfgaasdf.';
$VAR2 = 'Bafadfdsaa.';
You seem to be missing the /g flag, which tells perl to repeat the match as many times as possible.