Perl regexp use of uninitialized pattern patch - regex

My script loads some stuff from some files in some arrays, you enter a text from the keyboard, the script searches the relevant part of the text in those arrays, if it finds it, it does something, if not, well, another thing, at least in theory.
I get the following errors:
Use of uninitialized value in pattern match (m//) at emo_full_dynamic.pl line 120, <STDIN> chunk 2.
Modification of a read-only value attempted at emo_full_dynamic.pl line 121, <STDIN> chunk 2.
line 120 = $plm3 =~ /arr_(\w+.txt)/;
My problem, I think, is at $plm3 =~ /arr_(\w+.txt)/;. I used it so that I can store the name of an array in $1.
Here's my code:
#!/usr/bin/perl
use warnings;
$idx = 0;
$oldsep = $/;
opendir(DIR, 'c:/downloads/text_files/arrs/');
#files = readdir(DIR);
while ($idx <= $#files )
{
$value = $files[$idx];
if ( $value !~ m/^arr/i)
{
splice #files, $idx, 1;
}
else
{
$idx++;
}
}
foreach $plm (#files)
{
if($plm =~ m/txt$/)
{
open(ARR, "C:/downloads/text_files/arrs/$plm") or die $!;
while(<ARR>)
{ {
chomp($_);
$plm =~ m/arr_(\w+).txt/;
push(#{$1}, $_);
}
close ARR;
}
}
$plm = 0;
$idx = 0;
$stare = <STDIN>;
chomp($stare);
while($stare)
{
foreach $plm2 (#files)
{
if($plm2 =~ m/txt$/)
{
$plm2 =~ m/arr_(\w+).txt/;
if(grep $stare =~ m/$_/i, #{$1})
{
$flag = 1;
}
else
{
$flag = 0;
}
}
}
if($flag == 1)
{
$/ = "%\n";
$plm3 =~ /arr_(\w+.txt)/;
open SUPARARE, "C:/downloads/text_files/replies/$1" or die $!;
etc etc....

First of all, it's always a good idea to use strict pragma -- unless you have a valid reason to avoid it --.
Second, I don't see $plm3 initialized anywhere in your code. You have probably forgot to initialize it.

I think you are assigning something to variable $1 on line 121

Apparently there are some copy/paste issues which negates my initial answer.
Other mistakes, great and small:
You don't use strict. (fatal flaw)
Your opendir is used once, then never closed.
You use global filehandles, instead of lexical (e.g. open my $fh, ...)
Using a complext loop + splice instead of grep (#files=grep /^arr/i, #files)
Using chomp($_) when chomp per default chomps the $_ variable
I don't even know what this line means:
if(grep $stare =~ m/$_/i, #{$1}) {
You seem to be using a pattern match, where $_ is the pattern (which in this case is.. what? Nothing? Anything?), whose return value is used as a grep pattern for an array reference, that may or may not be initialized. A very horrible statement. If it indeed works as intended, the readability is very low.
Redeclaring $/ seems like a frivolous thing to do in this context, but I can't really tell, as the script ends there.

Related

File handle failed to read second time

How can I read from a file handle for the second time inside foreach loop in Perl?
foreach $a (#b){
while(my $line = <IN>){
if($line = /$a/){
print $line;
}
}
}
The above code is not processing the second element from the list #b. How to make it possible?
Your inner loop, while(my $line = <IN>), extracts lines from the IN handle until it reaches the end of the file.
When your outer loop, foreach $a (#b), tries to read from IN again, it's still at end-of-file. The first iteration of the foreach loop consumes all lines from the file, leaving nothing for the other iterations.
There are several possible ways to fix this:
Seek back to the beginning of IN before you attempt to read from it again:
foreach $a (#b){
seek IN, 0, 0
or die "Cannot seek(): $!";
while (my $line = <IN>) {
...
}
}
However, this only works for real files, not pipes or sockets or terminals.
Read the whole file into memory up front, then iterate over a normal array:
my #lines = <IN>;
foreach $a (#b){
foreach my $line (#lines) {
...
}
}
However, if the file is big, this will use a lot of memory.
Switch the order of the two loops:
while (my $line = <IN>) {
foreach $a (#b) {
...
}
}
This is my favorite. Now you only need to read from the file once. #b is already in memory, so you can iterate over it as many times as you want.
Side notes:
Don't use bareword filehandles like IN. Normal variables (such as $IN) are pretty much superior in every way.
Don't use variables called $a or $b. They're a bit special because Perl uses them in sort.
My personal preference is to never use < >. It's weirdly overloaded (it can mean either readline or glob, depending on the exact syntax you use) and it isn't terribly intuitive. Using readline means there's never any syntactic ambiguity and even programmers with no Perl experience can figure out what it does.
With those changes:
while (my $line = readline $IN) {
foreach my $re (#regexes) {
if ($line =~ /$re/) {
print $line;
}
}
}
You read within a while loop until the filehandle is exhausted (reached EOF end of file).
If you don't close and reopen your filehandle you won't read anymore from it in the second iteration of the outer loop.
If the amount of data you read from the filehandle is no so large you could read the file into an array variable and iterate over the content of the array variable.
For example:
my #filecontent = <IN>;
foreach $item_of_b (#b){
foreach my $line_of_file (#filecontent){
if($line_of_file =~ /$item_of_b/){
print $line_of_file;
}
}
}
And $a and $b should not be used as variable names, they are special due to sorting.

The perfect regex to extract a particular string in perl

I have a text file abc.txt that looks like this:
dQdC(sA1B2C3,sC5) = A lot of stuff
a = b = c
Baseball
dQdC(sC2V3X1,sD5) = A lot of stuff again
Now I want create two arrays in perl, one of which will contain A1B2C3 and C2V3X1, the other array will contain C5 and D5. I don't care about the other intermediate lines. To achieve this goal, I am trying this perl script:
for (my $in=0;$in<=$#lines;$in++){
if ($lines[$in]=~/dQdC\(s([A-Z0-9]+?),s([A-Z0-9]+?)\)/) {
print "1111"; #this line is just to check if it is at all going inside the loop
#A = $1;
#B = $2;
}
However, it is not even going inside the loop. So I guess I did something wrong with the regex. Will someone please tell me what I am doing wrong here?
my (#a, #b);
while ($file =~ /^dQdC\(s(\w+),s(\w+)\)/mg) {
push #a, $1;
push #b, $2;
}
or
my (#a, #b);
while (<$fh>) {
if (/^dQdC\(s(\w+),s(\w+)\)/) {
push #a, $1;
push #b, $2;
}
}
Working with parallel arrays isn't nice.
Alternative 1: Hash
my %hash = $file =~ /^dQdC\(s(\w+),s(\w+)\)/mg;
or
my %hash;
while (<$fh>) {
if (/^dQdC\(s(\w+),s(\w+)\)/) {
$hash{$1} = $2;
}
}
Alternative 2: AoA
use List::Util qw( pairs ); # 1.29+
my #pairs = pairs( $file =~ /^dQdC\(s(\w+),s(\w+)\)/mg );
or
my #pairs;
while (<$fh>) {
if (/^dQdC\(s(\w+),s(\w+)\)/) {
push #pairs, [ $1, $2 ];
}
}
If the format of your target lines is always as shown
use warnings;
use strict;
my $file = ...
my (#ary_1, #ary_2);
open my $fh, '<', $file or die "Can't open $file: $!";
while (<$fh>)
{
my ($v1, $v2) = /dQdC\(s([^,]+),s([^\)]+)/ or next;
push #ary_1, $v1;
push #ary_2, $v2;
}
which captures between ( and a , and then between a , and ). The first pattern might as well be s(.*?), as there is no benefit of the negated character class since the following , still need be matched (but I left it with [^...] for consistency with the other one).
Comments
In general better process a file line-by-line, unless there are specific reasons to read it first
C-style loop is rarely needed. To iterate over array index use for my $i (0..$#ary)
Please use warnings; and use strict; always
Try this:
(?<=\(s)([A-Z0-9]+)(?=,)
It matches substrings that come between (s and , using lookbehind and lookahead.
Similarily, use (?<=,s)([A-Z0-9]+)(?=\)) to capture the substrings between ,s and ).
Putting them together, you can create two capturing groups, each containing the different kind of substrings: (A1B2C3, C2V3X1), (C5, D5)

Issue with regex matching in Perl

I am new to Perl and any help will be appreciated. I have 2 variables: $release and $env_type. I want to check if a string contains $release_$env_type, then do something. For example,
$release="beta";
$env_type="testing";
so string is beta_testing
Code snippet:
if ( $_ =~ /${release}_${env_type}/ ) {
#do Something
}
This if condition doesn't get resolved. Kindly let me know what is the correct syntax to make this check? I searched on Google but didn't get any good post..
Kindly help!
I have a file with contents:
admin_vh_c9_simv2_edg=/console,/consolehelp
idminternal_vh_c9_simv2_edg=/oim,/soa-infra
sso_vh_c9_simv2_edg=/oim,/soa-infra,/odsm
my $env_type = "edg";
my $release = "c9_simv2";
#Input file containing contexts
my $idmInternal = "./IdmContexts.conf";
if ( !-e $idmInternal ) {
die "Unable to find the file $idmInternal!\n";
}
open( MYFILE, $idmInternal );
while (<MYFILE>) {
chomp;
if ( $_ =~ /${release}_${env_type}/ ) {
push( #filtered, $_ );
}
}
Your code is fine. The problem is elsewhere. The following prints match.
my $release="beta";
my $env_type="testing";
$_ = "so string is beta_testing";
if ( $_ =~ /${release}_${env_type}/ ) {
print "match\n";
}
Note: /\Q${release}_${env_type}/ would be better. It'll make sure that special characters in the interpolated variables match themselves.
Most likely problem: You read the value of $release and/or $env_type from a file, and forgot to chomp the trailing newline.
If you are using $_ then this will work.
if (m/${release}_${env_type}/)
{
# Do something
}
The m// match operator binds automatically to $_. There is no need to bind it explicitly.
To really tell what is going on, you can inject code before the test. For example compile a regex first and then print it.
# compile the regex first
my $regex = qr/${release}_${env_type}/;
say qq{\$regex="$regex"};
# then print your scanned text
say qq{\$_="$_"};
if ( m/$regex/ ) {
# do something
}
If you're going to explicitly bind to a regex, then use variables:
my $string = $_;
if ( $string =~ m/$regex/ ) {
}
Otherwise, simply match the "context variable" ($_).
if ( m/$regex/ ) {
}
Also, USUW would help spot a few problems, proactively:
# Before everything else
use strict;
use warnings;

Perl Regex match works, but replace does not

I have put together a Perl script to go through a directory and match various keys in the source and output the results to a text file. The match operation works well, however the end goal is to perform a replace operation. The Perl script is as follows:
#!/usr/bin/perl
#use strict;
use warnings;
#use File::Slurp;
#declare variables
my $file = '';
my $verbose = 0;
my $logfile;
my #files = grep {/[.](pas|cmm|ptd|pro)$/i} glob 'C:\users\perry_m\desktop\epic_test\pascal_code\*.*';
#iterate through the files in input directory
foreach $file (#files) {
print "$file\n";
#read the file into a single string
open FILEHANDLE, $file or die $!;
my $string = do { local $/; <FILEHANDLE> };
#perfrom REGEX on this string
########################################################
#fix the include formats to conform to normal PASCAL
$count = 0;
while ($string =~ m/%INCLUDE/g)
{
#%include
$count++;
}
if ($count > 0)
{
print " $count %INCLUDE\n";
}
$count = 0;
while ($string =~ m/INCLUDE/g)
{
#%INCLUDE;
$count++;
}
if ($count > 0)
{
print " $count INCLUDE\n";
}
$count = 0;
while ($string =~ m/(%include\s+')[A-Za-z0-9]+:([A-Za-z0-9]+.[A-Za-z]+')/g)
{
#$1$2;
$count++;
}
if ($count > 0)
{
print " $count XXXX:include \n";
}
}
This produces output as desired, an example is below:
C:\users\perry_m\desktop\epic_test\pascal_code\BRTINIT.PAS
1 INCLUDE
2 XXXX:include
39 external and readonly
However if I change the regex operations to try and implement a replace, using the replacement operation shown in the commented lines above, the scripts hangs and never returns. I imagine it is somehow related to memory, but I am new to Perl. I was also trying to avoid parsing the file by line if possible.
Example:
while ($string =~ s/%INCLUDE/%include/g)
{
#%include
$count++;
}
and
while ($string =~ s/(%include\s+')[A-Za-z0-9]+:([A-Za-z0-9]+.[A-Za-z]+')/$1$2;/g)
{
#$1$2;
$count++;
}
Edit: simplified the examples
The problem is with your while loops. A loop like
while ($string =~ m/INCLUDE/g) { ... }
will execute once for each ocurrence of INCLUDE in the target string, but a subtitution like
$string =~ s/INCLUDE/%INCLUDE;/
will make all of the replacement in one go and retuen the number of replacements made. So a loop
while ($string =~ s/INCLUDE/%INCLUDE;/g) { ... }
will endlessly add more and more percentage signs before and semicolons after every INCLUDE.
To find the number of replacements made, change all your loops like this to just
$count = $string =~ s/INCLUDE/%INCLUDE;/g
the pattern in s/INCLUDE/%INCLUDE/g will match the replacement also, so if you're running it in a while loop it will run forever (until you run out of memory).
s///g will replace all matches in a single shot so you very rarely will need to put it in a loop. Same goes for m//g, it will do the counting in a single step if you put it in list context.

How to match exactly two empty lines

I have a question about regular expressions. I have a file and I need to parse it in such a way that I could distinguish some specific blocks of text in it. These blocks of text are separated by two empty lines (there are blocks which are separated by 3 or 1 empty lines but I need exactly 2). So I have a piece of code and this is \s*$^\s*$/ regular expression I think should match, but it does not.
What is wrong?
$filename="yu";
open($in,$filename);
open(OUT,">>out.text");
while($str=<$in>)
{
unless($str = /^\s*$^\s*$/){
print "yes";
print OUT $str;
}
}
close($in);
close(OUT);
Cheers,
Yuliya
By default, Perl reads files a line at a time, so you won't see multiple new lines. The following code selects text terminated by a double new line.
local $/ = "\n\n" ;
while (<> ) {
print "-- found $_" ;
}
New Answer
After having problems excluding >2 empty lines, and a good nights sleep here is a better method that doesn't even need to slurp.
#!/usr/bin/perl
use strict;
use warnings;
my $file = 'yu';
my #blocks; #each element will be an arrayref, one per block
#that referenced array will hold lines in that block
open(my $fh, '<', $file);
my $empty = 0;
my $block_num = 0;
while (my $line = <$fh>) {
chomp($line);
if ($line =~ /^\s*$/) {
$empty++;
} elsif ($empty == 2) { #not blank and exactly 2 previous blanks
$block_num++; # move on to next block
$empty = 0;
} else {
$empty = 0;
}
push #{ $blocks[$block_num] }, $line;
}
#write out each block to a new file
my $file_num = 1;
foreach my $block (#blocks) {
open(my $out, '>', $file_num++ . ".txt");
print $out join("\n", #$block);
}
In fact rather than store and write later, you could simply write to one file per block as you go:
#!/usr/bin/perl
use strict;
use warnings;
my $file = 'yu';
open(my $fh, '<', $file);
my $empty = 0;
my $block_num = 1;
open(OUT, '>', $block_num . '.txt');
while (my $line = <$fh>) {
chomp($line);
if ($line =~ /^\s*$/) {
$empty++;
} elsif ($empty == 2) { #not blank and exactly 2 previous blanks
close(OUT); #just learned this line isn't necessary, perldoc -f close
open(OUT, '>', ++$block_num . '.txt');
$empty = 0;
} else {
$empty = 0;
}
print OUT "$line\n";
}
close(OUT);
use 5.012;
open my $fh,'<','1.txt';
#slurping file
local $/;
my $content = <$fh>;
close $fh;
for my $block ( split /(?<!\n)\n\n\n(?!\n)/,$content ) {
say 'found:';
say $block;
}
Deprecated in favor of new answer
justintime's answer works by telling perl that you want to call the end of a line "\n\n", which is clever and will work well. One exception is that this must match exactly. By the regex you are using it makes it seem like there might be whitespace on the "empty" lines, in which case this will not work. Also his method will split even on more than 2 linebreaks, which was not allowed in the OP.
For completeness, to do it the way you were asking, you need to slurp the whole file into a variable (if the file is not so large as to use all your memory, probably fine in most cases).
I would then probably say to use the split function to split the block of text into an array of chunks. Your code would then look something like:
#!/usr/bin/perl
use strict;
use warnings;
my $file = 'yu';
my $text;
open(my $fh, '<', $file);
{
local $/; enables slurp mode inside this block
$text = <$fh>;
}
close($fh);
my #blocks = split(
/
(?<!\n)\n #check to make sure there isn't another \n behind this one
\s*\n #first whitespace only line
\s*\n #second "
(?!\n) #check to make sure there isn't another \n after this one
/x, # x flag allows comments and whitespace in regex
$text
);
You can then do operations on the array. If I understand your comment to justintime's answer, you want to write each block out to a different file. That would look something like
my $file_num = 1;
foreach my $block (#blocks) {
open(my $out, '>', $file_num++ . ".txt");
print $out $block;
}
Notice that since you open $out lexically (with my) when it reaches the end of the foreach block, the $out variable dies (i.e. "goes out of scope"). When this happens to a lexical filehandle, the file is automatically closed. And you can do a similar thing to that with justintime's method as well:
local $/ = "\n\n" ;
my $file_num = 1;
while (<>) {
open(my $out, '>', $file_num++ . ".txt");
print $out $block;
}