Put regex match only into array, not entire line - regex

I am trying to check each line of a document for a regex match.
If the line has a match, I want to push the match only into an array.
In the code below, I thought that using the g operator at the end of the regex delimiters would make $lines value the regex match only. Instead $lines value is the entire line of the document containing the match...
my $line;
my #table;
while($line = <$input>){
if($line =~ m/foo/g){
push (#table, $line);
}
}
print #table;
If any one could help me get my matches into an array, it is much appreciated.
Thanks.
p.s.
Still learning... so any explanations of concepts I may have missed is also much appreciated.

g modifier in s///g is for global search and replace.
If you just want to push matching pattern into an array, you need to capture matching pattern enclosed by (). Captured elements are stored in variable $1, $2, etc..
Try following modification to your code:
my #table;
while(my $line = <$input>){
if($line =~ m/(foo)/){
push (#table, $1);
}
}
print #table;
Refer to this documentation for more details.
Or if you want to avoid needless use of global variables,
my #table;
while(my $line = <$input>){
if(my #captures = $line =~ m/(foo)/){
push #table, #captures;
}
}
which simplifies to
my #table;
while(my $line = <$input>){
push #table, $line =~ m/(foo)/;
}

Expanding on jkshah's answer a little, I'm explicitly storing the matches in #matches instead of using the magic variable $1 which I find a little harder to read.
"__DATA__" is a simple way to store lines in a filehandle in a perl source file.
use strict;
use warnings;
my #table;
while(my $line = <DATA>){
my #matches = $line =~ m/(foo)/;
if(#matches) {
warn "found: " . join(',', #matches );
push(#table,#matches);
}
}
print #table;
__DATA__
herp de derp foo
yerp fool foo flerp
heyhey

If you file is not very big(100-500mb fine for 2 GB RAM) then you can use below.Here I am extracting numbers if matched in line.It will be much faster than the foreach loop.
#!/usr/bin/perl
open my $file_h,"<abc" or die "ERROR-$!";
my #file = <$file_h>;
my $file_cont = join(' ',#file);
#file =();
my #match = $file_cont =~ /\d+/g;
print "#match";

Related

how to solve "Use of uninitialized value $2 in concatenation (.) or string at"

Below is my code. I want to print the data $1 and $2 in one row and split it with ,. Why can't I print the data?
#!/usr/intel/bin/perl
use strict;
use warnings;
use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
my $input = "par_disp_fabric.all_max_lowvcc_qor.rpt.gz";
my $output = "par_disp_fabric.all_max_lowvcc_qor.txt";
gunzip $input => $output
or die "gunzip failed: $GunzipError\n";
open (FILE, '<',"$output") or die "Cannot open $output\n";
while (<FILE>) {
my $line = $_;
chomp ($line);
if ($line =~ m/^\s+Timing Path Group \'(\S+)\'/) {
$line = $1;
if ($line =~ m/^\s+Levels of Logic:\s+(\S+)/) {
$line = $2;
}
}
print "$1,$2\n";
}
close (FILE);
The meat of your program is here:
if ($line =~ m/^\s+Timing Path Group \'(\S+)\'/) {
$line = $1;
if ($line =~ m/^\s+Levels of Logic:\s+(\S+)/) {
$line = $2;
}
}
The regex capturing variables ($1, $2, etc) are set when you match a string against a regex that contains sets of capturing parentheses. The first capturing parentheses set the value of $1, the second capturing parentheses set the value of $2, and so on. In order for $2 to be given a value, you need to match against a regex that contains two sets of capturing parentheses.
Both of your regexes only contain a single set of capturing parentheses. Therefore only $1 will be set on each of your matches. $2 will never be given a value - leading to the warning that you are seeing.
You need to rethink the logic in your code. I'm not sure why you think $2 will have a value here. Your code is a little confusing, so I'm unable to offer a more specific solution.
I can, however, give you some more general advice:
Use lexical filehandles and the three-arg version of open().
open my $fh, '<', "$output"
There is no need for the quotes around $output.
open my $fh, '<', $output
I know why you're doing it, but $output is a potentially confusing name for a file that you read from. Consider changing it.
Always include $! in an open() error message.
open my $fh, '<', $output or die "Cannot open '$output': $!\n";
Your $line variable seems unnecessary. Why not just keep the row data in $_, which will simplify your code:
while (<$fh>) {
chomp; # works on $_ by default
if (/some regex/) { # works on $_ by default
# etc...
}
}

Load regex from file and match groups with it in Perl

I have a file containing regular expressions, e.g.:
City of (.*)
(.*) State
Now I want to read these (line by line), match them against a string, and print out the extraction (matched group). For example: The string City of Berlin should match with the first expression City of (.*) from the file, after that Berlin should be extracted.
This is what I've got so far:
use warnings;
use strict;
my #pattern;
open(FILE, "<pattern.txt"); # open the file described above
while (my $line = <FILE>) {
push #pattern, $line; # store it inside the #pattern variable
}
close(FILE);
my $exampleString = "City of Berlin"; # line that should be matched and
# Berlin should be extracted
foreach my $p (#pattern) { # try each pattern
if (my ($match) = $exampleString =~ /$p/) {
print "$match";
}
}
I want Berlin to be printed.
What happens with the regex inside the foreach loop?
Is it not compiled? Why?
Is there even a better way to do this?
Your patterns contain a newline character which you need to chomp:
while (my $line = <FILE>) {
chomp $line;
push #pattern, $line;
}
First off - chomp is the root of your problem.
However secondly - your code is also very inefficient. Rather than checking patterns in a foreach loop, consider instead compiling a regex in advance:
#!/usr/bin/env perl
use strict;
use warnings;
# open ( my $pattern_fh, '<', "pattern.txt" ) or die $!;
my #patterns = <DATA>;
chomp(#patterns);
my $regex = join( '|', #patterns );
$regex = qr/(?:$regex)/;
print "Using regex of: $regex\n";
my $example_str = 'City of Berlin';
if ( my ($match) = $example_str =~ m/$regex/ ) {
print "Matched: $match\n";
}
Why is this better? Well, because it scales more efficiently. With your original algorithm - if I have 100 lines in the patterns file, and 100 lines to check as example str, it means making 10,000 comparisons.
With a single regex, you're making one comparison on each line.
Note - normally you'd use quotemeta when reading in regular expressions, which will escape 'meta' characters. We don't want to do this in this case.
If you're looking for even more concise, you can use map to avoid needing an intermediate array:
my $regex = join( '|', map { chomp; $_ } <$pattern_fh> );
$regex = qr/(?:$regex)/;
print "Using regex of: $regex\n";
my $example_str = 'City of Berlin';
if ( my ($match) = $example_str =~ m/$regex/ ) {
print "Matched: $match\n";
}

System command execution using Perl

I have a Perl script which runs a perforce command and stores the result in a variable $command.
Then it is stored in a file log.txt, and by using a regex the relevant data is taken out.
When I run that command alone the following things pop out:
4680 p4exp/v68 PJIANG-015394 25:34:19 IDLE none
8869 unnamed p4-python R integration semiconductor-project-trunktip turbolinuxclient 01:33:52 IDLE none
8870 unnamed p4-python R integration remote-trunktip-osxclient 01:33:52
The code goes as follows:
#! /usr/bin/env perl
use strict;
use warnings;
use autodie;
my $command = qx |p4 monitor show -ale|;
open FH, '>>', "log.txt";
print FH $command;
close FH;
open my $log_fh, '<', '/root/log.txt';
my %stat;
while ($line = <$log_fh>) {
chomp $line;
next if not $line =~ /(\d+)\s+/;
my $killid = $1;
if ($line =~ /R\s+integration/ and $line =~ /IDLE\s+none$/) {
my $killid_details = $line;
$stat{$killid} = $killid_details;
}
}
close $log_fh;
my $killpro;
foreach my $kill (keys %stat) {
print "$kill\n";
}
The following gets the number 8869 but how to do it without log.txt. Is using an array a better way to do it or hash is fine?
Please correct me as I am still learning.
Seems like your main stumbling block is getting line-by-line input for your loop?
Splitting on newlines should do the trick:
my $killid;
my #lines = split("\n", $command); #split on newlines
for my $line (#lines) {
next if not $line =~ /(\d+)\s+/;
my $id = $1;
if ($line =~ /R\s+integration/ and $line =~ /IDLE\s+none$/){
$killid = $id;
}
}
One caveat: you mentioned an output of 8870, but I'm getting 8869. The regexps you gave are looking for a line with "integration" and "IDLE none", and for your example input that appears to match 8869.
A hash is fine, though if you're using only one key in it (which seems to be the case), you might as well just use a single variable.
If you assign the result of a qx construct to an array instead of a scalar, then it will be split into lines automatically for you. This code demonstrates.
use strict;
use warnings;
my #lines = qx|p4 monitor show -ale|;
my %stat;
for my $line (#lines) {
chomp $line;
next unless $line =~ /(\d+)\s+/;
my $killid = $1;
if ($line =~ /R\s+integration/ and $line =~ /IDLE\s+none$/) {
$stat{$killid} = $line;
}
}
print "$_\n" for keys %stat;

detecting specific text from a line and saving it in array

I have a text file which consists of different lines it looks like
Destination|203.190.242.69|reached|203.190.244.6
Destination|208.109.249.198|reached|212.142.1.1
Destination|94.75.253.170|reached|85.17.100.90
Destination|212.112.234.228|reached|4.69.143.210
Destination|80.146.246.42|reached|192.168.1.1
Destination|122.209.193.217|reached|59.128.3.65
Destination|66.77.197.179|reached|66.77.197.251
Destination|195.254.227.65|reached|213.21.128.141
Destination|125.208.8.253|reached|125.208.15.254
I need to save both the IPs and save them in different arrays. So there will
be two arrays, one Destination and one reached. How can I do this. At the moment I have written a code to detect the IP but that does not seem to work.
while (my $line = <$in>) {
my $traceroute;
if ($line =~ /(^Destination)/) {
print "DUDE\n";
my $ip = $line =~ /(\d+\.\d+\.\d+\.\d+)$/s;
#$traceroute = $2;
print "$ip\n";
}
}
This should do the trick:
while (my $line = <DATA>) {
if($line =~ /(^Destination)/){
my($dest, $reach) = $line =~ /(\d+\.\d+\.\d+\.\d+)/g;
print "$dest $reach\n";
}
}
__DATA__
Destination|203.190.242.69|reached|203.190.244.6
Destination|208.109.249.198|reached|212.142.1.1
Destination|94.75.253.170|reached|85.17.100.90
Destination|212.112.234.228|reached|4.69.143.210
Destination|80.146.246.42|reached|192.168.1.1
Destination|122.209.193.217|reached|59.128.3.65
Destination|66.77.197.179|reached|66.77.197.251
Destination|195.254.227.65|reached|213.21.128.141
Destination|125.208.8.253|reached|125.208.15.254
Performing a regex in scalar context, just returns the number of successful matches or 0 if it fails. Assigning the result of a RegEx to an array or a list of variables puts it in list context, in which the RegEx returns the captured values.
The /g modifier matches the RegEx not only once, but as often, as it fits in the string. Read perldoc perlretut for more
Since you want to save the addresses in arrays, and none of the previous solutions explicitly does that, I'm posting another answer:
use strict;
my (#destination, #reached);
foreach my $line (<DATA>) {
chomp $line;
my #fields = split '\|', $line;
push #destination, $fields[1];
push #reached, $fields[3];
}
use Data::Dumper;
print "Destinations:\n".Dumper(#destination);
print "Reached:\n".Dumper(#reached);
__DATA__
Destination|203.190.242.69|reached|203.190.244.6
Destination|208.109.249.198|reached|212.142.1.1
Destination|94.75.253.170|reached|85.17.100.90
Destination|212.112.234.228|reached|4.69.143.210
Destination|80.146.246.42|reached|192.168.1.1
Destination|122.209.193.217|reached|59.128.3.65
Destination|66.77.197.179|reached|66.77.197.251
Destination|195.254.227.65|reached|213.21.128.141
Destination|125.208.8.253|reached|125.208.15.254
Split on the separator and save the items at indices 1 and 3.
perl -aF'\|' -lne 'next unless $F[0] eq "Destination"; print "$F[1]|$F[3]"' input >output
You may want a different output format and/or do something more, of course.
while (my $line = <$in>) {
chomp $line;
if ($line =~ /^Destination\|(\d+\.\d+\.\d+\.\d+)\|reached\|(\d+\.\d+\.\d+\.\d+)$/) {
my ($d, $r) = ($1,$2);
print "$d => $r\n";
}
}

How can I extract the substring in the last set of parentheses using Perl?

I am using Perl to parse out sizes in a string. What is the regex that I could use to accomplish this:
Example Data:
Sleepwell Mattress (Twin)
Magic Nite (Flip Free design) Mattress (Full XL)
Result:
Twin
Full XL
I know that I need to start at the end of the string and parse out the first set of parenthesis just not sure how to do it.
#!/usr/bin/perl
$file = 'input.csv';
open (F, $file) || die ("Could not open $file!");
while ($line = <F>)
{
($field1,$field2,$field3,$field4,$field5,$field6,$field7, $field8, $field9) = split ',', $line;
if ( $field1 =~ /^.*\((.*)\)/ ) {
print $1;
}
#print "$field1,$field2,$field3,$field4,$field5,$field6,$field7, $field8, $field9, $1\n";
}
close (F);
Not getting any results. Maybe I am not doing this right.
The answer depends on if the size information you are looking for always appears within parentheses at the end of the string. If that is the case, then your task is simple:
#!/usr/bin/perl
use strict;
use warnings;
while (<DATA> ) {
last unless /\S/;
my ($size) = /\( ( [^)]+ ) \)$/x;
print "$size\n";
}
__DATA__
Sleepwell Mattress (Twin)
Magic Nite (Flip Free design) Mattress (Full XL)
Output:
C:\Temp> xxl
Twin
Full XL
Note that the code you posted can be better written as:
#!/usr/bin/perl
use strict;
use warnings;
my ($input_file) = #ARGV;
open my $input, '<', $input_file
or die "Could not open '$input_file': $!";
while (my $line = <$input>) {
chomp $line;
my #fields = split /,/, $line;
if ($field[0] =~ /\( ( [^)]+ ) \)$/x ) {
print $1;
}
print join('|', #fields), "\n";
}
close $input;
Also, you should consider using Text::xSV or Text::CSV_XS to process CSV files.
The following regular expression will match the content at the end of the string:
m/\(([^)]+)\)$/m
The m at then end matches mutli-line strings and changes the $ to match at the end of the line, not the end of the string.
[edited to add the bit about multi-line strings]
Assuming your data arrives line by line, and you are only interested in the contents of the last set of parens:
if ( $string =~ /^.*\((.*)\)/ ) {
print $1;
}
fancy regex is not really necessary here. make it easier on yourself. you can do splitting on "[space](" and get the last element. Of course, this is when the data you want to get is always at the last...and have parenthesis
while(<>){
#a = split / \(/, $_;
print $a[-1]; # get the last element. do your own trimming
}
This is the answer as expressed in Perl5:
my $str = "Magic Nite (Flip Free design) Mattress (Full XL)";
$str =~ m/.*\((.*)\)/;
print "$1\r\n";