Using \Q...\E makes regex invalid Perl - regex

below is my script that scans each line of the input file against list of annotations. for every occurance
I tag the term from the line with the annotation tag. The regex works perfectly without \Q..\E operator, but if I don't include \Q..\E I get a range error. So In the situation below I have to keep substitution valid and at the same time take care of the range. Hope the question is clear.
while (<FILE>) {
chomp $_;
foreach $word (#array) {
#cells = split /\t/, $word;
$value = $cells[0];
$replace = $cells[1];
chomp $value;
chomp $replace;
$_=~s/\Q\b[\w\-]*$value[\w\-]*\b\E)/<$replace>$&<\\$replace>/ig;
}
print $_,"\n";
}

My guess is that your $value contains regex meta characters. This is easy enough to solve; either use $value = quotemeta $value; before matching and leave out \Q...\E completely, or put the \Q...\E around $value only: $_ =~ s/\b[\w\-]*\Q$value\E[\w\-]*\b/.../ig;

Related

Content of unmatched string

I am parsing input from stdin, which is in the following format:
1
2
3
done
My current code is:
while(<> =~ /(\d+)/) {
# do something with $1
}
# I want to access the line following the last number here
which is fine for the 1st part of the file (with the list of numbers).
However, I would like the content of the line immediately following the last number.
Is there any elegant way to do it?
You could use:
my $line;
while( ($line = <>) =~ /^(\d+)$/ ) {
# do something with $1
}
chomp $line;
# Do something with $line (done)
As another answer states, it is a good idea to use chomp with a line of input. It will deal with the newline char or chars in a platform-independent way. If you use a regex to match the number, $1 will be "clean".
Also as another answer states, /(\d+)/ will match a number anywhere in the line. /^(\d+)$/ will match a line that only contains a number.
Another way:
foreach my $line (<>) {
if( $line =~ /^(\d+)$/ ) {
# Do something with $1
}
else {
chomp $line;
# Do something with $line (done)
}
}
This will let you deal with multiple or even alternating lines of numbers and other text.
while (<>) {
chomp;
last if !/^\d+\z/;
... $_ ...
}
die("Premature EOF") if !defined($_);
... $_ ...
You can do it this way
my $var = <>;
chomp $var;
while($var =~ /^(\d+)$/)
{
#here do stuff with $var, containing only numbers
$var = <>;
chomp $var;
}
# here do stuff with $var, containing now a word
other thing: by changing /(\d+)/ for /^(\d+)$/ you make sure your entry contains ONLY numbers ;)

Load regex from file and match groups with it in Perl

I have a file containing regular expressions, e.g.:
City of (.*)
(.*) State
Now I want to read these (line by line), match them against a string, and print out the extraction (matched group). For example: The string City of Berlin should match with the first expression City of (.*) from the file, after that Berlin should be extracted.
This is what I've got so far:
use warnings;
use strict;
my #pattern;
open(FILE, "<pattern.txt"); # open the file described above
while (my $line = <FILE>) {
push #pattern, $line; # store it inside the #pattern variable
}
close(FILE);
my $exampleString = "City of Berlin"; # line that should be matched and
# Berlin should be extracted
foreach my $p (#pattern) { # try each pattern
if (my ($match) = $exampleString =~ /$p/) {
print "$match";
}
}
I want Berlin to be printed.
What happens with the regex inside the foreach loop?
Is it not compiled? Why?
Is there even a better way to do this?
Your patterns contain a newline character which you need to chomp:
while (my $line = <FILE>) {
chomp $line;
push #pattern, $line;
}
First off - chomp is the root of your problem.
However secondly - your code is also very inefficient. Rather than checking patterns in a foreach loop, consider instead compiling a regex in advance:
#!/usr/bin/env perl
use strict;
use warnings;
# open ( my $pattern_fh, '<', "pattern.txt" ) or die $!;
my #patterns = <DATA>;
chomp(#patterns);
my $regex = join( '|', #patterns );
$regex = qr/(?:$regex)/;
print "Using regex of: $regex\n";
my $example_str = 'City of Berlin';
if ( my ($match) = $example_str =~ m/$regex/ ) {
print "Matched: $match\n";
}
Why is this better? Well, because it scales more efficiently. With your original algorithm - if I have 100 lines in the patterns file, and 100 lines to check as example str, it means making 10,000 comparisons.
With a single regex, you're making one comparison on each line.
Note - normally you'd use quotemeta when reading in regular expressions, which will escape 'meta' characters. We don't want to do this in this case.
If you're looking for even more concise, you can use map to avoid needing an intermediate array:
my $regex = join( '|', map { chomp; $_ } <$pattern_fh> );
$regex = qr/(?:$regex)/;
print "Using regex of: $regex\n";
my $example_str = 'City of Berlin';
if ( my ($match) = $example_str =~ m/$regex/ ) {
print "Matched: $match\n";
}

Put regex match only into array, not entire line

I am trying to check each line of a document for a regex match.
If the line has a match, I want to push the match only into an array.
In the code below, I thought that using the g operator at the end of the regex delimiters would make $lines value the regex match only. Instead $lines value is the entire line of the document containing the match...
my $line;
my #table;
while($line = <$input>){
if($line =~ m/foo/g){
push (#table, $line);
}
}
print #table;
If any one could help me get my matches into an array, it is much appreciated.
Thanks.
p.s.
Still learning... so any explanations of concepts I may have missed is also much appreciated.
g modifier in s///g is for global search and replace.
If you just want to push matching pattern into an array, you need to capture matching pattern enclosed by (). Captured elements are stored in variable $1, $2, etc..
Try following modification to your code:
my #table;
while(my $line = <$input>){
if($line =~ m/(foo)/){
push (#table, $1);
}
}
print #table;
Refer to this documentation for more details.
Or if you want to avoid needless use of global variables,
my #table;
while(my $line = <$input>){
if(my #captures = $line =~ m/(foo)/){
push #table, #captures;
}
}
which simplifies to
my #table;
while(my $line = <$input>){
push #table, $line =~ m/(foo)/;
}
Expanding on jkshah's answer a little, I'm explicitly storing the matches in #matches instead of using the magic variable $1 which I find a little harder to read.
"__DATA__" is a simple way to store lines in a filehandle in a perl source file.
use strict;
use warnings;
my #table;
while(my $line = <DATA>){
my #matches = $line =~ m/(foo)/;
if(#matches) {
warn "found: " . join(',', #matches );
push(#table,#matches);
}
}
print #table;
__DATA__
herp de derp foo
yerp fool foo flerp
heyhey
If you file is not very big(100-500mb fine for 2 GB RAM) then you can use below.Here I am extracting numbers if matched in line.It will be much faster than the foreach loop.
#!/usr/bin/perl
open my $file_h,"<abc" or die "ERROR-$!";
my #file = <$file_h>;
my $file_cont = join(' ',#file);
#file =();
my #match = $file_cont =~ /\d+/g;
print "#match";

detecting specific text from a line and saving it in array

I have a text file which consists of different lines it looks like
Destination|203.190.242.69|reached|203.190.244.6
Destination|208.109.249.198|reached|212.142.1.1
Destination|94.75.253.170|reached|85.17.100.90
Destination|212.112.234.228|reached|4.69.143.210
Destination|80.146.246.42|reached|192.168.1.1
Destination|122.209.193.217|reached|59.128.3.65
Destination|66.77.197.179|reached|66.77.197.251
Destination|195.254.227.65|reached|213.21.128.141
Destination|125.208.8.253|reached|125.208.15.254
I need to save both the IPs and save them in different arrays. So there will
be two arrays, one Destination and one reached. How can I do this. At the moment I have written a code to detect the IP but that does not seem to work.
while (my $line = <$in>) {
my $traceroute;
if ($line =~ /(^Destination)/) {
print "DUDE\n";
my $ip = $line =~ /(\d+\.\d+\.\d+\.\d+)$/s;
#$traceroute = $2;
print "$ip\n";
}
}
This should do the trick:
while (my $line = <DATA>) {
if($line =~ /(^Destination)/){
my($dest, $reach) = $line =~ /(\d+\.\d+\.\d+\.\d+)/g;
print "$dest $reach\n";
}
}
__DATA__
Destination|203.190.242.69|reached|203.190.244.6
Destination|208.109.249.198|reached|212.142.1.1
Destination|94.75.253.170|reached|85.17.100.90
Destination|212.112.234.228|reached|4.69.143.210
Destination|80.146.246.42|reached|192.168.1.1
Destination|122.209.193.217|reached|59.128.3.65
Destination|66.77.197.179|reached|66.77.197.251
Destination|195.254.227.65|reached|213.21.128.141
Destination|125.208.8.253|reached|125.208.15.254
Performing a regex in scalar context, just returns the number of successful matches or 0 if it fails. Assigning the result of a RegEx to an array or a list of variables puts it in list context, in which the RegEx returns the captured values.
The /g modifier matches the RegEx not only once, but as often, as it fits in the string. Read perldoc perlretut for more
Since you want to save the addresses in arrays, and none of the previous solutions explicitly does that, I'm posting another answer:
use strict;
my (#destination, #reached);
foreach my $line (<DATA>) {
chomp $line;
my #fields = split '\|', $line;
push #destination, $fields[1];
push #reached, $fields[3];
}
use Data::Dumper;
print "Destinations:\n".Dumper(#destination);
print "Reached:\n".Dumper(#reached);
__DATA__
Destination|203.190.242.69|reached|203.190.244.6
Destination|208.109.249.198|reached|212.142.1.1
Destination|94.75.253.170|reached|85.17.100.90
Destination|212.112.234.228|reached|4.69.143.210
Destination|80.146.246.42|reached|192.168.1.1
Destination|122.209.193.217|reached|59.128.3.65
Destination|66.77.197.179|reached|66.77.197.251
Destination|195.254.227.65|reached|213.21.128.141
Destination|125.208.8.253|reached|125.208.15.254
Split on the separator and save the items at indices 1 and 3.
perl -aF'\|' -lne 'next unless $F[0] eq "Destination"; print "$F[1]|$F[3]"' input >output
You may want a different output format and/or do something more, of course.
while (my $line = <$in>) {
chomp $line;
if ($line =~ /^Destination\|(\d+\.\d+\.\d+\.\d+)\|reached\|(\d+\.\d+\.\d+\.\d+)$/) {
my ($d, $r) = ($1,$2);
print "$d => $r\n";
}
}

Matching a regular expression multiple times with Perl

Noob question here. I have a very simple perl script and I want the regex to match multiple parts in the string
my $string = "ohai there. ohai";
my #results = $string =~ /(\w\w\w\w)/;
foreach my $x (#results){
print "$x\n";
}
This isn't working the way i want as it only returns ohai. I would like it to match and print out ohai ther ohai
How would i go about doing this.
Thanks
Would this do what you want?
my $string = "ohai there. ohai";
while ($string =~ m/(\w\w\w\w)/g) {
print "$1\n";
}
It returns
ohai
ther
ohai
From perlretut:
The modifier "//g" stands for global matching and allows the
matching operator to match within a
string as many times as possible.
Also, if you want to put the matches in an array instead you can do:
my $string = "ohai there. ohai";
my #matches = ($string =~ m/(\w\w\w\w)/g);
foreach my $x (#matches) {
print "$x\n";
}
Or you could do this
my $string = "ohai there. ohai";
my #matches = split(/\s/, $string);
foreach my $x (#matches) {
print "$x\n";
}
The split function in this case splits on spaces and prints
ohai
there.
ohai