Capturing groups in a variable regexp in Perl - regex

I have a bunch of matches that I need to make, and they all use the same code, except for the name of the file to read from, and the regexp itself. Therefore, I want to turn the match into a procedure that just accepts the filename and regexp as a string. When I use the variable to try to match, though, the special capture variables stopped being set.
$line =~ /(\d+)\s(\d+)\s/;
That code sets $1 and $2 correctly, but the following leaves them undefined:
$regexp = "/(\d+)\s(\d+)\s/";
$line =~ /$regexp/;
Any ideas how I can get around this?
Thanks,
Jared

Use qr instead of quotes:
$regexp = qr/(\d+)\s(\d+)\s/;
$line =~ /$regexp/;

Quote your string using the perl regex quote-like operator qr
$regexp = qr/(\d+)\s(\d+)\s/;
This operator quotes (and possibly compiles) its STRING as a regular expression.
See the perldoc page for more info:
http://perldoc.perl.org/functions/qr.html

Quote your regex string usin qr :
my $regex = qr/(\d+)\s(\d+)\s/;
my $file =q!/path/to/file!;
foo($file, $regex);
then in the sub :
sub foo {
my $file = shift;
my $regex = shift;
open my $fh, '<', $file or die "can't open '$file' for reading: $!";
while (my $line=<$fh>) {
if ($line =~ $regex) {
# do stuff
}
}

Related

How to match strings with regex pattern like [aaa-bbb.com] in perl

I have file which have domain name inside "[" "]" brackets. I want to check whether a particular domain name is present or not.
sub main
{
my $file = '/home/deeps/sample.txt';
open(FH, $file) or die("File not found");
my $host = "deeps-cet.helll.com";
my match_patt = "\[$host\]";
while (my $String = <FH>)
{
if($String =~ $match_patt)
{
print "match";
}
}
close(FH);
}
main();
The above code throws error - Invalid [] range "s-c" in regex. help to resolve it.
Use quotemeta to escape ASCII non-"word" characters, that could have a special meaning in the regex
my $match_patt = quotemeta "[$host]";
Or use \Q escape right in the regex, implemented using quotemeta. See docs.
What happens in your code is that the well-meant escape of the bracket, \[, is evaluated already under the double quotes when you form the pattern, so after "\[$host\]" is assigned to $match_patt then that variable ends up with the string [deeps-cet.helll.com]. These [] are treated as the range operator in the regex and fail because of the "backwards" s-c range.†
This can be seen with the pattern built using non-interpolating single quotes for \[
my $match_patt = '\[' . $host . '\]';
which now works. But of course it is in principle better to use quotemeta.
† This is really lucky -- if the range were valid, like ac-sb.etc, then this would be a legitimate pattern inside [] which would silently do completely wrong things.
Bellow is corrected code, if you plan to use a variable for regular expression for this purpose available my $regex = qr/..../, and when you do match you should use construction $variable =~ /$regex/;
use strict;
use warnings;
use feature 'say';
my $fname = shift || '/home/deeps/sample.txt';
my $host = shift || 'deeps-cet.helll.com';
search($fname,$host);
sub search {
my $fname = shift;
my $host = shift;
my $regex = qr/\[$host\]/;
open my $fh, '<', $fname
or die "Can't open $fname";
/$regex/ && say "match" while <$fh>;
close $fh;
}

how to solve "Use of uninitialized value $2 in concatenation (.) or string at"

Below is my code. I want to print the data $1 and $2 in one row and split it with ,. Why can't I print the data?
#!/usr/intel/bin/perl
use strict;
use warnings;
use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
my $input = "par_disp_fabric.all_max_lowvcc_qor.rpt.gz";
my $output = "par_disp_fabric.all_max_lowvcc_qor.txt";
gunzip $input => $output
or die "gunzip failed: $GunzipError\n";
open (FILE, '<',"$output") or die "Cannot open $output\n";
while (<FILE>) {
my $line = $_;
chomp ($line);
if ($line =~ m/^\s+Timing Path Group \'(\S+)\'/) {
$line = $1;
if ($line =~ m/^\s+Levels of Logic:\s+(\S+)/) {
$line = $2;
}
}
print "$1,$2\n";
}
close (FILE);
The meat of your program is here:
if ($line =~ m/^\s+Timing Path Group \'(\S+)\'/) {
$line = $1;
if ($line =~ m/^\s+Levels of Logic:\s+(\S+)/) {
$line = $2;
}
}
The regex capturing variables ($1, $2, etc) are set when you match a string against a regex that contains sets of capturing parentheses. The first capturing parentheses set the value of $1, the second capturing parentheses set the value of $2, and so on. In order for $2 to be given a value, you need to match against a regex that contains two sets of capturing parentheses.
Both of your regexes only contain a single set of capturing parentheses. Therefore only $1 will be set on each of your matches. $2 will never be given a value - leading to the warning that you are seeing.
You need to rethink the logic in your code. I'm not sure why you think $2 will have a value here. Your code is a little confusing, so I'm unable to offer a more specific solution.
I can, however, give you some more general advice:
Use lexical filehandles and the three-arg version of open().
open my $fh, '<', "$output"
There is no need for the quotes around $output.
open my $fh, '<', $output
I know why you're doing it, but $output is a potentially confusing name for a file that you read from. Consider changing it.
Always include $! in an open() error message.
open my $fh, '<', $output or die "Cannot open '$output': $!\n";
Your $line variable seems unnecessary. Why not just keep the row data in $_, which will simplify your code:
while (<$fh>) {
chomp; # works on $_ by default
if (/some regex/) { # works on $_ by default
# etc...
}
}

Perl: Empty $1 regex value when matching?

Readers,
I have the following regex problem:
code
#!/usr/bin/perl -w
use 5.010;
use warnings;
my $filename = 'input.txt';
open my $FILE, "<", $filename or die $!;
while (my $row = <$FILE>)
{ # take one input line at a time
chomp $row;
if ($row =~ /\b\w*a\b/)
{
print "Matched: |$`<$&>$'|\n"; # the special match vars
print "\$1 contains '$1' \n";
}
else
{
#print "No match: |$row|\n";
}
}
input.txt
I like wilma.
this line does not match
output
Matched: |I like <wilma>|
Use of uninitialized value $1 in concatenation (.) or string at ./derp.pl line 14, <$FILE> line 22.
$1 contains ''
I am totally confused. If it is matching and I am checking things in a conditional. Why am I getting an empty result for $1? This isn't supposed to be happening. What am I doing wrong? How can I get 'wilma' to be in $1?
I looked here but this didn't help because I am getting a "match".
You don't have any parentheses in your regex. No parentheses, no $1.
I'm guessing you want the "word" value that ends in -a, so that would be /\b(\w*a)\b/.
Alternatively, since your whole regex only matches the bit you want, you can just use $& instead of $1, like you did in your debug output.
Another example:
my $row = 'I like wilma.';
$row =~ /\b(\w+)\b\s*\b(\w+)\b\s*(\w+)\b/;
print join "\n", "\$&='$&'", "\$1='$1'", "\$2='$2'", "\$3='$3'\n";
The above code produces this output:
$&='I like wilma'
$1='I'
$2='like'
$3='wilma'

Perl -- regex issue

I have the following code to get a substring inside an string, I'm using regular expressions but they seem not to work properly. How can I do it?
I have this string:
vlex.es/jurisdictions/ES/search?textolibre=transacciones+banco+de+bogota&translated_textolibre=,300,220,00:00:38,2,0.00%,38.67%,€0.00
and I want to get this substring:
transacciones+banco+de+bogota
The code:
open my $info, $myfile or die "Could not open $myfile: $!";
while (my $line = <$info>) {
if ($line =~ m/textolibre=/) {
my $line =~ m/textolibre=(.*?)&translated/g;
print $1;
}
last if $. == 3521239;
}
close $info;
The errors:
Use of uninitialized value $line in pattern match (m//) at classifier.pl line 10, <$info> line 20007.
Use of uninitialized value $1 in print at classifier.pl line 11, <$info> line 20007.
You are using the wrong tool for the job. You can use the URI module and its URI::QueryParam module to extract the parameters:
use strict;
use warnings;
use URI;
use URI::QueryParam;
my $str = "ivlex.es/jurisdictions/ES/search?textolibre=transacciones+banco+de+bogota&translated_textolibre=,300,220,00:00:38,2,0.00%,38.67%,0.00";
my $u = URI->new($str);
print $u->query_param('textolibre');
Output:
transacciones banco de bogota
the second declaration of $line is erroneous, drop my:
$line =~ m/textolibre=(.*?)&translated/g;
Sorry but I've found the answer. At line 10 my $line =~ m/textolibre=(.*?)&translated/g; I'm declaring the same variable two times so that's why throws the errors.
Thanks!

How do I substitute with an evaluated expression in Perl?

There's a file dummy.txt
The contents are:
9/0/2010
9/2/2010
10/11/2010
I have to change the month portion (0,2,11) to +1, ie, (1,3,12)
I wrote the substitution regex as follows
$line =~ s/\/(\d+)\//\/\1+1\//;
It's is printing
9/0+1/2010
9/2+1/2010
10/11+1/2010
How to make it add - 3 numerically than perform string concat? 2+1??
Three changes:
You'll have to use the e modifier
to allow an expression in the
replacement part.
To make the replacement globally
you should use the g modifier. This is not needed if you've one date per line.
You use $1 on the replacement side, not a backreference
This should work:
$line =~ s{/(\d+)/}{'/'.($1+1).'/'}eg;
Also if your regex contains the delimiter you're using(/ in your case), it's better to choose a different delimiter ({} above), this way you don't have to escape the delimiter in the regex making your regex clean.
this works: (e is to evaluate the replacement string: see the perlrequick documentation).
$line = '8/10/2010';
$line =~ s!/(\d+)/!('/'.($1+1).'/')!e;
print $line;
It helps to use ! or some other character as the delimiter if your regular expression has / itself.
You can also use, from this question in Can Perl string interpolation perform any expression evaluation?
$line = '8/10/2010';
$line =~ s!/(\d+)/!("/#{[$1+1]}/")!e;
print $line;
but if this is a homework question, be ready to explain when the teacher asks you how you reach this solution.
How about this?
$ cat date.txt
9/0/2010
9/2/2010
10/11/2010
$ perl chdate.pl
9/1/2010
9/3/2010
10/12/2010
$ cat chdate.pl
use strict;
use warnings;
open my $fp, '<', "date.txt" or die $!;
while (<$fp>) {
chomp;
my #arr = split (/\//, $_);
my $temp = $arr[1]+1;
print "$arr[0]/$temp/$arr[2]\n";
}
close $fp;
$