Perl string replace not working with $1 and $2 - regex

Search and replace is not working when I use $1 and $2 defined earlier.
It works when I store it in a new variable.
Does not work as intended.
perl -e'
my $name = "start middle end";
my $rep = "";
my $orig = "";
if ($name =~ /sta(.*?)\s\w+\s(.*)/) {
$orig = $1;
$rep = $2;
$name =~ s/$1/$2/;
print "$name\n";
}
'
sta middle end
Is it because $1 and $2 are getting replaced in the new $name =~ I am doing?
Works as intended.
perl -e'
my $name = "start middle end";
my $rep = "";
my $orig = "";
if ($name =~ /sta(.*?)\s\w+\s(.*)/) {
$orig = $1;
$rep = $2;
$name =~ s/$orig/${rep}/;
print "$name\n";
}
'
staend middle end
Is there a better one liner to do this? I do not want to define new variables.

The capture variables are reset by running the match in the first part of the s/// operator, for the replacement to use. The m// operator in list context will return the captured values so you can easily assign them there. Also you may want to use \Q (quotemeta) if your search string is not a regex.
perl -e'
my $name = "start middle end";
if (my ($orig, $rep) = $name =~ /sta(.*?)\s\w+\s(.*)/) {
$name =~ s/\Q$orig/$rep/;
print "$name\n";
}
'
sta middle end

Yes, the new successful regex match replaces $1 and $2.
You could avoid the global vars entirely as follows:
perl -e'
my $name = "start middle end";
if ( my ($orig, $rep) = $name =~ /sta(.*?)\s\w+\s(.*)/ ) {
$name =~ s/\Q$orig/$rep/;
CORE::say $name;
}
'
Better yet, you could avoid doing two matches as follows:
perl -e'
my $name = "start middle end";
if ( $name =~ s/sta\K.*?(?=\s\w+\s(.*))/$1/ ) {
CORE::say $name;
}
'
However, I'd use the following:
perl -e'
my $name = "start middle end";
if ( (my ($prefix, $suffix, $foo) = $name =~ /^(.*?sta).*?(\s\w+\s(.*))/ ) {
CORE::say "$prefix$foo$suffix";
}
'
Note that your code suffered from a code injection bug which I fixed using quotemeta (as \Q).

Here, just in case, we'd have had unexpected extra spaces, we could also try this expression:
(sta)([a-z]*)\s+(\w+)\s+(.+)
It's just another option.
TEST
perl -e'
my $name = "start middle end";
$name =~ s/(sta)([a-z]*)\s+(\w+)\s+(.+)/$1$4 $3 $4/;
print "$name\n";
'
OUTPUT
staend middle end
Please see the demo here

$2 in the replacement part refers to the capture group from the pattern part of the same substitution. Therefore, you only need one variable to remember $2.
perl -lwe '$_ = "start middle end" ; if (/sta(.*?)\s\w+\s(.*)/) {my $rep = $2; s/$1/$rep/; print}'
staend middle end

You can avoid other variables by using the last match start and end global arrays #- and #+ and just doing a substring replace:
my $name = "start middle end";
if ($name =~ /sta(.*?)\s\w+\s(.*)/) {
substr($name, $-[1], $+[1]-$-[1], $2);
print "$name\n";
}
See the entry for #- in perldoc perlvar

The regex capture variables exhibit strange behavior depending on code
flow, function calls and other stuff.
To fully explain and wrap a head around this requires a few pages
of explanation.
As for now, avoid the whole mess and just use a single regex
perl -e'
my $name = "start middle end";
$name =~ s/^(sta)(.*?)(\s\w+\s)(.*)/$1$4$3$4/;
print "$name\n";
'

Related

match using regex in perl

HI I am trying to exract some data from a text file in perl. My file looks like this
Name:John
FirstName:Smith
Name:Alice
FirstName:Meyers
....
I want my string to look like John Smith and Alice Meyers
I tried something like this but I'm stuck and I don't know how to continue
while (<INPUT>) {
if (/^[Name]/) {
$match =~ /(:)(.*?)(\n) /
$string = $string.$2;
}
if (/^[FirstName]/) {
$match =~ /(:)(.*?)(\n)/
$string = $string.$2;
}
}
What I try to do is that when I match Name or FirstName to copy to content between : and \n but I get confused which is $1 and $2
This will put you first and last names in a hash:
use strict;
use warnings;
use Data::Dumper;
open my $in, '<', 'in.txt';
my (%data, $names, $firstname);
while(<$in>){
chomp;
($names) = /Name:(.*)/ if /^Name/;
($firstname) = /FirstName:(.*)/ if /^FirstName/;
$data{$names} = $firstname;
}
print Dumper \%data;
Through perl one-liner,
$ perl -0777 -pe 's/(?m).*?Name:([^\n]*)\nFirstName:([^\n]*).*/\1 \2/g' file
John Smith
Alice Meyers
while (<INPUT>) {
/^([A-Za-z])+\:\s*(.*)$/;
if ($1 eq 'Name') {
$surname = $2;
} elsif ($1 eq 'FirstName') {
$completeName = $2 . " " . $surname;
} else {
/* Error */
}
}
You might want to add some error handling, e.g. make sure that a Name is always followed by a FirstName and so on.
$1 $2 $3 .. $N , it's the capture result of () inside regex.
If you do something like that , you cant avoid using $1 like variables.
my ($matched1,$matched2) = $text =~ /(.*):(.*)/
my $names = [];
my $name = '';
while(my $row = <>){
$row =~ /:(.*)/;
$name = $name.' '.$1;
push(#$names,$name) if $name =~ / /;
$name = '' if $name =~ / /;
}
`while(<>){
}
`
open (FH,'abc.txt');
my(%hash,#array);
map{$_=~s/.*?://g;chomp($_);push(#array,$_)} <FH>;
%hash=#array;
print Dumper \%hash;

How to replace a variable with another variable in PERL?

I am trying to replace all words from a text except some that I have in an array. Here's my code:
my $text = "This is a text!And that's some-more text,text!";
while ($text =~ m/([\w']+)/g) {
next if $1 ~~ #ignore_words;
my $search = $1;
my $replace = uc $search;
$text =~ s/$search/$replace/e;
}
However, the program doesn't work. Basically I am trying to make all words uppercase but skip the ones in #ignore_words. I know it's a problem with the variables being used in the regular expression, but I can't figure the problem out.
#!/usr/bin/perl
my $text = "This is a text!And that's some-more text,text!";
my #ignorearr=qw(is some);
my %h1=map{$_ => 1}#ignorearr;
$text=~s/([\w']+)/($h1{$1})?$1:uc($1)/ge;
print $text;
On running this,
THIS is A TEXT!AND THAT'S some-MORE TEXT,TEXT!
You can figure the problem out of your code if instead of applying an expression to the same control variable of a while loop, just let s/../../eg do it globally for you:
my $text = "This is a text!And that's some-more text,text!";
my #ignore_words = qw{ is more };
$text =~ s/([\w']+)/$1 ~~ #ignore_words ? $1 : uc($1)/eg;
print $text;
And on running:
THIS is A TEXT!AND THAT'S SOME-more TEXT,TEXT!

How do I use perl regex to extract the digit value from '[1]'?

My code...
$option = "[1]";
if ($option =~ m/^\[\d\]$/) {print "Activated!"; $str=$1;}
I need a way to drop off the square brackets from $option. $str = $1 does not work for some reason. Please advise.
To get $1 to work you need to capture the value inside the brackets using parentheses, i.e:
if ($option =~ m/^\[(\d)\]$/) {print "Activated!"; $str=$1;}
if ($option =~ m/^\[(\d)\]$/) { print "Activated!"; $str=$1; }
Or
if (my ($str) = $option =~ m/^\[(\d)\]$/) { print "Activated!" }
Or
if (my ($str) = $option =~ /(\d)/) { print "Activated!" }
..and a bunch of others. You forgot to capture your match with ()'s.
EDIT:
if ($option =~ /(?<=^\[)\d(?=\]$)/p && (my $str = ${^MATCH})) { print "Activated!" }
Or
my $str;
if ($option =~ /^\[(\d)(?{$str = $^N})\]$/) { print "Activated!" }
Or
if ($option =~ /^\[(\d)\]$/ && ($str = $+)) { print "Activated!" }
For ${^MATCH}, $^N, and $+, perlvar.
I love these questions : )

How can I count the amount of spaces at the start of a string in Perl?

How can I count the amount of spaces at the start of a string in Perl?
I now have:
$temp = rtrim($line[0]);
$count = ($temp =~ tr/^ //);
But that gives me the count of all spaces.
$str =~ /^(\s*)/;
my $count = length( $1 );
If you just want actual spaces (instead of whitespace), then that would be:
$str =~ /^( *)/;
Edit: The reason why tr doesn't work is it's not a regular expression operator. What you're doing with $count = ( $temp =~ tr/^ // ); is replacing all instances of ^ and with itself (see comment below by cjm), then counting up how many replacements you've done. tr doesn't see ^ as "hey this is the beginning of the string pseudo-character" it sees it as "hey this is a ^".
You can get the offset of a match using #-. If you search for a non-whitespace character, this will be the number of whitespace characters at the start of the string:
#!/usr/bin/perl
use strict;
use warnings;
for my $s ("foo bar", " foo bar", " foo bar", " ") {
my $count = $s =~ /\S/ ? $-[0] : length $s;
print "'$s' has $count whitespace characters at its start\n";
}
Or, even better, use #+ to find the end of the whitespace:
#!/usr/bin/perl
use strict;
use warnings;
for my $s ("foo bar", " foo bar", " foo bar", " ") {
$s =~ /^\s*/;
print "$+[0] '$s'\n";
}
Here's a script that does this for every line of stdin. The relevant snippet of code is the first in the body of the loop.
#!/usr/bin/perl
while ($x = <>) {
$s = length(($x =~ m/^( +)/)[0]);
print $s, ":", $x, "\n";
}
tr/// is not a regex operator. However, you can use s///:
use strict; use warnings;
my $t = (my $s = " \t\n sdklsdjfkl");
my $n = 0;
++$n while $s =~ s{^\s}{};
print "$n \\s characters were removed from \$s\n";
$n = ( $t =~ s{^(\s*)}{} ) && length $1;
print "$n \\s characters were removed from \$t\n";
Since the regexp matcher returns the parenthesed matches when called in a list context, CanSpice's answer can be written in a single statement:
$count = length( ($line[0] =~ /^( *)/)[0] );
This prints amount of white space
echo " hello" |perl -lane 's/^(\s+)(.*)+$/length($1)/e; print'
3

What is the scope of $1 through $9 in Perl?

What is the scope of $1 through $9 in Perl? For instance, in this code:
sub bla {
my $x = shift;
$x =~ s/(\d*)/$1 $1/;
return $x;
}
my $y;
# some code that manipulates $y
$y =~ /(\w*)\s+(\w*)/;
my $z = &bla($2);
my $w = $1;
print "$1 $2\n";
What will $1 be? Will it be the first \w* from $x or the first \d* from the second \w* in $x?
from perldoc perlre
The numbered match variables ($1, $2, $3, etc.) and the related punctuation set ($+ , $& , $` , $' , and $^N ) are all dynamically scoped until the end of the enclosing block or until the next successful match, whichever comes first. (See ""Compound Statements"" in perlsyn.)
This means that the first time you run a regex or substitution in a scope a new localized copy is created. The original value is restored (à la local) when the scope ends. So, $1 will be 10 up until the regex is run, 20 after the regex, and 10 again when the subroutine is finished.
But I don't use regex variables outside of substitutions. I find much clearer to say things like
#!/usr/bin/perl
use strict;
use warnings;
sub bla {
my $x = shift;
$x =~ s/(\d*)/$1 $1/;
return $x;
}
my $y = "10 20";
my ($first, $second) = $y =~ /(\w*)\s+(\w*)/;
my $z = &bla($second);
my $w = $first;
print "$first $second\n";
where $first and $second have better names that describe their contents.
By making a couple of small alterations to your example code:
sub bla {
my $x = shift;
print "$1\n";
$x =~ s/(\d+)/$1 $1/;
return $x;
}
my $y = "hello world9";
# some code that manipulates $y
$y =~ /(\w*)\s+(\w*)/;
my $z = &bla($2);
my $w = $1;
print "$1 $2\n$z\n";
we get the following output:
hello
hello world9
world9 9
showing that the $1 is limited to the dynamic scope (ie the $1 assigned within bla ceases to exist at the end of that function (but the $1 assigned from the $y regex is accessible within bla until it is overwritten))
The variables will be valid until the next time they are written to in the flow of execution.
But really, you should be using something like:
my ($match1, match2) = $var =~ /(\d+)\D(\d+)/;
Then use $match1 and $match2 instead of $1 and $2, it's much less ambiguous.