How does =~ behave in Perl? - regex

I have the following perl script:
$myVariable = "some value";
//... some code ...
$myVariable =~ s/\+/ /g;
$myVariable =~s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/seg;
//.. some more code ...
Reading from perl documentation, I think that =~ operator returns a boolean value, namely true or false.
My question is: Does the above operations where $myVariable is involved affect its value, or not.
N.B. The $myVariable's value is set in my script to a string that is a result from the $_ variable's split. I think that should not affect the behavior of that operation.
P.S. If you need more code from my script just let me now.

$myVariable is changed, but because you are doing substitutions (s///), not because the result of a match or substitution is a boolean; =~ is not like =, it is like ==. If you want the boolean result of the action, you need to assign it.
$result = $myVariable =~ s/\+/ /g;

We are talking about
leftvalue =~ rightvalue
rightvalue must be one of these things:
m/regexp/
s/regexp/replacement/
tr/regexp/translation/
leftvalue can be anything that is a left value.
The expresion leftvalue =~ rightwalue always evaluates to a boolean value, but this value is not assigned to leftvalue! This boolean value is the value of the expression itself! So you can use it very fine in an if-clause:
if (leftvalue =~ rightvalue) {
// do something
}
m/regexp/ will never change anything. It just tests, if regexp matches on leftvalue.
s/regexp/replacement/ also tests if regexp matches on leftvalue, and if so, it replaces the matching part with replacement. If regexp did match, leftvalue =~ rightvalue is true, otherwise it is false.
tr/regexp/replacement/ analogously the same as s///, but with translation instead of replacement.
So this will work fine:
my #a=('acbc123','aubu123');
foreach (#a) {
if ($_ =~ s/c(\d)/x$1/g;) {
$_ .= 'MATCHED!';
}
}
The results will be:
a[0] = 'acbx123MATCHED!'
the 'c', followed by a digit did match the regular expression. So ist was replaced by 'x' and that digit. And because it matched, the if-statement is true, and 'MATCHED!' is attached to the string.
a[1] = 'aubu123'
The regular expression did not match. Nothing was replaced and the if-statement was false.

The binding operator is just "binds" a target variable to one of the operators. It doesn't affect the value. The substitution operator, s///, normally changes the target value and returns the number of substitutions it made.
my $count = $target =~ s/.../.../;
my $count = ( $target =~ s/.../.../ ); # same thing, clarified with ()
Starting with Perl v5.14, there's a /r flag for the substitution operator that leaves alone the target value, and, instead of returning a count, returns the modified value:
my $modified = $target =~ s/.../.../r;

=~ doesn't quite mean anything by itself, it also needs something on its right to do to the variable on its left.
To see if a variable matches a pattern, you use m// on the right, you'll probably want to use this as a boolean, but you can also use it in other senses. This does not alter $foo:
$foo =~ m/pattern/
To substitute a replacement for a pattern, you use s/// on the right, this alters $foo:
$foo =~ s/pattern/replacement/;
To translate single characters within $foo, you use tr/// on the right, this alters $foo:
$foo =~ tr/abc/def/;

Related

extract string between two dots

I have a string of the following format:
word1.word2.word3
What are the ways to extract word2 from that string in perl?
I tried the following expression but it assigns 1 to sub:
#perleval $vars{sub} = $vars{string} =~ /.(.*)./; 0#
EDIT:
I have tried several suggestions, but still get the value of 1. I suspect that the entire expression above has a problem in addition to parsing. However, when I do simple assignment, I get the correct result:
#perleval $vars{sub} = $vars{string} ; 0#
assigns word1.word2.word3 to variable sub
. has a special meaning in regular expressions, so it needs to be escaped.
.* could match more than intended. [^.]* is safer.
The match operator (//) simply returns true/false in scalar context.
You can use any of the following:
$vars{sub} = $vars{string} =~ /\.([^.]*)\./ ? $1 : undef;
$vars{sub} = ( $vars{string} =~ /\.([^.]*)\./ )[0];
( $vars{sub} ) = $vars{string} =~ /\.([^.]*)\./;
The first one allows you to provide a default if there's no match.
Try:
/\.([^\.]+)\./
. has a special meaning and would need to be escaped. Then you would want to capture the values between the dots, so use a negative character class like ([^\.]+) meaning at least one non-dot. if you use (.*) you will get:
word1.stuff1.stuff2.stuff3.word2 to result in:
stuff1.stuff2.stuff3
But maybe you want that?
Here is my little example, I do find the perl one liners a little harder to read at times so I break it out:
use strict;
use warnings;
if ("stuff1.stuff2.stuff3" =~ m/\.([^.]+)\./) {
my $value = $1;
print $value;
}
else {
print "no match";
}
result
stuff2
. has a special meaning: any character (see the expression between your parentheses)
Therefore you have to escape it (\.) if you search a literal dot:
/\.(.*)\./
You've got to make sure you're asking for a list when you do the search.
my $x= $string =~ /look for (pattern)/ ;
sets $x to 1
my ($x)= $string =~ /look for (pattern)/ ;
sets $x to pattern.

Why can't I store a regexp in a variable?

Given the following code,
my $string = "foo";
my $regex = s/foo/bar/;
$string =~ $regex;
print $string, "\n";
I would have expected the output to be bar, however it is foo. Why is that the case, and how can I solve that problem?
Note that in my actual case, the regex is more complicated, and I actually want to store several of them in a hash (so I can write something like $string =~ $rules{$key}).
You're looking for substitution, not only the regex part so I guess compiled regex (qr//) is not what you're looking for,
use strict;
use warnings;
my $string = "foo";
my $regex = sub { $_[0] =~ s/foo/bar/ };
$regex->($string);
print $string, "\n";
Your statement
my $regex = s/foo/bar/
is equivalent to
my $regex = $_ =~ s/foo/bar/
s/// returns the number of substitutions made, or it returns false (specifically, the empty string). So $regex is now '' or 1 (it could be more if the /g modifier was in effect) and
$string =~ $regex
is doing 'foo' =~ // or 'foo' =~ /1/ depending on what $_ contained originally.
You can store a regex pattern in a variable but, in your example, the regex is just foo, and there is a lot more going on than just that pattern
The statement s/foo/bar/ is more complex than it seems -- it is a fully-fledged statement that applies a regex pattern to a target string and substitutes a replacement string if the pattern is found. In this case the target string is the default variable $_ and the replacement string is foo. You could think of it as a call to a subroutine
substitute($_, 'foo', 'bar')
and the regex pattern is only the second parameter
What you can do is store a regex pattern. The regex part of that substitution is foo, and you can say
my $pattern = qr/foo/;
s/$pattern/bar/;
But you really should explain the problem that you're trying to solve so that we can help you better
In the assignment, you need to tell Perl not to evaluate the regular expression but just to keep it. This is what qr is for.
But you can't do this with whole substitutions, which is why Сухой27 suggests using a subroutine.

Perl, Assign regex match to scalar

There's an example snippet in Mail::POP3Client in which theres a piece of syntax that I don't understand why or how it's working:
foreach ( $pop->Head( $i ) ) {
/^(From|Subject):\s+/i and print $_, "\n";
}
The regex bit in particular. $_ remains the same after that line but only the match is printed.
An additional question; How could I assign the match of that regex to a scalar of my own so I can use that instead of just print it?
This is actually pretty tricky. What it's doing is making use of perl's short circuiting feature to make a conditional statement. it is the same as saying this.
if (/^(From|Subject):\s+/i) {
print $_;
}
It works because perl stops evaluating and statements after something evaluates to 0. and unless otherwise specified a regex in the form /regex/ instead of $somevar =~ /regex/ will apply the regex to the default variable, $_
you can store it like this
my $var;
if (/^(From|Subject):\s+/i) {
$var = $_;
}
or you could use a capture group
/^((?:From|Subject):\s+)/i
which will store the whole thing into $1

Detect exact string value of scalar in regex matching

Say I have $foo = "bar.baz"
I want to use the scalar $foo to find strings that contain "bar.baz" (anywhere in the string), but not the regex-evaluted version of $foo.
So the line: if( $other =~ m/$foo/ ) ...
isn't working, because $foo is being evaluated such that the '.' is evaluated to any character. How do I stop that?
Pick one:
$foo = quotemeta("bar.baz");
if ($other =~ m/\Q$foo/)
(Both are actually the same thing, just done at different times.)

Match regex and assign results in single line of code

I want to be able to do a regex match on a variable and assign the results to the variable itself. What is the best way to do it?
I want to essentially combine lines 2 and 3 in a single line of code:
$variable = "some string";
$variable =~ /(find something).*/;
$variable = $1;
Is there a shorter/simpler way to do this? Am I missing something?
my($variable) = "some string" =~ /(e\s*str)/;
This works because
If the /g option is not used, m// in list context returns a list consisting of the subexpressions matched by the parentheses in the pattern, i.e., ($1, $2, $3 …).
and because my($variable) = ... (note the parentheses around the scalar) supplies list context to the match.
If the pattern fails to match, $variable gets the undefined value.
Why do you want it to be shorter? Does is really matter?
$variable = $1 if $variable =~ /(find something).*/;
If you are worried about the variable name or doing this repeatedly, wrap the thing in a subroutine and forget about it:
some_sub( $variable, qr/pattern/ );
sub some_sub { $_[0] = $1 if eval { $_[0] =~ m/$_[1]/ }; $1 };
However you implement it, the point of the subroutine is to make it reuseable so you give a particular set of lines a short name that stands in their place.
Several other answers mention a destructive substitution:
( my $new = $variable ) =~ s/pattern/replacement/;
I tend to keep the original data around, and Perl v5.14 has an /r flag that leaves the original alone and returns a new string with the replacement (instead of the count of replacements):
my $match = $variable =~ s/pattern/replacement/r;
Well, you could say
my $variable;
($variable) = ($variable = "find something soon") =~ /(find something).*/;
or
(my $variable = "find something soon") =~ s/^.*?(find something).*/$1/;
You can do substitution as:
$a = 'stackoverflow';
$a =~ s/(\w+)overflow/$1/;
$a is now "stack"
From Perl Cookbook 2nd ed
6.1 Copying and Substituting Simultaneously
$dst = $src;
$dst =~ s/this/that/;
becomes
($dst = $src) =~ s/this/that/;
I just assumed everyone did it this way, amazed that no one gave this answer.
Almost ....
You can combine the match and retrieve the matched value with a substitution.
$variable =~ s/.*(find something).*/$1/;
AFAIK, You will always have to copy the value though, unless you do not care to clobber the original.
$variable2 = "stackoverflow";
(my $variable1) = ($variable2 =~ /stack(\w+)/);
$variable1 now equals "overflow".
I do this:
#!/usr/bin/perl
$target = "n: 123";
my ($target) = $target =~ /n:\s*(\d+)/g;
print $target; # the var $target now is "123"
Also, to amplify the accepted answer using the ternary operator to allow you to specify a default if there is no match:
my $match = $variable =~ /(*pattern*).*/ ? $1 : *defaultValue*;