How does =~ behave in matching? - regex

I am confused in the =~ operator. It seems that it returns a value that is true/false of a match. But when applied using a g it returns the actual matches.
Example:
~
$ perl -e '
my $var = "03824531449411615213441829503544272752010217443235";
my #zips = $var =~ /\d{5}/g;
print join "--", #zips;
'
03824--53144--94116--15213--44182--95035--44272--75201--02174--43235
$ perl -e '
my $var = "03824531449411615213441829503544272752010217443235";
my #zips = $var =~ /\d{5}/;
print join "--", #zips;
'
1
$ perl -e '
my $var = "03824531449411615213441829503544272752010217443235";
my $zips = $var =~ /\d{5}/;
print join "--", $zips;
'
1
So how does this work? Why does it return true/false in non-g mode?Or is it something else?

perlop already given a pretty clear explanation for this, so I will just copy & paste related part of it:
For =~ operator:
Binary "=~" binds a scalar expression to a pattern match. ... When used in scalar context, the return value generally indicates the success of the operation. ... Behavior in list context depends on the particular operator. See Regexp Quote-Like Operators for details and perlretut for examples using these operators.
For m// operator:
Searches a string for a pattern match, and in scalar context returns true if it succeeds, false if it fails.
For m// without /g modifier in list context:
If the /g option is not used, m// in list context returns a list consisting of the subexpressions matched by the parentheses in the pattern, that is, ($1, $2, $3 ...). When there are no parentheses in the pattern, the return value is the list (1) for success. With or without parentheses, an empty list is returned upon failure.
For m// with /g modifier in list context:
The /g modifier specifies global pattern matching--that is, matching as many times as possible within the string. How it behaves depends on the context. In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern.
In scalar context, each execution of m//g finds the next match, returning true if it matches, and false if there is no further match.
Context of expressions in OP:
#zips = $var =~ /\d{5}/g;
m//g in list context;
#zips = $var =~ /\d{5}/;
m// in list context;
$zips = $var =~ /\d{5}/;
m// in scalar context.

$var =~ /(\d{5})/; also returns match in list context, it's only that /g does grouping regardless of () braces.

Related

What is the meaning of qr// in perl

I am completely new to perl and trying to design a lexer where I have come across:
my #token_def =
(
[Whitespace => qr{\s+}, 1],
[Comment => qr{#.*\n?$}m, 1],
);
and even after going through multiple sites I did not understand the meaning.
qr// is one of the quote-like operators that apply to pattern matching and related activities.
From perldoc:
This operator quotes (and possibly compiles) its STRING as a regular expression. STRING is interpolated the same way as PATTERN in m/PATTERN/. If ' is used as the delimiter, no interpolation is done.
From modern_perl:
The qr// operator creates first-class regexes. Interpolate them into the match operator to use them:
my $hat = qr/hat/;
say 'Found a hat!' if $name =~ /$hat/;
... or combine multiple regex objects into complex patterns:
my $hat = qr/hat/;
my $field = qr/field/;
say 'Found a hat in a field!'
if $name =~ /$hat$field/;
like( $name, qr/$hat$field/,
'Found a hat in a field!' );
qr// is documented in perlop in the "Regexp Quote-Like Operators" section.
Just like qq"..." aka "..." allows you to construct a string, qr/.../ allows you to construct a regular expression.
$s = "abc"; # Creates a string and assigns it to $s
$s = qq"abc"; # Same as above.
print("$s\n");
$re = qr/abc/; # Creates a compiled regex pattern and assigns it to $x
print "match\n" if $s =~ /$re/;
The quoting rules for qr/.../ are very similar to qq"..."'s. The only difference is that \ followed by a non-word character are passed through unchanged.

Swapping letters with regexp

How can I swap the letter o with the letter e and e with o?
I just tried this but I don't think this is a good way of doing this. Is there a better way?
my $str = 'Absolute force';
$str =~ s/e/___eee___/g;
$str =~ s/o/e/g;
$str =~ s/___eee___/o/g;
Output: Abseluto ferco
Use the transliteration operator:
$str =~ y/oe/eo/;
E.g.
$ echo "Absolute force" | perl -pe 'y/oe/eo/'
Abseluto ferco
As has already been said, the way to do this is the transliteration operator
tr/SEARCHLIST/REPLACEMENTLIST/cdsr
y/SEARCHLIST/REPLACEMENTLIST/cdsr
Transliterates all occurrences of the characters found in the search list with the corresponding character in the replacement list. It returns the number of characters replaced or deleted. If no string is specified via the =~ or !~ operator, the $_ string is transliterated.
However, I want to commend you on your creative use of regular expressions. Your solution works, although the placeholder string _ee_ would've been sufficient.
tr is only going to help you for character replacements though, so I'd like to quickly teach you how to utilize regular expressions for a more complicated mass replacement. Basically, you just use the /e tag to execute code in the RHS. The following will also do the replacement you were aiming for:
my $str = 'Absolute force';
$str =~ s/([eo])/$1 eq 'e' ? 'o' : 'e'/eg;
print $str;
Outputs:
Abseluto ferco
Note how the LHS (left hand side) matches both o and e, and them the RHS (right hand side) does a test to see which matched and returns the opposite for replacement.
Now, it's common to have a list of words that you want to replace, so it's convenient to just build a hash of your from/to values and then dynamically build the regular expression. The following does that:
my $str = 'Hello, foo. How about baz? Never forget bar.';
my %words = (
foo => 'bar',
bar => 'baz',
baz => 'foo',
);
my $wordlist_re = '(?:' . join('|', map quotemeta, keys %words) . ')';
$str =~ s/\b($wordlist_re)\b/$words{$1}/eg;
Outputs:
Hello, bar. How about foo? Never forget baz.
This above could've worked for your e and o case, as well, but would've been overkill. Note how I use quotemeta to escape the keys in case they contained a regular expression special character. I also intentionally used a non-capturing group around them in $wordlist_re so that variable could be dropped into any regex and behave as desired. I then put the capturing group inside the s/// because it's important to be able to see what's being captured in a regex without having to backtrack to the value of an interpolated variable.
The tr/// operator is best. However, if you wanted to use the s/// operator (to handle more than just single letter substitutions), you could write
$ echo 'Absolute force' | perl -pe 's/(e)|o/$1 ? "o" : "e"/eg'
Abseluto ferco
The capturing parentheses avoid the redundant $1 eq 'e' test in #Miller's answer.
from man sed:
y/source/dest/
Transliterate the characters in the pattern space which appear in source to the corresponding character in dest.
and tr command can do this too:
$ echo "Absolute force" | tr 'oe' 'eo'
Abseluto ferco

Perl regular expression with "[]"

My data looks like:
NC_004415 NC_010199 ([T(trnH ,trnS1 trnL1 ,)])
NC_006131 NC_010199 ([T(trnH ,trnS1 trnL1 ,)])
NC_006355 NC_007231 ([T(trnM ,trnQ ,)])
I want to capture everything between []:
while( my $line = <crex> )
{ $t=$line=~m/(\[.*\])/;
print $t;
}
The output of $t is 1. Why is it not working?
Since you're using a capturing group, you can just use $1 after the match succeeds:
if($line =~ m/(\[.*\])/) {
print $1;
}
$line =~ m/(\[.*\])/ returns a list of the matches in a list context, but you are using it in a scalar context. In a scalar context, the match operator returns a Boolean that indicates whether the match was successful or not. Therefore you get 1. You can use
my ($t) = $line =~ m/(\[.*\])/;
to create a list context, or you can use $1 instead of using $t.
Use parentheses around $t:
($t) = $line =~m/(\[.*\])/;
Refer to perldoc perlretut (Extracting matches).
I believe you are using the match operator (m//) in a scalar context and storing the result in $t. Since the match is successful, m// returns 1. Refer to perldoc perlop.

How does =~ behave in Perl?

I have the following perl script:
$myVariable = "some value";
//... some code ...
$myVariable =~ s/\+/ /g;
$myVariable =~s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/seg;
//.. some more code ...
Reading from perl documentation, I think that =~ operator returns a boolean value, namely true or false.
My question is: Does the above operations where $myVariable is involved affect its value, or not.
N.B. The $myVariable's value is set in my script to a string that is a result from the $_ variable's split. I think that should not affect the behavior of that operation.
P.S. If you need more code from my script just let me now.
$myVariable is changed, but because you are doing substitutions (s///), not because the result of a match or substitution is a boolean; =~ is not like =, it is like ==. If you want the boolean result of the action, you need to assign it.
$result = $myVariable =~ s/\+/ /g;
We are talking about
leftvalue =~ rightvalue
rightvalue must be one of these things:
m/regexp/
s/regexp/replacement/
tr/regexp/translation/
leftvalue can be anything that is a left value.
The expresion leftvalue =~ rightwalue always evaluates to a boolean value, but this value is not assigned to leftvalue! This boolean value is the value of the expression itself! So you can use it very fine in an if-clause:
if (leftvalue =~ rightvalue) {
// do something
}
m/regexp/ will never change anything. It just tests, if regexp matches on leftvalue.
s/regexp/replacement/ also tests if regexp matches on leftvalue, and if so, it replaces the matching part with replacement. If regexp did match, leftvalue =~ rightvalue is true, otherwise it is false.
tr/regexp/replacement/ analogously the same as s///, but with translation instead of replacement.
So this will work fine:
my #a=('acbc123','aubu123');
foreach (#a) {
if ($_ =~ s/c(\d)/x$1/g;) {
$_ .= 'MATCHED!';
}
}
The results will be:
a[0] = 'acbx123MATCHED!'
the 'c', followed by a digit did match the regular expression. So ist was replaced by 'x' and that digit. And because it matched, the if-statement is true, and 'MATCHED!' is attached to the string.
a[1] = 'aubu123'
The regular expression did not match. Nothing was replaced and the if-statement was false.
The binding operator is just "binds" a target variable to one of the operators. It doesn't affect the value. The substitution operator, s///, normally changes the target value and returns the number of substitutions it made.
my $count = $target =~ s/.../.../;
my $count = ( $target =~ s/.../.../ ); # same thing, clarified with ()
Starting with Perl v5.14, there's a /r flag for the substitution operator that leaves alone the target value, and, instead of returning a count, returns the modified value:
my $modified = $target =~ s/.../.../r;
=~ doesn't quite mean anything by itself, it also needs something on its right to do to the variable on its left.
To see if a variable matches a pattern, you use m// on the right, you'll probably want to use this as a boolean, but you can also use it in other senses. This does not alter $foo:
$foo =~ m/pattern/
To substitute a replacement for a pattern, you use s/// on the right, this alters $foo:
$foo =~ s/pattern/replacement/;
To translate single characters within $foo, you use tr/// on the right, this alters $foo:
$foo =~ tr/abc/def/;

Why are the parentheses so important when assigning this regex match?

I have a piece of code:
$s = "<sekar kapoor>";
($name) = $s =~ /<([\S\s]*)>/;
print "$name\n"; # Output is 'sekar kapoor'
If the parentheses are removed in the second line of code like this, in the variable $name:
$name = $s =~ /<([\S\s]*)>/; # $name is now '1'
I don't understand why it behaves like this. Can anyone please explain why it is so?
In your first example you have a list context on the left-hand side (you used parentheses); in the second you have a scalar context - you just have a scalar variable.
See the Perl docs for quote-like ops, Matching in list context:
If the /g option is not used, m// in
list context returns a list consisting
of the subexpressions matched by the
parentheses in the pattern, i.e., ($1
, $2 , $3 ...).
(Note that here $1
etc. are also set, and that this
differs from Perl 4's behavior.)
When there are no parentheses in the
pattern, the return value is the list
(1) for success. With or without
parentheses, an empty list is returned
upon failure.