Detect exact string value of scalar in regex matching - regex

Say I have $foo = "bar.baz"
I want to use the scalar $foo to find strings that contain "bar.baz" (anywhere in the string), but not the regex-evaluted version of $foo.
So the line: if( $other =~ m/$foo/ ) ...
isn't working, because $foo is being evaluated such that the '.' is evaluated to any character. How do I stop that?

Pick one:
$foo = quotemeta("bar.baz");
if ($other =~ m/\Q$foo/)
(Both are actually the same thing, just done at different times.)

Related

extract string between two dots

I have a string of the following format:
word1.word2.word3
What are the ways to extract word2 from that string in perl?
I tried the following expression but it assigns 1 to sub:
#perleval $vars{sub} = $vars{string} =~ /.(.*)./; 0#
EDIT:
I have tried several suggestions, but still get the value of 1. I suspect that the entire expression above has a problem in addition to parsing. However, when I do simple assignment, I get the correct result:
#perleval $vars{sub} = $vars{string} ; 0#
assigns word1.word2.word3 to variable sub
. has a special meaning in regular expressions, so it needs to be escaped.
.* could match more than intended. [^.]* is safer.
The match operator (//) simply returns true/false in scalar context.
You can use any of the following:
$vars{sub} = $vars{string} =~ /\.([^.]*)\./ ? $1 : undef;
$vars{sub} = ( $vars{string} =~ /\.([^.]*)\./ )[0];
( $vars{sub} ) = $vars{string} =~ /\.([^.]*)\./;
The first one allows you to provide a default if there's no match.
Try:
/\.([^\.]+)\./
. has a special meaning and would need to be escaped. Then you would want to capture the values between the dots, so use a negative character class like ([^\.]+) meaning at least one non-dot. if you use (.*) you will get:
word1.stuff1.stuff2.stuff3.word2 to result in:
stuff1.stuff2.stuff3
But maybe you want that?
Here is my little example, I do find the perl one liners a little harder to read at times so I break it out:
use strict;
use warnings;
if ("stuff1.stuff2.stuff3" =~ m/\.([^.]+)\./) {
my $value = $1;
print $value;
}
else {
print "no match";
}
result
stuff2
. has a special meaning: any character (see the expression between your parentheses)
Therefore you have to escape it (\.) if you search a literal dot:
/\.(.*)\./
You've got to make sure you're asking for a list when you do the search.
my $x= $string =~ /look for (pattern)/ ;
sets $x to 1
my ($x)= $string =~ /look for (pattern)/ ;
sets $x to pattern.

perl regular expression using repetition/quantifier

Simple, I am trying to see if a field has 9 digits and nothing else.
my $var = 123456789
( my $nine ) = ( $var =~ /\d{9}/ );
from my understanding this says, "match a digit 9 times and nothing else"
this outputs 1 but not the 123456789 that i was expecting. Why?
Your pattern matches a sequence of nine (international) digit character anywhere in the string. The 1 you are seeing is a true value that the pattern match returns to say that the match was successful.
If you just want to verify that the contents of a variable are exactly nine ASCII digits, then you should write
if ( $var =~ /\A[0-9]{9}\z/ ) { ... }
or, if you have the ASCII /a modifier available (any version of Perl 5 since 14) then you can say
if ( $var =~ /\A\d{9}\z/a ) { ... }
There is nothing coming back in list context within the regex,
so the default is that it sets your variable $nine to the scalar result
of the regex function call. And that function call returns in scalar context
the number of matches.
If you were to change that to (my $nine) = ($var =~ /\d/g);
$nine would contain 9.
Add a capture buffer around the digits.
ie:
( my $nine ) = ( $var =~ /(\d{9})/ );
However, even though your syntax for your assignment will work,
its conventional to write it as
my ($nine) = $var =~ /(\d{9})/;
You really want this:
( my $nine ) = ( $var =~ /(\d{9})/ );
The problem is that =~ is a comparison operator, and the result of the expression ( $var =~ /\d{9}/ ) is true (1). Adding the parentheses in my example makes the regex capture its matches, so it returns your 123456789.

How does =~ behave in Perl?

I have the following perl script:
$myVariable = "some value";
//... some code ...
$myVariable =~ s/\+/ /g;
$myVariable =~s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/seg;
//.. some more code ...
Reading from perl documentation, I think that =~ operator returns a boolean value, namely true or false.
My question is: Does the above operations where $myVariable is involved affect its value, or not.
N.B. The $myVariable's value is set in my script to a string that is a result from the $_ variable's split. I think that should not affect the behavior of that operation.
P.S. If you need more code from my script just let me now.
$myVariable is changed, but because you are doing substitutions (s///), not because the result of a match or substitution is a boolean; =~ is not like =, it is like ==. If you want the boolean result of the action, you need to assign it.
$result = $myVariable =~ s/\+/ /g;
We are talking about
leftvalue =~ rightvalue
rightvalue must be one of these things:
m/regexp/
s/regexp/replacement/
tr/regexp/translation/
leftvalue can be anything that is a left value.
The expresion leftvalue =~ rightwalue always evaluates to a boolean value, but this value is not assigned to leftvalue! This boolean value is the value of the expression itself! So you can use it very fine in an if-clause:
if (leftvalue =~ rightvalue) {
// do something
}
m/regexp/ will never change anything. It just tests, if regexp matches on leftvalue.
s/regexp/replacement/ also tests if regexp matches on leftvalue, and if so, it replaces the matching part with replacement. If regexp did match, leftvalue =~ rightvalue is true, otherwise it is false.
tr/regexp/replacement/ analogously the same as s///, but with translation instead of replacement.
So this will work fine:
my #a=('acbc123','aubu123');
foreach (#a) {
if ($_ =~ s/c(\d)/x$1/g;) {
$_ .= 'MATCHED!';
}
}
The results will be:
a[0] = 'acbx123MATCHED!'
the 'c', followed by a digit did match the regular expression. So ist was replaced by 'x' and that digit. And because it matched, the if-statement is true, and 'MATCHED!' is attached to the string.
a[1] = 'aubu123'
The regular expression did not match. Nothing was replaced and the if-statement was false.
The binding operator is just "binds" a target variable to one of the operators. It doesn't affect the value. The substitution operator, s///, normally changes the target value and returns the number of substitutions it made.
my $count = $target =~ s/.../.../;
my $count = ( $target =~ s/.../.../ ); # same thing, clarified with ()
Starting with Perl v5.14, there's a /r flag for the substitution operator that leaves alone the target value, and, instead of returning a count, returns the modified value:
my $modified = $target =~ s/.../.../r;
=~ doesn't quite mean anything by itself, it also needs something on its right to do to the variable on its left.
To see if a variable matches a pattern, you use m// on the right, you'll probably want to use this as a boolean, but you can also use it in other senses. This does not alter $foo:
$foo =~ m/pattern/
To substitute a replacement for a pattern, you use s/// on the right, this alters $foo:
$foo =~ s/pattern/replacement/;
To translate single characters within $foo, you use tr/// on the right, this alters $foo:
$foo =~ tr/abc/def/;

Why does my regular expression fail with certain substitutions?

I am new to perl and not sure how to achieve the following.
I am reading a file and putting the lines in a variable called $tline. Next, I am trying to replace some character from the $tline.
This substitution fails if $tline has some special characters like (, ?,= etc in it. How to escape the special characters from this variable $tline?
if ($tline ne "") {
$tline =~ s/\//\%;
}
EDIT
Sorry for the confusions. Here is what I am trying to do.
$tline =~ s/"\//"\<\%\=request\.getContextPath\(\)\%\>\//;
This is working for most of the cases. But when the input file has ? in it, it is failing.
How about:
$tline =~ s/\Q$var\E/;
That will cause quotemeta to be applied to contents of $var which is being used as the pattern.
This isn't a valid regex:
$tline =~ s/\//\%;
It gets read like this to perl
$tline =~ s/a/%;
Where a = /
What you wanted to do is replace a forward-slash with a percent sign you probably want
$tline =~ s/\//%/;
Which is better written like this:
$tline =~ s,/,%,;
You probably also want to replace more than just the first forward-slash, so you want the /g flag:
$tline =~ s,/,%,g;
And, this exactly what tr (transliteration) does:
$tline =~ tr,/,%,;
UPDATE I think what you want is a simple quotemeta() which takes your input, and regex-escapes the meta characters
$ perl -e'print quotemeta("</foo?>")'
\<\/foo\?\>
You could place all your special characters between square brackets (called a "character class"). The following will replace all left parentheses, question marks and equal signs in your string with percent signs:
my $tline = 'fo(?=o';
$tline =~ s/[(?=]/%/g;
print "$tline\n";
Prints:
fo%%%o
quotemeta is a good function for getting a exact literal with special characters into a regex. And \Q and \E are good operators for doing the same thing inside the regex.
However, you're search expression is not that complex. In your edit, you're simply looking for a double quote and a slash. In fact, I've quite simplified your expression so it contains not a single backslash. So it's not a problem for quotemeta nor for that matter \Q and \E.
Once pared down, I don't see anything in your revised substitution that would cause a problem with '?' in $tline.
Key to the simplification is that '.', '(', and ')' mean nothing special to the replacement section of your expression, so this is equivalent:
$tline =~ s/"\//"<%=request.getContextPath()%>\//;
Not to mention easier to read. Of course this is even easier:
$tline =~ s|"/|"<%=request.getContextPath()%>/|;
Because in Perl, you can choose the delimiter you wish with the s operator.
But with any of these, this works:
use Test::More tests => 1;
my $tline = '"/?"';
$tline =~ s|"/|"<%=request.getContextPath()%>/|;
ok( $tline =~ /getContextPath/ );
It passes the test. Perhaps you're having a problem with more than one substitution on a line. That can be fixed with:
$tline =~ s|"/|"<%=request.getContextPath()%>/|g;
That g is the global switch on the end, saying make this substitution for as many times as it occurs in the input.
However, since I can see what you are doing, I suggest an even tighter specification of what you want to search:
$tline =~ s~\b(href|link|src)="/~$1="<%=2request.getContextPath()%>/~g;
And when I run this:
use Test::More tests => 2;
my $tline = '"/?"';
$tline =~ s/"\//"<%=request.getContextPath()%>\//;
ok( $tline =~ /getContextPath/ );
$tline = 'src="/?/?/beer"';
ok( $tline =~ s~\b(href|link|src)="/~$1="<%=request.getContextPath()%>/~g
);
I get two successes.
Your true problem is yet unspecified.
Well, one way to do it is to put all the characters you want to replace in square brackets. Like so:
$string =~ s/[,?=\/]//; # This will remove the first ',', '?', '=', or '/' from your string.
If you want to remove all the '?' in a string, for example, use a g on the end of it like so:
$string =~ s/[?]//g;
I'm a little rusty, but I believe that you only need a '\' in front of \ or /, (and of course the other special characters like \n,\t, etc...). Like so:
$string =~ s/[\\]/[\/]/g; # Switch from DOS to Unix delimiters.
$string =~ s/[\n\t]//g; # Remove all newlines and tabs
As others have said, the code you've posted isn't going to work since you forgot the last /. That's another nice reason to keep the "weird" characters in a box.

How do I remove all hyphens with a Perl regex?

I thought this would have done it...
$rowfetch = $DBS->{Row}->GetCharValue("meetdays");
$rowfetch = /[-]/gi;
printline($rowfetch);
But it seems that I'm missing a small yet critical piece of the regex syntax.
$rowfetch is always something along the lines of:
------S
-M-W---
--T-TF-
etc... to represent the days of the week a meeting happens
$rowfetch =~ s/-//gi
That's what you need for your second line there. You're just finding stuff, not actually changing it without the "s" prefix.
You also need to use the regex operator "=~" for this.
Here is what your code presently does:
# Assign 'rowfetch' to the value fetched from:
# The function 'GetCharValue' which is a method of:
# An Value in A Hash Identified by the key "Row" in:
# Either a Hash-Ref or a Blessed Hash-Ref
# Where 'GetCharValue' is given the parameter "meetdays"
$rowfetch = $DBS->{Row}->GetCharValue("meetdays");
# Assign $rowfetch to the number of times
# the default variable ( $_ ) matched the expression /[-]/
$rowfetch = /[-]/gi;
# Print the number of times.
printline($rowfetch);
Which is equivalent to having written the following code:
$rowfetch = ( $_ =~ /[-]/ )
printline( $rowfetch );
The magic you are looking for is the
=~
Token instead of
=
The former is a Regex operator, and the latter is an assignment operator.
There are many different regex operators too:
if( $subject =~ m/expression/ ){
}
Will make the given codeblock execute only if $subject matches the given expression, and
$subject =~ s/foo/bar/gi
Replaces ( s/) all instances of "foo" with "bar", case-insentitively (/i), and repeating the replacement more than once(/g), on the variable $subject.
Using the tr operator is faster than using a s/// regex substitution.
$rowfetch =~ tr/-//d;
Benchmark:
use Benchmark qw(cmpthese);
my $s = 'foo-bar-baz-blee-goo-glab-blech';
cmpthese(-5, {
trd => sub { (my $a = $s) =~ tr/-//d },
sub => sub { (my $a = $s) =~ s/-//g },
});
Results on my system:
Rate sub trd
sub 300754/s -- -79%
trd 1429005/s 375% --
Off-topic, but without the hyphens, how will you know whether a "T" is Tuesday or Thursday?