how to use variable in regex subsitution with special characters [duplicate] - regex

$text_to_search = "example text with [foo] and more";
$search_string = "[foo]";
if ($text_to_search =~ m/$search_string/)
print "wee";
Please observe the above code. For some reason I would like to find the text "[foo]" in the $text_to_search variable and print "wee" if I find it. To do this I would have to ensure that the [ and ] is substituted with [ and ] to make Perl treat it as characters instead of operators.
How can I do this without having to first replace [ and ] with \[ and \] using a s/// expression?

Use \Q to autoescape any potentially problematic characters in your variable.
if($text_to_search =~ m/\Q$search_string/) print "wee";
Update: To clarify how this works...
The \Q will turn on "autoescaping" of special characters in the regex. That means that any characters which would otherwise have a special meaning inside the match operator (for example, *, ^ or [ and ]) will have a \ inserted before them so their special meaning is switched off.
The autoescaping is in effect until one of two situations occurs. Either a \E is found in the string or the end of the string is reached.
In my example above, there was no need to turn off the autoescaping, so I omitted the \E. If you need to use regex metacharacters later in the regex, then you'll need to use \E.

Use the quotemeta function:
$text_to_search = "example text with [foo] and more";
$search_string = quotemeta "[foo]";
print "wee" if ($text_to_search =~ /$search_string/);

You can use quotemeta (\Q \E) if your Perl is version 5.16 or later, but if below you can simply avoid using a regular expression at all.
For example, by using the index command:
if (index($text_to_search, $search_string) > -1) {
print "wee";
}

Related

Extract first word after specific word

I'm having difficulty writing a Perl program to extract the word following a certain word.
For example:
Today i'm not going anywhere except to office.
I want the word after anywhere, so the output should be except.
I have tried this
my $words = "Today i'm not going anywhere except to office.";
my $w_after = ( $words =~ /anywhere (\S+)/ );
but it seems this is wrong.
Very close:
my ($w_after) = ($words =~ /anywhere\s+(\S+)/);
^ ^ ^^^
+--------+ |
Note 1 Note 2
Note 1: =~ returns a list of captured items, so the assignment target needs to be a list.
Note 2: allow one or more blanks after anywhere
In Perl v5.22 and later, you can use \b{wb} to get better results for natural language. The pattern could be
/anywhere\b{wb}.+?\b{wb}(.+?\b{wb})/
"wb" stands for word break, and it will account for words that have apostrophes in them, like "I'll", that plain \b doesn't.
.+?\b{wb}
matches the shortest non-empty sequence of characters that don't have a word break in them. The first one matches the span of spaces in your sentence; and the second one matches "except". It is enclosed in parentheses, so upon completion $1 contains "except".
\b{wb} is documented most fully in perlrebackslash
First, you have to write parentheses around left side expression of = operator to force array context for regexp evaluation. See m// and // in perlop documentation.[1] You can write
parentheses also around =~ binding operator to improve readability but it is not necessary because =~ has pretty high priority.
Use POSIX Character Classes word
my ($w_after) = ($words =~ / \b anywhere \W+ (\w+) \b /x);
Note I'm using x so whitespaces in regexp are ignored. Also use \b word boundary to anchor regexp correctly.
[1]: I write my ($w_after) just for convenience because you can write my ($a, $b, $c, #rest) as equivalent of (my $a, my $b, my $c, my #rest) but you can also control scope of your variables like (my $a, our $UGLY_GLOBAL, local $_, #_).
This Regex to be matched:
my ($expect) = ($words=~m/anywhere\s+([^\s]+)\s+/);
^\s+ the word between two spaces
Thanks.
If you want to also take into consideration the punctuation marks, like in:
my $words = "Today i'm not going anywhere; except to office.";
Then try this:
my ($w_after) = ($words =~ /anywhere[[:punct:]|\s]+(\S+)/);

Why `stoutest` is not a valid regular expression?

From perlop:
If "/" is the delimiter then the initial m is optional. With the m you can use any pair of non-whitespace characters as delimiters. This is particularly useful for matching path names that contain "/", to avoid LTS (leaning toothpick syndrome). If "?" is the delimiter, then the match-only-once rule of ?PATTERN? applies. If "'" is the delimiter, no interpolation is performed on the PATTERN. When using a character valid in an identifier, whitespace is required after the m.
So I can pick up any letter as a delimiter. Eventually this regex should be fine:
stoutest
That can be rewritten
s/ou/es/
However it does not seems to work in Perl. Why?
$ perl -e '$_ = qw/ou/; stoutest; print'
ou
Because Perl can't pick out the operator s
perldoc perlop says this
Any non-whitespace delimiter may replace the slashes. Add space after the s when using a character allowed in identifiers.
This program works fine
my $s = 'bout';
$s =~ s toutest;
say $s;
output
best
Because stoutest, or any other string of alphanumeric characters, is a single token in the eyes of the Perl parser. Otherwise we couldn't use any barewords that begin with s (or m, or q, or y).
This works, though
$_ = "ou";
s toutest;
print
The substitute operator starts with an s identifier, and you code doesn't have one. Gotta use
s toutest
If it worked the way you think, we couldn't have any operators or subroutines that start with m, s, tr, q or y since all of them can be followed by any non-whitespace delimiter.
Ironically, your very own code proves demonstrates why it can't be the way you think. If it worked the way you think
$_ = qw/ou/; stoutest; print
wouldn't be equivalent to
$_ = qw/ou/; s/ou/es/; print
It would be equivalent to
$_ = q'/ou/; stoutest; print
aka
$_ = '/ou/; stoutest; print

tcl regexp from variable and special characters

I am a bit confused
my input string is " foo/1"
my motivation is to set foo as a variable and regexp it :
set line " foo/1"
set a foo
regexp "\s$a" $line does not work
also I noticed that only if I use curly and giving the exact string braces it works
regexp {\sfoo} $line works
regexp "\sfoo" $line doesnt work
can somebody explain why?
thanks
Quick answer:
"\\s" == {\s}
Long answer:
In Tcl, if you type a string using "" for enclosing it, everything inside will be evaluated first and then used as a string. This means that \s is evaluated (interpreted) as an escape character, instead of two characters.
If you want to type \ character inside "" string you have to escape it as well: \\. In your case you would have to type "\\sfoo".
In case of {} enclosed strings, they are always quoted, no need for repeated backslash.
Using "" is good if you want to use variables or inline commands in the string, for example:
puts "The value $var and the command result: [someCommand $arg]"
The above will evaluate $var and [someCommand $arg] and put them into the string.
If you'd have used braces, for example:
puts {The value $var and the command result: [someCommand $arg]}
The string will not be evaluated. It will contain all the $ and [ characters, just like you typed them.

how do you match two strings in two different variables using regular expressions?

$a='program';
$b='programming';
if ($b=~ /[$a]/){print "true";}
this is not working
thanks every one i was a little confused
The [] in regex mean character class which match any one of the character listed inside it.
Your regex is equivalent to:
$b=~ /[program]/
which returns true as character p is found in $b.
To see if the match happens or not you are printing true, printing true will not show anything. Try printing something else.
But if you wanted to see if one string is present inside another you have to drop the [..] as:
if ($b=~ /$a/) { print true';}
If variable $a contained any regex metacharacter then the above matching will fail to fix that place the regex between \Q and \E so that any metacharacters in the regex will be escaped:
if ($b=~ /\Q$a\E/) { print true';}
Assuming either variable may come from external input, please quote the variables inside the regex:
if ($b=~ /\Q$a\E/){print true;}
You then won't get burned when the pattern you'll be looking for will contain "reserved characters" like any of -[]{}().
(apart the missing semicolons:) Why do you put $a in square brackets? This makes it a list of possible characters. Try:
$b =~ /\Q${a}\E/
Update
To answer your remarks regarding = and =~:
=~ is the matching operator, and specifies the variable to which you are applying the regex ($b) in your example above. If you omit =~, then Perl will automatically use an implied $_ =~.
The result of a regular expression is an array containing the matches. You usually assign this so an array, such as in ($match1, $match2) = $b =~ /.../;. If, on the other hand, you assign the result to a scalar, then the scalar will be assigned the number of elements in that array.
So if you write $b = /\Q$a\E/, you'll end up with $b = $_ =~ /\Q$a\E/.
$a='program';
$b='programming';
if ( $b =~ /\Q$a\E/) {
print "match found\n";
}
If you're just looking for whether one string is contained within another and don't need to use any character classes, quantifiers, etc., then there's really no need to fire up the regex engine to do an exact literal match. Consider using index instead:#!/usr/bin/env perl
#!/usr/bin/env perl
use strict;
use warnings;
my $target = 'program';
my $string = 'programming';
if (index($string, $target) > -1) {
print "target is in string\n";
}

How can I convert a string into a regular expression that matches itself in Perl?

How can I convert a string to a regular expression that matches itself in Perl?
I have a set of strings like these:
Enter your selection:
Enter Code (Navigate, Abandon, Copy, Exit, ?):
and I want to convert them to regular expressions sop I can match something else against them. In most cases the string is the same as the regular expression, but not in the second example above because the ( and ? have meaning in regular expressions. So that second string needs to be become an expression like:
Enter Code \(Navigate, Abandon, Copy, Exit, \?\):
I don't need the matching to be too strict, so something like this would be fine:
Enter Code .Navigate, Abandon, Copy, Exit, ..:
My current thinking is that I could use something like:
s/[\?\(\)]/./g;
but I don't really know what characters will be in the list of strings and if I miss a special char then I might never notice the program is not behaving as expected. And I feel that there should exist a general solution.
Thanks.
As Brad Gilbert commented use quotemeta:
my $regex = qr/^\Q$string\E$/;
or
my $quoted = quotemeta $string;
my $regex2 = qr/^$quoted$/;
There is a function for that quotemeta.
quotemeta EXPR
Returns the value of EXPR
with all non-"word" characters
backslashed. (That is, all characters
not matching /[A-Za-z_0-9]/ will be
preceded by a backslash in the
returned string, regardless of any
locale settings.) This is the internal
function implementing the \Q escape in
double-quoted strings.
If EXPR is omitted, uses $_.
From http://www.regular-expressions.info/characters.html :
there are 11 characters with special meanings: the opening square bracket [, the backslash \, the caret ^, the dollar sign $, the period or dot ., the vertical bar or pipe symbol |, the question mark ?, the asterisk or star *, the plus sign +, the opening round bracket ( and the closing round bracket )
In Perl (and PHP) there is a special function quotemeta that will escape all these for you.
To put Brad Gilbert's suggestion into an answer instead of a comment, you can use quotemeta function. All credit to him
Why use a regular expression at all? Since you aren't doing any capturing and it seems you will not be going to allow for any variations, why not simply use the index builtin?
$s1 = 'hello, (world)?!';
$s2 = 'he said "hello, (world)?!" and nothing else.';
if ( -1 != index $s2, $s1 ) {
print "we've got a match\n";
}
else {
print "sorry, no match.\n";
}