tcl regexp from variable and special characters - regex

I am a bit confused
my input string is " foo/1"
my motivation is to set foo as a variable and regexp it :
set line " foo/1"
set a foo
regexp "\s$a" $line does not work
also I noticed that only if I use curly and giving the exact string braces it works
regexp {\sfoo} $line works
regexp "\sfoo" $line doesnt work
can somebody explain why?
thanks

Quick answer:
"\\s" == {\s}
Long answer:
In Tcl, if you type a string using "" for enclosing it, everything inside will be evaluated first and then used as a string. This means that \s is evaluated (interpreted) as an escape character, instead of two characters.
If you want to type \ character inside "" string you have to escape it as well: \\. In your case you would have to type "\\sfoo".
In case of {} enclosed strings, they are always quoted, no need for repeated backslash.
Using "" is good if you want to use variables or inline commands in the string, for example:
puts "The value $var and the command result: [someCommand $arg]"
The above will evaluate $var and [someCommand $arg] and put them into the string.
If you'd have used braces, for example:
puts {The value $var and the command result: [someCommand $arg]}
The string will not be evaluated. It will contain all the $ and [ characters, just like you typed them.

Related

how to use variable in regex subsitution with special characters [duplicate]

$text_to_search = "example text with [foo] and more";
$search_string = "[foo]";
if ($text_to_search =~ m/$search_string/)
print "wee";
Please observe the above code. For some reason I would like to find the text "[foo]" in the $text_to_search variable and print "wee" if I find it. To do this I would have to ensure that the [ and ] is substituted with [ and ] to make Perl treat it as characters instead of operators.
How can I do this without having to first replace [ and ] with \[ and \] using a s/// expression?
Use \Q to autoescape any potentially problematic characters in your variable.
if($text_to_search =~ m/\Q$search_string/) print "wee";
Update: To clarify how this works...
The \Q will turn on "autoescaping" of special characters in the regex. That means that any characters which would otherwise have a special meaning inside the match operator (for example, *, ^ or [ and ]) will have a \ inserted before them so their special meaning is switched off.
The autoescaping is in effect until one of two situations occurs. Either a \E is found in the string or the end of the string is reached.
In my example above, there was no need to turn off the autoescaping, so I omitted the \E. If you need to use regex metacharacters later in the regex, then you'll need to use \E.
Use the quotemeta function:
$text_to_search = "example text with [foo] and more";
$search_string = quotemeta "[foo]";
print "wee" if ($text_to_search =~ /$search_string/);
You can use quotemeta (\Q \E) if your Perl is version 5.16 or later, but if below you can simply avoid using a regular expression at all.
For example, by using the index command:
if (index($text_to_search, $search_string) > -1) {
print "wee";
}

What is the type of argument "replacement" in gensub() of GAWK?

The prototype of the function gensub() in GAWK is
gensub(regexp, replacement, how [, target])
According to my observations from examples,
regexp is a regular expression enclosed in slashes
I saw in examples a quoted string is provided to replacement (see the example below).
But it can contain back-references to groups in the matched substring (see the example below), which seems to
me that the type of replacement is a regular expression, and that the quoted string provided to replacement is coerced into a regular expression.
Now I am
confused: what is the type of replacement, a string, or a regular
expression?
Can I give a regular expression enclosed in slashes to
replacement?
E.g., from the same link:
$ gawk '
> BEGIN {
> a = "abc def"
> b = gensub(/(.+) (.+)/, "\\2 \\1", "g", a)
> print b
> }'
-| def abc
Can I replace b = gensub(/(.+) (.+)/, "\\2 \\1", "g", a) with b =
gensub(/(.+) (.+)/, /\2 \1/, "g", a)?
Btw, what does -| def abc mean?
Primarily, replacement is a string with a limited set of metacharacters.
If using a regex as the replacement compiles, then it may be accepted; I'd hate to have to work out what it does.
The -| def abc is mostly just the output from the preceding (illustrative) command. The role of the -| is explained in typographical conventions as a glyph marking output to standard output; most of the other example outputs have that marker before the output. It is not a part of the awk command, anyway. The awk command would generate def abc.
What characters are treated specially?
The manual says (at gensub()):
This is done by using parentheses in the regexp to mark the components and then specifying ‘\N’ in the replacement text, where N is a digit from 1 to 9.
It also mentions 'more than sub and gsub provide), so looking at gsub(), it says:
As in sub(), the characters ‘&’ and ‘\’ are special
and sub() says:
If the special character ‘&’ appears in replacement, it stands for the precise substring that was matched by regexp. … The effect of this special character (‘&’) can be turned off by putting a backslash before it in the string. As usual, to insert one backslash in the string, you must write two backslashes. Therefore, write ‘\&’ in a string constant to include a literal ‘&’ in the replacement.

how do you match two strings in two different variables using regular expressions?

$a='program';
$b='programming';
if ($b=~ /[$a]/){print "true";}
this is not working
thanks every one i was a little confused
The [] in regex mean character class which match any one of the character listed inside it.
Your regex is equivalent to:
$b=~ /[program]/
which returns true as character p is found in $b.
To see if the match happens or not you are printing true, printing true will not show anything. Try printing something else.
But if you wanted to see if one string is present inside another you have to drop the [..] as:
if ($b=~ /$a/) { print true';}
If variable $a contained any regex metacharacter then the above matching will fail to fix that place the regex between \Q and \E so that any metacharacters in the regex will be escaped:
if ($b=~ /\Q$a\E/) { print true';}
Assuming either variable may come from external input, please quote the variables inside the regex:
if ($b=~ /\Q$a\E/){print true;}
You then won't get burned when the pattern you'll be looking for will contain "reserved characters" like any of -[]{}().
(apart the missing semicolons:) Why do you put $a in square brackets? This makes it a list of possible characters. Try:
$b =~ /\Q${a}\E/
Update
To answer your remarks regarding = and =~:
=~ is the matching operator, and specifies the variable to which you are applying the regex ($b) in your example above. If you omit =~, then Perl will automatically use an implied $_ =~.
The result of a regular expression is an array containing the matches. You usually assign this so an array, such as in ($match1, $match2) = $b =~ /.../;. If, on the other hand, you assign the result to a scalar, then the scalar will be assigned the number of elements in that array.
So if you write $b = /\Q$a\E/, you'll end up with $b = $_ =~ /\Q$a\E/.
$a='program';
$b='programming';
if ( $b =~ /\Q$a\E/) {
print "match found\n";
}
If you're just looking for whether one string is contained within another and don't need to use any character classes, quantifiers, etc., then there's really no need to fire up the regex engine to do an exact literal match. Consider using index instead:#!/usr/bin/env perl
#!/usr/bin/env perl
use strict;
use warnings;
my $target = 'program';
my $string = 'programming';
if (index($string, $target) > -1) {
print "target is in string\n";
}

TCL regsub isn't working when the expression has [0]

I tried the following code:
set exp {elem[0]}
set temp {elem[0]}
regsub $temp $exp "1" exp
if {$exp} {
puts "######### 111111111111111 ################"
} else {
puts "########### 0000000000000000 ############"
}
of course, this is the easiest regsub possible (the words match completely), and still it doesnt work, and no substitution is done. if I write elem instead of elem[0], everything works fine.
I tried using {elem[0]}, elem[0], "elem[0]" etc, and none of them worked.
Any clue anyone?
This is the easiest regsub possible (the words match completely)
Actually, no, the words don't match. You see, in a regular expression, square brackets have meaning. Your expression {elem[0]} actually mean:
match the sequence of letters 'e'
followed by 'l'
followed by 'e'
followed by 'm'
followed by '0' (the character for the number zero)
So it would match the string "elem0" not "elem[0]" since the character after 'm' is not '0'.
What you want is {elem\[0\]} <-- backslash escapes special meaning.
Read the manual for tcl's regular expression syntax, re_syntax, for more info on how regular expressions work in tcl.
In addition to #slebetman's answer, if your want any special characters in your regular expression to be treated like plain text, there is special syntax for that:
set word {abd[0]}
set regex $word
regexp $regex $word ;# => returns 0, did not match
regexp "(?q)$regex" $word ;# => returns 1, matched
That (?q) marker must be the first part of the RE.
Also, if you're really just comparing literal strings, consider the simpler if {$str1 eq $str2} ... or the glob-style matching of [string match]

How can I convert a string into a regular expression that matches itself in Perl?

How can I convert a string to a regular expression that matches itself in Perl?
I have a set of strings like these:
Enter your selection:
Enter Code (Navigate, Abandon, Copy, Exit, ?):
and I want to convert them to regular expressions sop I can match something else against them. In most cases the string is the same as the regular expression, but not in the second example above because the ( and ? have meaning in regular expressions. So that second string needs to be become an expression like:
Enter Code \(Navigate, Abandon, Copy, Exit, \?\):
I don't need the matching to be too strict, so something like this would be fine:
Enter Code .Navigate, Abandon, Copy, Exit, ..:
My current thinking is that I could use something like:
s/[\?\(\)]/./g;
but I don't really know what characters will be in the list of strings and if I miss a special char then I might never notice the program is not behaving as expected. And I feel that there should exist a general solution.
Thanks.
As Brad Gilbert commented use quotemeta:
my $regex = qr/^\Q$string\E$/;
or
my $quoted = quotemeta $string;
my $regex2 = qr/^$quoted$/;
There is a function for that quotemeta.
quotemeta EXPR
Returns the value of EXPR
with all non-"word" characters
backslashed. (That is, all characters
not matching /[A-Za-z_0-9]/ will be
preceded by a backslash in the
returned string, regardless of any
locale settings.) This is the internal
function implementing the \Q escape in
double-quoted strings.
If EXPR is omitted, uses $_.
From http://www.regular-expressions.info/characters.html :
there are 11 characters with special meanings: the opening square bracket [, the backslash \, the caret ^, the dollar sign $, the period or dot ., the vertical bar or pipe symbol |, the question mark ?, the asterisk or star *, the plus sign +, the opening round bracket ( and the closing round bracket )
In Perl (and PHP) there is a special function quotemeta that will escape all these for you.
To put Brad Gilbert's suggestion into an answer instead of a comment, you can use quotemeta function. All credit to him
Why use a regular expression at all? Since you aren't doing any capturing and it seems you will not be going to allow for any variations, why not simply use the index builtin?
$s1 = 'hello, (world)?!';
$s2 = 'he said "hello, (world)?!" and nothing else.';
if ( -1 != index $s2, $s1 ) {
print "we've got a match\n";
}
else {
print "sorry, no match.\n";
}