regexp tcl to search for variables

regexp tcl to search for variables - regex

I am trying to find the matching pattern using regexp command in the {if loop} . Still a newbie in tcl. The code is as shown below:
set A 0;
set B 2;
set address "my_street[0]_block[2]_road";
if {[regexp {street\[$A\].*block\[$B\]} $address]} {
puts "the location is found"
}
I am expecting the result to return "the location is found" as the $address contain matching A and B variables. i am hoping to able to change the A and B number for a list of $address. but I am not able to get the result to return "the location is found".
Thank you.

Tcl's regular expression engine doesn't do variable interpolation. (Should it? Perhaps. It doesn't though.) That means that you need to do it at the generic level, which is in general quite annoying but OK here as the variables only have numbers in, which are never RE metacharacters by themselves.
Basic version (with SO. MANY. BACKSLASHES.):
if {[regexp "street\\\[$A\\\].*block\\\[$B\\\]" $address]} {
Nicer version with format:
if {[regexp [format {street\[%d\].*block\[%d\]} $A $B] $address]} {
You could also use subst -nocommands -nobackslashes but that's getting less than elegant.
If you need to support general substitutions, it's sufficient to use regsub to do the protection.
proc protect {string} {
regsub -all {\W} $string {\\&}
}
# ...
if {[regexp [format {street\[%s\].*block\[%s\]} [protect $A] [protect $B]] $address]} {
It's overkill when you know you're working with alphanumeric substitutions into the RE.

Related

Replacing with Named Captures and Precompiled Regular Expressions in Perl

I'm trying to compile a set of substitution regexes but I can't figure out how to delay interpolation of the capture variables in the replacement scalar I'm setting aside; here's a simple contrived example:
use strict;
use warnings;
my $from = "quick";
my $to = "zippy";
my $find = qr/${from} (?<a>(fox|dog))/;
my $repl = "$to $+{a}"; # Use of uninitialized value in concatenation (.) or string
my $s0 = "The quick fox...\n";
$s0 =~ s/${find}/${repl}/;
print($s0);
This doesn't work because repl is interpolated immediately and elicits "Use of uninitialized value in concatenation (.) or string"
If I use non-interpolating '' quotes it doesn't interpolate in the actual substitution so I get "The zippy $+{a}..."
Is there a trick to setting aside a replacement scalar that contains capture references?

You are getting the warning because you are using $+{a} before performing the match. qr// doesn't perform any matching; it's simply compiles the pattern. It's s/// that performs the match.
You presumably meant to use
my $repl = "$to \$+{a}";
But that simply outputs
The zippy \$+{a}...
You could use the following:
my $find = qr/quick (?<a>fox|dog)/;
my $s0 = "The quick fox...\n";
$s0 =~ s/$find/zippy $+{a}/;
print($s0);
But that hard codes the replacement expression. If you want this code to be dynamic, then what you are building is a template system.
I don't know of any template system with your specific desired syntax.
If you're ok with using the positional variables ($1) instead of named ones ($+{a}), you can use String::Substitution.
use String::Substitution qw( sub_modify );
my $find = qr/quick (?<a>fox|dog)/; # Or simply qr/\Q$from\E (fox|dog)/
my $repl = "zippy \$1";
my $s0 = "The quick fox...\n";
sub_modify($s0, $find, $repl);
print($s0);

The qr// only compiles a pattern. It does not perform a match, so it does not set anything in %+. Hence, the uninitialized warnings.
However, you can do that in the substitution so you don't need to prepare the replacement ahead of time:
s/$find/$to $+{a}/;
However, if you don't know what you want your replacement to be, you can eval code in the replacement side of the substitution that will then be the replacement. Here's a simple addition:
s/$find/ 2 + 2 /e;
You'd get the sum as the replacement:
The 4 jumped over the lazy dog
But here's the rub: That's code and it can do whatever code can do. How you construct that is very important and should never use unsanitized user input.
If you didn't know the string you wanted to put in there, you can construct it beforehand and store it in the variable you use in the replacement side. However, you are making Perl code to eval, so it needs to be a valid Perl string. The double quotes are part of the eval that you will eval later:
my $replacement = '"$to $+{a}"';
s/$find/$replacement/;
Like that, you get the literal string value from $replacement:
The "$to $+{a}" jumped over the lazy dog
Adding the /e means that we evaluate the replacement side as code:
s/$find/$replacement/e;
But, that code is $replacement, and ends up giving us the same result because it's just its string value:
The "$to $+{a}" jumped over the lazy dog
Now here's the fun part. We can eval again! Add another /e and the substitution will eval the first time, then take that result and eval it again:
$s0 =~ s/${find}/$replacement/ee;
The first round of the eval gets the literal text value of $replacement, which is "$to $+{a}" (including the double quotes). The second round takes "$to $+{a}" and evals that, filling in the variables with the values in the current lexical scope. The %+ is populated by the substitution already. Now you have your result:
The zippy fox jumped over the lazy dog
However, this isn't a trick you should pull out lightly. There might be a better way to attack your problem. You do this sort of thing when you bend anything else to your will.
You also have to be very careful that you do what you intend in the string that you construct. You are creating new Perl code. If you are using any sort of outside data that you didn't supply, someone can trick your program into running code that you didn't intend.

There are three good ways to do dynamic regex substitution at runtime:
String interpolation of variables s///
Callback for code execution s///e
Embedded code constructs in the regex.
See the examples below.
Normally a callback form, either via a function or Embedded regex code is used when logic is required to construct a replacement.
Otherwise, use a simple string interpolation on the replacement side.
use strict;
use warnings;
my $s0 = "";
my ($from, $to) = ("quick", "zippy") ;
sub getRepl {
my ($grp1, $grp2) = #_;
if ( $grp1 eq $from ) {
return "<$to $grp2>" }
else {
return "< $2>"
}
}
my $find = qr/(\Q${from}\E) (fox|dog)/;
# ======================================
# Substitution via string interpolation
$s0 = "The quick dog...\n";
$s0 =~ s/$find/[$to $2]/;
print $s0;
# ======================================
# Substitution via callback (eval)
$s0 = "The quick dog...\n";
$s0 =~ s/$find/ getRepl($1,$2) /e;
print $s0;
# ==================================================
# Substitution via regex embedded code constructs
my $repl = "";
my $RxCodeEmbed = qr/(\Q${from}\E)(?{$repl = '(' . $to}) (fox|dog)(?{$repl .= ' ' . $^N . ')'})/;
$s0 = "The quick dog...\n";
$s0 =~ s/$RxCodeEmbed/$repl/;
print $s0;
Outputs
The [zippy dog]...
The <zippy dog>...
The (zippy dog)...

other file with regexp

Im have many tcl scripts and in all the same lots of regexp entries.
regexp one exmpl:
if {[regexp -nocase {outl} $cat]} { set cat "outlook" }
how can insert all my regexp in a file and load this in a proc?
exampl:
proc pub:mapping {nick host handle channel text} {
set cat [lindex [split $text] 1];
#regexp i want hier load the file for regexp
if {[regexp -nocase {outl} $cat]} { set cat "outlook" }
putnow "PRIVMSG $channel :new $cat"
}
Regards

If I understand you correctly, you now have a bunch of Tcl scripts with large portions of code being repeated among them (in your case, various regex comparisons). In that case, it makes a lot of sense to extract that code into a separate unit.
This could be, as you suggest, become a sort of a text file where you would list regex expressions and their results in some format and then load them when needed in Tcl scripts. But I feel this would be too complicated and ungainly.
Might I suggest you simply create a regex checking proc and save that into a .tcl file. If you need regex checking in any of your other scripts, you can simply source that file and have the proc available.
From your question I'm not quite sure how you plan on using those regex comparisons, but maybe this example can be of some help:
# This is in regexfilter.tcl
proc regexfilter {text} {
if {[regexp -nocase {outl} $text]} { return "Outlook" }
if {[regexp -nocase {exce} $text]} { return "Excel" }
if {[regexp -nocase {foo} $text]} { return "Bar" }
# You can have as many options here as you like.
# In fact, you should consider changing all this into a switch - case
# The main thing is to have all your filters in one place and
# no code duplication
}
#
# This can then be in other Tcl scripts
#
source /path_to_filter_scipt/regexfilter.tcl
proc pub:mapping {nick host handle channel text} {
set cat [lindex [split $text] 1]
set cat [regexfilter $cat]
putnow "PRIVMSG $channel :new $cat"
}

If you're just wanting to expand abbreviations, then you can use string map
proc expand_abbreviations {string} {
# this is an even-numbered list mapping the abbreviation to the expansion
set abbreviations {
outl outlook
foo foobar
ms Microsoft
}
return [string map $abbreviations $string]
}
This approach will be quite fast. However, if the string already contains "outlook", it will be turned into "outlookook"

Passing a match in regsub with & to a procedure (Tcl is being used)

I want to go through a comma separated string and replace matches with more comma separated elements.
i.e 5-A,B after the regsub should give me 1-A,2-A,3-A,4-A,5-A,B
The following is not working for me as & is being passed as an actual & instead of the actual match:
regsub -all {\d+\-\w+} $string [myConvertProc &]
However not attempting to pass the & and using it directly works:
regsub -all o "Hello World" &&&
> Hellooo Wooorld
Not sure what I am doing wrong in attempting to pass the value & holds to myConvertProc
Edit: I think my initial problem is the [myConvertProc &] is getting evaluated first, so I am actually passing '&' to the procedure.
How do I get around this within the regex realm? Is it possible?
Edit 2: I've already solved it using a foreach on a split list, so I'm just looking to see if this is possible within a regsub. Thanks!

You are correct in your first edit: the problem is that each argument to regsub is fully evaluated before executing the command.
One solution is to insert a command substitution string into the string, and then use subst on it:
set string [regsub -all {\d+\-\w+} $string {[myConvertProc &]}]
# -> [myConvertProc 5-A],B
set string [subst $string]
# -> 1-A,2-A,3-A,4-A,5-A,B
This will only work if there is nothing else in string that is subject to substitution (but you can of course turn off variable and backslash substitution).
The foreach solution is much better. An alternative foreach solution is to iterate over the result of regexp -indices -inline -all, but iterating over the parts of a split list is preferable if it works.
Update:
A typical foreach solution goes like this:
set res {}
foreach elem [split $string ,] {
if {[regexp -- {^\d+-\w+$} $elem]} {
lappend res [myConvertProc $elem]
} else {
lappend res $elem
}
}
join $res ,
That is, you collect a result list by looking at each element in the raw list. If the element matches your requirement, you convert it and add the result to the result list. If the element doesn't match, you just add it to the result list.
It can be simplified somewhat in Tcl 8.6:
join [lmap elem [split $string ,] {
if {[regexp -- {^\d+-\w+$} $elem]} {
myConvertProc $elem
} else {
set elem
}
}] ,
Which is the same thing, but the lmap command handles the result list for you.
Documentation: foreach, lappend, lmap, regexp, regsub, set, split, subst

Tcl adds curly braces when using `$` sign

set B {pc_0::!mx_0 pi::$mx_0}
puts $B
set A ""
foreach x $B {
lappend A $x
}
puts $A
The output of this program is
pc_0::!mx_0 pi::$mx_0
pc_0::!mx_0 {pi::$mx_0}
It is strange that tcl adds curly braces in second output. I guess it is because it uses $ symbol. But I really need to use it and I don't want the braces to be inserted. How this can be explained and how to avoid the braces?

As a general rule, don't treat lists as strings. Pretend that they don't have a string representation. (The string representation is only useful for serialization, debugging, but not for the user).
To convert text (especially user-input) to a list use split.
To convert it back, use join.
Sou you want:
puts [join $A]
Background:
A list have the sideeffect of escaping all meta-characters used by Tcl so no further subsitution takes place when you eval this list. This is a very important property for generating Callbacks/code that will be later executed:
set userinput [gets stdin]
set code [list puts $userinput]
eval $code
No matter what the user enters here, the output is always the same as the user entered, without any substitution.
If the $ would not be escaped, then an evaluation would try to substitute $mx_0, which will most likly fail.

Why not print the list in a way similar to how it was created?
As an experienced programmer new to Tcl, this seems much more intuitive to me:
foreach x $A {
puts $x
}

Case matching regexp

I have been wondering about a regexp matching pattern in Tcl for some time and I've remained stumped as to how it was working. I'm using Wish and Tcl/Tk 8.5 by the way.
I have a random string MmmasidhmMm stored in $line and the code I have is:
while {[regexp -all {[Mm]} $line match]} {
puts $data $match
regsub {[Mm]} $line "" line
}
$data is a text file.
This is what I got:
m
m
m
m
m
m
While I was expecting:
M
m
m
m
M
m
I was trying some things to see how changing a bit would affect the results when I got this:
while {[regexp -all {^[Mm]} $line match]} {
puts $data $match
regsub {[Mm]} $line "" line
}
I get:
M
m
m
Surprisingly, $match keeps the case.
I was wondering why in the first case, $match automatically becomes lowercase for some reason. Unless I am not understanding how the regexp actually is working, I'm not sure what I could be doing wrong. Maybe there's a flag that fixes it that I don't know about?
I'm not sure I'll really use this kind of code some day, but I guess learning how it works might help me in other ways. I hope I didn't miss anything in there. Let me know if you need more information!

The key here is in your -all flag. The documentation for that said:
-all -- Causes the regular expression to be matched as many times as possible in the string, returning the total number of matches found. If this is specified with match variables, they will contain information for the last match only.
That means the variable match contains the very last match, which is a lower case 'm'. Drop the -all flag and you will get what you want.
Update
If your goal is to remove all 'm' regardless of case, that whole block of code can be condensed into just one line:
regsub -all {[MM]} $line "" line
Or, more intuitively:
set line [string map -nocase {m ""} $line]; # Map all M's into nothing

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js