How do I use regex capture group as array index? - regex

I'm trying to use regsub in TCL to replace a string with the value from an array.
array set myArray "
one 1
two 2
"
set myString "\[%one%\],\[%two%\]"
regsub -all "\[%(.+?)%\]" $myString "$myArray(\\1)" newString
My goal is to convert a string from "[%one%],[%two%]" to "1,2". The problem is that the capture group index is not resolved. I get the following error:
can't read "myArray(\1)": no such element in array
while executing
"regsub -all "\[%(.+?)%\]" $myString "$myArray(\\1)" newString"

This is a 2 step process in Tcl. Your main mistake here is using double quotes everywhere:
array set myArray {one 1 two 2}
set myString {[%one%],[%two%]}
regsub -all {\[%(.+?)%\]} $myString {$myArray(\1)} new
puts $new
puts [subst -nobackslash -nocommand $new]
$myArray(one),$myArray(two)
1,2
So we use regsub to search for the expression and replace it with the string representation of the variable we want to expand. Then we use the rarely-used subst command to perform the variable (only) substitution.

Apart from using regsub+subst (which is a decidedly tricky pair of commands to use safely in general) you can also do relatively simple transformations using string map. The trick is in how you prepare the mapping:
# It's conventional to use [array set] like this…
array set myArray {
one 1
two 2
}
set myString "\[%one%\],\[%two%\]"
# Build the transform
set transform {}
foreach {from to} [array get myArray] {
lappend transform "\[%$from%\]" $to
}
# Apply the transform
set changedString [string map $transform $myString]
puts "transformed from '$myString' to '$changedString'"
As long as each individual thing you want to go from and to is a constant string at the time of application, you can use string map to do it. The advantage? It's obviously correct. It's very hard to make a regsub+subst transform obviously correct (but necessary if you need a more complex transform; that's the correct way to do %XX encoding and decoding in URLs for example).

Related

Tcl: Regsub does not substitute a string while parsing HTML snipet

I'm trying to find a specific string within an array element. Since array element is a string which can contain multiple occurrences of the string I perform recursive substitution of the result. Algorithm works on simple example, but when I use it with HTML (which is the purpose of the program) it stuck in an infinite while loop.
Here is an (ugly) expression that I'm using:
set expression {\<div\sclass\=\"fileText\"\sid\=\"[^\"]+\"\>File\:\s\<a\s(title\=\"[^\"]+\"\s)?href\=\"([^\"]+)\"\starget\=\"\_blank\"\>([^\<]+)\<\/a\>[^\<]+\<\/div\>};
Here is an element of the array I from which I want to extract strings (it containes 2 occurences of the given expression):
set htmlForParse(0) {file" id="f51456520"><div class="fileText" id="fT51456520">File: 48912-arduinouno_r3_front.jpg (1022 KB, 1800x1244)</div><a class="fileThumb" href="//example.com" target="_blank"><img " title="Reply to this post">YesNo?</a></span></div><div class="file" id="f51456769"><div class="fileText" id="fT51456769">File: 892991578.jpg (32 KB, 400x422)</div><a class="fileThumb" href="//example.com" target="_blank"><img src};
And here are the loops that I'm using to achieve this:
for {set k 0} {$k < [array size htmlForParse]} {incr k} {
while {[regexp $expression $htmlForParse($k) exString]} {
regsub -- $exString $htmlForParse($k) {} htmlForParse($k);
puts $htmlForParse($k);
} }
Purpose of the regsub is to substitute one hit from regexp at a time, until no hits are left and regexp returns 0. At that moment, while loop is finished, and next element of the array can be examined. But that doesn't happen, it continues to loop forever, and it seem that regsub does not substitute found string with an empty string (nor will it substitute with anything else either). Why?
The problem is that the string you are matching contains unquoted RE metacharacters. The ones I notice are parentheses (around the sizes):
% regexp $expression $htmlForParse($k) exString
1
% puts $exString
<div class="fileText" id="fT51456520">File: 48912-arduinouno_r3_front.jpg (1022 KB, 1800x1244)</div>
This means that the substring you extract doesn't actually match as a regular expression in the regsub, and no change is made. Next time round the loop, you get to match everything exactly as it was once again. Not what you want!
The easiest fix is to tell the regsub that the string it is using as a pattern is a literal string. This is done by preceding the RE with ***=, like this:
while {[regexp $expression $htmlForParse($k) exString]} {
regsub -- ***=$exString $htmlForParse($k) {} htmlForParse($k)
puts $htmlForParse($k)
}
With your sample text, this will perform two replacements. I hope that's what you want.
Also, your initial RE has far too many backslashes in it. None of /, < and > are RE metacharacters. It's not harmful to quote them, but I hope you are generating that RE from something, not writing it by hand!

tcl regex inside string map

I got a template which I want to transform into a tex file. For example each chapter and section which contains a dot and a letter I want to replace it with a backslash and the letter. I know I can try using regsub, but I really want to try it with string map, here is the code which is not working:
set main \
{
.chapter{Assignment 1}
.section{1a}
}
set main [string map {.({^[A-Za-z]+$})\1 \\\1} $main]
It will be easier to do just:
set main [string map {.c \\c} $main]
set main [string map {.s \\s} $main]
But I just want to try using the {any letter thingy} and to know if is possible using the string map command.
As per the string map manual entry this command does not take a regular expression. String map takes a simple list of strings to match with values to substitute in place. It has a -nocase option to enable case independent matches but that is all. However, you can have multiple pairs eg:
string map {.c \\c .s \\s} $value
You can also use normal Tcl scripting to build up a more complex list of pairs if you want as that mapping is just a list. If you want or need to use regular expressions, then you must use regsub.

splitting a formula and again regenerating and reevaluateing formula

I m splitting a formula string with "*/+-()" as my pattern (for eg. a*b+c is string) and I m getting a list in the output as (a b c) where a,b,c are variables and contain some values like 5,10,15.
What I need is: I should be able to directly substitute values in the variables and evaluate the expression.
The formula is taken from the user and changes time to time. so if the user enters (a/b), something should automatically replace it with real values (5/10) and then return the result 0.5.
The formula is formed from limited number of variables (for eg. a,b,c) and it can use +,-,*,/,(,) as operators.
The problem is that after splitting the variables, i m not able to replace them with their values or evaluate the equation. Please help me to do this task in as short expression as possible. thanks in advance.
It is not at all that complicated:
First, replace all variables with with a Tcl variable (prepped a $).
You have to be careful not replace sin(a) with $sin($a) or similar.
regsub -all -inline {[a-z]+(?![a-z\(])} $input
Example:
set input {a*b+c+sin(d)}
regsub -all -inline {[a-z]+(?![a-z\(])} $input
would yield $a+$b*$c+sin($c), which can be passed to expr.
If you need the variable names, just use regexp with this expression.
If you know the names of the variables and none of them are prefixes of anything else you use, you can easily transform the expression like this:
set a 1; set b 2; set c 3
set e "a*b+c"
set value [expr [string map {a $a b $b c $c} $e]]
puts "$e = $value"
Note: no braces around the expression on the third line. This is when you want to avoid safety like that because you are doing runtime generation of the expression.
That mapping can be generated automatically:
set a 1; set b 2; set c 3
set e "a*b+c"
set vars {a b c}
set value [expr [string map [regsub -all {\w+} "& $&"] $e]]
puts "$e = $value"
However, if you've got prefixes and other things like that, you need a more complex transform:
# Omitting the variable setup and print at the end...
proc replIfRight {vars word} {
if {$word in $vars} {return \$$word} else {return $word}
}
set value [expr [subst [regsub -all {\w+} [string map {[ \[ $ \$ \\ \\\\} $e] {[replIfRight $vars &]}]]]
You're absolutely right to not expect to come up with such a horrible thing yourself!

Is there a way to do multiple substitutions using regsub?

Is it possible to have do different substitutions in an expression using regsub?
example:
set a ".a/b.c..d/e/f//g"
Now, in this expression, is it possible to substitute
"." as "yes"
".." as "no"
"/" as "true"
"//" as "false" in a single regsub command?
With a regsub, no. There's a long-standing feature request for this sort of thing (which requires substitution with the result of evaluating a command on the match information) but it's not been acted on to date.
But you can use string map to do what you want in this case:
set a ".a/b.c..d/e/f//g"
set b [string map {".." "no" "." "yes" "//" "false" "/" "true"} $a]
puts "changed $a to $b"
# changed .a/b.c..d/e/f//g to yesatruebyescnodtrueetrueffalseg
Note that when building the map, if any from-value is a prefix of another, the longer from-value should be put first. (This is because the string map implementation checks which change to make in the order you list them in…)
It's possible to use regsub and subst to do multiple-target replacements in a two-step process, but I don't advise it for anything other than very complex cases! A nice string map is far easier to work with.
You may also try to do it yourself. This is a draft proc which you could use as a starting point. It is not production ready, and you must be carefull because substitutions after the first one work on already substituted string.
These are the parameters:
options is a list of options that will be passed to every call to regsub
resubList is a list of key/value pairs, where the key is a regular expression and the value is a substitution
string is the string you want to substitute
This is the procedure, and it simply calls regsub multiple times, once for every element in resubList and, at the end, it returns the final string.
proc multiregsub {options resubList string} {
foreach {re sub} $resubList {
set string [regsub {*}$options -- $re $string $sub]
}
return $string
}

TCL, regexp function and parameter substitution

I have the following code where I'm trying to match data on a single line into different variables via the regexp function.
The number of data (on the input line) and then of variable names in regexp function can vary, that's why I use $varLine (which is previously processed in my real code).
set in_stim "13 1 1 0 1 0 0 0 2 03"
set regex {^(\d+)\s([01])\s([01])\s([01])\s([01])\s([01])\s([01])\s([01])\s(\d+)\s(\d+)}
set varLine "sig1 sig2 sig3 sig4 sig5 sig6 sig7 sig8 sig8"
regexp $regex $in_stim whole sig0 $varLine
puts "sig0: $sig0"
puts $sig1
When I am executing it, I get the following error ($sig0 is correctly displayed):
sig0: 13
can't read "sig1": no such variable
while executing
"puts $sig1"
If I manually substitute $varLine into the regexp line, the error disappears:
set in_stim "13 1 1 0 1 0 0 0 2 03"
set regex {^(\d+)\s([01])\s([01])\s([01])\s([01])\s([01])\s([01])\s([01])\s(\d+)\s(\d+)}
regexp $regex $in_stim whole sig0 sig1 sig2 sig3 sig4 sig5 sig6 sig7 sig8 sig8
puts $sig0
puts $sig1
I get the following correct output:
13
1
Does anyone see mistakes in my code or could help?
Thanks!
The issue is that the regexp command doesn't take a list of variables to store submatches into as one argument, but rather as many arguments.
The simplest method of working around this is to expand the variable list:
regexp $regex $in_stim whole sig0 {*}$varLine
When you do
set varLine "sig1 sig2 sig3 sig4 sig5 sig6 sig7 sig8 sig8"
regexp $regex $in_stim whole sig0 $varLine
the Tcl parser passes the regexp command exactly five arguments, with the fifth argument being the contents of the variable "varLine" which is then treated by the regexp command as a single word. A single word obviously denotes a single variable (with somewhat complex name in your case, as it happens).
To do what you need, you have to resort to dynamic scripting which can be done in two ways:
"Classic" approach using eval:
eval [concat [list regexp $regex $in_stim whole sig0] $varLine]
Using the {*} syntactic sugar from Tcl 8.5 onwards:
regexp $regex $in_stim whole sig0 {*}$varLine
The classic approach first constructs a list of words by concatenating two lists: the "static" part of the command and then the list of variables to pass to it. Then the constructed list is evaluated as a command. You can read more on this in this classic book.
In the new-style approach, the {*} thingy is used to expand the contents of $varLine into an "in-place" list — refer to the 5th rule here.