tcl regsub will not work - regex

I'm trying to write an extremely simple piece of code and tcl is not cooperating. I can only imagine there is a very simple error I am missing in my code but I have absolutely no idea what it could be please help I'm so frustrated!!
My code is the following ...
proc substitution {stringToSub} {
set afterSub $stringToSub
regsub {^.*?/projects} "$stringToSub" "//Path/projects" afterSub
regsub {C:/projects} "$stringToSub" "//Path/projects" afterSub
return $afterSub
}
puts "[substitution /projects] "
puts "[substitution C:/projects] "
The substitution works fine for the second expression but not the first one. Why is that??
I have also tried using
regsub {^/projects} "$stringToSub" "//Path/projects" afterSub
and
regsub {/projects} "$stringToSub" "//Path/projects" afterSub
but neither are working. What is going on??

Since yours two regsub calls don't change the input string (i.e.: $stringToSub) but put the result in the string $afterSub which is returned by the function. You will always obtain the result of the last regsub call and the result of the first regsub call in $aftersub is always overwritten.
Note that the first pattern is more general and include all the strings matched by the second (assuming that $stringToSub is always a path). If you hope to obtain "//Path/projects" for your sample strings, you can simply remove the second regsub call:
proc substitution {stringToSub} {
set afterSub $stringToSub
regsub {^.*?/projects} "$stringToSub" "//Path/projects" afterSub
return $afterSub
}

The first two lines in your procedure will effectively do nothing, since regsub always overwrites the destination variable (afterSub) even when there's 0 matches/substitutions made. From the regsub manual:
This command matches the regular expression exp against string, and either copies string to the variable whose name is given by varName or returns string if varName is not present. (Regular expression matching is described in the re_syntax reference page.) If there is a match, then while copying string to varName (or to the result of this command if varName is not present) the portion of string that matched exp is replaced with subSpec.
There's no need to match C:/projects specifically, because ^.*?/projects will match that text?

The issue is that your second use of the regsub operation is overwriting the substituted value from the first regsub use.
We could simplify the code to just this:
proc substitution {stringToSub} {
return [regsub {^.*?/projects} $stringToSub "//Path/projects"]
}

Related

Tcl: Regsub does not substitute a string while parsing HTML snipet

I'm trying to find a specific string within an array element. Since array element is a string which can contain multiple occurrences of the string I perform recursive substitution of the result. Algorithm works on simple example, but when I use it with HTML (which is the purpose of the program) it stuck in an infinite while loop.
Here is an (ugly) expression that I'm using:
set expression {\<div\sclass\=\"fileText\"\sid\=\"[^\"]+\"\>File\:\s\<a\s(title\=\"[^\"]+\"\s)?href\=\"([^\"]+)\"\starget\=\"\_blank\"\>([^\<]+)\<\/a\>[^\<]+\<\/div\>};
Here is an element of the array I from which I want to extract strings (it containes 2 occurences of the given expression):
set htmlForParse(0) {file" id="f51456520"><div class="fileText" id="fT51456520">File: 48912-arduinouno_r3_front.jpg (1022 KB, 1800x1244)</div><a class="fileThumb" href="//example.com" target="_blank"><img " title="Reply to this post">YesNo?</a></span></div><div class="file" id="f51456769"><div class="fileText" id="fT51456769">File: 892991578.jpg (32 KB, 400x422)</div><a class="fileThumb" href="//example.com" target="_blank"><img src};
And here are the loops that I'm using to achieve this:
for {set k 0} {$k < [array size htmlForParse]} {incr k} {
while {[regexp $expression $htmlForParse($k) exString]} {
regsub -- $exString $htmlForParse($k) {} htmlForParse($k);
puts $htmlForParse($k);
} }
Purpose of the regsub is to substitute one hit from regexp at a time, until no hits are left and regexp returns 0. At that moment, while loop is finished, and next element of the array can be examined. But that doesn't happen, it continues to loop forever, and it seem that regsub does not substitute found string with an empty string (nor will it substitute with anything else either). Why?
The problem is that the string you are matching contains unquoted RE metacharacters. The ones I notice are parentheses (around the sizes):
% regexp $expression $htmlForParse($k) exString
1
% puts $exString
<div class="fileText" id="fT51456520">File: 48912-arduinouno_r3_front.jpg (1022 KB, 1800x1244)</div>
This means that the substring you extract doesn't actually match as a regular expression in the regsub, and no change is made. Next time round the loop, you get to match everything exactly as it was once again. Not what you want!
The easiest fix is to tell the regsub that the string it is using as a pattern is a literal string. This is done by preceding the RE with ***=, like this:
while {[regexp $expression $htmlForParse($k) exString]} {
regsub -- ***=$exString $htmlForParse($k) {} htmlForParse($k)
puts $htmlForParse($k)
}
With your sample text, this will perform two replacements. I hope that's what you want.
Also, your initial RE has far too many backslashes in it. None of /, < and > are RE metacharacters. It's not harmful to quote them, but I hope you are generating that RE from something, not writing it by hand!

How do I use regex capture group as array index?

I'm trying to use regsub in TCL to replace a string with the value from an array.
array set myArray "
one 1
two 2
"
set myString "\[%one%\],\[%two%\]"
regsub -all "\[%(.+?)%\]" $myString "$myArray(\\1)" newString
My goal is to convert a string from "[%one%],[%two%]" to "1,2". The problem is that the capture group index is not resolved. I get the following error:
can't read "myArray(\1)": no such element in array
while executing
"regsub -all "\[%(.+?)%\]" $myString "$myArray(\\1)" newString"
This is a 2 step process in Tcl. Your main mistake here is using double quotes everywhere:
array set myArray {one 1 two 2}
set myString {[%one%],[%two%]}
regsub -all {\[%(.+?)%\]} $myString {$myArray(\1)} new
puts $new
puts [subst -nobackslash -nocommand $new]
$myArray(one),$myArray(two)
1,2
So we use regsub to search for the expression and replace it with the string representation of the variable we want to expand. Then we use the rarely-used subst command to perform the variable (only) substitution.
Apart from using regsub+subst (which is a decidedly tricky pair of commands to use safely in general) you can also do relatively simple transformations using string map. The trick is in how you prepare the mapping:
# It's conventional to use [array set] like this…
array set myArray {
one 1
two 2
}
set myString "\[%one%\],\[%two%\]"
# Build the transform
set transform {}
foreach {from to} [array get myArray] {
lappend transform "\[%$from%\]" $to
}
# Apply the transform
set changedString [string map $transform $myString]
puts "transformed from '$myString' to '$changedString'"
As long as each individual thing you want to go from and to is a constant string at the time of application, you can use string map to do it. The advantage? It's obviously correct. It's very hard to make a regsub+subst transform obviously correct (but necessary if you need a more complex transform; that's the correct way to do %XX encoding and decoding in URLs for example).

Is there a way to do multiple substitutions using regsub?

Is it possible to have do different substitutions in an expression using regsub?
example:
set a ".a/b.c..d/e/f//g"
Now, in this expression, is it possible to substitute
"." as "yes"
".." as "no"
"/" as "true"
"//" as "false" in a single regsub command?
With a regsub, no. There's a long-standing feature request for this sort of thing (which requires substitution with the result of evaluating a command on the match information) but it's not been acted on to date.
But you can use string map to do what you want in this case:
set a ".a/b.c..d/e/f//g"
set b [string map {".." "no" "." "yes" "//" "false" "/" "true"} $a]
puts "changed $a to $b"
# changed .a/b.c..d/e/f//g to yesatruebyescnodtrueetrueffalseg
Note that when building the map, if any from-value is a prefix of another, the longer from-value should be put first. (This is because the string map implementation checks which change to make in the order you list them in…)
It's possible to use regsub and subst to do multiple-target replacements in a two-step process, but I don't advise it for anything other than very complex cases! A nice string map is far easier to work with.
You may also try to do it yourself. This is a draft proc which you could use as a starting point. It is not production ready, and you must be carefull because substitutions after the first one work on already substituted string.
These are the parameters:
options is a list of options that will be passed to every call to regsub
resubList is a list of key/value pairs, where the key is a regular expression and the value is a substitution
string is the string you want to substitute
This is the procedure, and it simply calls regsub multiple times, once for every element in resubList and, at the end, it returns the final string.
proc multiregsub {options resubList string} {
foreach {re sub} $resubList {
set string [regsub {*}$options -- $re $string $sub]
}
return $string
}

regsub -all and proc

I'm writing a recursive procedure, called "subliner3". To be simple it replaces:
[object method] with Object->method()
[object method:attr1 attr2 ...] with object->method(attr1,attr2,...)
It is recursive to replace (1) and (2) inside (2). Any attr may be like (1) or (2).
So, this code causes problem:
while {[regsub -all {\[([^\[\]:]+)[:]([^\[\]]+)\]} $subline "[subliner3 "\\1" "\\2"]" subline]} {}
This is supposed to find exactly (2) in subline (subline is an attribute list) and call function again for it. The problem is that, when subline is called with regsub's \1 and \2 subliner3 really gets "\1" and "\2", so looks like they are interpreted afted subliner3 call. How can I manage to call [subliner3 "\1" "\2"] with interpreted \1 & \2?
Sample Input:
[self runAction:[CCSequence actions:[CCDelayTime actionWithDuration:5], [CCCallFunc actionWithTarget:self selector:#selector(resetMessage)], nil]];
Output:
self->runAction(CCSequence::actions(CCDelayTime::actionWithDuration(5), CCCallFunc::actionWithTarget(self, #selector(resetMessage)), nil);
You can do it (under some assumptions, such as no use of arrays) but you really need to work inside out and to put your substitution code in a loop.
# Your sample as input
set string {[self runAction:[CCSequence actions:[CCDelayTime actionWithDuration:5], [CCCallFunc actionWithTarget:self selector:#selector(resetMessage)], nil]];}
# Do most of the replacements, recursively.
#
# Note that some parts are changed to \001stripExtra stuff\002, because we need
# to do further processing on the arguments which can't quite be done in this
# looped [regsub].
while {[regsub -all {\[(\w+) (\w+):([^]]+)\]} $string \
"\\1::\\2(\u0001stripExtra \\3\u0002)" string]} {}
# The further processing to do on arguments (removing selectors)
proc stripExtra args {
foreach s $args {
# The lack of a fourth argument means [regsub] returns the string
lappend t [regsub {^\w+:(?!:)} [string trimright $s ","] {}]
}
return [join $t ","]
}
# Apply the further processing by converting to a substitutable string
set string [subst [string map {\u0001 "\[" \u0002 "\]"} $string]]
# Now transformed...
puts $string
The code above is rather brittle as it doesn't actually understand Objective-C, so you should check that its output is reasonable on your real input data…
One of possible solutions is:
expr \"[regsub -all {(1)} "number 1" "[getSp \1]"]\"
So, at first, regsub manages to put (1) on \1 position. Then expr makes to call getSp. And it will be called not with \1, but with (1).
But to work with this solution I need to make sure, that string, returned by regsub has [ ] specified only for procedures, but it is not allways like that. For example string after regsub call may be like that:
[self runAction:[CCSequence actions:[subliner3 "CCDelayTime actionWithDuration" "5"], [subliner3 "CCCallFunc actionWithTarget" "self selector:#selector(resetMessage)"], nil]];
Where only subliner3 - is a procedure.

TCL, regexp function and parameter substitution

I have the following code where I'm trying to match data on a single line into different variables via the regexp function.
The number of data (on the input line) and then of variable names in regexp function can vary, that's why I use $varLine (which is previously processed in my real code).
set in_stim "13 1 1 0 1 0 0 0 2 03"
set regex {^(\d+)\s([01])\s([01])\s([01])\s([01])\s([01])\s([01])\s([01])\s(\d+)\s(\d+)}
set varLine "sig1 sig2 sig3 sig4 sig5 sig6 sig7 sig8 sig8"
regexp $regex $in_stim whole sig0 $varLine
puts "sig0: $sig0"
puts $sig1
When I am executing it, I get the following error ($sig0 is correctly displayed):
sig0: 13
can't read "sig1": no such variable
while executing
"puts $sig1"
If I manually substitute $varLine into the regexp line, the error disappears:
set in_stim "13 1 1 0 1 0 0 0 2 03"
set regex {^(\d+)\s([01])\s([01])\s([01])\s([01])\s([01])\s([01])\s([01])\s(\d+)\s(\d+)}
regexp $regex $in_stim whole sig0 sig1 sig2 sig3 sig4 sig5 sig6 sig7 sig8 sig8
puts $sig0
puts $sig1
I get the following correct output:
13
1
Does anyone see mistakes in my code or could help?
Thanks!
The issue is that the regexp command doesn't take a list of variables to store submatches into as one argument, but rather as many arguments.
The simplest method of working around this is to expand the variable list:
regexp $regex $in_stim whole sig0 {*}$varLine
When you do
set varLine "sig1 sig2 sig3 sig4 sig5 sig6 sig7 sig8 sig8"
regexp $regex $in_stim whole sig0 $varLine
the Tcl parser passes the regexp command exactly five arguments, with the fifth argument being the contents of the variable "varLine" which is then treated by the regexp command as a single word. A single word obviously denotes a single variable (with somewhat complex name in your case, as it happens).
To do what you need, you have to resort to dynamic scripting which can be done in two ways:
"Classic" approach using eval:
eval [concat [list regexp $regex $in_stim whole sig0] $varLine]
Using the {*} syntactic sugar from Tcl 8.5 onwards:
regexp $regex $in_stim whole sig0 {*}$varLine
The classic approach first constructs a list of words by concatenating two lists: the "static" part of the command and then the list of variables to pass to it. Then the constructed list is evaluated as a command. You can read more on this in this classic book.
In the new-style approach, the {*} thingy is used to expand the contents of $varLine into an "in-place" list — refer to the 5th rule here.