TCL, regexp function and parameter substitution - regex

I have the following code where I'm trying to match data on a single line into different variables via the regexp function.
The number of data (on the input line) and then of variable names in regexp function can vary, that's why I use $varLine (which is previously processed in my real code).
set in_stim "13 1 1 0 1 0 0 0 2 03"
set regex {^(\d+)\s([01])\s([01])\s([01])\s([01])\s([01])\s([01])\s([01])\s(\d+)\s(\d+)}
set varLine "sig1 sig2 sig3 sig4 sig5 sig6 sig7 sig8 sig8"
regexp $regex $in_stim whole sig0 $varLine
puts "sig0: $sig0"
puts $sig1
When I am executing it, I get the following error ($sig0 is correctly displayed):
sig0: 13
can't read "sig1": no such variable
while executing
"puts $sig1"
If I manually substitute $varLine into the regexp line, the error disappears:
set in_stim "13 1 1 0 1 0 0 0 2 03"
set regex {^(\d+)\s([01])\s([01])\s([01])\s([01])\s([01])\s([01])\s([01])\s(\d+)\s(\d+)}
regexp $regex $in_stim whole sig0 sig1 sig2 sig3 sig4 sig5 sig6 sig7 sig8 sig8
puts $sig0
puts $sig1
I get the following correct output:
13
1
Does anyone see mistakes in my code or could help?
Thanks!

The issue is that the regexp command doesn't take a list of variables to store submatches into as one argument, but rather as many arguments.
The simplest method of working around this is to expand the variable list:
regexp $regex $in_stim whole sig0 {*}$varLine

When you do
set varLine "sig1 sig2 sig3 sig4 sig5 sig6 sig7 sig8 sig8"
regexp $regex $in_stim whole sig0 $varLine
the Tcl parser passes the regexp command exactly five arguments, with the fifth argument being the contents of the variable "varLine" which is then treated by the regexp command as a single word. A single word obviously denotes a single variable (with somewhat complex name in your case, as it happens).
To do what you need, you have to resort to dynamic scripting which can be done in two ways:
"Classic" approach using eval:
eval [concat [list regexp $regex $in_stim whole sig0] $varLine]
Using the {*} syntactic sugar from Tcl 8.5 onwards:
regexp $regex $in_stim whole sig0 {*}$varLine
The classic approach first constructs a list of words by concatenating two lists: the "static" part of the command and then the list of variables to pass to it. Then the constructed list is evaluated as a command. You can read more on this in this classic book.
In the new-style approach, the {*} thingy is used to expand the contents of $varLine into an "in-place" list — refer to the 5th rule here.

Related

Is there a Perl regex metacharacter or a way to have specify a default value, if a subpattern capture does not match?

Here's the idea. I am parsing command line options but doing it across the entire command line, not by each #ARGV element separately.
program --format="%H:%M:%S" --timeout 12 --nofail
I want the parsing to work with these cases.
--name=value, easy to parse
--name value, pretty easy
--name no value, default the value to 1
Here is the regex which works, except it cannot do the missing value case
%options = "#ARGV" ~= /--([A-Za-z]+)[= ]([^-]\S*)/g;
i.e. match --name=value or --name value but not --name --name, --name --name is two names, not a --name=value pair.
If a --name has no value following it that matches the second capture in the regex, is there a way, within the regex, to specify a default, in my case a 1, to indicate "true". i.e. if an --name has no argument, like --nofail then set that argument to 1 indicating true.
Actually, in asking this I figured out a workaround using separate match statements which is fine. However, just out of curiosity, the question still stands, is there a Perl regex way to have a default if a submatch fails?
I don't see how to return a list reflecting a changed input from a regex alone. To change the input we need s{}{}er operator, as we need code in its replacement part to analyze captures and decide what to change; and, we get a string, not a list, which need be further processed (split).
Here is then one such take, with a minimal intrusion of code.
Match name and value, with = or space between them, and if value ($2) is undefined give it a value; so we need /e to implement that.† Once we are at it, put a space between all name-value pairs. This goes under /r so that the changed string is returned, and passed through split
my %arg = split ' ',
$args =~ s{ --(\w+) (?: =|\s+|\z) ([^-]\S*)? }{ $1.' '.($2//'7 ') }ergx;
The split can be done by another regex instead but that's still extra processing.
A complete program (with more flags added to the input)
use warnings;
use strict;
use feature 'say';
my $args = shift // q(--fmt="%H:%M" --f1 --time 12 --f2 --f3);
say $args;
my %arg = split ' ',
$args =~ s{ --(\w+) (?: =|\s+|\z) ([^-]\S*)? }{ $1 . ' ' . ($2//'1 ') }ergx;
say "$_ => $arg{$_}" for keys %arg;
This prints as expected. But note that there may be edge cases, and in particular having a space inside (a quoted) argument value, like "%H %M", would require a far more complex pattern.
I presume that the regex ask is for play/study. Normally this goes by libraries, like Getopt::Long. If that is somehow not possible then processing #ARGV term by term is nice and easy -- and fast.
† In order to actually do "if value ($2) is undefined give it a value" we need to run code in the replacement part, what is done under the /e modifier

value of binding operator expression in perl

I have some doubt about the outcome of a binding operator expression in perl. I mean expression like
string =~ /pattern/
I have done some simple test
$ss="a1b2c3";
say $ss=~/a/; # 1
say $ss=~/[a-z]/g; # abc
#aa=$ss=~/[a-z]/g;say #aa; # abc
$aa=#aa;say $aa; # 3
$aa=$ss=~/[a-z]/g;say $aa; # 1
note the comment part above is the running result.
So here comes the question, what on earth is returned by $ss=~/[a-z]/g, it seems that it returned an array according to code line 3,4,5. But what about the last line, why it gives 1 instead of 3 which is the length of array?
The return of the match operator depends on the context: in list context it returns all captured matches, in scalar context the true/false. The say imposes list context, but in the first example nothing is captured in the regex so you only get "success."
Next, the behavior of /g modifier also differs across contexts. In list context, with it the string keeps being scanned with the given pattern until all matches are found, and a list with them is returned. These are your second and third examples.
But in scalar context its behavior is a bit specific: with it the search will continue from the position of the last match, the next time round. One typical use is in the loop condition
while (/(\w+)/g) { ... }
This is a bit of a tokenizer: after the body of the loop runs the next word is found, etc.
Then the last example doesn't really make sense; you are getting the "normal" scalar-context matching success/fail, and /g doesn't do anything -- until you match on $ss the next time
perl -wE'
$s=shift||q(abc);
for (1..2) { $m = $s=~/(.)/g; say "$m: $1"; }
'
prints lines 1:a and then 1:b.
Outside of iterative structures (like while condition) the /g in scalar context is usually an error, pointless at best or a quiet bug.
See "Global matching" under "Using regular expressions" in perlretut for /g.
See regex operators in perlop in general, and about /g as well. A useful tool to explore /g workings is pos.

It is possible to match subexpression only in tcl regexp?

$report contains the following text:
// Command : generate report
Report 123
------------------------------
status Names
------------------------------
Flat : Module1
Flat : Module2
------------------------------
Total Flattened = 2
I want to extract the module names only. There is an unknown number of modules. It would be really nice if I could do something like this:
set modules [regexp -all -inline {Flat\s+:\s+(\S+)} $report]
but that puts a bunch of extra junk in $modules that I don't care about. Am I missing something? I know there are ways of getting around this. It just seems strange that there doesn't seem to be a way to turn off matching the full expression. Especially since there is syntax for turning off subexpression matching (?:).
No, there is no way to not get the full match strings.
lmap {full capture} $modules {set capture}
Picks out the captured strings for you.
# for Tcl 8.5 and earlier
set res [list]
foreach {full capture} $modules {lappend res $capture}
You get all that stuff because it could be relevant, and if not it's very easy to pick out the bits you do want.
Documentation: foreach, lmap, set
Getting lmap for Tcl 8.5 and earlier

How do I use regex capture group as array index?

I'm trying to use regsub in TCL to replace a string with the value from an array.
array set myArray "
one 1
two 2
"
set myString "\[%one%\],\[%two%\]"
regsub -all "\[%(.+?)%\]" $myString "$myArray(\\1)" newString
My goal is to convert a string from "[%one%],[%two%]" to "1,2". The problem is that the capture group index is not resolved. I get the following error:
can't read "myArray(\1)": no such element in array
while executing
"regsub -all "\[%(.+?)%\]" $myString "$myArray(\\1)" newString"
This is a 2 step process in Tcl. Your main mistake here is using double quotes everywhere:
array set myArray {one 1 two 2}
set myString {[%one%],[%two%]}
regsub -all {\[%(.+?)%\]} $myString {$myArray(\1)} new
puts $new
puts [subst -nobackslash -nocommand $new]
$myArray(one),$myArray(two)
1,2
So we use regsub to search for the expression and replace it with the string representation of the variable we want to expand. Then we use the rarely-used subst command to perform the variable (only) substitution.
Apart from using regsub+subst (which is a decidedly tricky pair of commands to use safely in general) you can also do relatively simple transformations using string map. The trick is in how you prepare the mapping:
# It's conventional to use [array set] like this…
array set myArray {
one 1
two 2
}
set myString "\[%one%\],\[%two%\]"
# Build the transform
set transform {}
foreach {from to} [array get myArray] {
lappend transform "\[%$from%\]" $to
}
# Apply the transform
set changedString [string map $transform $myString]
puts "transformed from '$myString' to '$changedString'"
As long as each individual thing you want to go from and to is a constant string at the time of application, you can use string map to do it. The advantage? It's obviously correct. It's very hard to make a regsub+subst transform obviously correct (but necessary if you need a more complex transform; that's the correct way to do %XX encoding and decoding in URLs for example).

Is there a way to do multiple substitutions using regsub?

Is it possible to have do different substitutions in an expression using regsub?
example:
set a ".a/b.c..d/e/f//g"
Now, in this expression, is it possible to substitute
"." as "yes"
".." as "no"
"/" as "true"
"//" as "false" in a single regsub command?
With a regsub, no. There's a long-standing feature request for this sort of thing (which requires substitution with the result of evaluating a command on the match information) but it's not been acted on to date.
But you can use string map to do what you want in this case:
set a ".a/b.c..d/e/f//g"
set b [string map {".." "no" "." "yes" "//" "false" "/" "true"} $a]
puts "changed $a to $b"
# changed .a/b.c..d/e/f//g to yesatruebyescnodtrueetrueffalseg
Note that when building the map, if any from-value is a prefix of another, the longer from-value should be put first. (This is because the string map implementation checks which change to make in the order you list them in…)
It's possible to use regsub and subst to do multiple-target replacements in a two-step process, but I don't advise it for anything other than very complex cases! A nice string map is far easier to work with.
You may also try to do it yourself. This is a draft proc which you could use as a starting point. It is not production ready, and you must be carefull because substitutions after the first one work on already substituted string.
These are the parameters:
options is a list of options that will be passed to every call to regsub
resubList is a list of key/value pairs, where the key is a regular expression and the value is a substitution
string is the string you want to substitute
This is the procedure, and it simply calls regsub multiple times, once for every element in resubList and, at the end, it returns the final string.
proc multiregsub {options resubList string} {
foreach {re sub} $resubList {
set string [regsub {*}$options -- $re $string $sub]
}
return $string
}