It is possible to match subexpression only in tcl regexp? - regex

$report contains the following text:
// Command : generate report
Report 123
------------------------------
status Names
------------------------------
Flat : Module1
Flat : Module2
------------------------------
Total Flattened = 2
I want to extract the module names only. There is an unknown number of modules. It would be really nice if I could do something like this:
set modules [regexp -all -inline {Flat\s+:\s+(\S+)} $report]
but that puts a bunch of extra junk in $modules that I don't care about. Am I missing something? I know there are ways of getting around this. It just seems strange that there doesn't seem to be a way to turn off matching the full expression. Especially since there is syntax for turning off subexpression matching (?:).

No, there is no way to not get the full match strings.
lmap {full capture} $modules {set capture}
Picks out the captured strings for you.
# for Tcl 8.5 and earlier
set res [list]
foreach {full capture} $modules {lappend res $capture}
You get all that stuff because it could be relevant, and if not it's very easy to pick out the bits you do want.
Documentation: foreach, lmap, set
Getting lmap for Tcl 8.5 and earlier

Related

Need help on a simple TCL script regarding regexp

I'm trying to write a tcl script basically to do following. Based on the syslog below,
LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/23, changed state to down
When that log is seen on the router, the script needs to push out a shell command which includes the interface index number, in this case it is 23. I can use regex to scrape the interface in the syslog by doing this below.
set interface ""
if {[regexp {.* (GigabitEthernet1/0/[0-9]*)} $syslog_msg match interface]}
if [catch {cli_exec $cli1(fd) "show port_diag unit 1 port 23"} result] {
error $result $errorInfo
}
}
But how can I use only the interface index number (which is 23) in the command above? Do I need to extract [0-9]* from the regexp and store it as a vairable or somethig like that?
Please just enclose the expression [0-9]* with parentheses and append
a variable name, say num, to be assigned to the second capture group.
Here is a snipped code to demonstrate:
if {[regexp {.* (GigabitEthernet1/0/([0-9]*))} $syslog_msg match interface num]} {
puts $num
}
Output:
23
If the result looks okay, modify the command within the curly braces to perform your task as:
if {[regexp {.* (GigabitEthernet1/0/([0-9]*))} $syslog_msg match interface num]} {
if [catch {cli_exec $cli1(fd) "show port_diag unit 1 port $num"} result] {
error $result $errorInfo
}
}

How do I use regex capture group as array index?

I'm trying to use regsub in TCL to replace a string with the value from an array.
array set myArray "
one 1
two 2
"
set myString "\[%one%\],\[%two%\]"
regsub -all "\[%(.+?)%\]" $myString "$myArray(\\1)" newString
My goal is to convert a string from "[%one%],[%two%]" to "1,2". The problem is that the capture group index is not resolved. I get the following error:
can't read "myArray(\1)": no such element in array
while executing
"regsub -all "\[%(.+?)%\]" $myString "$myArray(\\1)" newString"
This is a 2 step process in Tcl. Your main mistake here is using double quotes everywhere:
array set myArray {one 1 two 2}
set myString {[%one%],[%two%]}
regsub -all {\[%(.+?)%\]} $myString {$myArray(\1)} new
puts $new
puts [subst -nobackslash -nocommand $new]
$myArray(one),$myArray(two)
1,2
So we use regsub to search for the expression and replace it with the string representation of the variable we want to expand. Then we use the rarely-used subst command to perform the variable (only) substitution.
Apart from using regsub+subst (which is a decidedly tricky pair of commands to use safely in general) you can also do relatively simple transformations using string map. The trick is in how you prepare the mapping:
# It's conventional to use [array set] like this…
array set myArray {
one 1
two 2
}
set myString "\[%one%\],\[%two%\]"
# Build the transform
set transform {}
foreach {from to} [array get myArray] {
lappend transform "\[%$from%\]" $to
}
# Apply the transform
set changedString [string map $transform $myString]
puts "transformed from '$myString' to '$changedString'"
As long as each individual thing you want to go from and to is a constant string at the time of application, you can use string map to do it. The advantage? It's obviously correct. It's very hard to make a regsub+subst transform obviously correct (but necessary if you need a more complex transform; that's the correct way to do %XX encoding and decoding in URLs for example).

Tcl switch statement and -regexp quoting?

There must be an extra level of evaluation when using the Tcl switch command. Here's an example session at the repl to show what I mean:
$ tclsh
% regexp {^\s*foo} " foo"
1
% regexp {^\\s*foo} " foo"
0
% switch -regexp " foo" {^\\s*foo {puts "match"}}
match
% switch -regexp " foo" {{^\s*foo} {puts "match"}}
match
...There needed to be an extra backslash added inside the first "switch" version. This is consistent between 8.5.0 and 8.6.0 on Windows. Can someone point me to the section of the manual where this and similar instances of extra levels of unquoting are described? My first thought was that the curly brackets in the first "switch" version would have protected the backslash, but "switch" itself must be applying an extra level of backslash susbtitution to the patterns. Unless I'm misunderstanding the nuances of something else.
Edit:
...hmmm... Like Johannes Kuhn says below backslash substitution apparently depends on the dynamic context of use, and not the lexical context of creation...
% set t {\s*foo}
\s*foo
% proc test_s x {set x}
% test_s $t
\s*foo
% proc test_l x {lindex $x 0}
% test_l $t
s*foo
% puts $t
^\s*foo
...that seems to be quite the interesting design choice.
The problem you describe here is simple to solve:
The difference between switch and regexp is that switch takes actually a list.
So if we print the first element of the list {^\s*foo {puts "match"}} with
% puts [lindex {^\s*foo {puts "match"}} 0]
^s*foo
it results in something that we don't want.
List constructing is a little bit complex, if you are not sure, use an interactive Tcl shell that constructs one for you with list.
Edit: Indeed, it is an intresting desin choice, but this applies to everything in Tcl. For example expr uses an minilanguage designed for arithmetic expressions. It is up to the command what it shall do with it's arguments. Even language constucts like if, for, while are just commands that treats one of the arguments as expression, and the other arguments as script. This design makes it possible to create new control structures, like sqlite's eval, which takes the SQL statment and a script that it should evaluate for each result.

Match Different Word Order Regex

I have really been struggling trying to match a relatively simple set of possible word orders in a single Regex line.
Basically, I want to match these (among other grammatically similar) possibilities:
"set the var on"
"set the var off"
"set var on"
"set var off"
"set off the var"
"set on the var"
"set on var"
"set off var"
The only groups I need are "var" (which can by any single word) and the value which will always be either on or off. That's the basic idea.
With that in mind, there are two possible grammar structures:
(on/off) (perhaps a word) (a word)
(a word) (on/off)
I have been able to independently match these possibilities with the following regex:
/((on |off )([a-z]{1,})? ([a-z]{2,}))/i
/([a-z]{2,}) (on|off)/i
So, I figured I could do this:
/(((on |off )([a-z]{1,})? ([a-z]{2,})))|(([a-z]{2,}) (on|off))/i
Which is just (phrase 1)|(phrase 2), but phrase two will always match against "set off" thinking that "set" is the name. I also tried:
/((?!set)) (((on |off )([a-z]{1,})? ([a-z]{2,})))|(([a-z]{2,}) (on|off))/i
With no success.
EDIT 1: Also, I neglected to mention that these phrases can be found anywhere in the file; they are not on independent lines.
E.g.: "this is the way to set the var on" is as likely as "set the var on"
Questions:
What is the best way that I can do this together without having to
separately match?
Is there a way to force a matching order for regex OR statements?
'the' may always appear before 'var':
((the)? var)
'set' always begins the expression:
^set
'on' and 'off' are mutually exclusive but one is required:
(on|off)
'var' and 'on'/'off' appear one after the other in no particular order. All together now:
^set ((the)? var (on|off)|(on|off) (the)? var)$
Note: I'm a .NET developer. Regexes are pretty standard, and the above should work, but there may be a more efficient way to write this in perl.
Whenever you try to match complex data, you should probably try to create a grammar. Perl regexes allow you to specify a recursive grammar via (?(DEFINE)...).
use strict; use warnings; use feature 'say';
my $grammar = qr(
set \s+ (?:the \s+)? (?<variable>(?&VAR)) \s+ (?:to \s+)? (?<value>(?&VAL))
| set \s+ (?<value>(?&VAL)) \s+ (?:the \s+)? (?<variable>(?&VAR))
(?(DEFINE)
(?<VAL> on | off) # edit only here to add new values
(?<VAR> (?!the|(?&VAL)) \w+)
)
)x; # /x -- whitespace is irrelevant
while(<>){
if (/$grammar/) { say "> val: $+{value} var: $+{variable}" }
else { say "> no match" }
}
Syntax to note: (?&rule) calls a named rule. (?<name>pattern) named capture, allows access via %+ hash. Is also used to declare rules in the (DEFINE) block.
Example session:
set the switch to off!
> val: off var: switch
I would like to set something on fire...
> val: on var: something
set on the set!
> val: on var: set
set on the set off something
> val: on var: set
set on off
> no match
Do note that I made the grammar fairly unambiguous by asserting that a variable does not match a value as well. However, the above examples do show some interesting cases that may not have been parsed as it would be expected.
For a more powerful way to write grammars inside regexes, look at Regexp::Grammars.

TCL, regexp function and parameter substitution

I have the following code where I'm trying to match data on a single line into different variables via the regexp function.
The number of data (on the input line) and then of variable names in regexp function can vary, that's why I use $varLine (which is previously processed in my real code).
set in_stim "13 1 1 0 1 0 0 0 2 03"
set regex {^(\d+)\s([01])\s([01])\s([01])\s([01])\s([01])\s([01])\s([01])\s(\d+)\s(\d+)}
set varLine "sig1 sig2 sig3 sig4 sig5 sig6 sig7 sig8 sig8"
regexp $regex $in_stim whole sig0 $varLine
puts "sig0: $sig0"
puts $sig1
When I am executing it, I get the following error ($sig0 is correctly displayed):
sig0: 13
can't read "sig1": no such variable
while executing
"puts $sig1"
If I manually substitute $varLine into the regexp line, the error disappears:
set in_stim "13 1 1 0 1 0 0 0 2 03"
set regex {^(\d+)\s([01])\s([01])\s([01])\s([01])\s([01])\s([01])\s([01])\s(\d+)\s(\d+)}
regexp $regex $in_stim whole sig0 sig1 sig2 sig3 sig4 sig5 sig6 sig7 sig8 sig8
puts $sig0
puts $sig1
I get the following correct output:
13
1
Does anyone see mistakes in my code or could help?
Thanks!
The issue is that the regexp command doesn't take a list of variables to store submatches into as one argument, but rather as many arguments.
The simplest method of working around this is to expand the variable list:
regexp $regex $in_stim whole sig0 {*}$varLine
When you do
set varLine "sig1 sig2 sig3 sig4 sig5 sig6 sig7 sig8 sig8"
regexp $regex $in_stim whole sig0 $varLine
the Tcl parser passes the regexp command exactly five arguments, with the fifth argument being the contents of the variable "varLine" which is then treated by the regexp command as a single word. A single word obviously denotes a single variable (with somewhat complex name in your case, as it happens).
To do what you need, you have to resort to dynamic scripting which can be done in two ways:
"Classic" approach using eval:
eval [concat [list regexp $regex $in_stim whole sig0] $varLine]
Using the {*} syntactic sugar from Tcl 8.5 onwards:
regexp $regex $in_stim whole sig0 {*}$varLine
The classic approach first constructs a list of words by concatenating two lists: the "static" part of the command and then the list of variables to pass to it. Then the constructed list is evaluated as a command. You can read more on this in this classic book.
In the new-style approach, the {*} thingy is used to expand the contents of $varLine into an "in-place" list — refer to the 5th rule here.