Get multiple matches with tcl regexp

Get multiple matches with tcl regexp - regex

How do I get all the matches in tcl using regexp command? For example I have a string as following
set line "foo \"aaa\"zz bar \"aaa:ccc\" ccc"
puts $line
foo "aaa"zz bar "aaa:ccc" ccc
I now want to get aaa and aaa:ccc. There could be any number of such matches.
I tried
regexp -all -inline {"(.*)"} $line
{"aaa"zz bar "aaa:ccc"} {aaa"zz bar "aaa:ccc}
But as seen this didn't work. What's the right way to get multiple matches and match everything within double quotes?

You can capture all between two quotes with "([^"]*)" pattern. When using the pattern with regexp -all -inline:
set matches [regexp -all -inline {"([^"]*)"} $line]
you will get all overall match values and all the captured substrings.
You will have to post-process the matches in order to get the final list of captures:
set line "foo \"aaa\"zz bar \"aaa:ccc\" ccc"
set matches [regexp -all -inline {"([^"]*)"} $line]
set res {}
foreach {match capture} $matches {
lappend res $capture
}
puts $res
See the online Tcl demo.
Output:
aaa aaa:ccc

An alternate method to extract the quoted strings is:
set unquoted [lmap quoted [regexp -all -inline {".*?"} $line] {string trim $quoted {"}}]
Or, split the string using quote as the delimiter, and take every second field.
set unquoted [lmap {a b} [split $line {"}] {set b}]
That gives you a trailing empty element since this split invocation results in a list with an odd number of elements.

Related

Using a capturing group of the match pattern of a regsub in the substitution itself?

In the code below, $html is a string of HTML. The code captures the full match and the capturing group for each match in a list. Then, if the list is not empty, it iterates through the list to replace the span tags with em tags along with the original text that was between them.
For example if the HTML is:
This is <span class='add'>span 1</span> and this is <span class='add'>span 2</span>. then $a would be a list of length 4: {<span class='add'>span 1</span>} {span 1} {<span class='add'>span 2</span>} {span 2}.
The sample code generates:
This is <em>span 1</em> and this is <em>span 2</em>. as expected; but it seems that this must be an inefficient way to do this and, somehow, the capturing group should be usable directly within the regsub expression.
Is this true and how is it done?
Something like:
set html [regsub "<span class='add'>(.+?)</span>" $html "<em>.../em>"]
where the ... is something that points to the captured group.
Thank you.
set a [regexp -all -inline -- {<span class='add'>(.+?)</span>} $html]
if { [llength $a] > 0 } {
foreach {x y} $a {
set html [regsub "<span class='add'>${y}</span>" $html "<em>${y}</em>"]
}
}

You can use a backreference in the replacement:
regsub -all {<span class='add'>(.+?)</span>} $html {<em>\1</em>}
EDIT: To trim leading and trailing spaces from the captured string, you can simply leave them out by matching leading and trailing spaces outside the parentheses:
regsub -all {<span class='add'>\s*(.+?)\s*</span>} $html {<em>\1</em>}

Regex a var that contains square brackets in tcl

I'm trying to edit a verilog file by finding a match in lines of a file and replacing the match by "1'b1". The problem is that the match is a bus with square brackets in the form "busname[0-9]".
for example in this line:
XOR2X1 \S12/gen_fa[8].fa_i/x0/U1 ( .A(\S12/bcomp [8]), .B(abs_gx[8]), .Y(
I need to replace "abs_gx[8]" by "1'b1".
So I tried to find a match by using this code:
#gets abs_gx[8]
set net "\{[lindex $data 0]\}"
#gets 1'b1
set X [lindex $data 1]
#open and read lines of file
set netlist [open "./$circuit\.v" r]
fconfigure $netlist -buffering line
gets $netlist line
#let's assume the line is XOR2X1 \S12/gen_fa[8].fa_i/x0/U1 ( .A(\S12/bcomp [8]), .B(abs_gx[8]), .Y(
if {[regexp "(.*.\[A-X\]\()$net\(\).*)" $line -inline]} {
puts $new "$1 1'b$X $2" }
elseif {[regexp "(.*.\[Y-Z\]\()$net(\).*)" $line]} {
puts $new "$1$2" }
else {puts $new $line}
gets $netlist line
I tried so much things and nothing seems to really match or I get an error because 8 is not a command because [8] gets interpreted as a command.
Any sneaky trick to place a variable in a regex without having it interpreted as a regular expression itself?

If you have an arbitrary string that you want to match exactly as part of a larger regular expression, you should precede all non-alphanumeric characters in the string by a backslash (\). Fortunately, _ is also not special in Tcl's REs, so you can use \W (equivalent to [^\w]) to match the characters you need to fix
set reSafe [regsub -all {\W} $value {\\&}]
If you're going to be doing that a lot, make a helper procedure.
proc reSafe {value} {
regsub -all {\W} $value {\\&}
}
(Yes, I'd like a way of substituting variables more directly, but the RE engine's internals are code I don't want to touch…)

If I understand correctly, you want to substitute $X for $net except when $net is preceded by Y( or Z( in which case you just delete $net. You could avoid the complications of regexp by using string map which just does literal substitutions - see https://www.tcl-lang.org/man/tcl8.6/TclCmd/string.htm#M34 . You would then need to specify the Y( and Z( cases separately, but that's easy enough when there are only two. So instead of the regsub lines you would do:
set line [string map [list Y($net Y( Z($net Z( $net $X] $line]
puts $new $line

Match any repetitive pattern using tcl

I have a binary file that I don't know what is inside. Then, I convert it to hex number using binary scan $bin_data "H*" hex_data. The problem is how to match for any repetitive pattern (byte).
Example 1:
In the file: 0cabab79
Expected Output: abab
Example 2:
In the file: 0c1f1f03035d
Expected Output: 1f1f0303
Example 3:
In the file: 0c678967895d13
Expected Output: 67896789

You can use a more flexible regexp to get all the repeated patterns (at least 2 characters):
set inputs [list 0cabab79 0c1f1f03035d 0c678967895d13]
foreach input $inputs {
set out ""
foreach {whole sub} [regexp -all -inline {(..+)\1} $input] {
append out $whole
}
puts $out
}
# Output:
# abab
# 1f1f0303
# 67896789
If you want to make sure the pattern are in pairs of even number of characters (i.e. aaaaaa should give aaaa and not aaaaaa), then...
set inputs [list 0cabab79 0c1f1f03035d 0c678967895d13 aaaaaa]
foreach input $inputs {
set out ""
foreach {whole sub} [regexp -all -inline {((?:..)+)\1} $input] {
append out $whole
}
puts $out
}
# Output:
# abab
# 1f1f0303
# 67896789
# aaaa

You can do this with a regexp using a backreference:
regexp -all -inline {(..)\1} 0c1f1f03035d
This will return you a list containing the full-length repetition followed by the repeated element for all matches.
So for this one you would get
1f1f 1f 0303 03
Looping through these you can build your Expected Output e.g.
set op {};
foreach {rep sing} [regexp -all -inline {(..)\1} 0c1f1f03035d] {
append op $rep
}

Getting value from a string in tcl

I have to get a patter from the specified string
This is first time I'm using tcl. Like in perl, I can simply get the grouped value with $1 $2 ... $n. In tcl I've tried this way ... actually this didn't even work...
while { [gets $LOG_FILE line] >= 0 } {
if {[regexp -inline {Getting available devices: (/.+$)} $line]} {
puts {group0}
}
}

With regexp, you have two ways to get submatches out.
Without -inline, you have to supply variables sufficient to get the submatch you care about (with the first such variable being for the whole matched region, like $& in Perl):
if {[regexp {Getting available devices: (/.+$)} $line a b]} {
puts $b
}
It's pretty common to use -> as an overall-match variable. It's totally non-special to Tcl, but it makes the script mnemonically easier to grok:
if {[regexp {Getting available devices: (/.+$)} $line -> theDevices]} {
puts $theDevices
}
With -inline, regexp returns a list of things that were matched instead of assigning them to variables.
set matched [regexp -inline {Getting available devices: (/.+$)} $line]
if {[llength $matched]} {
set group1 [lindex $matched 1]
puts $group1
}
The -inline form works very well with multi-variable foreach and lassign, especially in combination with -all.
foreach {-> theDevices} [regexp -inline -all {Getting available devices: (/.+$)} $line] {
puts $theDevices
}

variable number of match in a regexp

I have this file :
#2/1/1/21=p1 5/1/1/21=p1 isid=104
3/1/1/9=p1 4/1/1/4=p1 5/1/1/17=p1 6/1/1/4=p1 isid=100
1/1/1/4=p1 6/1/1/5=p1 isid=101
I want line 1 to be ignored (it a commented line)
In line two I want to get "3/1/1/9" "4/1/1/4" "5/1/1/17" in three variables var1 var2 and var3
In line three I want to get "1/1/1/4" and "6/1/1/5" in two variables var1 and var2.
For the moment I can ignore line 1, and match what I want on line 2 OR line 3 :
if {[regexp {^(\d/\d/\d/\d{1,2})=p1.*(\d/\d/\d/\d{1,2})=p1} $line value match1 match2]} {
# This works for line 3 but not line 2
}
if {[regexp {^(\d/\d/\d/\d{1,2})=p1.*(\d/\d/\d/\d{1,2})=p1.*(\d/\d/\d/\d{1,2})=p1} $line value match1 match2 match3]} {
# This works for line 2 but not line 3
}
How can I have the right number of matched for line 2 AND 3 ?

Try this regex:
[regexp {^(\d/\d/\d/\d{1,2})=p1.*?(\d/\d/\d/\d{1,2})=p1(?:.*?(\d/\d/\d/\d{1,2})=p1)?} $line value match1 match2 match3]
^^^ ^^
This makes the third match optional.
I also turned your greedy .* into non-greedy .*?.
To get all matches, you could use something more like this:
if {[string range $line 0 0] ne "#"} {
set matches [regexp -all -inline -- {\d/\d/\d/\d{1,2}(?==p1)} $line]
}
And you will get the matches in the list $matches. You can then access them through lindex or if you use lassign and give them specific names.

This might be better:
lassign [regexp -all -inline {\d+(?:/\d+){3}(?==p1)} $line] var1 var2 var3
If there are only 2 matches, var3 will be empty
If there are more matches, only the first 3 will be captured in variables.
If you really just want all the matches:
set vars [regexp -all -inline {\d+(?:/\d+){3}(?==p1)} $line]
To ignore comments:
while {[gets $fh line] != -1} {
if {[regexp {^\s*#} $line]} continue
# do your other stuff here ...
}

try this:
(^|\s)(\d/\d/\d/\d=p1)
or more compactly:
(^|\s)((\d/){3}\d=p1)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Get multiple matches with tcl regexp - regex

Related

Using a capturing group of the match pattern of a regsub in the substitution itself?

Regex a var that contains square brackets in tcl

Match any repetitive pattern using tcl

Getting value from a string in tcl

variable number of match in a regexp

Categories

Resources