tcl - explanation on the following regexp and regsub commands - regex

Ok so tcl expert here (Brad Lanam) wrote the following regexp and regsub commands in a tcl script to parse my file format (called liberty (.lib) used in chip design). I just want to know what they mean (if not why they were used since you don't have the context). I have used the references on tcl wiki but simply cannot seem to connect the dots. Here's the snippet of his code
set fh [open z.lib r]
set inval false
while { [gets $fh line] >= 0 } {
if { [regexp {\);} $line] } {
set inval false
}
if { [regexp {index_(\d+)} $line all idx] } {
regsub {^[^"]*"} $line {} d
regsub {".*} $d {} d
regsub -all {,} $d {} d
dict set risedata constraints indexes $idx $d
}
if { $inval } {
regsub {^[^"]*"} $line {} d
regsub {".*} $d {} d
regsub -all {[ ,]+} $d { } d
set row [expr {$rcount % 5}]
set column [expr {$rcount / 5}]
set i 0
foreach {v} [split $d { }] {
set c [lindex [dict get $risedata constraints indexes 3] $i]
dict set risedata constraints constraint $c $row $column $v
incr i
}
incr rcount
}
if { [regexp {values} $line] } {
set inval true
set row 0
set rcount 0
}
}
close $fh
Especially, what does
if { [regexp {index_(\d+)} $line all idx] } {
regsub {^[^"]*"} $line {} d
regsub {".*} $d {} d
regsub -all {,} $d {} d
Mean?? does line containing \d+ search for line variable for more than one digit and match against the string line ? What is regsub {^[^"]*"} $line {} d ?
Big thanks for helping a noob like me understand.
Reference: Brad Lanam

I'll take it line-by-line and explain what it appears to be doing.
if { [regexp {index_(\d+)} $line all idx] } {
This first line checks to see if the string stored in line includes
a substring of index_ followed by 1 or more digits. If so, it
stores the matching substring in all (which the rest of the code
appears to ignore) and stores the digits found in the variable idx.
So if line were set to "stuff index_123 more stuff", you would end
up with all set to index_123 and idx set to 123.
regsub {^[^"]*"} $line {} d
This regsub will remove everything from the beginning of line up
to and including the first double-quote. It stores the result in d.
regsub {".*} $d {} d
The next regsub operates on the value now in d. It looks for a
double-quote and removes that character and everything afterward,
storing the result again in d.
regsub -all {,} $d {} d
Finally, this line deletes any commas found in d, storing the result
back in d.
The next set of regexp/regsub lines perform a similar set of
operations except for the last line in the group:
regsub -all {[ ,]+} $d { } d
After the previous lines removed everything except the section that
had been in double-quotes, this line removes any sections made up of
one or more spaces and commas and substitutes them with a single
space.
Let me know if that is clear.

Related

How to use regexp from matching against a list in tcl

I am new to TCL world and i was using regexp in my code. I am reading a file and printing the lines which are not in a list. Could please help me with them. I was trying the below code but it is not working.
set fp [open file r]
set pattern {pattern1 pattern2 pattern3...patternN}
while {[gets $fp line] >= 0 } {
if {![regexp -- "$pattern" $line] } {
puts $line
}
}
Thanks
Your problem statement says that you are looking to print lines not in a list.
If that's really what you need, then you shouldn't use regexp, but should use ni which means "not in":
set fp [open file r]
set pattern {pattern1 pattern2 pattern3...patternN}
while {[gets $fp line] >= 0 } {
if {$line ni $pattern} {
puts $line
}
}
If this is not what you need, then you'll need to define your regex as several patterns alternating with the | character. For example:
tclsh> regexp {ab xy} "ab"
0
tclsh> regexp {ab|xy} "ab"
1
set fp [open file r]
set pattern {pattern1|pattern2|pattern3|...|patternN}
while {[gets $fp line] >= 0 } {
if {![regexp -- $pattern $line]} {
puts $line
}
}
Another option would be to continue defining $pattern as a list of patterns, but then you'd need to iterate through all the patterns and print the line if all the patterns failed to match.
set fp [open file r]
set pattern {pattern1 pattern2 pattern3 ... patternN}
while {[gets $fp line] >= 0 } {
set match 0
foreach p $pattern {
if {[regexp -- $p $line]} {
set match 1
break
}
}
if {$match == 0} {
puts $line
}
}
While I like #ChrisHeithoff's answer, I think it is clearer when you use a helper procedure:
proc anyPatternMatches {patternList stringToCheck} {
foreach pattern $patternList {
if {[regexp -- $pattern $stringToCheck]} {
return true
}
}
return false
}
set fp [open file r]
set patterns {pattern1 pattern2 pattern3 ... patternN}
while {[gets $fp line] >= 0 } {
if {![anyPatternMatches $patterns $line]} {
puts $line
}
}

Matching pattern and adding values to a list in Tcl

I wondered if I could get some advice, I'm trying to create a list by reading a file with input shown below
Example from input file
Parameter.Coil = 1
Parameter.Coil.Info = Coil
BaseUTC.TimeSet_v0 = 1
BaseUTC.TimeSet_v0.Info = TimeSet_v0
BaseUTC.TimeSet_v1 = 1
BaseUTC.TimeSet_v1.Info = TimeSet_v1
BaseUTC.TimeSet_v14 = 1
BaseUTC.TimeSet_v14.Info = TimeSet_v14
BaseUTC.TimeSet_v32 = 1
BaseUTC.TimeSet_v32.Info = TimeSet_v32
VC.MILES = 1
VC.MILES.Info = MILES_version1
I am interested in any line with prefix of "BaseUTC." and ".Info" and would like to save value after "=" in a list
Desired:
output = TimeSet_v0 TimeSet_v1 TimeSet_v14 TimeSet_v32
I've tried the following but not getting the desired output.
set input [open "[pwd]/Data/Input" r]
set list ""
while { [gets $input line] >= 0 } {
incr number
set sline [split $line "\n"]
if {[regexp -- {BaseUTC.} $sline]} {
#puts "lines = $sline"
if {[regexp -- {.Info} $sline]} {
set output [lindex [split $sline "="] 1]
lappend list $output
}}}
puts "output = $list"
close $input
I get output as
output = \ TimeSet_v0\} \ TimeSet_v1\} \ TimeSet_v14\} \ TimeSet_v32\}
Can any help identify my mistake, please.
Thank you in advance.
A lot of your code doesn't seem to match your description (Steering? I thought you were looking for BaseUTC lines?), or just makes no sense (Why split what you already know is a single line on newline characters?), but one thing that will really help simplify things is a regular expression capture group. Something like:
#!/usr/bin/env tclsh
proc process {filename} {
set in [open $filename]
set list {}
while {[gets $in line] >= 0} {
if {[regexp {^BaseUTC\..*\.Info\s+=\s+(.*)} $line -> arg]} {
lappend list $arg
}
}
close $in
return $list
}
puts [process Data/Input]
Or using wildcards instead of regular expressions:
proc process {filename} {
set in [open $filename]
set list {}
while {[gets $in line] >= 0} {
lassign [split $line =] var arg
if {[string match {BaseUTC.*.Info} [string trim $var]]} {
lappend list [string trim $arg]
}
}
close $in
return $list
}

How to extract only two fields using Regular Expression

In TCL I am writing the regular expression for below output:
Output args is
packet-filter 0
identifier 0
direction bidirectional
network-ip 10.7.98.231/32
ue-port-start 0
ue-port-end 0
nw-port-start 0
nw-port-end 0
protocol 1
precedence 0
packet-filter 1
identifier 1
direction uplink
network-ip 10.7.98.231/32
ue-port-start 0
ue-port-end 0
nw-port-start 0
nw-port-end 0
protocol 1
precedence 0
Output of my Regular Expression : regexp -all -inline {direction\s+(\S+)} $args is
{direction bidirectional} bidirectional {direction uplink} uplink
I need to extract the direction value which is bidirectional and uplink
Any suggestion ?
For the current case, where the captured substrings are chunks of non-whitespace text, you may re-build the output checking if each item has length set to 1:
set results [regexp -all -inline {direction\s+(\S+)} $args]
set res {}
foreach item $results {
if {[llength $item] == 1} {
lappend res $item
}
}
Then, $res will only hold bidirectional and uplink.
See the Tcl demo.
For a more generic case, you may use
set res {}
foreach {whole capture1} $results {
lappend res $capture1
}
See this Tcl demo
You may add more captureX arguments to accommodate all the capturing group values returned by your regex.
You simply need a loop or something equivalent. If you need to work on each direction individually, a foreach loop is appropriate:
set results [regexp -all -inline {direction\s+(\S+)} $args]
foreach {main sub} $results {
puts $sub
}
# bidirectional
# uplink
Or if you need the list of directions, then lmap sounds appropriate:
set directions [lmap {main sub} $results {set sub}]
# bidirectional uplink
The regexp is not absolutely necessary, you may process the value of args into a dictionary:
set d [dict create]
foreach {k v} $args {
dict lappend d $k $v
}
puts [dict get $d direction]

regexp loop to find first instance of each query TCL

I have a list variable containing some values:
lappend list {query1}
{query2}
{query3}
And some data in file1 with parts of them matching the values above
query1 first data
query1 different data
query1 different data
query2 another data
query2 random data
query3 data something
query3 last data
How do I create a regexp loop that catches only the first instance found of each query and prints them out? In this case the output would be:
query1 first data
query2 another data
query3 data something
Attempted code to produce the output
set readFile1 [open file1.txt r]
while { [gets $readFile1 data] > -1 } {
for { set n 0 } { $n < [llength $list] } { incr n } {
if { [regexp "[lindex $list $n]" $data] } {
puts $data
}
}
}
close $readFile1
I tried using a for loop while reading the data from a file, but it seems to catch all values even if the -all option is not used.
You can either read the file as a whole into a variable using read command, if the text file is smaller in size. Apply the regexp for the content and we can extract the required data.
set list {query1 query2 query3}
set fp [open file1.txt r]
set data [read $fp]
close $fp
foreach elem $list {
# '-line' flag will enable the line sensitive matching
if {[regexp -line "$elem.+" $data line]} {
puts $line
}
}
If suppose the file too large to hold or if you consider run-time memory usage, then go ahead with the reading the content line by line. There we need to have control on what already matched for which you can keep an array to maintain whether the first occurrence of any query matched or not.
set list {query1 query2 query3}
set fp [open file1.txt r]
array set first_occurence {}
while {[gets $fp line]!=-1} {
foreach elem $list {
if {[info exists first_occurence($elem)]} {
continue
}
if {[regexp $elem $line]} {
set first_occurence($elem) 1
puts $line
}
}
}
close $fp
Reference : regexp
package require fileutil
set queries {query1 query2 query3}
set result {}
::fileutil::foreachLine line file1.txt {
foreach query $queries {
if {![dict exists $result $query]} {
if {[regexp $query $line]} {
dict set result $query $line
puts $line
}
}
}
}
The trick here is to store the findings in a dictionary. If there is a value corresponding to the query in the dictionary already, we don’t search for it again. This also has the advantage that the found lines are available to the script after the search and aren’t just printed out. The regexp search looks for the query string anywhere in the line: if it should only be in the beginning of the line, use regexp ^$query $line instead.
Documentation: dict, fileutil package, foreach, if, package, puts, regexp, set
Try This,
set fd [open "query_file.txt" r]
set data [read $fd]
set uniq_list ""
foreach l [split $data "\n"] {
lappend uniq_list [lindex $l 0]
}
set uniq_list [lsort -unique $uniq_list]
foreach l $uniq_list {
if {[string equal $l ""]} {
continue
}
foreach line [split $data "\n"] {
if {[regexp $l $line]} {
puts "$line"
break
}
}
}
close $fd
References: file , list , regexp
Not using regexp at all: I assume your "query"s do not contain whitespace
set list [list query1 query2 query3]
array set seen {}
set fh [open file1]
while {[gets $fh line] != -1} {
set query [lindex [split $line] 0]
if {$query in $list && $query ni [array names seen]} {
set seen($query) 1
puts $line
}
}
query1 first data
query2 another data
query3 data something

How do I get the text that present under the matched string in tcl

I have a string value in tcl as
set out " ABC CDE EFG
123 456"
I want to get the text that is present below text "EFG".
As right now it is "456", but it can be anything so I need a way though which I can grep for "EFG" and get the text below it.
This answer takes some inspiration from Johannes Kuhn's answer, but I use regexp to get the word indices from the "keys" line.
# this is as close as I can get to a here-doc in Tcl
set out [string trim {
ABC DEF GHI
123 456
}]
# map the words in the first line to the values in the 2nd line
lassign [split $out \n] keys values
foreach range [regexp -all -inline -indices {\S+} $keys] {
set data([string range $keys {*}$range]) [string range $values {*}$range]
}
parray data
outputs
data(ABC) = 123
data(DEF) =
data(GHI) = 456
I Suggest splitting the string into the keys and values with
lassign [split $out \n] keys values
and then look for the string position in the keys and get the same range in the values
set start [string first "EFG" $keys]
set value [string range $values $start [expr {${start}+[string length "EFG"]-1}]]
wraping it in a proc and we get
proc getValue {input lookFor} {
lassign [split $input \n] keys values
set start [string first $lookfor $keys]
set value [string range $values $start \
[expr {${start}+[string length $lookfor]-1}]]
}
invoke it like that:
getValue $out "EFG"
Edit: how is the 2nd line aligned? With a tabulator (\t), spaces?
In this case what you actually have is two lines with groups of 3 alphanumeric characters separated by spaces with a large amount of leading whitespace prefixing the second line ("\x20ABC\x20CDE\x20EFG\n[string repeat \x20 10]123[string repeat \x20 5]456" will reproduce what you posted). In your example [string range end-2 end] would give you what you need. I'd suggest reading the file line by line and each time you see the EFG, on the next line extract the part you need (maybe using string range) and emit it.
For example (untested):
set state 0
set f [open $inputfile r]
while {[gets $f line] != -1} {
if {$state} {
puts [string range $line end-2 end]
set state 0
} else {
if {[string match "*EFG" $line]} { set state 1 }
}
}
close $f