regexp loop to find first instance of each query TCL

regexp loop to find first instance of each query TCL - regex

I have a list variable containing some values:
lappend list {query1}
{query2}
{query3}
And some data in file1 with parts of them matching the values above
query1 first data
query1 different data
query1 different data
query2 another data
query2 random data
query3 data something
query3 last data
How do I create a regexp loop that catches only the first instance found of each query and prints them out? In this case the output would be:
query1 first data
query2 another data
query3 data something
Attempted code to produce the output
set readFile1 [open file1.txt r]
while { [gets $readFile1 data] > -1 } {
for { set n 0 } { $n < [llength $list] } { incr n } {
if { [regexp "[lindex $list $n]" $data] } {
puts $data
}
}
}
close $readFile1
I tried using a for loop while reading the data from a file, but it seems to catch all values even if the -all option is not used.

You can either read the file as a whole into a variable using read command, if the text file is smaller in size. Apply the regexp for the content and we can extract the required data.
set list {query1 query2 query3}
set fp [open file1.txt r]
set data [read $fp]
close $fp
foreach elem $list {
# '-line' flag will enable the line sensitive matching
if {[regexp -line "$elem.+" $data line]} {
puts $line
}
}
If suppose the file too large to hold or if you consider run-time memory usage, then go ahead with the reading the content line by line. There we need to have control on what already matched for which you can keep an array to maintain whether the first occurrence of any query matched or not.
set list {query1 query2 query3}
set fp [open file1.txt r]
array set first_occurence {}
while {[gets $fp line]!=-1} {
foreach elem $list {
if {[info exists first_occurence($elem)]} {
continue
}
if {[regexp $elem $line]} {
set first_occurence($elem) 1
puts $line
}
}
}
close $fp
Reference : regexp

package require fileutil
set queries {query1 query2 query3}
set result {}
::fileutil::foreachLine line file1.txt {
foreach query $queries {
if {![dict exists $result $query]} {
if {[regexp $query $line]} {
dict set result $query $line
puts $line
}
}
}
}
The trick here is to store the findings in a dictionary. If there is a value corresponding to the query in the dictionary already, we don’t search for it again. This also has the advantage that the found lines are available to the script after the search and aren’t just printed out. The regexp search looks for the query string anywhere in the line: if it should only be in the beginning of the line, use regexp ^$query $line instead.
Documentation: dict, fileutil package, foreach, if, package, puts, regexp, set

Try This,
set fd [open "query_file.txt" r]
set data [read $fd]
set uniq_list ""
foreach l [split $data "\n"] {
lappend uniq_list [lindex $l 0]
}
set uniq_list [lsort -unique $uniq_list]
foreach l $uniq_list {
if {[string equal $l ""]} {
continue
}
foreach line [split $data "\n"] {
if {[regexp $l $line]} {
puts "$line"
break
}
}
}
close $fd
References: file , list , regexp

Not using regexp at all: I assume your "query"s do not contain whitespace
set list [list query1 query2 query3]
array set seen {}
set fh [open file1]
while {[gets $fh line] != -1} {
set query [lindex [split $line] 0]
if {$query in $list && $query ni [array names seen]} {
set seen($query) 1
puts $line
}
}
query1 first data
query2 another data
query3 data something

Related

How to use regexp from matching against a list in tcl

I am new to TCL world and i was using regexp in my code. I am reading a file and printing the lines which are not in a list. Could please help me with them. I was trying the below code but it is not working.
set fp [open file r]
set pattern {pattern1 pattern2 pattern3...patternN}
while {[gets $fp line] >= 0 } {
if {![regexp -- "$pattern" $line] } {
puts $line
}
}
Thanks

Your problem statement says that you are looking to print lines not in a list.
If that's really what you need, then you shouldn't use regexp, but should use ni which means "not in":
set fp [open file r]
set pattern {pattern1 pattern2 pattern3...patternN}
while {[gets $fp line] >= 0 } {
if {$line ni $pattern} {
puts $line
}
}
If this is not what you need, then you'll need to define your regex as several patterns alternating with the | character. For example:
tclsh> regexp {ab xy} "ab"
0
tclsh> regexp {ab|xy} "ab"
1
set fp [open file r]
set pattern {pattern1|pattern2|pattern3|...|patternN}
while {[gets $fp line] >= 0 } {
if {![regexp -- $pattern $line]} {
puts $line
}
}
Another option would be to continue defining $pattern as a list of patterns, but then you'd need to iterate through all the patterns and print the line if all the patterns failed to match.
set fp [open file r]
set pattern {pattern1 pattern2 pattern3 ... patternN}
while {[gets $fp line] >= 0 } {
set match 0
foreach p $pattern {
if {[regexp -- $p $line]} {
set match 1
break
}
}
if {$match == 0} {
puts $line
}
}

While I like #ChrisHeithoff's answer, I think it is clearer when you use a helper procedure:
proc anyPatternMatches {patternList stringToCheck} {
foreach pattern $patternList {
if {[regexp -- $pattern $stringToCheck]} {
return true
}
}
return false
}
set fp [open file r]
set patterns {pattern1 pattern2 pattern3 ... patternN}
while {[gets $fp line] >= 0 } {
if {![anyPatternMatches $patterns $line]} {
puts $line
}
}

Matching pattern and adding values to a list in Tcl

I wondered if I could get some advice, I'm trying to create a list by reading a file with input shown below
Example from input file
Parameter.Coil = 1
Parameter.Coil.Info = Coil
BaseUTC.TimeSet_v0 = 1
BaseUTC.TimeSet_v0.Info = TimeSet_v0
BaseUTC.TimeSet_v1 = 1
BaseUTC.TimeSet_v1.Info = TimeSet_v1
BaseUTC.TimeSet_v14 = 1
BaseUTC.TimeSet_v14.Info = TimeSet_v14
BaseUTC.TimeSet_v32 = 1
BaseUTC.TimeSet_v32.Info = TimeSet_v32
VC.MILES = 1
VC.MILES.Info = MILES_version1
I am interested in any line with prefix of "BaseUTC." and ".Info" and would like to save value after "=" in a list
Desired:
output = TimeSet_v0 TimeSet_v1 TimeSet_v14 TimeSet_v32
I've tried the following but not getting the desired output.
set input [open "[pwd]/Data/Input" r]
set list ""
while { [gets $input line] >= 0 } {
incr number
set sline [split $line "\n"]
if {[regexp -- {BaseUTC.} $sline]} {
#puts "lines = $sline"
if {[regexp -- {.Info} $sline]} {
set output [lindex [split $sline "="] 1]
lappend list $output
}}}
puts "output = $list"
close $input
I get output as
output = \ TimeSet_v0\} \ TimeSet_v1\} \ TimeSet_v14\} \ TimeSet_v32\}
Can any help identify my mistake, please.
Thank you in advance.

A lot of your code doesn't seem to match your description (Steering? I thought you were looking for BaseUTC lines?), or just makes no sense (Why split what you already know is a single line on newline characters?), but one thing that will really help simplify things is a regular expression capture group. Something like:
#!/usr/bin/env tclsh
proc process {filename} {
set in [open $filename]
set list {}
while {[gets $in line] >= 0} {
if {[regexp {^BaseUTC\..*\.Info\s+=\s+(.*)} $line -> arg]} {
lappend list $arg
}
}
close $in
return $list
}
puts [process Data/Input]
Or using wildcards instead of regular expressions:
proc process {filename} {
set in [open $filename]
set list {}
while {[gets $in line] >= 0} {
lassign [split $line =] var arg
if {[string match {BaseUTC.*.Info} [string trim $var]]} {
lappend list [string trim $arg]
}
}
close $in
return $list
}

Parsing the data from a verilog file

I am a beginner in TCL scripting. I am working on parsing the data from a Verilog file to a xls file as below.
Input Verilog file contains following data:
Inferred components
Operator Signedness Inputs Outputs CellArea Line Col Filename
=====================================================================================================
apn
u_apn_ttp_logic
abc
u_apn_wgt_op_rd_u_apn_sca_u_part_sel1_sl_69_33
module:shift_right_vlog_unsigned_4662_7709
very_fast/barrel >> x 25x5 25 223.02 69 33 part_select.v
=====================================================================================================
apn
u_apn_ttp_logic
u_apn_wgt_op_rd_u_apn_scale_u_part_sel1_sub00283545
module:sub_signed_4513_5538
very_fast - signed 11x24 25 152.80 0 0 a
=====================================================================================================
(This is a long file…)
The parsing will end after the last section:
=====================================================================================================
apn
u_apn_start_ctrl_final_adder_add_212_41
module:add_unsigned_carry_882
very_fast + unsigned 32x32x1 32 120.39 212 41 feu_start_ctrl.v
=====================================================================================================
I want to extract the data as below , consider first section
Top name=apn
Instance=u_apn_ttp_logic/abc/u_apn_wgt_op_rd_u_feu_scale_u_part_select1_srl_69_33
Module = shift_right_vlog_unsigned_4662_7709
Architecture=very_fast/barrel
Operator = >>
Sign=x
Input Size = 25x5
Output = 25
Area = 223.02
Column = 69
Row = 33
File Name = part_select.v
I am stucked at a point while implementing this.
below is my approach for the same:
set fd "[open "path_data.v" r]"
set flag 0
while {[gets $fd line] !=-1} {
if {[regexp "\===*" $line]} {
while {[gets $fd line] >= 0} {
append data "$line "
if {[regexp "\====*" $line]} {
break
} }
set topname [lindex $data 0]
regexp {(^[a-z]*) (.*) (.*module)} $data match topname instance sub3
puts "top name :$topname "
puts "instance: $instance"
}
close $fd
I am able to output topname and instance name only, not other values
Also please help me extract these values.

With this sort of task, it really helps if you put parts of the task into procedures that do just a simpler bit. For example, suppose we were to split the processing of a single section into its own procedure. Since it is only going to do one section (presumably a lot shorter than the overall file), it can work on a string (or list of strings) instead of having to process by lines, which will make things greatly easier to comprehend.
For example, it would handle just this input text:
apn
u_apn_ttp_logic
abc
u_apn_wgt_op_rd_u_apn_sca_u_part_sel1_sl_69_33
module:shift_right_vlog_unsigned_4662_7709
very_fast/barrel >> x 25x5 25 223.02 69 33 part_select.v
We might handle that like this:
proc processSection {sectionText} {
set top ""
set instance ""
set module ""
set other {}
foreach line [split $sectionText "\n"] {
if {$top eq ""} {
set top [string trim $line]
continue
}
if {$module eq ""} {
# This regular expression matches lines starting with “module:” and
# extracts the rest of the line
if {[regexp {^module:(.*)} $line -> tail]} {
set module [string trim $tail]
} else {
append instance [string trim $line] "/"
}
continue
}
# This regular expression matches a sequence of non-space characters, and
# the -all -inline options make regexp return a list of all such matches.
lappend other {*}[regexp -all -inline {\S+} $line]
}
# Remember to remove trailing “/” character of the instance
set instance [string trimright $instance /]
# Note that this splits apart the list in $other
outputSectionInfo $top $instance $module {*}$other
}
We also need something to produce the output. I've split it into its own procedure as it is often nice to keep parsing separate from output.
proc outputSectionInfo {top instance module arch op sgn in out area col row file} {
# Output the variables
foreach {label varname} {
"Top name" top
"Instance" instance
"Module" module
"Architecture" arch
"Operator" op
"Sign" sgn
"Input Size" in
"Output" out
"Area" area
"Column" col
"Row" row
"File Name" file
} {
# Normally, best to avoid using variables whose names are in variables, but
# for printing out like this, this is actually really convenient.
puts "$label = [set $varname]"
}
}
Now that we've got a section handler and output generator (and you can verify for yourself that these do sensible things, as they're quite a bit simpler than what you were trying to do in one go), we just need to feed the sections from the file into it, skipping over the header. The code does that, and just does that.
set fd [open "path_data.v"]
set flag 0
while {[gets $fd line] >= 0} {
if {[regexp {^=====+$} $line]} {
if {$flag} {
processSection [string trimright $accumulator "\n"]
}
set flag 1
set accumulator ""
} else {
append accumulator $line "\n"
}
}
close $fd
Your immediate problem was that your code was closing the channel too early, but that was in turn caused by your confusion over indentation, and that was in turn caused by you trying to do too much in one place. Splitting things up to make the code more comprehensible is the fix for this sort of issue, as it makes it much easier to tell that the code is definitely correct (or definitely wrong).

I worked on above script, and here is my code for the same. This code won't work if there is an empty line after "========" line
But I would like to explore your code as it is well organised.
set fd "[open "path_data.v" r]"
set fd1 "[open ./data_path_rpt.xls w+]"
puts $fd1 "Top Name\tInstance\tModule\tArchitecture\tOperator\tSign\tInput Size\tOutput size\tArea\tLine number\tColumn number\tFile Name"
set data {}
while {[gets $fd line] !=-1} {
if {[regexp "\===*" $line]} {
set data {}; set flag 0
while {[gets $fd line] >= 0} { append data "$line "
if {[regexp {[a-z]\.v} $line]} { set flag 1;break} }
puts "$data\n\n"
if {$flag} {
set topname [lindex $data 0]
regexp {(^[a-z]*) (.*) (.*module\:)(.*)} $data match topname instance sub3 sub4
set inst_name {} ;
foreach txt $instance {
append inst_name "$txt\/"
}
set instance [string trim $inst_name "\/"]
set module [lindex $sub4 0]
set architecture [lindex $sub4 1]
set operator [lindex $sub4 2]
set sign [lindex $sub4 3]
set input_size [lindex $sub4 4]
set output_size [lindex $sub4 5]
set area [lindex $sub4 6]
set linenum [lindex $sub4 7]
set col_num [lindex $sub4 8]
set file_name [lindex $sub4 9]
puts $fd1 "$topname\t$instance\t$module\t$architecture\t$operator\t$sign\t$input_size\t$output_size\t$area\t$linenum\t$col_num\t$file_name"
set data {} ; set flag 0
}}
}
close $fd1
close $fd

In Tcl, how to read in file by line and find a list of string (from another file) then append the line with a ##

I need to be able tp read in file by line and find a set of strings (from another file) that starts with the strings plus set of characters like ({ somedata }) then append the line with a ## to that block.
Here is what I have so far:
set mydir <path to my dir>
#file name file.txt with content:
~>cat file.txt
Strng00 {
some data
}
Strng021 {
some data
}
Strng02 {
some data
}
Strng03 {
some data
}
Strng_dt {
some data
}
Strng01 {
some data
}
Strng02 {
some data
}
Strng03 {
some data
}
Strng_dt {
some data
}
Strng42 {
some data
}
Strng412
--
set list { Strng01 Strng02 Strng03 Strng_dt Strng42 } # May be read in the list from another file which needs to be matched
set fileIn [lindex $argv 0]
set fileInId [open $mydir/file.txt r]
set appendLine 0
foreach item $list {
set j 0
while {[gets $fileInId line ] != -1} {
if [regexp -all -line $item $line] { set appendLine 1 }
if $appendLine {
if [regexp {^\s*\}\s*$} $line] { set appendLine 0 }
set line "## $line"
}
puts $line
}
set j 1
}
The result only shows the first entry of the list:
Strng00
Strng021
Strng02
Strng03
Strng_dt
##Strng01 {
## some data
##}
Strng02
Strng03
Strng_dt
Strng42
Strng412
I'd like to get ## after each of the items listed..
Thanks in advance.`

Here's another take:
while {[gets $fh line] != -1} {
set first [lindex [split [string trimleft $line]] 0]
puts [format "%s%s" [expr {$first in $list ? "##" : ""}] $line]
}
That finds the first word in the line, and checks if it is an element in $list.

Does this solve your problem?
set list { Strng01 Strng02 Strng03 Strng_dt Strng42 }
set fileInId [open $mydir/file.txt r]
while {[gets $fileInId line ] != -1} {
if {[regexp -line [join $list |] $line]} {
set line "## $line"
}
puts $line
}
edit: dealing with the updated specification.
This is one way to do it; it takes advantage of the fact that the lines in the file match Tcl command invocation syntax.
proc unknown {cmd args} {
set list { Strng01 Strng02 Strng03 Strng_dt Strng42 }
if {$cmd in $list} {
foreach line [split [info level 0] \n] {
puts "## $line"
}
} else {
puts [info level 0]
}
}
source file.txt
It works like this: using source on the file means that the Tcl interpreter tries to use the keywords on each line as command names, with the { ... } parts as arguments. Since the keywords aren't existing commands, the interpreter hands the invocations over to the unknown command, which we have redefined to recognize the relevant keywords and print the complete invocation ([info level 0]) with a ## prefix if they are in the list, or else just print the invocation as it is.
Documentation: chan, foreach, if, info, join, open, proc, puts, regexp, set, source, split, unknown, while

How do I get the text that present under the matched string in tcl

I have a string value in tcl as
set out " ABC CDE EFG
123 456"
I want to get the text that is present below text "EFG".
As right now it is "456", but it can be anything so I need a way though which I can grep for "EFG" and get the text below it.

This answer takes some inspiration from Johannes Kuhn's answer, but I use regexp to get the word indices from the "keys" line.
# this is as close as I can get to a here-doc in Tcl
set out [string trim {
ABC DEF GHI
123 456
}]
# map the words in the first line to the values in the 2nd line
lassign [split $out \n] keys values
foreach range [regexp -all -inline -indices {\S+} $keys] {
set data([string range $keys {*}$range]) [string range $values {*}$range]
}
parray data
outputs
data(ABC) = 123
data(DEF) =
data(GHI) = 456

I Suggest splitting the string into the keys and values with
lassign [split $out \n] keys values
and then look for the string position in the keys and get the same range in the values
set start [string first "EFG" $keys]
set value [string range $values $start [expr {${start}+[string length "EFG"]-1}]]
wraping it in a proc and we get
proc getValue {input lookFor} {
lassign [split $input \n] keys values
set start [string first $lookfor $keys]
set value [string range $values $start \
[expr {${start}+[string length $lookfor]-1}]]
}
invoke it like that:
getValue $out "EFG"
Edit: how is the 2nd line aligned? With a tabulator (\t), spaces?

In this case what you actually have is two lines with groups of 3 alphanumeric characters separated by spaces with a large amount of leading whitespace prefixing the second line ("\x20ABC\x20CDE\x20EFG\n[string repeat \x20 10]123[string repeat \x20 5]456" will reproduce what you posted). In your example [string range end-2 end] would give you what you need. I'd suggest reading the file line by line and each time you see the EFG, on the next line extract the part you need (maybe using string range) and emit it.
For example (untested):
set state 0
set f [open $inputfile r]
while {[gets $f line] != -1} {
if {$state} {
puts [string range $line end-2 end]
set state 0
} else {
if {[string match "*EFG" $line]} { set state 1 }
}
}
close $f

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

regexp loop to find first instance of each query TCL - regex

Related

How to use regexp from matching against a list in tcl

Matching pattern and adding values to a list in Tcl

Parsing the data from a verilog file

In Tcl, how to read in file by line and find a list of string (from another file) then append the line with a ##

How do I get the text that present under the matched string in tcl

Categories

Resources