Getting value from a string in tcl - regex

I have to get a patter from the specified string
This is first time I'm using tcl. Like in perl, I can simply get the grouped value with $1 $2 ... $n. In tcl I've tried this way ... actually this didn't even work...
while { [gets $LOG_FILE line] >= 0 } {
if {[regexp -inline {Getting available devices: (/.+$)} $line]} {
puts {group0}
}
}

With regexp, you have two ways to get submatches out.
Without -inline, you have to supply variables sufficient to get the submatch you care about (with the first such variable being for the whole matched region, like $& in Perl):
if {[regexp {Getting available devices: (/.+$)} $line a b]} {
puts $b
}
It's pretty common to use -> as an overall-match variable. It's totally non-special to Tcl, but it makes the script mnemonically easier to grok:
if {[regexp {Getting available devices: (/.+$)} $line -> theDevices]} {
puts $theDevices
}
With -inline, regexp returns a list of things that were matched instead of assigning them to variables.
set matched [regexp -inline {Getting available devices: (/.+$)} $line]
if {[llength $matched]} {
set group1 [lindex $matched 1]
puts $group1
}
The -inline form works very well with multi-variable foreach and lassign, especially in combination with -all.
foreach {-> theDevices} [regexp -inline -all {Getting available devices: (/.+$)} $line] {
puts $theDevices
}

Related

Get multiple matches with tcl regexp

How do I get all the matches in tcl using regexp command? For example I have a string as following
set line "foo \"aaa\"zz bar \"aaa:ccc\" ccc"
puts $line
foo "aaa"zz bar "aaa:ccc" ccc
I now want to get aaa and aaa:ccc. There could be any number of such matches.
I tried
regexp -all -inline {"(.*)"} $line
{"aaa"zz bar "aaa:ccc"} {aaa"zz bar "aaa:ccc}
But as seen this didn't work. What's the right way to get multiple matches and match everything within double quotes?
You can capture all between two quotes with "([^"]*)" pattern. When using the pattern with regexp -all -inline:
set matches [regexp -all -inline {"([^"]*)"} $line]
you will get all overall match values and all the captured substrings.
You will have to post-process the matches in order to get the final list of captures:
set line "foo \"aaa\"zz bar \"aaa:ccc\" ccc"
set matches [regexp -all -inline {"([^"]*)"} $line]
set res {}
foreach {match capture} $matches {
lappend res $capture
}
puts $res
See the online Tcl demo.
Output:
aaa aaa:ccc
An alternate method to extract the quoted strings is:
set unquoted [lmap quoted [regexp -all -inline {".*?"} $line] {string trim $quoted {"}}]
Or, split the string using quote as the delimiter, and take every second field.
set unquoted [lmap {a b} [split $line {"}] {set b}]
That gives you a trailing empty element since this split invocation results in a list with an odd number of elements.

How do I check if a particular value in a series of values has already been added to XML prop?

Consider
set data {<prop>red;blue;green</prop>}
I can add a new color using
incr count [regsub -all -- \
[appendArgs (< $name >)(.*?)(</ $name >)] $data [appendArgs \
\\1 $newValue \\3] data]
where newValue is defined by
set newValue [join \
[list \\2 [string map [list \\ \\\\] $value]] $separator]
if value is "pink", I'll end up with
<prop>red;blue;green;pink</prop>
If I run it again, I get
<prop>red;blue;green;pink;pink</prop>
Is it possible to rewrite the regex to check for $value and only add it if it is missing? Also, it should be able to handle
<prop>red;blue;pink;green</prop>
I tried ((?!$value)) but it didn't really work. Any help much appreciated.
proc addToContent {data color} {
lassign [split $data <>] -> tag content endtag
if {$color ni [split $content \;]} {
return "<$tag>$content;$color<$endtag>"
} else {
return $data
}
}
addToContent {<prop>red;blue;green</prop>} pink
# -> <prop>red;blue;green;pink</prop>
addToContent {<prop>red;blue;pink;green</prop>} pink
# -> <prop>red;blue;pink;green</prop>
If your Tcl version doesn't have lassign, use foreach {-> tag content endtag} [split $data <>] break. If you don't have the ni operator, use [lsearch -exact [split $content \;] $color] < 0. In both cases, you should upgrade.
But for real XML processing you should use something like tDOM.
Documentation: if, lassign, proc, return, split, tDOM

Match any repetitive pattern using tcl

I have a binary file that I don't know what is inside. Then, I convert it to hex number using binary scan $bin_data "H*" hex_data. The problem is how to match for any repetitive pattern (byte).
Example 1:
In the file: 0cabab79
Expected Output: abab
Example 2:
In the file: 0c1f1f03035d
Expected Output: 1f1f0303
Example 3:
In the file: 0c678967895d13
Expected Output: 67896789
You can use a more flexible regexp to get all the repeated patterns (at least 2 characters):
set inputs [list 0cabab79 0c1f1f03035d 0c678967895d13]
foreach input $inputs {
set out ""
foreach {whole sub} [regexp -all -inline {(..+)\1} $input] {
append out $whole
}
puts $out
}
# Output:
# abab
# 1f1f0303
# 67896789
If you want to make sure the pattern are in pairs of even number of characters (i.e. aaaaaa should give aaaa and not aaaaaa), then...
set inputs [list 0cabab79 0c1f1f03035d 0c678967895d13 aaaaaa]
foreach input $inputs {
set out ""
foreach {whole sub} [regexp -all -inline {((?:..)+)\1} $input] {
append out $whole
}
puts $out
}
# Output:
# abab
# 1f1f0303
# 67896789
# aaaa
You can do this with a regexp using a backreference:
regexp -all -inline {(..)\1} 0c1f1f03035d
This will return you a list containing the full-length repetition followed by the repeated element for all matches.
So for this one you would get
1f1f 1f 0303 03
Looping through these you can build your Expected Output e.g.
set op {};
foreach {rep sing} [regexp -all -inline {(..)\1} 0c1f1f03035d] {
append op $rep
}

Print lines before and after matching regexp in TCL

I want to be able to print 10 lines before and 10 lines after I come across a matching pattern in a file. I'm matching the pattern via regex. I would need a TCL specific solution. I basically need the equivalent of the grep -B 10 -A 10 feature.
Thanks in advance!
If the data is “relatively small” (which can actually be 100MB or more on modern computers) then you can load it all into Tcl and process it there.
# Read in the data
set f [open "datafile.txt"]
set lines [split [read $f] "\n"]
close $f
# Find which lines match; adjust to taste
set matchLineNumbers [lsearch -all -regexp $lines $YourPatternHere]
# Note that the matches are already in order
# Handle overlapping ranges!
foreach n $matchLineNumbers {
set from [expr {max(0, $n - 10)}]
set to [expr {min($n + 10, [llength $lines] - 1)}]
if {[info exists prev] && $from <= $prev} {
lset ranges end $to
} else {
lappend ranges $from $to
}
set prev $to
}
# Print out the ranges
foreach {from to} $ranges {
puts "=== $from - $to ==="
puts [join [lrange $lines $from $to] "\n"]
}
The only mechanism that springs to mind is for you to split the input data into a list of lines. You'd then need to sweep through the list and whenever you found a match output a suitable collection of entries from the list.
To the best of my knowledge there's no built-in, easy way of doing this.
There might be something useful in tcllib.
I'd use grep myself.

My regexp is not working

I want to extract the error_name, Severity and Occurrences.
Here is the snippet of my report:
error_name: xxxxxxxxxx
Severity: Warning Occurrence: 2
error_name2:xxxxxxxxxxx.
Severity: Warning Occurrence: 16
error_name3:xxxxxxxxxxxxx
Severity: Warning Occurrence: 15
I am trying
while { [ gets $fp line ] >= 0 } {
if { [ regexp {^([^:\s]):.+^Severity:\s+Warning\s+Occurrence:\s+\d+} $line match errName count] } {
puts $errName
puts $count
incr errCount $count
}
But it does not write anything.
I would write this:
set fid [open filename r]
while {[gets $fid line] != -1} {
foreach {match label value} [regexp -inline -all {(\w+):\s*(\S*)} $line] {
switch -exact -- $label {
Severity {set sev $value}
Occurrence {set count $value}
default {set err $label}
}
}
if {[info exists err] && [info exists sev] && [info exists count]} {
puts $err
puts $count
incr errCount $count
unset err count sev
}
}
puts $errCount
error_name
2
error_name2
16
error_name3
15
33
If you can hold the entire file in memory at once (depends on how big it is relative to how much memory you've got) then you can use a piece of clever RE trickery to pick everything out:
# Load the whole file into $data
set f [open $filename]
set data [read $f]
close $f
# Store the RE in its own variable for clarity
set RE {^(\w+):.*\nSeverity: +(\w+) +Occurrence: +(\d+)$}
foreach {- name severity occur} [regexp -all -inline -line $RE $data] {
# Do something with each thing found
puts "$name - $severity - $occur"
}
OK, now to explain. The key is that we're parsing the whole string at once, but we're using the -line option so that ^ and $ become line-anchors and . won't match a newline. Apart from that, the -all -inline does what it says: returns a list of everything found, matches and submatches. We then iterate over that with foreach (the - is an odd variable name, but it's convenient for a “dummy discard”). This keeps the majority of the complicated string parsing in the RE engine rather than trying to do stuff in script.
You'll get better performance if you can constrain the start of the RE better than “word starting at line start” (as you can stop parsing a line sooner and continue to the next one) but if that's what your data is, that's what your data is.