I have a binary file that I don't know what is inside. Then, I convert it to hex number using binary scan $bin_data "H*" hex_data. The problem is how to match for any repetitive pattern (byte).
Example 1:
In the file: 0cabab79
Expected Output: abab
Example 2:
In the file: 0c1f1f03035d
Expected Output: 1f1f0303
Example 3:
In the file: 0c678967895d13
Expected Output: 67896789
You can use a more flexible regexp to get all the repeated patterns (at least 2 characters):
set inputs [list 0cabab79 0c1f1f03035d 0c678967895d13]
foreach input $inputs {
set out ""
foreach {whole sub} [regexp -all -inline {(..+)\1} $input] {
append out $whole
}
puts $out
}
# Output:
# abab
# 1f1f0303
# 67896789
If you want to make sure the pattern are in pairs of even number of characters (i.e. aaaaaa should give aaaa and not aaaaaa), then...
set inputs [list 0cabab79 0c1f1f03035d 0c678967895d13 aaaaaa]
foreach input $inputs {
set out ""
foreach {whole sub} [regexp -all -inline {((?:..)+)\1} $input] {
append out $whole
}
puts $out
}
# Output:
# abab
# 1f1f0303
# 67896789
# aaaa
You can do this with a regexp using a backreference:
regexp -all -inline {(..)\1} 0c1f1f03035d
This will return you a list containing the full-length repetition followed by the repeated element for all matches.
So for this one you would get
1f1f 1f 0303 03
Looping through these you can build your Expected Output e.g.
set op {};
foreach {rep sing} [regexp -all -inline {(..)\1} 0c1f1f03035d] {
append op $rep
}
Related
How do I get all the matches in tcl using regexp command? For example I have a string as following
set line "foo \"aaa\"zz bar \"aaa:ccc\" ccc"
puts $line
foo "aaa"zz bar "aaa:ccc" ccc
I now want to get aaa and aaa:ccc. There could be any number of such matches.
I tried
regexp -all -inline {"(.*)"} $line
{"aaa"zz bar "aaa:ccc"} {aaa"zz bar "aaa:ccc}
But as seen this didn't work. What's the right way to get multiple matches and match everything within double quotes?
You can capture all between two quotes with "([^"]*)" pattern. When using the pattern with regexp -all -inline:
set matches [regexp -all -inline {"([^"]*)"} $line]
you will get all overall match values and all the captured substrings.
You will have to post-process the matches in order to get the final list of captures:
set line "foo \"aaa\"zz bar \"aaa:ccc\" ccc"
set matches [regexp -all -inline {"([^"]*)"} $line]
set res {}
foreach {match capture} $matches {
lappend res $capture
}
puts $res
See the online Tcl demo.
Output:
aaa aaa:ccc
An alternate method to extract the quoted strings is:
set unquoted [lmap quoted [regexp -all -inline {".*?"} $line] {string trim $quoted {"}}]
Or, split the string using quote as the delimiter, and take every second field.
set unquoted [lmap {a b} [split $line {"}] {set b}]
That gives you a trailing empty element since this split invocation results in a list with an odd number of elements.
I have the following code in a tcl script
$a_list - {Hello1.my.name.is.not.adam.go.away,
Hello2.my.name.is.not.adam,
Hello3.my.name.is.not.adam.leave.me}
foreach a $a_list {if {[regexp adam [regsub {.*\.} $a {}]] == 1} {puts $a} }
My understanding is that this looks for the string adam in $a_list and it matches when adam is the last string.
For example
Hello1.my.name.is.not.adam.go.away ---> NO MATCH
Hello2.my.name.is.not.adam ---> MATCH
Hello3.my.name.is.not.adam.leave.me ---> NO MATCH
The problem I am facing is that I want to match with adam and then strip away everything away after including adam itself.
For example
Hello1.my.name.is.not.adam.go.away ---> MATCH
Hello2.my.name.is.not.adam ---> MATCH
Hello3.my.name.is.not.adam.leave.me ---> MATCH
In all cases above, it should change the string to
Hello1.my.name.is.not ---> MATCH
Hello2.my.name.is.not ---> MATCH
Hello3.my.name.is.not ---> MATCH
Your help is appreciated.
Thanks
Method 1 :
With simple string commands, we can get the desired result.
set input {Hello1.my.name.is.not.adam.go.away, Hello2.my.name.is.not.adam, Hello3.my.name.is.not.adam.leave.me noobuntu dinesh}
foreach elem $input {
# Getting the index of the word 'adam' in each element
set idx [string first "adam" $elem]
# If the word is not available, then 'idx' will have the value as '-1'
if {$idx!=-1} {
# string range will give the substring for the given indices
puts "->[string range $elem 0 [expr {$idx-1}]]"
}
}
will give output as follows,
->Hello1.my.name.is.not.
->Hello2.my.name.is.not.
->Hello3.my.name.is.not.
Method 2:
If you are interested only with regex patterns, then it can tweaked by regsub command as
set input {Hello1.my.name.is.not.adam.go.away, Hello2.my.name.is.not.adam, Hello3.my.name.is.not.adam.leave.me noobuntu dinesh}
foreach elem $input {
if {[regsub {(.*?)adam.*$} $elem {\1} result]} {
puts "->$result"
}
}
will produce output as
->Hello1.my.name.is.not.
->Hello2.my.name.is.not.
->Hello3.my.name.is.not.
Reference : string, regsub
The simplest approach to strip the word adam and everything after it in each element of a list, you use a simple regsub and lmap:
% lmap s $a_list {regsub {\madam\M.*} $s ""}
Hello1.my.name.is.not. Hello2.my.name.is.not. Hello3.my.name.is.not.
The \m only matches at the start of a word, and the \M only matches at the end of a word. It works because if the word isn't there, regsub does nothing.
Using Tcl 8.5? You won't have lmap, and will need to do this instead:
set result {}
foreach s $a_list {
lappend result [regsub {\madam\M.*} $s ""]
}
# The altered list is now in $result
I have the following code:
set a "10.20.30.40"
regsub -all {.([0-9]+).([0-9]+).} $a {\2 \1} b
I am trying to grep 2nd and 3rd octet of the IP address.
Expected output:
20 30
Actual output:
20 04 0
What is my mistake here?
I'd stay away from regular expressions altogether:
set b [join [lrange [split $a .] 1 2]]
Split the value on dots, take the 2nd and 3nd elements, and join them with a space.
You need to set the variables for the match and captured groups, then you can access them. Here is an example:
set a "10.20.30.40"
set rest [regexp {[0-9]+\.([0-9]+)\.([0-9]+)\.[0-9]+} $a match submatch1 submatch2]
puts $submatch1
puts $submatch2
Output of the demo
20
30
EDIT:
You can use regsub and backerferences this way (I am now replacing the 3rd and 2nd octets, just for demonstration). Note that a literal dot must be escaped:
set a "10.20.30.40"
regsub -all {\.([0-9]+)\.([0-9]+)\.} $a {.\2.\1.} b
puts $b
Output of the demo:
10.30.20.40
To obtain a "20 30" string, you need to use
regsub -all {^[0-9]+\.([0-9]+)\.([0-9]+)\.[0-9]+$} $a {\1 \2} b
I have to get a patter from the specified string
This is first time I'm using tcl. Like in perl, I can simply get the grouped value with $1 $2 ... $n. In tcl I've tried this way ... actually this didn't even work...
while { [gets $LOG_FILE line] >= 0 } {
if {[regexp -inline {Getting available devices: (/.+$)} $line]} {
puts {group0}
}
}
With regexp, you have two ways to get submatches out.
Without -inline, you have to supply variables sufficient to get the submatch you care about (with the first such variable being for the whole matched region, like $& in Perl):
if {[regexp {Getting available devices: (/.+$)} $line a b]} {
puts $b
}
It's pretty common to use -> as an overall-match variable. It's totally non-special to Tcl, but it makes the script mnemonically easier to grok:
if {[regexp {Getting available devices: (/.+$)} $line -> theDevices]} {
puts $theDevices
}
With -inline, regexp returns a list of things that were matched instead of assigning them to variables.
set matched [regexp -inline {Getting available devices: (/.+$)} $line]
if {[llength $matched]} {
set group1 [lindex $matched 1]
puts $group1
}
The -inline form works very well with multi-variable foreach and lassign, especially in combination with -all.
foreach {-> theDevices} [regexp -inline -all {Getting available devices: (/.+$)} $line] {
puts $theDevices
}
I want to extract the error_name, Severity and Occurrences.
Here is the snippet of my report:
error_name: xxxxxxxxxx
Severity: Warning Occurrence: 2
error_name2:xxxxxxxxxxx.
Severity: Warning Occurrence: 16
error_name3:xxxxxxxxxxxxx
Severity: Warning Occurrence: 15
I am trying
while { [ gets $fp line ] >= 0 } {
if { [ regexp {^([^:\s]):.+^Severity:\s+Warning\s+Occurrence:\s+\d+} $line match errName count] } {
puts $errName
puts $count
incr errCount $count
}
But it does not write anything.
I would write this:
set fid [open filename r]
while {[gets $fid line] != -1} {
foreach {match label value} [regexp -inline -all {(\w+):\s*(\S*)} $line] {
switch -exact -- $label {
Severity {set sev $value}
Occurrence {set count $value}
default {set err $label}
}
}
if {[info exists err] && [info exists sev] && [info exists count]} {
puts $err
puts $count
incr errCount $count
unset err count sev
}
}
puts $errCount
error_name
2
error_name2
16
error_name3
15
33
If you can hold the entire file in memory at once (depends on how big it is relative to how much memory you've got) then you can use a piece of clever RE trickery to pick everything out:
# Load the whole file into $data
set f [open $filename]
set data [read $f]
close $f
# Store the RE in its own variable for clarity
set RE {^(\w+):.*\nSeverity: +(\w+) +Occurrence: +(\d+)$}
foreach {- name severity occur} [regexp -all -inline -line $RE $data] {
# Do something with each thing found
puts "$name - $severity - $occur"
}
OK, now to explain. The key is that we're parsing the whole string at once, but we're using the -line option so that ^ and $ become line-anchors and . won't match a newline. Apart from that, the -all -inline does what it says: returns a list of everything found, matches and submatches. We then iterate over that with foreach (the - is an odd variable name, but it's convenient for a “dummy discard”). This keeps the majority of the complicated string parsing in the RE engine rather than trying to do stuff in script.
You'll get better performance if you can constrain the start of the RE better than “word starting at line start” (as you can stop parsing a line sooner and continue to the next one) but if that's what your data is, that's what your data is.