Having issue with back reference in TCL - regex

I have the following code:
set a "10.20.30.40"
regsub -all {.([0-9]+).([0-9]+).} $a {\2 \1} b
I am trying to grep 2nd and 3rd octet of the IP address.
Expected output:
20 30
Actual output:
20 04 0
What is my mistake here?

I'd stay away from regular expressions altogether:
set b [join [lrange [split $a .] 1 2]]
Split the value on dots, take the 2nd and 3nd elements, and join them with a space.

You need to set the variables for the match and captured groups, then you can access them. Here is an example:
set a "10.20.30.40"
set rest [regexp {[0-9]+\.([0-9]+)\.([0-9]+)\.[0-9]+} $a match submatch1 submatch2]
puts $submatch1
puts $submatch2
Output of the demo
20
30
EDIT:
You can use regsub and backerferences this way (I am now replacing the 3rd and 2nd octets, just for demonstration). Note that a literal dot must be escaped:
set a "10.20.30.40"
regsub -all {\.([0-9]+)\.([0-9]+)\.} $a {.\2.\1.} b
puts $b
Output of the demo:
10.30.20.40
To obtain a "20 30" string, you need to use
regsub -all {^[0-9]+\.([0-9]+)\.([0-9]+)\.[0-9]+$} $a {\1 \2} b

Related

Get multiple matches with tcl regexp

How do I get all the matches in tcl using regexp command? For example I have a string as following
set line "foo \"aaa\"zz bar \"aaa:ccc\" ccc"
puts $line
foo "aaa"zz bar "aaa:ccc" ccc
I now want to get aaa and aaa:ccc. There could be any number of such matches.
I tried
regexp -all -inline {"(.*)"} $line
{"aaa"zz bar "aaa:ccc"} {aaa"zz bar "aaa:ccc}
But as seen this didn't work. What's the right way to get multiple matches and match everything within double quotes?
You can capture all between two quotes with "([^"]*)" pattern. When using the pattern with regexp -all -inline:
set matches [regexp -all -inline {"([^"]*)"} $line]
you will get all overall match values and all the captured substrings.
You will have to post-process the matches in order to get the final list of captures:
set line "foo \"aaa\"zz bar \"aaa:ccc\" ccc"
set matches [regexp -all -inline {"([^"]*)"} $line]
set res {}
foreach {match capture} $matches {
lappend res $capture
}
puts $res
See the online Tcl demo.
Output:
aaa aaa:ccc
An alternate method to extract the quoted strings is:
set unquoted [lmap quoted [regexp -all -inline {".*?"} $line] {string trim $quoted {"}}]
Or, split the string using quote as the delimiter, and take every second field.
set unquoted [lmap {a b} [split $line {"}] {set b}]
That gives you a trailing empty element since this split invocation results in a list with an odd number of elements.

Tcl Regexp confusion

I have the following code in a tcl script
$a_list - {Hello1.my.name.is.not.adam.go.away,
Hello2.my.name.is.not.adam,
Hello3.my.name.is.not.adam.leave.me}
foreach a $a_list {if {[regexp adam [regsub {.*\.} $a {}]] == 1} {puts $a} }
My understanding is that this looks for the string adam in $a_list and it matches when adam is the last string.
For example
Hello1.my.name.is.not.adam.go.away ---> NO MATCH
Hello2.my.name.is.not.adam ---> MATCH
Hello3.my.name.is.not.adam.leave.me ---> NO MATCH
The problem I am facing is that I want to match with adam and then strip away everything away after including adam itself.
For example
Hello1.my.name.is.not.adam.go.away ---> MATCH
Hello2.my.name.is.not.adam ---> MATCH
Hello3.my.name.is.not.adam.leave.me ---> MATCH
In all cases above, it should change the string to
Hello1.my.name.is.not ---> MATCH
Hello2.my.name.is.not ---> MATCH
Hello3.my.name.is.not ---> MATCH
Your help is appreciated.
Thanks
Method 1 :
With simple string commands, we can get the desired result.
set input {Hello1.my.name.is.not.adam.go.away, Hello2.my.name.is.not.adam, Hello3.my.name.is.not.adam.leave.me noobuntu dinesh}
foreach elem $input {
# Getting the index of the word 'adam' in each element
set idx [string first "adam" $elem]
# If the word is not available, then 'idx' will have the value as '-1'
if {$idx!=-1} {
# string range will give the substring for the given indices
puts "->[string range $elem 0 [expr {$idx-1}]]"
}
}
will give output as follows,
->Hello1.my.name.is.not.
->Hello2.my.name.is.not.
->Hello3.my.name.is.not.
Method 2:
If you are interested only with regex patterns, then it can tweaked by regsub command as
set input {Hello1.my.name.is.not.adam.go.away, Hello2.my.name.is.not.adam, Hello3.my.name.is.not.adam.leave.me noobuntu dinesh}
foreach elem $input {
if {[regsub {(.*?)adam.*$} $elem {\1} result]} {
puts "->$result"
}
}
will produce output as
->Hello1.my.name.is.not.
->Hello2.my.name.is.not.
->Hello3.my.name.is.not.
Reference : string, regsub
The simplest approach to strip the word adam and everything after it in each element of a list, you use a simple regsub and lmap:
% lmap s $a_list {regsub {\madam\M.*} $s ""}
Hello1.my.name.is.not. Hello2.my.name.is.not. Hello3.my.name.is.not.
The \m only matches at the start of a word, and the \M only matches at the end of a word. It works because if the word isn't there, regsub does nothing.
Using Tcl 8.5? You won't have lmap, and will need to do this instead:
set result {}
foreach s $a_list {
lappend result [regsub {\madam\M.*} $s ""]
}
# The altered list is now in $result

Match any repetitive pattern using tcl

I have a binary file that I don't know what is inside. Then, I convert it to hex number using binary scan $bin_data "H*" hex_data. The problem is how to match for any repetitive pattern (byte).
Example 1:
In the file: 0cabab79
Expected Output: abab
Example 2:
In the file: 0c1f1f03035d
Expected Output: 1f1f0303
Example 3:
In the file: 0c678967895d13
Expected Output: 67896789
You can use a more flexible regexp to get all the repeated patterns (at least 2 characters):
set inputs [list 0cabab79 0c1f1f03035d 0c678967895d13]
foreach input $inputs {
set out ""
foreach {whole sub} [regexp -all -inline {(..+)\1} $input] {
append out $whole
}
puts $out
}
# Output:
# abab
# 1f1f0303
# 67896789
If you want to make sure the pattern are in pairs of even number of characters (i.e. aaaaaa should give aaaa and not aaaaaa), then...
set inputs [list 0cabab79 0c1f1f03035d 0c678967895d13 aaaaaa]
foreach input $inputs {
set out ""
foreach {whole sub} [regexp -all -inline {((?:..)+)\1} $input] {
append out $whole
}
puts $out
}
# Output:
# abab
# 1f1f0303
# 67896789
# aaaa
You can do this with a regexp using a backreference:
regexp -all -inline {(..)\1} 0c1f1f03035d
This will return you a list containing the full-length repetition followed by the repeated element for all matches.
So for this one you would get
1f1f 1f 0303 03
Looping through these you can build your Expected Output e.g.
set op {};
foreach {rep sing} [regexp -all -inline {(..)\1} 0c1f1f03035d] {
append op $rep
}

extracting digits using regex in tcl

I am new to regex and still trying to wrap my head around it. I am stuck at this one point and all related questions to my problem doesnt seem to help.
I have a text varible
set text "/folders/beta_0_2_1"
I want to extract 0, 2, 1 and save them in three different variables using regex in tcl.
I tried to reach through beta variable using
[regexp {/beta_} $text]
However, I am not able to figure out the part where I can extract each of those variables and then save them. Can you provide me some direction?
You can use it like this:
regexp {/beta_([0-9]+)_([0-9]+)_([0-9]+)} $text -> num1 num2 num3
Then you can use the variables:
puts "$num1 $num2 $num3"
# => 0 2 1
-> by the way is a convention. This variable (yes, it's one!) will contain the whole match.
And as a side note, you could also split it by underscore:
lassign [split $text "_"] - num1 num2 num3
puts "$num1 $num2 $num3"
# => 0 2 1
to extract all the sequences of digits:
% set text "/folders/beta_0_2_1"
/folders/beta_0_2_1
% regexp -inline -all {\d+} $text
0 2 1
% lassign [regexp -inline -all {\d+} $text] a b c
% puts $a; puts $b; puts $c
0
2
1

Case matching regexp

I have been wondering about a regexp matching pattern in Tcl for some time and I've remained stumped as to how it was working. I'm using Wish and Tcl/Tk 8.5 by the way.
I have a random string MmmasidhmMm stored in $line and the code I have is:
while {[regexp -all {[Mm]} $line match]} {
puts $data $match
regsub {[Mm]} $line "" line
}
$data is a text file.
This is what I got:
m
m
m
m
m
m
While I was expecting:
M
m
m
m
M
m
I was trying some things to see how changing a bit would affect the results when I got this:
while {[regexp -all {^[Mm]} $line match]} {
puts $data $match
regsub {[Mm]} $line "" line
}
I get:
M
m
m
Surprisingly, $match keeps the case.
I was wondering why in the first case, $match automatically becomes lowercase for some reason. Unless I am not understanding how the regexp actually is working, I'm not sure what I could be doing wrong. Maybe there's a flag that fixes it that I don't know about?
I'm not sure I'll really use this kind of code some day, but I guess learning how it works might help me in other ways. I hope I didn't miss anything in there. Let me know if you need more information!
The key here is in your -all flag. The documentation for that said:
-all -- Causes the regular expression to be matched as many times as possible in the string, returning the total number of matches found. If this is specified with match variables, they will contain information for the last match only.
That means the variable match contains the very last match, which is a lower case 'm'. Drop the -all flag and you will get what you want.
Update
If your goal is to remove all 'm' regardless of case, that whole block of code can be condensed into just one line:
regsub -all {[MM]} $line "" line
Or, more intuitively:
set line [string map -nocase {m ""} $line]; # Map all M's into nothing