variable number of match in a regexp

variable number of match in a regexp - regex

I have this file :
#2/1/1/21=p1 5/1/1/21=p1 isid=104
3/1/1/9=p1 4/1/1/4=p1 5/1/1/17=p1 6/1/1/4=p1 isid=100
1/1/1/4=p1 6/1/1/5=p1 isid=101
I want line 1 to be ignored (it a commented line)
In line two I want to get "3/1/1/9" "4/1/1/4" "5/1/1/17" in three variables var1 var2 and var3
In line three I want to get "1/1/1/4" and "6/1/1/5" in two variables var1 and var2.
For the moment I can ignore line 1, and match what I want on line 2 OR line 3 :
if {[regexp {^(\d/\d/\d/\d{1,2})=p1.*(\d/\d/\d/\d{1,2})=p1} $line value match1 match2]} {
# This works for line 3 but not line 2
}
if {[regexp {^(\d/\d/\d/\d{1,2})=p1.*(\d/\d/\d/\d{1,2})=p1.*(\d/\d/\d/\d{1,2})=p1} $line value match1 match2 match3]} {
# This works for line 2 but not line 3
}
How can I have the right number of matched for line 2 AND 3 ?

Try this regex:
[regexp {^(\d/\d/\d/\d{1,2})=p1.*?(\d/\d/\d/\d{1,2})=p1(?:.*?(\d/\d/\d/\d{1,2})=p1)?} $line value match1 match2 match3]
^^^ ^^
This makes the third match optional.
I also turned your greedy .* into non-greedy .*?.
To get all matches, you could use something more like this:
if {[string range $line 0 0] ne "#"} {
set matches [regexp -all -inline -- {\d/\d/\d/\d{1,2}(?==p1)} $line]
}
And you will get the matches in the list $matches. You can then access them through lindex or if you use lassign and give them specific names.

This might be better:
lassign [regexp -all -inline {\d+(?:/\d+){3}(?==p1)} $line] var1 var2 var3
If there are only 2 matches, var3 will be empty
If there are more matches, only the first 3 will be captured in variables.
If you really just want all the matches:
set vars [regexp -all -inline {\d+(?:/\d+){3}(?==p1)} $line]
To ignore comments:
while {[gets $fh line] != -1} {
if {[regexp {^\s*#} $line]} continue
# do your other stuff here ...
}

try this:
(^|\s)(\d/\d/\d/\d=p1)
or more compactly:
(^|\s)((\d/){3}\d=p1)

Related

Using a capturing group of the match pattern of a regsub in the substitution itself?

In the code below, $html is a string of HTML. The code captures the full match and the capturing group for each match in a list. Then, if the list is not empty, it iterates through the list to replace the span tags with em tags along with the original text that was between them.
For example if the HTML is:
This is <span class='add'>span 1</span> and this is <span class='add'>span 2</span>. then $a would be a list of length 4: {<span class='add'>span 1</span>} {span 1} {<span class='add'>span 2</span>} {span 2}.
The sample code generates:
This is <em>span 1</em> and this is <em>span 2</em>. as expected; but it seems that this must be an inefficient way to do this and, somehow, the capturing group should be usable directly within the regsub expression.
Is this true and how is it done?
Something like:
set html [regsub "<span class='add'>(.+?)</span>" $html "<em>.../em>"]
where the ... is something that points to the captured group.
Thank you.
set a [regexp -all -inline -- {<span class='add'>(.+?)</span>} $html]
if { [llength $a] > 0 } {
foreach {x y} $a {
set html [regsub "<span class='add'>${y}</span>" $html "<em>${y}</em>"]
}
}

You can use a backreference in the replacement:
regsub -all {<span class='add'>(.+?)</span>} $html {<em>\1</em>}
EDIT: To trim leading and trailing spaces from the captured string, you can simply leave them out by matching leading and trailing spaces outside the parentheses:
regsub -all {<span class='add'>\s*(.+?)\s*</span>} $html {<em>\1</em>}

Get multiple matches with tcl regexp

How do I get all the matches in tcl using regexp command? For example I have a string as following
set line "foo \"aaa\"zz bar \"aaa:ccc\" ccc"
puts $line
foo "aaa"zz bar "aaa:ccc" ccc
I now want to get aaa and aaa:ccc. There could be any number of such matches.
I tried
regexp -all -inline {"(.*)"} $line
{"aaa"zz bar "aaa:ccc"} {aaa"zz bar "aaa:ccc}
But as seen this didn't work. What's the right way to get multiple matches and match everything within double quotes?

You can capture all between two quotes with "([^"]*)" pattern. When using the pattern with regexp -all -inline:
set matches [regexp -all -inline {"([^"]*)"} $line]
you will get all overall match values and all the captured substrings.
You will have to post-process the matches in order to get the final list of captures:
set line "foo \"aaa\"zz bar \"aaa:ccc\" ccc"
set matches [regexp -all -inline {"([^"]*)"} $line]
set res {}
foreach {match capture} $matches {
lappend res $capture
}
puts $res
See the online Tcl demo.
Output:
aaa aaa:ccc

An alternate method to extract the quoted strings is:
set unquoted [lmap quoted [regexp -all -inline {".*?"} $line] {string trim $quoted {"}}]
Or, split the string using quote as the delimiter, and take every second field.
set unquoted [lmap {a b} [split $line {"}] {set b}]
That gives you a trailing empty element since this split invocation results in a list with an odd number of elements.

Having issue with back reference in TCL

I have the following code:
set a "10.20.30.40"
regsub -all {.([0-9]+).([0-9]+).} $a {\2 \1} b
I am trying to grep 2nd and 3rd octet of the IP address.
Expected output:
20 30
Actual output:
20 04 0
What is my mistake here?

I'd stay away from regular expressions altogether:
set b [join [lrange [split $a .] 1 2]]
Split the value on dots, take the 2nd and 3nd elements, and join them with a space.

You need to set the variables for the match and captured groups, then you can access them. Here is an example:
set a "10.20.30.40"
set rest [regexp {[0-9]+\.([0-9]+)\.([0-9]+)\.[0-9]+} $a match submatch1 submatch2]
puts $submatch1
puts $submatch2
Output of the demo
20
30
EDIT:
You can use regsub and backerferences this way (I am now replacing the 3rd and 2nd octets, just for demonstration). Note that a literal dot must be escaped:
set a "10.20.30.40"
regsub -all {\.([0-9]+)\.([0-9]+)\.} $a {.\2.\1.} b
puts $b
Output of the demo:
10.30.20.40
To obtain a "20 30" string, you need to use
regsub -all {^[0-9]+\.([0-9]+)\.([0-9]+)\.[0-9]+$} $a {\1 \2} b

Match any repetitive pattern using tcl

I have a binary file that I don't know what is inside. Then, I convert it to hex number using binary scan $bin_data "H*" hex_data. The problem is how to match for any repetitive pattern (byte).
Example 1:
In the file: 0cabab79
Expected Output: abab
Example 2:
In the file: 0c1f1f03035d
Expected Output: 1f1f0303
Example 3:
In the file: 0c678967895d13
Expected Output: 67896789

You can use a more flexible regexp to get all the repeated patterns (at least 2 characters):
set inputs [list 0cabab79 0c1f1f03035d 0c678967895d13]
foreach input $inputs {
set out ""
foreach {whole sub} [regexp -all -inline {(..+)\1} $input] {
append out $whole
}
puts $out
}
# Output:
# abab
# 1f1f0303
# 67896789
If you want to make sure the pattern are in pairs of even number of characters (i.e. aaaaaa should give aaaa and not aaaaaa), then...
set inputs [list 0cabab79 0c1f1f03035d 0c678967895d13 aaaaaa]
foreach input $inputs {
set out ""
foreach {whole sub} [regexp -all -inline {((?:..)+)\1} $input] {
append out $whole
}
puts $out
}
# Output:
# abab
# 1f1f0303
# 67896789
# aaaa

You can do this with a regexp using a backreference:
regexp -all -inline {(..)\1} 0c1f1f03035d
This will return you a list containing the full-length repetition followed by the repeated element for all matches.
So for this one you would get
1f1f 1f 0303 03
Looping through these you can build your Expected Output e.g.
set op {};
foreach {rep sing} [regexp -all -inline {(..)\1} 0c1f1f03035d] {
append op $rep
}

Getting value from a string in tcl

I have to get a patter from the specified string
This is first time I'm using tcl. Like in perl, I can simply get the grouped value with $1 $2 ... $n. In tcl I've tried this way ... actually this didn't even work...
while { [gets $LOG_FILE line] >= 0 } {
if {[regexp -inline {Getting available devices: (/.+$)} $line]} {
puts {group0}
}
}

With regexp, you have two ways to get submatches out.
Without -inline, you have to supply variables sufficient to get the submatch you care about (with the first such variable being for the whole matched region, like $& in Perl):
if {[regexp {Getting available devices: (/.+$)} $line a b]} {
puts $b
}
It's pretty common to use -> as an overall-match variable. It's totally non-special to Tcl, but it makes the script mnemonically easier to grok:
if {[regexp {Getting available devices: (/.+$)} $line -> theDevices]} {
puts $theDevices
}
With -inline, regexp returns a list of things that were matched instead of assigning them to variables.
set matched [regexp -inline {Getting available devices: (/.+$)} $line]
if {[llength $matched]} {
set group1 [lindex $matched 1]
puts $group1
}
The -inline form works very well with multi-variable foreach and lassign, especially in combination with -all.
foreach {-> theDevices} [regexp -inline -all {Getting available devices: (/.+$)} $line] {
puts $theDevices
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

variable number of match in a regexp - regex

try this: (^|\s)(\d/\d/\d/\d=p1) or more compactly: (^|\s)((\d/){3}\d=p1)

Related

Using a capturing group of the match pattern of a regsub in the substitution itself?

Get multiple matches with tcl regexp

Having issue with back reference in TCL

Match any repetitive pattern using tcl

Getting value from a string in tcl

Categories

Resources