extracting digits using regex in tcl - regex

I am new to regex and still trying to wrap my head around it. I am stuck at this one point and all related questions to my problem doesnt seem to help.
I have a text varible
set text "/folders/beta_0_2_1"
I want to extract 0, 2, 1 and save them in three different variables using regex in tcl.
I tried to reach through beta variable using
[regexp {/beta_} $text]
However, I am not able to figure out the part where I can extract each of those variables and then save them. Can you provide me some direction?

You can use it like this:
regexp {/beta_([0-9]+)_([0-9]+)_([0-9]+)} $text -> num1 num2 num3
Then you can use the variables:
puts "$num1 $num2 $num3"
# => 0 2 1
-> by the way is a convention. This variable (yes, it's one!) will contain the whole match.
And as a side note, you could also split it by underscore:
lassign [split $text "_"] - num1 num2 num3
puts "$num1 $num2 $num3"
# => 0 2 1

to extract all the sequences of digits:
% set text "/folders/beta_0_2_1"
/folders/beta_0_2_1
% regexp -inline -all {\d+} $text
0 2 1
% lassign [regexp -inline -all {\d+} $text] a b c
% puts $a; puts $b; puts $c
0
2
1

Related

convert a string to a map or list in tcl

I've a command output as something below this is an example
card-1-1-1 4 -Number 1 -type Eth -config -GEPorts 4
card-1-3-1 3 -Number 2 -type Eth -config Yes -GEPorts 3
I need this to be converted into a list like
card-1-1-1 4
-Number 1
-type Eth
-config if_empty_insert_null
-GEPorts 4
card-1-3-1 3
-Number 2
-type Eth
-config Yes
-GEPorts 3
Well, if it wasn't for the fact that you've got some options that are sometimes missing associated values, this would be pretty much trivial. As it is, we need to be more careful. The main tricky bits are using regexp -all -inline to parse to a Tcl list and using a for loop to iterate over everything when detecting absent parameters.
# Process each line
foreach row [split $inputData "\n"] {
# If there's a comment syntax or blank lines are allowed, you handle them here
# Safely convert to a Tcl list
set words [regexp -all -inline {\S+} $row]
# First two words are used "as is"
set pairs [lrange $words 0 1]
# Can't use foreach here; non-constant step size prevents it
for {set i 2} {$i < [llength $words]} {incr i} {
set paramName [lindex $words $i]
set next [lindex $words [expr {$i + 1}]]
# Set the default for if the option is value-less
set parameter "if_empty_insert_null"
# Look for a value; slightly complex as I'm allowing for negative numbers
if {$next ne "" && ![regexp {^-[a-zA-Z]} $next]} {
set parameter $next
incr i
}
# Now we can update the list as we know the pair of values to add
lappend pairs $paramName $parameter
}
# Now print everything out; we can use foreach for this as we're guaranteed to
# have an even number of values
foreach {a b} $pairs {
# Do more complex formatting if you want
puts "$a $b"
}
}

Having issue with back reference in TCL

I have the following code:
set a "10.20.30.40"
regsub -all {.([0-9]+).([0-9]+).} $a {\2 \1} b
I am trying to grep 2nd and 3rd octet of the IP address.
Expected output:
20 30
Actual output:
20 04 0
What is my mistake here?
I'd stay away from regular expressions altogether:
set b [join [lrange [split $a .] 1 2]]
Split the value on dots, take the 2nd and 3nd elements, and join them with a space.
You need to set the variables for the match and captured groups, then you can access them. Here is an example:
set a "10.20.30.40"
set rest [regexp {[0-9]+\.([0-9]+)\.([0-9]+)\.[0-9]+} $a match submatch1 submatch2]
puts $submatch1
puts $submatch2
Output of the demo
20
30
EDIT:
You can use regsub and backerferences this way (I am now replacing the 3rd and 2nd octets, just for demonstration). Note that a literal dot must be escaped:
set a "10.20.30.40"
regsub -all {\.([0-9]+)\.([0-9]+)\.} $a {.\2.\1.} b
puts $b
Output of the demo:
10.30.20.40
To obtain a "20 30" string, you need to use
regsub -all {^[0-9]+\.([0-9]+)\.([0-9]+)\.[0-9]+$} $a {\1 \2} b

variable number of match in a regexp

I have this file :
#2/1/1/21=p1 5/1/1/21=p1 isid=104
3/1/1/9=p1 4/1/1/4=p1 5/1/1/17=p1 6/1/1/4=p1 isid=100
1/1/1/4=p1 6/1/1/5=p1 isid=101
I want line 1 to be ignored (it a commented line)
In line two I want to get "3/1/1/9" "4/1/1/4" "5/1/1/17" in three variables var1 var2 and var3
In line three I want to get "1/1/1/4" and "6/1/1/5" in two variables var1 and var2.
For the moment I can ignore line 1, and match what I want on line 2 OR line 3 :
if {[regexp {^(\d/\d/\d/\d{1,2})=p1.*(\d/\d/\d/\d{1,2})=p1} $line value match1 match2]} {
# This works for line 3 but not line 2
}
if {[regexp {^(\d/\d/\d/\d{1,2})=p1.*(\d/\d/\d/\d{1,2})=p1.*(\d/\d/\d/\d{1,2})=p1} $line value match1 match2 match3]} {
# This works for line 2 but not line 3
}
How can I have the right number of matched for line 2 AND 3 ?
Try this regex:
[regexp {^(\d/\d/\d/\d{1,2})=p1.*?(\d/\d/\d/\d{1,2})=p1(?:.*?(\d/\d/\d/\d{1,2})=p1)?} $line value match1 match2 match3]
^^^ ^^
This makes the third match optional.
I also turned your greedy .* into non-greedy .*?.
To get all matches, you could use something more like this:
if {[string range $line 0 0] ne "#"} {
set matches [regexp -all -inline -- {\d/\d/\d/\d{1,2}(?==p1)} $line]
}
And you will get the matches in the list $matches. You can then access them through lindex or if you use lassign and give them specific names.
This might be better:
lassign [regexp -all -inline {\d+(?:/\d+){3}(?==p1)} $line] var1 var2 var3
If there are only 2 matches, var3 will be empty
If there are more matches, only the first 3 will be captured in variables.
If you really just want all the matches:
set vars [regexp -all -inline {\d+(?:/\d+){3}(?==p1)} $line]
To ignore comments:
while {[gets $fh line] != -1} {
if {[regexp {^\s*#} $line]} continue
# do your other stuff here ...
}
try this:
(^|\s)(\d/\d/\d/\d=p1)
or more compactly:
(^|\s)((\d/){3}\d=p1)

I am stuck with regexp only returning the match while I want to get the followup followed by the match

TCL/TK:
Problem: I want to be able to get the post-match string data, but even though I provide
regexp with more than a variable for the match itself the secutive variables either turn out empty, or I got the same value from the first two.
E.g:
set args "!do dance"
regsub -all {(!do)} $args prefix command
puts $prefis "!do"
puts $command "!do"
What to do? Ty
EDIT I found the solution thanks to inspiration by your answer, here's a snippet
if { [ regsub {(!do\s+)} $args "" match ] >= 1 } {
if { $match == "{help}" }
Assuming you want to remove the "!do" then you can do the following:
set args "!do dance"
regsub -all {(!do)} $args "" output
puts $output
I'm not sure why you're using regexp here, and it seems like you're using eggdrop or something. You can easily use:
set prefix [lindex $args 0]
set command [lindex $args 1]
Though you should be careful with $args. It's usually used in procs to mean all the other arguments passed on to the proc aside from the already defined arguments.
% puts $prefix
!do
% puts $command
dance

Case matching regexp

I have been wondering about a regexp matching pattern in Tcl for some time and I've remained stumped as to how it was working. I'm using Wish and Tcl/Tk 8.5 by the way.
I have a random string MmmasidhmMm stored in $line and the code I have is:
while {[regexp -all {[Mm]} $line match]} {
puts $data $match
regsub {[Mm]} $line "" line
}
$data is a text file.
This is what I got:
m
m
m
m
m
m
While I was expecting:
M
m
m
m
M
m
I was trying some things to see how changing a bit would affect the results when I got this:
while {[regexp -all {^[Mm]} $line match]} {
puts $data $match
regsub {[Mm]} $line "" line
}
I get:
M
m
m
Surprisingly, $match keeps the case.
I was wondering why in the first case, $match automatically becomes lowercase for some reason. Unless I am not understanding how the regexp actually is working, I'm not sure what I could be doing wrong. Maybe there's a flag that fixes it that I don't know about?
I'm not sure I'll really use this kind of code some day, but I guess learning how it works might help me in other ways. I hope I didn't miss anything in there. Let me know if you need more information!
The key here is in your -all flag. The documentation for that said:
-all -- Causes the regular expression to be matched as many times as possible in the string, returning the total number of matches found. If this is specified with match variables, they will contain information for the last match only.
That means the variable match contains the very last match, which is a lower case 'm'. Drop the -all flag and you will get what you want.
Update
If your goal is to remove all 'm' regardless of case, that whole block of code can be condensed into just one line:
regsub -all {[MM]} $line "" line
Or, more intuitively:
set line [string map -nocase {m ""} $line]; # Map all M's into nothing