How to use regexp to grab elements of a tcl string

How to use regexp to grab elements of a tcl string - regex

I have extracted some data from a tabular column using "lsearch" and now have a TCL variable like this
{ 1 no8 MASTER (UP-DOWN) ABCD 1456 /clown F right_left_123 /local/opt/data WXYZ (M5,N6) }
How can I now use "regexp" to grab each of these values into separate variables? I guess I will have to filter by space, but the blank space between these values are variable.Also, I am a "regexp" newbie.
I tried using "lindex" but looks like the entire element is in index 0. Please let me know what is the easiest way.

lsearch has probably returned a list containing this 1 element. If you want to now get the elements inside this element, use a second index, to go 1 level deeper:
# suppose the list in in the variable $l
puts [lindex $l 0 0]
# => 1
puts [lindex $l 0 1]
# => no8

Related

Tcl multiple numerical starts within same string

How do I do this in tcl
Say I have a list $list1 containing 5 such entries
blah_1_X11Y2_R0/Isi_bl_X8Y0/wrap_bl_X0Y0_R0/Isine_MY/core
blah_1_X13Y2_R0/Isi_bl_X5Y0/wrap_bl_X0Y0_R0/Isine_MY/core
blah_1_X11Y2_R0/Isi_br_X7Y0/wrap_br_X1Y0_R0/Isine_R0/core
blah_1_X11Y2_R0/Isi_bl_X17Y0/wrap_bl_X0Y0_R0/Isine_MY/core
blah_1_X11Y2_R0/Isi_br_X15Y0/wrap_br_X1Y0_R0/Isine_R0/core
I want to sort them numerically and output to be like this
blah_1_X11Y2_R0/Isi_br_**X7**Y0/wrap_br_X1Y0_R0/Isine_R0/core
blah_1_X11Y2_R0/Isi_bl_**X8**Y0/wrap_bl_X0Y0_R0/Isine_MY/core
blah_1_X11Y2_R0/Isi_br_**X15**Y0/wrap_br_X1Y0_R0/Isine_R0/core
blah_1_X11Y2_R0/Isi_bl_**X17**Y0/wrap_bl_X0Y0_R0/Isine_MY/core
blah_1_**X13**Y2_R0/Isi_bl_**X5**Y0/wrap_bl_X0Y0_R0/Isine_MY/core
Thanks

This sounds almost like a job for lsort -dictionary, yet we need a bit of work to extract the numeric parts because we don't seem to want to sort on the non-numeric parts. (I'm assuming you've got your data in a list variable called data.)
# Extract the parts we want to sort on
set nums [lmap item $data {
# The collation key is a list of all digit sequences in the input value
regexp -all -inline {\d+} $item
}]
# Sort and remap back onto the original data
set sorted_data [lmap idx [lsort -dictionary -indices $nums] {
lindex $data $idx
}]
The -indices option is very useful for when you have a collation key (something you've extracted from the data that you want to sort on) as it means that you don't need to zip that into the original data to do the sort. And lmap is just so useful for these sorts of things.
The collation key extracted from:
blah_1_X11Y2_R0/Isi_bl_X8Y0/wrap_bl_X0Y0_R0/Isine_MY/core
is:
1 11 2 0 8 0 0 0 0
And I think your data gets sorted as:
blah_1_X11Y2_R0/Isi_br_X7Y0/wrap_br_X1Y0_R0/Isine_R0/core
blah_1_X11Y2_R0/Isi_bl_X8Y0/wrap_bl_X0Y0_R0/Isine_MY/core
blah_1_X11Y2_R0/Isi_br_X15Y0/wrap_br_X1Y0_R0/Isine_R0/core
blah_1_X11Y2_R0/Isi_bl_X17Y0/wrap_bl_X0Y0_R0/Isine_MY/core
blah_1_X13Y2_R0/Isi_bl_X5Y0/wrap_bl_X0Y0_R0/Isine_MY/core
If that's not quite correct, a more complex method of picking out the collation key should do the trick.

Why does special characters in my variable disappear on doing an lindex in TCL?

I have a list in my application that i work on.. Its basically like this:
$item = {text1 text2 text3}
Then I pick up the first member in the list with:
lindex $item 0
On doing this text1 which used to be (say) abcdef\12345 becomes abcdef12345.
But its very important for me to not lose this \ . Why is it disappearing. THere are other characters like - and > which don't disappear. Please note that I cannot escape the \ in the text beforehand. If there's anything I can do before operating on the $item with lindex, please suggest.

The problem is that \ is a Tcl list metasyntax character, unlike -, > or any alphanumeric. You need to convert your string into a proper Tcl list before using lindex (or any other list-consuming operation) on it. To do that, you need to understand exactly what you mean by “words” in your input data. If your input data is a sequences of non-whitespace characters separated by single whitespace characters, you can use split to do the conversion to a list:
set properList [split $item]
# Now we can use it...
set theFirstWord [lindex $properList 0]
If you've got a different separator, split takes an optional extra character to say what to split by. For example, to split by colons (:) you do:
set properList [split $item ":"]
However, if you have other sorts of splitting rules, this doesn't work so well. For example, if you can split by multiple whitespace characters, it's actually better to use regexp (with the -all -inline options) to do the word-identification:
# Strictly, this *chooses* all sequences of one or more non-whitespace characters
set properList [regexp -all -inline {\S+} $item]
You can also do splitting by multi-character sequences, though in that case it is most easily done by mapping (with string map) the multi-character sequence to a single rare character first. Unicode means that there are lots of such characters to pick…
# NUL, \u0000, is a great character to pick for text, and terrible for binary data
# For binary data, choose something beyond \u00ff
set properList [split [string map {"BOUNDARY" "\u0000"} $item] "\u0000"]
Even more complex options are possible, but that's when you use splitx from Tcllib.
package require textutil::split
# Regular expression to describe the separator; very sophisticated approach
set properList [textutil::split::splitx $item {SPL+I*T}]

In tcl Lists can be created in several ways:
by setting a variable to be a list of values
set lst {{item 1} {item 2} {item 3}}
with the split command
set lst [split "item 1.item 2.item 3" "."]
with the list command.
set lst [list "item 1" "item 2" "item 3"]
And an individual list member can be accessed with the lindex command.
set x "a b c"
puts "Item 2 of the list {$x} is: [lindex $x 2]\n"
This will give output:
Item 2 of the list {a b c} is: c
And With respect to the question asked
You need to define the variable like this abcdef\\12345
In order to make this clear try to run the following command.
puts "\nI gave $100.00 to my daughter."
and
puts "\nI gave \$100.00 to my daughter."
The second one will give you the proper result.
If you don't have the option to change the text, try to save the text in curly braces, as mentioned in the first example.
set x {abcd\12345}
puts "A simple substitution: $x\n"
Output:
A simple substitution: abcd\12345
set y [set x {abcdef\12345}]
And check for this output:
puts "Remember that set returns the new value of the variable: X: $x Y: $y\n"
Output:
Remember that set returns the new value of the variable: X: abcdef\12345 Y: abcdef\12345

lsearch does not match elements that require curly-braces (Tcl 8.4)

I'm dealing with a big number of signals. I've been able to store them into a list, but since their name have brackets the signals are store in a list. Latter on, using regexp, I analyze some output produced and, if there's a match, I needed to set a flag.
In this following example I show the element added to the list and, later one, I try to check if the same element is inside of the list using lsearch
set mylist [list]
set element {aux[1]}
lappend mylist $element
puts "mylist: $mylist \nelement: $element\n\[list element\]: [list $element]"
The result of this puts is:
mylist: {aux[1]}
element: aux[1]
[list element]: {aux[1]}
Since my element is stored as {a[1]}, I've not found a way to make lsearch to return a match
set result [lsearch $mylist $element]
set result2 [lsearch $mylist [list $element]]
puts $result
puts $result2
Both results return '-1'.
I've seen solutions, but none of them using Tcl 8.4; And I need to use it due to backwards compatibility.

Use the -exact matching style. The default style is -glob, which means that the substring [1] matches a single 1.
lsearch -exact $mylist $element
# => 0
Documentation: lsearch

How do I use regex capture group as array index?

I'm trying to use regsub in TCL to replace a string with the value from an array.
array set myArray "
one 1
two 2
"
set myString "\[%one%\],\[%two%\]"
regsub -all "\[%(.+?)%\]" $myString "$myArray(\\1)" newString
My goal is to convert a string from "[%one%],[%two%]" to "1,2". The problem is that the capture group index is not resolved. I get the following error:
can't read "myArray(\1)": no such element in array
while executing
"regsub -all "\[%(.+?)%\]" $myString "$myArray(\\1)" newString"

This is a 2 step process in Tcl. Your main mistake here is using double quotes everywhere:
array set myArray {one 1 two 2}
set myString {[%one%],[%two%]}
regsub -all {\[%(.+?)%\]} $myString {$myArray(\1)} new
puts $new
puts [subst -nobackslash -nocommand $new]
$myArray(one),$myArray(two)
1,2
So we use regsub to search for the expression and replace it with the string representation of the variable we want to expand. Then we use the rarely-used subst command to perform the variable (only) substitution.

Apart from using regsub+subst (which is a decidedly tricky pair of commands to use safely in general) you can also do relatively simple transformations using string map. The trick is in how you prepare the mapping:
# It's conventional to use [array set] like this…
array set myArray {
one 1
two 2
}
set myString "\[%one%\],\[%two%\]"
# Build the transform
set transform {}
foreach {from to} [array get myArray] {
lappend transform "\[%$from%\]" $to
}
# Apply the transform
set changedString [string map $transform $myString]
puts "transformed from '$myString' to '$changedString'"
As long as each individual thing you want to go from and to is a constant string at the time of application, you can use string map to do it. The advantage? It's obviously correct. It's very hard to make a regsub+subst transform obviously correct (but necessary if you need a more complex transform; that's the correct way to do %XX encoding and decoding in URLs for example).

TCL, regexp function and parameter substitution

I have the following code where I'm trying to match data on a single line into different variables via the regexp function.
The number of data (on the input line) and then of variable names in regexp function can vary, that's why I use $varLine (which is previously processed in my real code).
set in_stim "13 1 1 0 1 0 0 0 2 03"
set regex {^(\d+)\s([01])\s([01])\s([01])\s([01])\s([01])\s([01])\s([01])\s(\d+)\s(\d+)}
set varLine "sig1 sig2 sig3 sig4 sig5 sig6 sig7 sig8 sig8"
regexp $regex $in_stim whole sig0 $varLine
puts "sig0: $sig0"
puts $sig1
When I am executing it, I get the following error ($sig0 is correctly displayed):
sig0: 13
can't read "sig1": no such variable
while executing
"puts $sig1"
If I manually substitute $varLine into the regexp line, the error disappears:
set in_stim "13 1 1 0 1 0 0 0 2 03"
set regex {^(\d+)\s([01])\s([01])\s([01])\s([01])\s([01])\s([01])\s([01])\s(\d+)\s(\d+)}
regexp $regex $in_stim whole sig0 sig1 sig2 sig3 sig4 sig5 sig6 sig7 sig8 sig8
puts $sig0
puts $sig1
I get the following correct output:
13
1
Does anyone see mistakes in my code or could help?
Thanks!

The issue is that the regexp command doesn't take a list of variables to store submatches into as one argument, but rather as many arguments.
The simplest method of working around this is to expand the variable list:
regexp $regex $in_stim whole sig0 {*}$varLine

When you do
set varLine "sig1 sig2 sig3 sig4 sig5 sig6 sig7 sig8 sig8"
regexp $regex $in_stim whole sig0 $varLine
the Tcl parser passes the regexp command exactly five arguments, with the fifth argument being the contents of the variable "varLine" which is then treated by the regexp command as a single word. A single word obviously denotes a single variable (with somewhat complex name in your case, as it happens).
To do what you need, you have to resort to dynamic scripting which can be done in two ways:
"Classic" approach using eval:
eval [concat [list regexp $regex $in_stim whole sig0] $varLine]
Using the {*} syntactic sugar from Tcl 8.5 onwards:
regexp $regex $in_stim whole sig0 {*}$varLine
The classic approach first constructs a list of words by concatenating two lists: the "static" part of the command and then the list of variables to pass to it. Then the constructed list is evaluated as a command. You can read more on this in this classic book.
In the new-style approach, the {*} thingy is used to expand the contents of $varLine into an "in-place" list — refer to the 5th rule here.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to use regexp to grab elements of a tcl string - regex

lsearch has probably returned a list containing this 1 element. If you want to now get the elements inside this element, use a second index, to go 1 level deeper: # suppose the list in in the variable $l puts [lindex $l 0 0] # => 1 puts [lindex $l 0 1] # => no8

Related

Tcl multiple numerical starts within same string

Why does special characters in my variable disappear on doing an lindex in TCL?

lsearch does not match elements that require curly-braces (Tcl 8.4)

How do I use regex capture group as array index?

TCL, regexp function and parameter substitution

Categories

Resources