Tcl multiple numerical starts within same string - list

How do I do this in tcl
Say I have a list $list1 containing 5 such entries
blah_1_X11Y2_R0/Isi_bl_X8Y0/wrap_bl_X0Y0_R0/Isine_MY/core
blah_1_X13Y2_R0/Isi_bl_X5Y0/wrap_bl_X0Y0_R0/Isine_MY/core
blah_1_X11Y2_R0/Isi_br_X7Y0/wrap_br_X1Y0_R0/Isine_R0/core
blah_1_X11Y2_R0/Isi_bl_X17Y0/wrap_bl_X0Y0_R0/Isine_MY/core
blah_1_X11Y2_R0/Isi_br_X15Y0/wrap_br_X1Y0_R0/Isine_R0/core
I want to sort them numerically and output to be like this
blah_1_X11Y2_R0/Isi_br_**X7**Y0/wrap_br_X1Y0_R0/Isine_R0/core
blah_1_X11Y2_R0/Isi_bl_**X8**Y0/wrap_bl_X0Y0_R0/Isine_MY/core
blah_1_X11Y2_R0/Isi_br_**X15**Y0/wrap_br_X1Y0_R0/Isine_R0/core
blah_1_X11Y2_R0/Isi_bl_**X17**Y0/wrap_bl_X0Y0_R0/Isine_MY/core
blah_1_**X13**Y2_R0/Isi_bl_**X5**Y0/wrap_bl_X0Y0_R0/Isine_MY/core
Thanks

This sounds almost like a job for lsort -dictionary, yet we need a bit of work to extract the numeric parts because we don't seem to want to sort on the non-numeric parts. (I'm assuming you've got your data in a list variable called data.)
# Extract the parts we want to sort on
set nums [lmap item $data {
# The collation key is a list of all digit sequences in the input value
regexp -all -inline {\d+} $item
}]
# Sort and remap back onto the original data
set sorted_data [lmap idx [lsort -dictionary -indices $nums] {
lindex $data $idx
}]
The -indices option is very useful for when you have a collation key (something you've extracted from the data that you want to sort on) as it means that you don't need to zip that into the original data to do the sort. And lmap is just so useful for these sorts of things.
The collation key extracted from:
blah_1_X11Y2_R0/Isi_bl_X8Y0/wrap_bl_X0Y0_R0/Isine_MY/core
is:
1 11 2 0 8 0 0 0 0
And I think your data gets sorted as:
blah_1_X11Y2_R0/Isi_br_X7Y0/wrap_br_X1Y0_R0/Isine_R0/core
blah_1_X11Y2_R0/Isi_bl_X8Y0/wrap_bl_X0Y0_R0/Isine_MY/core
blah_1_X11Y2_R0/Isi_br_X15Y0/wrap_br_X1Y0_R0/Isine_R0/core
blah_1_X11Y2_R0/Isi_bl_X17Y0/wrap_bl_X0Y0_R0/Isine_MY/core
blah_1_X13Y2_R0/Isi_bl_X5Y0/wrap_bl_X0Y0_R0/Isine_MY/core
If that's not quite correct, a more complex method of picking out the collation key should do the trick.

Related

Why does special characters in my variable disappear on doing an lindex in TCL?

I have a list in my application that i work on.. Its basically like this:
$item = {text1 text2 text3}
Then I pick up the first member in the list with:
lindex $item 0
On doing this text1 which used to be (say) abcdef\12345 becomes abcdef12345.
But its very important for me to not lose this \ . Why is it disappearing. THere are other characters like - and > which don't disappear. Please note that I cannot escape the \ in the text beforehand. If there's anything I can do before operating on the $item with lindex, please suggest.
The problem is that \ is a Tcl list metasyntax character, unlike -, > or any alphanumeric. You need to convert your string into a proper Tcl list before using lindex (or any other list-consuming operation) on it. To do that, you need to understand exactly what you mean by “words” in your input data. If your input data is a sequences of non-whitespace characters separated by single whitespace characters, you can use split to do the conversion to a list:
set properList [split $item]
# Now we can use it...
set theFirstWord [lindex $properList 0]
If you've got a different separator, split takes an optional extra character to say what to split by. For example, to split by colons (:) you do:
set properList [split $item ":"]
However, if you have other sorts of splitting rules, this doesn't work so well. For example, if you can split by multiple whitespace characters, it's actually better to use regexp (with the -all -inline options) to do the word-identification:
# Strictly, this *chooses* all sequences of one or more non-whitespace characters
set properList [regexp -all -inline {\S+} $item]
You can also do splitting by multi-character sequences, though in that case it is most easily done by mapping (with string map) the multi-character sequence to a single rare character first. Unicode means that there are lots of such characters to pick…
# NUL, \u0000, is a great character to pick for text, and terrible for binary data
# For binary data, choose something beyond \u00ff
set properList [split [string map {"BOUNDARY" "\u0000"} $item] "\u0000"]
Even more complex options are possible, but that's when you use splitx from Tcllib.
package require textutil::split
# Regular expression to describe the separator; very sophisticated approach
set properList [textutil::split::splitx $item {SPL+I*T}]
In tcl Lists can be created in several ways:
by setting a variable to be a list of values
set lst {{item 1} {item 2} {item 3}}
with the split command
set lst [split "item 1.item 2.item 3" "."]
with the list command.
set lst [list "item 1" "item 2" "item 3"]
And an individual list member can be accessed with the lindex command.
set x "a b c"
puts "Item 2 of the list {$x} is: [lindex $x 2]\n"
This will give output:
Item 2 of the list {a b c} is: c
And With respect to the question asked
You need to define the variable like this abcdef\\12345
In order to make this clear try to run the following command.
puts "\nI gave $100.00 to my daughter."
and
puts "\nI gave \$100.00 to my daughter."
The second one will give you the proper result.
If you don't have the option to change the text, try to save the text in curly braces, as mentioned in the first example.
set x {abcd\12345}
puts "A simple substitution: $x\n"
Output:
A simple substitution: abcd\12345
set y [set x {abcdef\12345}]
And check for this output:
puts "Remember that set returns the new value of the variable: X: $x Y: $y\n"
Output:
Remember that set returns the new value of the variable: X: abcdef\12345 Y: abcdef\12345

How to use regexp to grab elements of a tcl string

I have extracted some data from a tabular column using "lsearch" and now have a TCL variable like this
{ 1 no8 MASTER (UP-DOWN) ABCD 1456 /clown F right_left_123 /local/opt/data WXYZ (M5,N6) }
How can I now use "regexp" to grab each of these values into separate variables? I guess I will have to filter by space, but the blank space between these values are variable.Also, I am a "regexp" newbie.
I tried using "lindex" but looks like the entire element is in index 0. Please let me know what is the easiest way.
lsearch has probably returned a list containing this 1 element. If you want to now get the elements inside this element, use a second index, to go 1 level deeper:
# suppose the list in in the variable $l
puts [lindex $l 0 0]
# => 1
puts [lindex $l 0 1]
# => no8

lsearch does not match elements that require curly-braces (Tcl 8.4)

I'm dealing with a big number of signals. I've been able to store them into a list, but since their name have brackets the signals are store in a list. Latter on, using regexp, I analyze some output produced and, if there's a match, I needed to set a flag.
In this following example I show the element added to the list and, later one, I try to check if the same element is inside of the list using lsearch
set mylist [list]
set element {aux[1]}
lappend mylist $element
puts "mylist: $mylist \nelement: $element\n\[list element\]: [list $element]"
The result of this puts is:
mylist: {aux[1]}
element: aux[1]
[list element]: {aux[1]}
Since my element is stored as {a[1]}, I've not found a way to make lsearch to return a match
set result [lsearch $mylist $element]
set result2 [lsearch $mylist [list $element]]
puts $result
puts $result2
Both results return '-1'.
I've seen solutions, but none of them using Tcl 8.4; And I need to use it due to backwards compatibility.
Use the -exact matching style. The default style is -glob, which means that the substring [1] matches a single 1.
lsearch -exact $mylist $element
# => 0
Documentation: lsearch

How do I use regex capture group as array index?

I'm trying to use regsub in TCL to replace a string with the value from an array.
array set myArray "
one 1
two 2
"
set myString "\[%one%\],\[%two%\]"
regsub -all "\[%(.+?)%\]" $myString "$myArray(\\1)" newString
My goal is to convert a string from "[%one%],[%two%]" to "1,2". The problem is that the capture group index is not resolved. I get the following error:
can't read "myArray(\1)": no such element in array
while executing
"regsub -all "\[%(.+?)%\]" $myString "$myArray(\\1)" newString"
This is a 2 step process in Tcl. Your main mistake here is using double quotes everywhere:
array set myArray {one 1 two 2}
set myString {[%one%],[%two%]}
regsub -all {\[%(.+?)%\]} $myString {$myArray(\1)} new
puts $new
puts [subst -nobackslash -nocommand $new]
$myArray(one),$myArray(two)
1,2
So we use regsub to search for the expression and replace it with the string representation of the variable we want to expand. Then we use the rarely-used subst command to perform the variable (only) substitution.
Apart from using regsub+subst (which is a decidedly tricky pair of commands to use safely in general) you can also do relatively simple transformations using string map. The trick is in how you prepare the mapping:
# It's conventional to use [array set] like this…
array set myArray {
one 1
two 2
}
set myString "\[%one%\],\[%two%\]"
# Build the transform
set transform {}
foreach {from to} [array get myArray] {
lappend transform "\[%$from%\]" $to
}
# Apply the transform
set changedString [string map $transform $myString]
puts "transformed from '$myString' to '$changedString'"
As long as each individual thing you want to go from and to is a constant string at the time of application, you can use string map to do it. The advantage? It's obviously correct. It's very hard to make a regsub+subst transform obviously correct (but necessary if you need a more complex transform; that's the correct way to do %XX encoding and decoding in URLs for example).

Is there a way to do multiple substitutions using regsub?

Is it possible to have do different substitutions in an expression using regsub?
example:
set a ".a/b.c..d/e/f//g"
Now, in this expression, is it possible to substitute
"." as "yes"
".." as "no"
"/" as "true"
"//" as "false" in a single regsub command?
With a regsub, no. There's a long-standing feature request for this sort of thing (which requires substitution with the result of evaluating a command on the match information) but it's not been acted on to date.
But you can use string map to do what you want in this case:
set a ".a/b.c..d/e/f//g"
set b [string map {".." "no" "." "yes" "//" "false" "/" "true"} $a]
puts "changed $a to $b"
# changed .a/b.c..d/e/f//g to yesatruebyescnodtrueetrueffalseg
Note that when building the map, if any from-value is a prefix of another, the longer from-value should be put first. (This is because the string map implementation checks which change to make in the order you list them in…)
It's possible to use regsub and subst to do multiple-target replacements in a two-step process, but I don't advise it for anything other than very complex cases! A nice string map is far easier to work with.
You may also try to do it yourself. This is a draft proc which you could use as a starting point. It is not production ready, and you must be carefull because substitutions after the first one work on already substituted string.
These are the parameters:
options is a list of options that will be passed to every call to regsub
resubList is a list of key/value pairs, where the key is a regular expression and the value is a substitution
string is the string you want to substitute
This is the procedure, and it simply calls regsub multiple times, once for every element in resubList and, at the end, it returns the final string.
proc multiregsub {options resubList string} {
foreach {re sub} $resubList {
set string [regsub {*}$options -- $re $string $sub]
}
return $string
}