Create list from list items - list

I am trying to crate a new array/list from an existing list of items. I am wanting to check if the item exist first, if it does not, create it then add a value to it. If it already exist just append a value. I also need a way to get the length of the total.
set Area {23401 23402 23403}
foreach Item $Area {
set ElExist [info exist ${Item}lst]
if {$ElExist == 0} {
set ${Item}lst {};
lappend ${Item}lst $TotalRecords
} else {
lappend ${Item}lst $TotalRecords
}
set CurrentOptinon [llength ${Item}lst]
}

If I was writing that code, I'd do it like this:
set Area {23401 23402 23403}
foreach Item $Area {
upvar 0 ${Item}lst lst
lappend lst $TotalRecords
set CurrentOptinon [llength $lst]
}
This will behave the same as your code, but it's so much shorter. Here's the tricks in use:
lappend creates a variable if it didn't already exist.
upvar 0 makes a local alias to a variable. So much simpler.
The alias removes the need for magic with llength, but otherwise you could have done:
set CurrentOptinon [llength [set ${Item}lst]]
The $ syntax is in many ways just a short-cut for a call to the single-argument version of set, which reads the named variable. Except if you write set then you can use substitutions in your variable name. As a rule of thumb, if you're extensively using variable names in variables without aliasing, you're probably doing something wrong (unless you really do need the name).
You're using weird variable names. Much better would be an array.
set Area {23401 23402 23403}
foreach Item $Area {
lappend lst($Item) $TotalRecords
set CurrentOptinon [llength $lst($Item)]
}
However, this is likely to require you to change code elsewhere.

Related

Tcl: Persistent increment inside a list

I am trying to set-up multiple incr's for each entry in a list. I thought that I could assign an integer to each list entry...
set list {
{/run 00}
{/run/shm 00}
{/boot 00}
}
and use the following code as part of a foreach loop to increment the value...
lset list 1 [expr {[lindex $list 1] + 1}]
What I am finding is that the value increments correctly but when the code executes a second and third time the value has reset to 00, so it never increases past 1 on each pass.
If I set up a basic increment for a standard variable as part of the code..
set counter 00
incr counter
it quite happily increments on each run of the code and the counter increases by 1 until I break the code.
Any advise or help in getting this working would be much appreciated. I am definitely not a tcl expert so if I am trying to accomplish this the wrong way please let me know. :)
Thanks in advance for your help.
If you change your data structure slightly to flatten it out instead of using a list of pairs, it becomes usable as a dict. And there's a dict incr command:
This adds the given increment value (an integer that defaults to 1 if not specified) to the value that the given key maps to in the dictionary value contained in the given variable, writing the resulting dictionary value back to that variable. Non-existent keys are treated as if they map to 0. It is an error to increment a value for an existing key if that value is not an integer. The updated dictionary value is returned.
Example usage:
% set list {/run 0 /run/shm 0 /boot 0}
/run 0 /run/shm 0 /boot 0
% dict incr list /boot
/run 0 /run/shm 0 /boot 1
% puts $list
/run 0 /run/shm 0 /boot 1
If you want to do this in a command, you have to pass by name and use upvar so the changes are made in the right stack frame:
% proc demo {fstab_} {
upvar 1 $fstab_ fstab
dict incr fstab /run
}
% demo list
/run 1 /run/shm 0 /boot 1
% puts $list
/run 1 /run/shm 0 /boot 1
And to update every value:
% foreach dir [dict keys $list] { dict incr list $dir }
% puts $list
/run 2 /run/shm 1 /boot 2
I would expect that doing this:
for {set idx 0} {$idx < [llength $list]} {incr idx} {
lset list $idx 1 [expr {[lindex $list $idx 1] + 1}]
}
would increment every numeric value in that list, which is what I believe you want to do. However, doing this:
foreach pair $list {
lset pair 1 [expr {[lindex $pair 1] + 1}]
}
will not work. Tcl conceptually copies the sublist-items out of the main list in foreach so that the changes to the pairs aren't reflected back. Also, conceptually Tcl also copies the value to hand to foreach in the first place. Of course, these copies are not actually real, as that would be very expensive! Instead Tcl uses shared references with copy-on-write-to-shared semantics, a system that works very well given that we can check the sharing status very cheaply (which is enabled by Tcl's threading model; values are never shared between threads, so sharing-state decisions can be lock-free and local).
A consequence of this is that Tcl explicitly rejects weird at-a-distance state changes of the kinds that cause weird bugs sometimes in languages with different semantics. If you're changing something, it'd better be a variable (as those are the main mutable things) and you'll have it right there in front of you when you do the change.

Looping over many names which don't have rules

When there are just few names, looping in Stata is easy.
Also, when there is a rule as to how the names change (e.g. increment) I can do the following:
forval i = 1/5 {
...
}
However, there are cases where i have hundreds of names that I need to loop over, which don't have rules of increment.
For example:
48700 48900 48999 49020 49180 49340 ...
Is there some short-hand way of writing the loop?
Or do I just have to painstakingly list all of them?
The answer is it depends.
If these are part of variable names, you can do something like this:
clear
set obs 5
foreach var in 48700 48900 48999 49020 49180 49340 {
generate var`var' = runiform()
}
ds
var48700 var48900 var48999 var49020 var49180 var49340
ds var48*
var48700 var48900 var48999
local names `r(varlist)'
foreach var of local names {
display `var'
}
.41988069
.06420179
.36276805
If these are file names, a macro extended function can be handy:
dir, w
48700.rtf 48999.rtf 49180.rtf
48900.rtf 49020.rtf 49340.rtf
local list : dir . files "*"
display `list'
48700.rtf48900.rtf48999.rtf49020.rtf49180.rtf49340.rtf
local list : dir . files "48*"
display `list'
48700.rtf48900.rtf48999.rtf
foreach fil of local list {
display "`fil'"
}
48700.rtf
48900.rtf
48999.rtf
EDIT:
The above approaches are concerned with how to efficiently get all relevant names in a local macro.
If you already know the names and you merely want a cleaner way to write the loop (or want to re-use the names in several loops), you can simply assign these in a local macro yourself:
local names var48700 var48900 var48999 var49020 var49180 var49340
foreach var of local names {
display `var'
}
.41988069
.06420179
.36276805
.52763051
.16493952
.66403782
The local macro names will automatically expand during run time to include all the specified items.

How to avoid backslashitis?

I have the following code:
proc list_backslash {} {
array unset options
array set options {
-inputs {vdd}
-outputs {vss}
}
set inputs { vdd2 vdd dvdd }
set outputs { vss2 vss dvss }
set updateOptions [ list \
-inputs $inputs \
-outputs $outputs ]
array set options $updateOptions
foreach {k v} [array get options] {
puts "$k => $v"
}
}
Since I have a lot of key-value pairs in updateOptions, there is a severe backslashitis! Is there a better way to do code updateOptions? I tried subst + braces {} and realized it does not preserve the list structure thus dooming it.
Generally speaking, if you need to continue a line you have to use a quoting mechanism of some kind with Tcl. Otherwise, the command call ends when the line ends. The [brackets] can include multiple statements too; it's legal, but really not recommended.
But that does mean that sometimes you've got awkward alternatives. Perhaps you'll be best off with doing this:
set updateOptions {
-inputs $inputs
-outputs $outputs
}
foreach {key value} $updateOptions {
set options($key) [subst $value]
}
The array set command isn't especially efficient until you get to huge numbers of options (many thousands) when the code is inside a procedure.
Or if you've got Tcl 8.6, dict map is perhaps better:
array set options [dict map {key value} $updateOptions {subst $value}]
Be aware that subst is not a particularly efficient command in Tcl 8.6 except when used with a literal argument. That's because with variable arguments, it compiles them to bytecode at runtime.
option a) Put it all on one line.
option b) Structure the code as:
set options(-inputs) $inputs
set options(-outputs) $outputs
option c) Learn to like backslashes.

Error in regex hyphen usage inside brackets utilizing perl

I have this perl script that compares two arrays to give me back those results found in both of them. The problem arises I believe in a regular expression, where it encounters a hyphen ( - ) inside of brackets [].
I am getting the following error:
Invalid [] range "5-3" in regex; marked by <-- HERE in m/>gi|403163623|ref|XP_003323683.2| leucyl-tRNA synthetase [Puccinia graminis f. sp. tritici CRL 75-3 <-- HERE 6-700-3]
MAQSTPSSIQELMDKKQKEATLDMGGNFTKRDDLIRYEKEAQEKWANSNIFQTDSPYIENPELKDLSGEE
LREKYPKFFGTFPYPYMNGSLHLGHAFTISKIEFAVGFERMRGRRALFPVGWHATGMPIKSASDKIIREL
EQFGQDLSKFDSQSNPMIETNEDKSATEPTTASESQDKSKAKKGKIQAKSTGLQYQFQIMESIGVSRTDI
PKFADPQYWLQYFPPIAKNDLNAFGARVDWRRSFITTDINPYYDAFVRWQMNRLKEKGYVKFGERYTIYS
PKDGQPCMDHDRSSGERLGSQEYTCLKMKVLEWGPQAGDLAAKLGGKDVFFV at comparer line 21, <NUC> chunk 168.
I thought the error could be solved by just adding \Q..\E in the regex so as to bypass the [] but this has not worked. Here is my code, and thanks in advance for any and all help that you may offer.
#cyt = <CYT>;
#nuc = <NUC>;
$cyt = join ('',#cyt);
$cyt =~ /\[([^\]]+)\]/g;
#shared = '';
foreach $nuc (#nuc) {
if ($cyt =~ $nuc) {
push #shared, $nuc;
}
}
print #shared;
What I am trying to achieve with this code is compare two different lists loaded into the arrays #cyt and #nuc. I then compare the name in between the [] of one of the elements in list to to the name in [] of the other. All those finds are then pushed into #shared. Hope that clarifies it a bit.
Your question describes a set intersection, which is covered in the Perl FAQ.
How do I compute the difference of two arrays? How do I compute the intersection of two arrays?
Use a hash. Here's code to do both and more. It assumes that each
element is unique in a given array:
my (#union, #intersection, #difference);
my %count = ();
foreach my $element (#array1, #array2) { $count{$element}++ }
foreach my $element (keys %count) {
push #union, $element;
push #{ $count{$element} > 1 ? \#intersection : \#difference }, $element;
}
Note that this is the symmetric difference, that is, all elements in
either A or in B but not in both. Think of it as an xor operation.
Applying it to your problem gives the code below.
Factor out the common code to find the names in the data files. This sub assumes that
every [name] will be entirely contained within a given line rather than crossing a newline boundary
each line of input will contain at most one [name]
If these assumptions are invalid, please provide more representative samples of your inputs.
Note the use of the /x regex switch that tells the regex parser to ignore most whitespace in patterns. In the code below, this permits visual separation between the brackets that are delimiters and the brackets surrounding the character class that captures names.
sub extract_names {
my($fh) = #_;
my %name;
while (<$fh>) {
++$name{$1} if /\[ ([^\]]+) \]/x;
}
%name;
}
Your question uses old-fashioned typeglob filehandles. Note that the paramter extract_names expects is a filehandle. Convenient parameter passing is one of many benefits of indirect filehandles, such as those created below.
open my $cyt, "<", "cyt.dat" or die "$0: open: $!";
open my $nuc, "<", "nuc.dat" or die "$0: open: $!";
my %cyt = extract_names $cyt;
my %nuc = extract_names $nuc;
With the names from cyt.dat in the hash %cyt and likewise for nuc.dat and %nuc, the code here iterates over the keys of both hashes and increments the corresponding keys in %shared.
my %shared;
for (keys %cyt, keys %nuc) {
++$shared{$_};
}
At this point, %shared represents a set union of the names in cyt.dat and nuc.dat. That is, %shared contains all keys from either %cyt or %nuc. To compute the set difference, we observe that the value in %shared for a key present in both inputs must be greater than one.
The final pass below iterates over the keys in sorted order (because hash keys are stored internally in an undefined order). For truly shared keys (i.e., those whose values are greater than one), the code prints them and deletes the rest.
for (sort keys %shared) {
if ($shared{$_} > 1) {
print $_, "\n";
}
else {
delete $shared{$_};
}
}

Is there a data structure similar to std::set in TCL?

TCL has a data structure called dict which maintains a collection of key-value pairs.
Is there another data structure which maintains a collection of keys (with no values)?
If no, then maybe someone already wrote a simple adapter on dict with empty values?
You could use the tcllib package ::struct::set.
http://tcllib.sourceforge.net/doc/struct_set.html
Just use a single list.
set example [list "key1" "key2" "key3"]
if {[lsearch -exact $example "key3"] != -1} {
puts "found your key!"
} else {
puts "your key is not in the list"
}
Maybe you should ask a more specific question to get a more accurate answer.
An alternative for dict is array which doesn't preserve the order of keys.
Another approach would be to accumulate everything into say, $bucket.
Then do:
set uniqueItems [lsort -unique $bucket]