I'm familiar with finding the intersection of two lists, however, I wanted to find the union of two lists in tcl (while eliminating duplicates). I do have a working copy of this code, but I'm not sure it is robust enough for any kind/number of lists and am hence looking for a better solution.
Any help or ideas are appreciated.
If you treat lists as sets, so you don't worry about order if the items, you could just sort the joined list:
set union [lsort -unique [list {*}$list1 {*}$list2]]
Tclx provides a union command:
% info patchlevel
8.5.9
% set a [list a b c]
a b c
% set b [list a d e]
a d e
% package require Tclx
8.4
% union $a $b
a b c d e
%
% union
wrong # args: should be "union lista listb"
%
One way that doesn't need sorting is to use dictionary keys as sets:
% set a [list a b c]
a b c
% set b [list a d e]
a d e
% set d {}
% foreach k $a { dict set d $k . }
% foreach k $b { dict set d $k . }
% set c [dict keys $d]
a b c d e
This has the advantage of not needing to sort at all, which can help quite a lot with large input sets.
Related
I have two lists
set a1 {a b c d e f}
set b1 {b f e}
I am trying to do remove_from_list $a1 $b1 >> {a c d}
Is there a function that can operate on lists on tcl?
To begin with, you can't use brackets for list literals. Brackets are used for command substitution in Tcl.
Instead, use the list-making command, list
% set a1 [list a b c d e f]
a b c d e f
Or, more or less equivalently:
% set b1 {b f e}
b f e
There is no standard command to subtract one list from another. It's very easy to construct a filter, though.
You can filter out items in a1 that are in b1 by an lmap (list map) filter:
lmap item $a1 {
if {$item ni $b1} {
set item
} else {
continue
}
}
# results in
a c d
The if command inside the lmap body determines if an item is not in (ni)$b1. If this is true, the value of the item becomes a part of the result. If it is false, the item is skipped.
In Tcl 8.5 and earlier, there is no lmap. In that case, one can copy-and-paste an ersatz lmap (works the same) into the code: link below.
Or, one can use foreach instead. It's a little messier but works.
set res {}
foreach item $a1 {
if {$item ni $b1} {
lappend res $item
}
}
% set res
a c d
Documentation:
continue,
foreach,
if,
lappend,
list,
lmap (for Tcl 8.5),
lmap,
ni (operator),
set
You can also use an array. It's easy to add and remove elements
% foreach elem $a1 {set x($elem) 1}
% foreach elem $b1 {unset x($elem)}
% set result [array names x]
d a c
It's a pretty efficient approach too, only a single pass through each list.
Or use a dictionary to maintain the original insertion order:
% foreach elem $a1 {dict set y $elem 1}
% foreach elem $b1 {dict unset y $elem}
% set result [dict keys $y]
a c d
# with struct::set from tcllib
package require struct::set
set a1 {a b c d e f}
set b1 {b f e}
struct::set difference $a1 $b1
# result in
d a c
Dokumentation:
struct::set
See code below:
set k [list]
foreach i [list 1 2] {
lappend k [ list "hey" [ list "ho" [ list $i ] ] ]
}
puts [ join $k ",and,"]
exit
The result is:
hey {ho 1},and,hey {ho 2}
But I expected the result to look like:
hey {ho {1}},and,hey {ho {2}}
Any ideas why is that so?
Thanks.
If anyone of the list command's arguments are more than elements one, then only that corresponding indexed element's return value will have the braced list form.
% list a b c; # All list args are having only single element
a b c
% list "a" "b" "c"; # Same as above
a b c
% list {a} {b} {c}; # Same again...
a b c
% list "a b" c d; # Here, 1st arg is having 2 elements.
{a b} c d
%
Tcl's wiki page already mentioned about bizarre behavior of the nested lists in only one case, which is
% list [list [list x]]
x
It means that Tcl lists alone cannot be used to represent ALL kinds of data structures, as Tcl lists magically collapse when it's a series of nested lists with the terminal list having only a single bare word that requires no escaping.
Update :
More importantly, if the arg is having a space in it,
% list "x "
{x }
% list "x"
x
%
Since the space has to be considered as well, Tcl has no other way, but to enclose the braces.
i have a tcl list as below.
set mylist [list a b c d e]; # could be more
Now i am doing some processing if the list contains the items "c", "d", "e". But i need to skip that processing if and only if the list has either of the below values:
set mylist [list a];
OR
set mylist [list b];
OR
set mylist [list a b];
So if mylist is any of the above three, i skip the processing. But lets say if the list has any values other than the above three combinations, i do the processing.
What is the most efficient way of searching if the list has any of the three combinations.
I have the basic code which is fulfilling my requirement, but i was looking for more efficient way as i am not much familiar with tcl containers.
set mylist [list a];
if {[llength $mylist] == 2 && ([lindex $mylist 0] eq "a" || [lindex $mylist 0] eq "b") && ([lindex $mylist 1] eq "a" || [lindex $mylist 1] eq "b")} {
puts "1. skip the processing"
} elseif {[llength $mylist] == 1 && ([lindex $mylist 0] eq "a" || [lindex $mylist 0] eq "b")} {
puts "2. skip the processing"
} else {
puts "Do the processing"
}
I was wondering if there is any other efficient way to perform the same.
if {$mylist in {a b {a b}}} {
puts "skip the processing"
}
A list isn't a string, but we can usually compare lists to strings for equality and order. A list with a single element "a" is comparable to the string "a". If you want to know if a given string is equal to any of the lists in the question, the easiest way is to check if the value of the list is a member of the list {a b {a b}}.
Note: This particular solution does not solve all aspects of list equality in general. It works in those cases where it works.
Efficiency
Is it really efficient to compare a list to a string when this will cause automatic, repeated reconstruction of the internal representation of the data ("shimmering"). Actually, it is. If one compares the procedures
proc foo1 mylist {
set a 0
if {$mylist in {a b {a b}}} {set a 92}
return $a
}
proc foo2 mylist {
set a 0
if {$mylist in [list [list a] [list b] [list a b]]} {set a 92}
return $a
}
then foo1 seems to be faster than foo2 (different machines may produce different results).
Constructing a list inside the condition evaluation code does not seem to add very much time. This procedure
proc foo3 mylist {
set a 0
set x [list [list a] [list b] [list a b]]
if {$mylist in $x} {set a 92}
return $a
}
is somewhere in between foo1 and foo2 in speed, but not significantly faster than foo2.
One can also do this by invoking lsearch:
proc foo4 mylist {
set a 0
set x [list [list a] [list b] [list a b]]
if {[lsearch $x $mylist] >= 0} {set a 92}
return $a
}
proc foo5 mylist {
set a 0
set x [list [list a] [list b] [list a b]]
set i [lsearch $x $mylist]
if {$i >= 0} {set a 92}
return $a
}
which is comparable to foo2 and foo3.
(In case it needs to be said, lsearch is more versatile than the in operator, offering e.g. case insensitive lookup, regex lookup, etc. If you need such things, lsearch is the best option.)
I've deleted most of my observations and theories on speed after timing the procedures on another machine, which showed quite different results. foo1 was consistently faster on both machines, though. Since that code is simpler than the other alternatives, I would say this is the way to do it. But to be sure, one needs to time the procedure with one's own machine, whitelist, and code to be performed.
Finally, none of this really matters if I/O occurs inside the procedures, since the I/O will be so much slower than anything else.
Documentation: if, list, lsearch, proc, puts, return, set
I have a "flat" Tcl list. Now I want to append a new element as a child to one of the existing elements. How can I do this?
This is what I tried:
[ lindex $flights $i ] [ lindex $flight 0 ] ]
I try to add an element form the list "flight" to an element of the list "flights". The element $i in the flights list already exists.
I might be running against Tcl syntax as I'm new to Tcl.
Thanks for your help.
You can use lset to replace an element of your list with a new list. http://www.tcl.tk/man/tcl8.5/TclCmd/lset.htm The first element of the new list will be the old element, the 2nd element will be its child. Here's an example:
% set flights [list a b c d e]
a b c d e
% set i 1
1
% lset flights $i [list b child]
a {b child} c d e
% lindex $flights 1
b child
% lindex [lindex $flights 1] 1
child
% lindex [lindex $flights 1] 0
b
I have a c function (dbread) that reads 'fields' from a 'database'. Most of those fields are single valued; but sometimes they are multi-valued. So I had c code that said
if valcount == 1
return string
else
make list
foreach item in vals
append to list
return list
Because i thought most of the time people want a scalar.
However doing this leads to some odd parsing errors. Specifically if I want to add a value
set l [dbread x] # get current c value
lappend l "extra value" # add a value
dbwrite x {*}$l # set it back to db
If x has single value and that value contains spaces the lappend parses wrong. I get a list with 3 items not 2. I see that this is because it is passed something that is not a list and it parses it to a list and sees 2 items.
set l "foo bar"
lappend l "next val" # string l is parsed into list -> [list foo bar]
so I end up with [list foo bar {next val}]
Anyway, the solution is to make dbread always return a list - even if there is only one item. My question is - is there any downside to this? Are there surprises lurking for the 90% case where people would expect a scalar
The alternative would be to do my own lappend that checks for llength == 1 and special cases it
I think it's cleaner to have an API which always returns a list of results, be it one result or many. Then there's no special casing needed.
No downside, only upside.
Think about it, what if you move away from returning a single scalar and have a case in the future where you're returning a single value that happens to be a string with a space in it. If you didn't construct a list of that single value, you'd treat it as two values (because Tcl would shimmer the string into a list of two things). By always constructing a list of return values, all the code using your API will handle this correctly.
Just because Tcl doesn't have strict typing doesn't mean it's good style to return different types at different times.
One of the approaches I have taken in the past (when the data for each row could contain nulls or empty strings), was to use a list of lists of list:
{{a b} {c d}} ;# two rows, each with two elements
{{{} b} {c d}} ;# two rows, first element of first row is null
;# llength [lindex [lindex {{{} b} {c d}} 0] 0] -> 0
{ { {{}} b } { c d } }
;# two rows, first element of first row is the empty string
;# llength [lindex [lindex {{{{}} b} {c d}} 0] 0] -> 1
It looks complicated, but it's really not if you treat the actual data items as an opaque data structure and add accessors to use it:
foreach row $db_result {
foreach element $row {
if {[db_isnull $element]} {
puts "null"
} elseif {![string length [db_value $element]]} {
puts "empty string"
} else {
puts [db_value $element]
}
}
}
Admittedly, far more complicated than you're looking for, but I thought it worth mentioning.