What is Tuple? And tuple vs. List vs. Vector? - clojure

Can you give me a brief description of Tuple? And when to use it over List and Vector?

Tuple is usually represented in Clojure through an associative data structure such as map {:name "david" :age 35} or record.
A vector ["david" 35] offers fast positional access (= 35 (nth ["david" 35] 1)), and you can store different types.
A list ("david" 35) or ("david" "justin" "david") offers fast access from the head and fast forward traversal. Although it can hold different types it would be most common for it to contain a single type, possibly containing duplicates, in a determined order. Contrast to a set #{"david" "justin"} which would contain no duplicates and is optimised for checking membership/presence.
Sorted sets (sorted-set) and maps (sorted-map) maintain the order of objects using a comparator.
Check out 4clojure and clojuredocs.org. Good luck!

When ever you will do more insertion/deletion operations in a data structure you should use a List. When ever in a data structure accessing of variables is very frequent use a vector.
Tuples are objects that pack elements of different types together in a single object, just like pair objects do for pairs of elements, but generalized for any number of elements.
Conceptually, they are similar to plain old data structures (C-like structs) but instead of having named data members, its elements are accessed by their order in the tuple.

Related

Clojure: how to conj to front of hash map in swap! function

This works ok, except it adds the new value to the end of the hash map:
(swap! my-atom conj #new-fields)
I need for my-atom to be the first item in #new-fields. I have tried assoc-in, cons and pretty much everything that might "put things together". What can I do to swap! in my-atom to the front of #new-fields?
Hash-maps are unordered collections; logically they do not have a "beginning" or an "end". They have an iteration order, which is an implementation detail (based on the hashes of the keys) and which users should not rely upon. This iteration order will be consistent between readings of the same map because the map is an immutable value.
It sounds like you want a different datatype, to provide predictable ordering. Sorted maps are the easiest replacement. You can create them using sorted-map (which sorts using compare on the keys), or sorted-map-by (which takes a comparator function to compare keys with). conjing a key-value pair into one will put it first iff the new key is lowest according to the comparator.
Note that these are still logical maps: If the comparator says two keys equal one another then they are the same key and the resulting map will only have one value for them.
If you can't make that fit your requirements, it sounds like you're not actually using a logical map, since the values have both an index and a key. A few alternatives if you really need to manually set the order might be
A vector of [key value] tuples or maps with a single key/value pair.
A map with composite keys [index old-key], sorted on index, where old-key is whatever keys you're using now.

Is there a linked hash set in C++?

Java has a LinkedHashSet, which is a set with a predictable iteration order. What is the closest available data structure in C++?
Currently I'm duplicating my data by using both a set and a vector. I insert my data into the set. If the data inserted successfully (meaning data was not already present in the set), then I push_back into the vector. When I iterate through the data, I use the vector.
If you can use it, then a Boost.MultiIndex with sequenced and hashed_unique indexes is the same data structure as LinkedHashSet.
Failing that, keep an unordered_set (or hash_set, if that's what your implementation provides) of some type with a list node in it, and handle the sequential order yourself using that list node.
The problems with what you're currently doing (set and vector) are:
Two copies of the data (might be a problem when the data type is large, and it means that your two different iterations return references to different objects, albeit with the same values. This would be a problem if someone wrote some code that compared the addresses of the "same" elements obtained in the two different ways, expecting the addresses to be equal, or if your objects have mutable data members that are ignored by the order comparison, and someone writes code that expects to mutate via lookup and see changes when iterating in sequence).
Unlike LinkedHashSet, there is no fast way to remove an element in the middle of the sequence. And if you want to remove by value rather than by position, then you have to search the vector for the value to remove.
set has different performance characteristics from a hash set.
If you don't care about any of those things, then what you have is probably fine. If duplication is the only problem then you could consider keeping a vector of pointers to the elements in the set, instead of a vector of duplicates.
To replicate LinkedHashSet from Java in C++, I think you will need two vanilla std::map (please note that you will get LinkedTreeSet rather than the real LinkedHashSet instead which will get O(log n) for insert and delete) for this to work.
One uses actual value as key and insertion order (usually int or long int) as value.
Another ones is the reverse, uses insertion order as key and actual value as value.
When you are going to insert, you use std::map::find in the first std::map to make sure that there is no identical object exists in it.
If there is already exists, ignore the new one.
If it does not, you map this object with the incremented insertion order to both std::map I mentioned before.
When you are going to iterate through this by order of insertion, you iterate through the second std::map since it will be sorted by insertion order (anything that falls into the std::map or std::set will be sorted automatically).
When you are going to remove an element from it, you use std::map::find to get the order of insertion. Using this order of insertion to remove the element from the second std::map and remove the object from the first one.
Please note that this solution is not perfect, if you are planning to use this on the long-term basis, you will need to "compact" the insertion order after a certain number of removals since you will eventually run out of insertion order (2^32 indexes for unsigned int or 2^64 indexes for unsigned long long int).
In order to do this, you will need to put all the "value" objects into a vector, clear all values from both maps and then re-insert values from vector back into both maps. This procedure takes O(nlogn) time.
If you're using C++11, you can replace the first std::map with std::unordered_map to improve efficiency, you won't be able to replace the second one with it though. The reason is that std::unordered map uses a hash code for indexing so that the index cannot be reliably sorted in this situation.
You might wanna know that std::map doesn't give you any sort of (log n) as in "null" lookup time. And using std::tr1::unordered is risky business because it destroys any ordering to get constant lookup time.
Try to bash a boost multi index container to be more freely about it.
The way you described your combination of std::set and std::vector sounds like what you should be doing, except by using std::unordered_set (equivalent to Java's HashSet) and std::list (doubly-linked list). You could also use std::unordered_map to store the key (for lookup) along with an iterator into the list where to find the actual objects you store (if the keys are different from the objects (or only a part of them)).
The boost library does provide a number of these types of combinations of containers and look-up indices. For example, this bidirectional list with fast look-ups example.

Why retrieving elements from CMap is not ordered

In my application, I have a CMap of CString values. After adding the elements in the Map, if I retrieve the elements in some other place, am not getting the elements in the order of insertion.Suppose I retrieve the third element, I get the fifth like that. Is it a behavior of CMap. Why this happens?
You asked for "why", so here goes:
A Map provides for an efficient way to retrieve values by key. It does this by using a clever datastructure that is faster for this than a list or an array would be (where you have to search through the whole list before you know if an element is in there or not). There are trade-offs, such as increased memory usage, and the inability to do some other things (such as knowing in which order things were inserted).
There are two common ways to implement this
a hashmap, which puts keys into buckets by hash value.
a treemap, which arranges keys into a binary tree, according to how they are sorted
You can iterate over maps, but it will be according to how they are stored internally, either in key order (treemap) or completely unpredictable (hashmap). Your CMap seems to be a hashmap.
Either way, insertion order is not preserved. If you want that, you need an extra datastructure (such as a list).
How about read documentation to CMap?
http://msdn.microsoft.com/ru-ru/library/s897094z%28v=vs.71%29.aspx
It's unordered map really. How you retrieve elements? By GetStartPosition and GetNextAssoc? http://msdn.microsoft.com/ru-ru/library/d82fyybt%28v=vs.71%29.aspx read Remark here
Remarks
The iteration sequence is not predictable; therefore, the "first element in the map" has no special significance.
CMap is a dictionary collection class that maps unique keys to values. Once you have inserted a key-value pair (element) into the map, you can efficiently retrieve or delete the pair using the key to access it. You can also iterate over all the elements in the map.

As a data container, what are the main differences between vector and list

Say we need a list of numbers, there are two definitions:
(def vector1 [1 2 3])
(def list2 '(1 2 3))
So what are the main differences?
The [1 2 3] is a vector, whereas '(1 2 3) is a list. There are different performance characteristics of these two data structures.
Vectors provide quick, indexed random access to its elements (v 34) returns element of vector v at index 34 in O(1) time. On the other hand it is generally more expensive to modify vectors.
Lists are easy to modify at head and/or tail (depending on implementation), but provide linear access to elements: (nth (list 1 2 3 4 5) 3) requires sequential scan of the list.
For more information on the performance tradeoffs you can google "vector vs. list performance" or something similar.
[FOLLOW-UP]
Ok, lets get into some more detail. First of all vectors and lists are concepts that are not specific to Clojure. Along with maps, queues, etc., they are abstract types of collections of data. Algorithms operating on data are defined in terms of those abstractions. Vectors and lists are defined by the behavior that I briefly described above (i.e. something is a vector if it has size, you can access its elements by and index in constant time etc.).
Clojure, as any other language, wants to fulfill those expectations when providing data structures that are called this way. If you'll look at the basic implementation of nth in vector, you'll see a call to arrayFor method which has the complexity of O(log32N) and a lookup in Java array which is O(1).
Why can we say that (v 34) is in fact O(1)? Because the maximum value of log32 for a Java int is around 7. This means that random access to a vector is de facto constant time.
In summary, the main difference between vectors and lists really is the performance characteristics. Additionally, as Jeremy Heiler points out, in Clojure there are logical differences in behaviour, i.e. with respect to growing the collection with conj.
There are two main differences between lists and vectors.
Lists logically grow at the head, while vectors logically grow on the tail. You can see this in action when using the conj function. It will grow the collection according to the type of collection given to it. While you can grow collections on either side, it is performant to do so in this way.
In order to retrieve the nth item in a list, the list needs to be traversed from the head. Vectors, on the other hand, are not traversed and return the nth item in O(1). (It really is O(log32n) but that is due to how the collections are implemented as persistent collections.)
vector assess time is not O(1) it's log32N
Vectors (IPersistentVector)
A Vector is a collection of values indexed by contiguous integers. Vectors support access to items by index in log32N hops. count is O(1). conj puts the item at the end of the vector. Vectors also support rseq, which returns the items in reverse order. Vectors implement IFn, for invoke() of one argument, which they presume is an index and look up in themselves as if by nth, i.e. vectors are functions of their indices.
A vector holds all data items in adjacent areas of memory, making
transfer of the entire vector easy and insertion or deletion of items
expensive when compared with lists.
Lists hold items is disjoint areas of memory, making transfer of the
entire list expensive but insertion and deletion of individual items
relatively cheap.
Classic vectors are also fixed in size, and limited to N items, while
lists can dynamically grow and shrink.
Vectors also offer indexed access to the element items. Lists don't.
The classic vector's predecessor is an array.
Modern implementations of vectors often aim at providing similar
characteristic, but the underlying data structure may in fact be a
list or a hash, and those vectors typically support dynamic
re-sizing.

Python equivalent for C++ STL vector/list containers

Is there something similar in Python that I would use for a container that's like a vector and a list?
Any links would be helpful too.
You can use the inbuilt list - underlying implementation is similar to C++ vector. Although some things differ - for example, you can put objects of different type in one and the same list.
http://effbot.org/zone/python-list.htm
N.B.: Please keep in mind that vector and list are two very different data structures. List are heterogeneous, i.e. can store different object types, while C++ vectors are homogeneous. The data in vectors is stored in linear arrangement whereas in list is a collection of references to the type and the memory address of the variables.
Have a look at Python's datastructures page. Here's a rough translation:
() => boost::Tuple (with one important distinction, you can't reassign values in a Python tuple)
[] => std::vector (as the comments have aluded towards, lacks memory characteristics associated with vectors)
[] => std::list
{} => tr1::unordered_map or boost::unordered_map (essentially a hash table)
set() => std::set
py
cpp
deque
deque
PriorityQueue (or you may use heapq)
priorityqueue
set
unordered_set
list
vector
defaultdict(int)
unordered_map
list
stack
deque
queue
dict .get(val,0)
unordered_map
in py >= 3.7, dict remember insert order. https://stackoverflow.com/a/51777540/13040423
In case you need TreeMap / TreeSet
https://github.com/grantjenks/python-sortedcontainers
Lists are sequences.
see http://docs.python.org/tutorial/datastructures.html
append is like push_back, see the other methods as well.
Python also has as part of the standard library an array type which is more efficient and the member type is constrained.
You may also look at numpy (not part of the standard library) if you need to get serious about efficient manipulation of large vectors/arrays.