Python equivalent for C++ STL vector/list containers - c++

Is there something similar in Python that I would use for a container that's like a vector and a list?
Any links would be helpful too.

You can use the inbuilt list - underlying implementation is similar to C++ vector. Although some things differ - for example, you can put objects of different type in one and the same list.
http://effbot.org/zone/python-list.htm
N.B.: Please keep in mind that vector and list are two very different data structures. List are heterogeneous, i.e. can store different object types, while C++ vectors are homogeneous. The data in vectors is stored in linear arrangement whereas in list is a collection of references to the type and the memory address of the variables.

Have a look at Python's datastructures page. Here's a rough translation:
() => boost::Tuple (with one important distinction, you can't reassign values in a Python tuple)
[] => std::vector (as the comments have aluded towards, lacks memory characteristics associated with vectors)
[] => std::list
{} => tr1::unordered_map or boost::unordered_map (essentially a hash table)
set() => std::set

py
cpp
deque
deque
PriorityQueue (or you may use heapq)
priorityqueue
set
unordered_set
list
vector
defaultdict(int)
unordered_map
list
stack
deque
queue
dict .get(val,0)
unordered_map
in py >= 3.7, dict remember insert order. https://stackoverflow.com/a/51777540/13040423
In case you need TreeMap / TreeSet
https://github.com/grantjenks/python-sortedcontainers

Lists are sequences.
see http://docs.python.org/tutorial/datastructures.html
append is like push_back, see the other methods as well.

Python also has as part of the standard library an array type which is more efficient and the member type is constrained.
You may also look at numpy (not part of the standard library) if you need to get serious about efficient manipulation of large vectors/arrays.

Related

C++ multi-index map implementation

I'm implementing a multi-index map in C++11, which I want to be optimized for specific features. The problem I'm currently trying to solve, is to not store key elements more then once. But let me explain.
The problem arose from sorting histograms to overlay them in different combinations. The histograms had names, which could be split into tokens (properties).
Here are the features I want my property map to have:
Be able to loop over properties in any order;
Be able to return container with unique values for each property;
Accumulate properties' values in the order they arrive, but to be able to sort properties using a custom comparison operator after the map is filled;
I have a working implementation in C++11 using std::unordered_map with std::tuple as key_type. I'm accumulating property values as they arrive into a tuple of forward_lists. The intended use, is to iterate over the lists to compose keys.
The optimization I would like to introduce, is to only store properties' value in the lists, and not store them in tuples used as keys in the map. I'd like to maintain ability to have functions returning const references to lists of property values, instead of lists of some wrappers.
I know that boost::multi_index has similar functionality, but I don't need the overhead of sorting as the keys arrive. I'd like to have new property values stored sequentially, and only be sortable postfactum. I've also looked at boost::flyweight, but in the simplest approach, the lists will then be of flyweight<T> instead of T, and I'd like to not do that. (If that IS the best solution, I could definitely live with it.)
I know that lists are stable, i.e. once an element is created, its pointer and iterator remain valid, even after invoking list::sort(). Knowing that, can something be done to the map, to eliminate redundant copies of tuple elements? Could a custom map allocator help here?
Thanks for suggestions.
Have your map be from tuples of iterators to your prop containers.
Write a hash the dereferences the iterators and combines the result.
Replace the forward list prop containers with sets that first order on hash, then contents.
Do lookup by first finding in set, then doing lookup in hash.
If you need a different order for props, have another container of set iterators.

Reference to a subset of a container object

i have a quick question about having a reference to a subset of a collection.
Consider i have a vector of objects. Now I want to create another vector which is a subset of this vector, and I dont want to create a copy of the subset of objects.
One of the ways I was thinking about is creating a vector<auto_ptr<MyClass> >. Is this a good approach?
Please suggest if you think any other containers or idioms or patterns would help in this case.
Thank you
No ! See : Why it is wrong to use std::auto_ptr<> with STL containers ?
Now, as an alternative, you could store raw pointers or boost::shared_ptr depending on your needs.
Another, possibly more STL way would be to just have the one vector but keep track of sub ranges using pairs of iterators (note all the algorithms use iterators for exactly this reason)
You can use a vector of indices: vector<int> (or vector<size_t> if you want to be pedantic). This is better than storing pointers (pointers in a general meaning: raw C/C++ pointers, shared_ptr, iterator, etc) if the containing vector is not constant.
Consider the following scenario: the "big" vector contains an apple, an orange and a lemon, while the "small" vector contains a pointer to the apple. If you add a bunch of other fruits to the big vector, the STL is going to reallocate storage for the vector, so the pointer to the apple will be invalid (point to deallocated memory).
If the above scenario is possible, use a vector of indices. If it is not possible, use the other techniques (e.g. a vector of raw pointers, or a vector of copies of the objects).
If the subsection is contiguous you can reference the subsection using an iterator and a count indicating how many items you are referencing.
The sane way to do this would be to create some sort of templated class which you could construct with a container reference and two indices, and let the class do all the bounds and error checking, though I'm unsure how you'd be able to tell if the underlying container still existed at some later time...

C++ deque vs vector and C++ map vs Set

Can some one please tell me what is the difference between vector vs deque. I know the implementation of vector in C++ but not deque. Also interfaces of map and set seem similar to me. What is the difference between the two and when to use one.
std::vector: A dynamic-array class. The internal memory allocation makes sure that it always creates an array. Useful when the size of the data is known and is known to not change too often. It is also good when you want to have random-access to elements.
std::deque: A double-ended queue that can act as a stack or queue. Good for when you are not sure about the number of elements and when accessing data-element are always in a serial manner. They are fast when elements are added/removed from front and end but not when they're added/removed to/from the middle.
std::list: A double-linked list that can be used to create a 'list' of data. The advantage of a list is that elements can be inserted or deleted from any part of the list without affecting an iterator that is pointing to a list member (and is still a member of the list after deletion). Useful when you know that elements will be deleted very often from any part of the list.
std::map: A dictionary that maps a 'key' to a 'value'. Useful for applications like 'arrays' whose index are not an integer. Basically can be used to create a map-list of name to an element, like a map that stores name-to-widget relationship.
std::set: A list of 'unique' data values. For e.g. if you insert 1, 2, 2, 1, 3, the list will only have the elements 1, 2, 3. Note that the elements in this list are always ordered. Internally, they're usually implemented as binary search trees (like map).
See here for full details:
What are the complexity guarantees of the standard containers?
vector Vs deque
A deque is the same as a vector but with the following addition:
It is a "front insertion sequence"
This means that deque is the same as a vector but provides the following additional gurantees:
push_front() O(1)
pop_front() O(1)
set Vs map
A map is a "Pair Associative Container" while set is a "Simple Associative Container"
This means they are exactly the same. The difference is that map holds pairs of items (Key/Value) rather than just a value.
std::vector
Your default sequential containers should be a std::vector. Generally, std::vector will provide you with the right balance of performance and speed. The std::vector container is similar to a C-style array that can grow or shrink during runtime. The underlying buffer is stored contiguously and is guaranteed to be compatible with C-style arrays.
Consider using a std::vector if:
You need your data to be stored contiguously in memory
Especially useful for C-style API compatibility
You do not know the size at compile time
You need efficient random access to your elements (O(1))
You will be adding and removing elements from the end
You want to iterate over the elements in any order
Avoid using a std::vector if:
You will frequently add or remove elements to the front or middle of the sequence
The size of your buffer is constant and known in advance (prefer std::array)
Be aware of the specialization of std::vector: Since C++98, std::vector has been specialized such that each element only occupies one bit. When accessing individual boolean elements, the operators return a copy of a bool that is constructed with the value of that bit.
std::array
The std::array container is the most like a built-in array, but offering extra features such as bounds checking and automatic memory management. Unlike std::vector, the size of std::array is fixed and cannot change during runtime.
Consider using a std::array if:
You need your data to be stored contiguously in memory
Especially useful for C-style API compatibility
The size of your array is known in advance
You need efficient random access to your elements (O(1))
You want to iterate over the elements in any order
Avoid using a std::array if:
You need to insert or remove elements
You don’t know the size of your array at compile time
You need to be able to resize your array dynamically
std::deque
The std::deque container gets its name from a shortening of “double ended queue”. The std::deque container is most efficient when appending items to the front or back of a queue. Unlike std::vector, std::deque does not provide a mechanism to reserve a buffer. The underlying buffer is also not guaranteed to be compatible with C-style array APIs.
Consider using std::deque if:
You need to insert new elements at both the front and back of a sequence (e.g. in a scheduler)
You need efficient random access to your elements (O(1))
You want the internal buffer to automatically shrink when elements are removed
You want to iterate over the elements in any order
Avoid using std::deque if:
You need to maintain compatibility with C-style APIs
You need to reserve memory ahead of time
You need to frequently insert or remove elements from the middle of the sequence
Calling insert in the middle of a std::deque invalidates all iterators and references to its elements
std::list
The std::list and std::forward_list containers implement linked list data structures. Where std::list provides a doubly-linked list, the std::forward_list only contains a pointer to the next object. Unlike the other sequential containers, the list types do not provide efficient random access to elements. Each element must be traversed in order.
Consider using std::list if:
You need to store many items but the number is unknown
You need to insert or remove new elements from any position in the sequence
You do not need efficient access to random elements
You want the ability to move elements or sets of elements within the container or between different containers
You want to implement a node-wise memory allocation scheme
Avoid using std::list if:
You need to maintain compatibility with C-style APIs
You need efficient access to random elements
Your system utilizes a cache (prefer std::vector for reduced cache misses)
The size of your data is known in advance and can be managed by a std::vector
A map is what is often refered to as an associative array, usually implemented using a binary tree (for example). A deque is a double-ended queue, a certain incarnation of a linked list.
That is not to say that the actual implementations of the container library uses these concepts - the containr library will just give you some guarantees about how you can access the container and at what (amortized) cost.
I suggest you take a look at a reference that will go into detail about what those guarantees are. Scott Meyers book "Effective STL: 50 Specific Ways to Improve Your Use of the Standard Template Library" should talk a bit about those, if I remember correctly. Apart from that, the C++ standard is obviously a good choice.
What I really want to say is: containers really are described by their properties, not by the underlying implementation.
set: holds unique values. Put 'a' in twice, the set has one 'a'.
map: maps keys to values, e.g. 'name' => 'fred', 'age' => 40. You can look up 'name' and you'll get 'fred' out.
dequeue, like a vector but you can only add/remove at the ends. No inserts into the middle. http://en.wikipedia.org/wiki/Deque
edit: my dequeue description is lacking, see comments below for corrections

Indices instead of pointers in STL containers?

Due to specific requirements [*], I need a singly-linked list implementation that uses integer indices instead of pointers to link nodes. The indices are always interpreted with respect to a vector containing the list nodes.
I thought I might achieve this by defining my own allocator, but looking into the gcc's implementation of , they explicitly use pointers for the link fields in the list nodes (i.e., they do not use the pointer type provided by the allocator):
struct _List_node_base
{
_List_node_base* _M_next; ///< Self-explanatory
_List_node_base* _M_prev; ///< Self-explanatory
...
}
(For this purpose, the allocator interface is also deficient in that it does not define a dereference function; "dereferencing" an integer index always needs a pointer to the underlying storage.)
Do you know a library of STL-like data structures (i am mostly in need of singly- and doubly-linked list) that use indices (wrt. a base vector) instead of pointers to link nodes?
[*] Saving space: the lists will contain many 32-bit integers. With two pointers per node (STL list is doubly-linked), the overhead is 200%, or 400% on 64-bit platform, not counting the overhead of the default allocator.
EDIT: I'm looking for a SLL implementation that defines nodes in the following manner:
struct list_node
{
int _value; ///< The value in the list
int _next; ///< Next node in the list
...
}
_next is interpreted wrt. an implicit array or vector (must be provided externally to each method operating on the list).
EDIT2: After a bit more searching, I've found that the standard actually requires that allocators intended to be used with standard collections must define the pointer type to be equivalent with T*.
Why are you using the STL list? Unless you have very specific requirements, you should be using vector or deque instead. If your reason for using the list was to increase insertion efficiency, you should note that a deque offers most of the advantages of both list and vector because it is not required to maintain contiguous storage, but uses arrays as it's underlying storage media.
EDIT: And regarding your desire for a list that offers operator[], such a structure does not exist (at least, does not exist and still conform to the STL). One of the key design ideas of the STL is that algorithms and containers offer only what they can efficiently. Considering offering operator[] on a linked list requires linear time for each access, that's not efficient.
We had to write our own list containers to get exactly this. It's about a half day's work.
Boost.Interprocess (containers) provides slist container that uses the pointer type of the allocator. Maybe this is what you are looking for :)
Even if these containers are included in Boost.Interprocess they work perfectly on intraprocess memory. In addition the author has already made the separation and proposed to bots as Boost.Containers (Documentation/Source)
One option that is a bit out there is to use Judy Arrays. They provide highly efficient storage and are computationally efficient. They are good for storing sets of integers; I don't know if that suits your problem.
If you're concerned about the memory overhead of the linked list for storing a large number of int values, you should definitely consider a vector or deque as Billy ONeal suggested.
The drawback to either of these containers when compared to a linked list comes when inserting elements. Either of deque or vector will have to copy elements if you insert items into the middle of the container (deque has a big advantage over vector if you're going to insert at the beginning of the container, and even has an advantage when adding to the end of the container).
However, copying int elements after insertion is really not a whole lot more costly than scanning a linked list to find an element by index. Since deque and vector can locate an element by index in constant time and since data structures are generally read far more often than they're modified, you'll probably see a net gain using either of deque or vector over a linked list of int that you access by index (even a custom version).

Is the linked list only of limited use?

I was having a nice look at my STL options today. Then I thought of something.
It seems a linked list (a std::list) is only of limited use. Namely, it only really seems
useful if
The sequential order of elements in my container matters, and
I need to erase or insert elements in the middle.
That is, if I just want a lot of data and don't care about its order, I'm better off using an std::set (a balanced tree) if I want O(log n) lookup or a std::unordered_map (a hash map) if I want O(1) expected lookup or a std::vector (a contiguous array) for better locality of reference, or a std::deque (a double-ended queue) if I need to insert in the front AND back.
OTOH, if the order does matter, I am better off using a std::vector for better locality of reference and less overhead or a std::deque if a lot of resizing needs to occur.
So, am I missing something? Or is a linked list just not that great? With the exception of middle insertion/erasure, why on earth would someone want to use one?
Any sort of insertion/deletion is O(1). Even std::vector isn't O(1) for appends, it approaches O(1) because most of the time it is, but sometimes you are going to have to grow that array.
It's also very good at handling bulk insertion, deletion. If you have 1 million records and want to append 1 million records from another list (concat) it's O(1). Every other structure (assuming stadard/naive implementations) are at least O(n) (where n is the number of elements added).
Order is important very often. When it is, linked lists are good. If you have a growing collection, you have the option of linked lists, array lists (vector in C++) and double-ended queues (deque). Use linked lists if you want to modify (add, delete) elements anywhere in the list often. Use array lists if fast retrieval is important. Use double-ended queues if you want to add stuff to both ends of the data structure and fast retrieval is important. For the deque vs vector question: use vector unless inserting/removing things from the beginning is important, in which case use deque. See here for an in-depth look at this.
If order isn't important, linked lists aren't normally ideal.
std::list is notable for its splice() method, which allows you to move one more more elements from one list to another in constant time, without copying or allocating any elements or list nodes.
This question reminds me of this infamous one. Read it for the parallels as to why such simple data structures are important.
Linked List is a fundamental data structure. Other data structures, like hash maps, may use linked lists internally.
Two different algorithms may have O(1) time complexity, for a look up, but that doesn't mean they have the same performance. For example the first one may be 10 or 100 times faster than the second.
Whenever you need to store, iterate and do something with a bunch of data, the normal (and fast) data stucture for that task is the Linked List. More complex data structures are for special cases, ie Set is suitable when you don't want repeated values.
std::list has the following properties:
Sequence
Front Seuqence
Back Seuqence
Forward Container
Reverse Container
Of these properties std::vector does not have (Back Seuqence)
While std::set does not support any sequence properties or (Reverse Container)
So what does this mean?
Will a back sequence supports O(1) for rend() and rbegin() etc
For full information see:
What are the complexity guarantees of the standard containers?
Linked lists are immutable and recursive datastructures whereas arrays are mutable and imperative (=change-based). In functional programming, there are usually no arrays - You don't change elements but transform lists into new lists. While linked lists don't even need additional memory here, this isn't possible efficiently with arrays.
You can easily build or decompose lists without having to change any value.
double [] = []
double (head:rest) = (2 * head):(double rest)
In C++, which is an imperative language, you won't use lists that often. One example could be a list of spaceships in a game from which you can easily remove all spaceships that have been destroyed since the previous frame.