It is stated everywhere that the common property of all sequential containers is that the elements can be accessed sequentially. But we know that std::array, std::vector and std::deque all support fast random access to the elements. std::list supports bidirectional iteration, whereas std::forward_list supports only unidirectional iteration.
So what does actually "accessed sequentially" means here?
A Sequence Container has the requirement that its elements are stored in a well-defined, determined order, such that a function like front() or a reference to its nth element is meaningful. The fact that sequential access is permitted does not preclude that random access is also allowed.
In contrast, there is no requirement that the elements of an Associative Container are stored in any particular order. So, for example, attempting to call front() on a std::set object is meaningless.
Sequential access has little to do with random access or iterators.
For example, std::set's iterator is a bidirectional iterator. You can iterate over elements of a std::set just like you would over elements of a std::vector.
Sequence containers have a front and a back, and all their elements are between those, in the order you inserted them in. Contrast this with a std::set, which conceptually does have a front and a back (minimum and maximum values), but that stores its elements in an order defined by a comparison function. Contrast this also with a std::unordered_set, which does not really have a front and a back, and stores its elements in an order determined by a hash function. Finally, contrast this with a std::stack, which only has a top (conceptually, a back, but no front).
The only other standard container that has a front and a back and stores its elements in the order you insert them in is std::queue. However, you cannot access any arbitrary element in a queue without accessing and removing all the elements in front of it.
So, if I had to give a definition of a sequence container, it would be one with sequential access, meaning access to any of its elements in the order you insert them in, without having to do anything other than iterate over it. As a result, you can sort a sequence container.
Not to be confused with contiguous (or random) access.
That said, it isn't a very useful categorization. More useful categories are those of iterators and of complexity of operations.
Related
How often do std::multimap and std::unordered_multimap shuffle entries around? I'm asking because my code passes around references to distinguish between entries with the same hash, and I want to know when to run the reference redirection function on them.
What happens if I do this:
std::multimap atable; //Type specification stuff left out
//Code that pus in two entries with the same key, call that key foo
int bar = atable[foo];
Is the result different if it's an unordered_multimap instead?
Back to passing references around to distinguish entries with the same hash. Is there a safer way to do that?
Do the entries move around if I remove one of the entries (That's what's suggested by a reading of the documentation for std::vector)?
NO, no elements will be harmed during any operation.
As is explained in this famous Q&A, for associative containers, there is no iterator invalidation upon insertions / erasure (except for the element being erased of course). For unordered associative containers, there is iterator invalidation during rehashing, about which the Standard says (emphasize mine)
23.2.5 Unordered associative containers [unord.req]
9 The elements of an unordered associative container are organized into
buckets. Keys with the same hash code appear in the same bucket. The
number of buckets is automatically increased as elements are added to
an unordered associative container, so that the average number of
elements per bucket is kept below a bound. Rehashing invalidates
iterators, changes ordering between elements, and changes which
buckets elements appear in, but does not invalidate pointers or
references to elements. For unordered_multiset and unordered_multimap,
rehashing preserves the relative ordering of equivalent elements.
Again, this does not entail the reshuflling of the actually stored elements (the Key and Value types in unordered_map<Key, Value>), because unordered maps have buckets that are organized as linked lists, and iterators to stored elements (key-value pairs) have both an element pointer and a bucket pointer. The rehashing shuffles buckets, which invalidates the iterators (because their bucket pointer is invalidated) but not pointers or references to the elements itself. This is explained in detail in another Q&A
How often do std::multimap and std::unordered_multimap shuffle entries around?
Never. The iterators that point to elements of any associative container (including sets, maps, and their unordered or "multi" versions) are never invalidated (unless the specific element they point to is deleted). In other words, the actual elements are never "shuffled around". These are required to be implemented as linked structures (e.g., linked-tree), meaning they can be re-structured just by changing a few pointers, without having to physically move any element.
EDIT: Apparently (see TemplateRex' comment), this is not the case for unordered containers. In that case, the iterators can get invalidated, but the elements themselves do not move around. These requirements imply an indirect container with no back-pointer, which I guess is a reasonable choice, but not one I would have expected.
What happens if I do this: ... (get [] of multimap) ...
The operator[] is not defined for std::multimap (or unordered version). So, what would happen? A compiler error would happen.
Is the result different if it's an unordered_multimap instead?
No, it's the same, the operator[] does not exist.
Back to passing references around to distinguish entries with the same hash. Is there a safer way to do that?
Yes, the recommended practice is to refer to elements of the map / set / whatever using iterators, not references. The iterators to elements are guaranteed to remain valid, and they are copyable and have the right const-ness protection on them, and that makes them the perfect objects to "refer to an entry".
EDIT: As per the same comment, I would have to recommend using pointers to the elements if dealing with a hashed container (unordered containers), because iterators are not guaranteed (by the standard) to remain valid.
All of the associative containers in the C++ standard library are node based, i.e., their elements stay put. However, whether the hash is computed on the object after copying it or on a temporary object passed to the container isn't specified. I would guess, that generally the hash is computed before the object is copied/moved.
To distinguish elements with the same hash you need to have an equality function anyway: if the location of the object causes it to be different it would mean that all objects are different and you wouldn't be able to look them up at all. You need to have an equality function for the elements in an unordered container which defines equivalence of keys. For the ordered associative the equivalent class is based on the strict weak ordering, i.e., on an expression like this (using < rather than a binary predicate for readability; any binary predicate defining a strict weak order would work, too):
bool equivalent = !(a < b) && !(b < a);
To get the pointer to the data in a vector we can use
vector<double> Vec;
double* Array_Pointer = &(Vec[0]);
Function(Array_Pointer);
Is that possible to get the pointer to the data in a set? Can I use that as array pointer like above?
If not possible, what is the best way to make a vector out of set? I mean without loop over all elements.
No, this is not necessarily possible. The C++ ISO standard explicitly guarantees contiguous storage of elements in a std::vector, so you can safely take the address of the first element and then use that pointer as if you were pointing at a raw array. Other containers in the standard library do not have this guarantee.
The reason for this is to efficiently support most operations on a std::set, the implementation needs to use complex data structures like balanced binary search trees to store and organize the data. These structures are inherently nonlinear and require nodes to be allocated and linked together. Efficiently getting this to work with the elements in a flat array would be difficult, if not impossible, in the time constraints laid out by the standard (amortized O(log n) for most operations.)
EDIT: In response to your question - there is no way to build a std::vector from a std::set without some code somewhere iterating over the set and copying the elements over. You can do this without explicitly using any loops yourself by using the std::vector range constructor:
std::vector<T> vec(mySet.begin(), mySet.end());
Hope this helps!
No. It's not possible to implement set in such a way that you can do this.
If you implement set in such a way that elements are stored in a single array, then when you add more elements, that array will inevitably need to be reallocated at some point. At that time, any references to existing elements will be invalidated.
One of features of set is that it guarantees that references to elements will never be invalidated if you add (or remove) other elements. As stated in [associative.reqmts]:
The insert and emplace members shall not affect the validity of iterators and references to the container, and the erase members shall invalidate only iterators and references to the erased elements.
So it's impossible to implement set in such a way that all of the elements of the set are stored in a single array.
Note that this has nothing to do with the efficiency requirements such as O(log n) insert/delete/lookup (if you squint really hard and allow for amortized O(log n) insertion time, at least), or maintaining sorted order, or anything like that. If it was just these, they could easily be handled with a data structure on top of the underlying elements, and the elements themselves could be stored in an array. It also doesn't even have anything to do with guarantees about iterator invalidation, since iterators are abstract.
No, the only thing holding you back is the reference invalidation requirement.
When talking about the STL, I have several schoolmates telling me that "vectors are linked lists".
I have another one arguing that if you call the erase() method with an iterator, it breaks the vector, since it's a linked list.
They also tend to don't understand why I'm always arguing that vector are contiguous, just like any other array, and don't seem to understand what random access means. Are vector stricly contiguous just like regular arrays, or just at most contiguous ? (for example it will allocate several contiguous segments if the whole array doesn't fit).
I'm sorry to say that your schoolmates are completely wrong. If your schoolmates can honestly say that "vectors are linked lists" then you need to respectfully tell them that they need to pick up a good C++ book (or any decent computer science book) and read it. Or perhaps even the Wikipedia articles for vectors and lists. (Also see the articles for dynamic arrays and linked lists.)
Vectors (as in std::vector) are not linked lists. (Note that std::vector do not derive from std::list). While they both can store a collection of data, how a vector does it is completely different from how a linked list does it. Therefore, they have different performance characteristics in different situations.
For example, insertions are a constant-time operation on linked lists, while it is a linear-time operation on vectors if it is inserted in somewhere other than the end. (However, it is amortized constant-time if you insert at the end of a vector.)
The std::vector class in C++ are required to be contiguous by the C++ standard:
23.2.4/1 Class template vector
A vector is a kind of sequence that supports random access iterators. In addition, it supports (amortized) constant time insert and erase operations at the end; insert and erase in the middle take linear time. Storage management is handled automatically, though hints can be given to improve efficienty. The elements of a vector are stored contiguously, meaning that if v is a vector<T, Allocator> where T is some type other than bool, then it obeys the identity &v[n] == &v[0] + n for all 0 <= n < v.size().
Compare that to std::list:
23.2.2/1 Class template list
A list is a kind of sequence that supports bidirectional iterators and allows constant time insert and erase operations anywhere within the sequence, with storage management handled automatically. Unlike vectors (23.2.4) and deques (23.2.1), fast random access to list elements is not supported, but many algorithms only need sequential access anyway.
Clearly, the C++ standard stipulates that a vector and a list are two different containers that do things differently.
You can't "break" a vector (at least not intentionally) by simply calling erase() with a valid iterator. That would make std::vectors rather useless since the point of its existence is to manage memory for you!
vector will hold all of it's storage in a single place. A vector is not even remotely like a linked list. Infact, if I had to pick two data structures that were most unlike each other, it would be vector and list. "At most contiguous" is how a deque operates.
Vector:
Guaranteed contiguous storage for all elements - will copy or move elements.
O(1) access time.
O(n) for insert or remove.
Iterators invalidated upon insertion or removal of any element.
List:
No contiguous storage at all - never copies or moves elements.
O(n) access time- plus all the nasty cache misses you're gonna get.
O(1) insert or remove.
Iterators valid as long as that specific element is not removed.
As you can see, they behave differently in every data structure use case.
By definition, vectors are contiguous blocks of memory like C arrays. See: http://en.wikipedia.org/wiki/Vector_(C%2B%2B)
Vectors allow random access; that is,
an element of a vector may be
referenced in the same manner as
elements of arrays (by array indices).
Linked-lists and sets, on the other
hand, do not support random access or
pointer arithmetic.
Vectors are not linked linked list, they provide random access and are contiguous just like arrays. In order to achieve this they re-allocate memory under the hood.
List is designed to allow quick insertions and deletions, while not invalidating any references or iterators except
the ones to the deleted element.
What are containers/adapters? I have basic knowledge of C++ and its sub-topics like (class/templates/STL).
Can anyone please explain in layman's language and give me a practical example of the application of containers/adapters?
A container is a specific data structure that contains data, usually in an unbounded amount. Each container type has limitations on how to access, add, or remove data efficiently.
Below are a few examples of containers using STL classes.
Sequence Containers
Here are the sequence containers, meaning the data is reliably ordered (that is, there is a front and a back to them. I do NOT mean that they automatically sort themselves!).
A vector is a bit like a flexibly-sized array. Vectors are random-access, meaning you can access any element with integer index in constant time (just like an array). You can add or remove from the back of the vector in amortized constant time as well. Anywhere else, though, and you're probably looking at having to recopy potentially all of the elements.
A deque, or double-ended queue, is like a vector but you can add to the front or the back in amortized constant time. You can still access elements in constant time, but deque elements are not guaranteed to be contiguous in memory like vectors or arrays.
A list is a linked list, meaning data which are linked together by pointers. You have constant-time access to the beginning and the end, but in order to get anywhere in the middle you need to iterate through the list. You can add elements anywhere in the list in constant time, though, if you already have a pointer to one of the nearby nodes.
Associative Containers
These are associative containers, meaning that elements are no longer ordered but instead have associations with each other used for determining uniqueness or mappings:
A set is a container with unique elements. You can only add one of each element to a set; any other additions are ignored.
A multiset is like a set, but you can put more than one of an element in. The multiset keeps track of how many of each kind of element are in the structure.
A map, also known as an associative array, is a structure in which you insert key-value pairs; then you can look up any value by supplying the key. So it's a bit like an array that you can access with a string index (key), or any other kind of index. (If you insert another key-value pair and the key already exists, then you just overwrite the value for the original key.)
A multimap is a map that allows for insertion of multiple values for the same key. When you do a key lookup, you get back a container with all the values in it.
Container Adapters
Container adapters, on the other hand, are interfaces created by limiting functionality in a pre-existing container and providing a different set of functionality. When you declare the container adapters, you have an option of specifying which sequence containers form the underlying container. These are:
A stack is a container providing Last-In, First-Out (LIFO) access. Basically, you remove elements in the reverse order you insert them. It's difficult to get to any elements in the middle. Usually this goes on top of a deque.
A queue is a container providing First-In, First-Out (FIFO) access. You remove elements in the same order you insert them. It's difficult to get to any elements in the middle. Usually this goes on top of a deque.
A priority_queue is a container providing sorted-order access to elements. You can insert elements in any order, and then retrieve the "lowest" of these values at any time. Priority queues in C++ STL use a heap structure internally, which in turn is basically array-backed; thus, usually this goes on top of a vector.
See this reference page for more information, including time complexity for each of the operations and links to detailed pages for each of the container types.
<joke>C++ is technical and hard to understand :-D</joke>
Containers are data types from STL that can contain data.
Example: vector as a dynamic array
Adapters are data types from STL that adapt a container to provide specific interface.
Example: stack providing stack interface on top of the chosen container
(side note: both are actually templates not data types, but the definition looks better this way)
The technical definition of "container" from The SGI STL documentation is pretty good:
A Container is an object that stores other objects (its elements), and that has methods for accessing its elements. In particular, every type that is a model of Container has an associated iterator type that can be used to iterate through the Container's elements.
So, a container is a data structure that holds ("contains") a collection of objects of some type. The key idea is that there are different types of containers, each of which stores objects in a different way and provides different performance characteristics, but all of them have a standard interface so that you can swap one out for another easily and without modifying too much of the code that uses the container. The idea is that the containers are designed to be interchangeable as much as possible.
The container adapters are classes that provide a subset of a container's functionality but may provide additional functionality that makes it easier to use containers for certain scenarios. For example, you could easily use std::vector or std::deque for a stack data structure and call push_back, back, and pop_back as the stack interface; std::stack provides an interface that can use a std::vector or std::deque or other sequence container but provides the more standard push, top, and pop member functions for accessing members.
Can some one please tell me what is the difference between vector vs deque. I know the implementation of vector in C++ but not deque. Also interfaces of map and set seem similar to me. What is the difference between the two and when to use one.
std::vector: A dynamic-array class. The internal memory allocation makes sure that it always creates an array. Useful when the size of the data is known and is known to not change too often. It is also good when you want to have random-access to elements.
std::deque: A double-ended queue that can act as a stack or queue. Good for when you are not sure about the number of elements and when accessing data-element are always in a serial manner. They are fast when elements are added/removed from front and end but not when they're added/removed to/from the middle.
std::list: A double-linked list that can be used to create a 'list' of data. The advantage of a list is that elements can be inserted or deleted from any part of the list without affecting an iterator that is pointing to a list member (and is still a member of the list after deletion). Useful when you know that elements will be deleted very often from any part of the list.
std::map: A dictionary that maps a 'key' to a 'value'. Useful for applications like 'arrays' whose index are not an integer. Basically can be used to create a map-list of name to an element, like a map that stores name-to-widget relationship.
std::set: A list of 'unique' data values. For e.g. if you insert 1, 2, 2, 1, 3, the list will only have the elements 1, 2, 3. Note that the elements in this list are always ordered. Internally, they're usually implemented as binary search trees (like map).
See here for full details:
What are the complexity guarantees of the standard containers?
vector Vs deque
A deque is the same as a vector but with the following addition:
It is a "front insertion sequence"
This means that deque is the same as a vector but provides the following additional gurantees:
push_front() O(1)
pop_front() O(1)
set Vs map
A map is a "Pair Associative Container" while set is a "Simple Associative Container"
This means they are exactly the same. The difference is that map holds pairs of items (Key/Value) rather than just a value.
std::vector
Your default sequential containers should be a std::vector. Generally, std::vector will provide you with the right balance of performance and speed. The std::vector container is similar to a C-style array that can grow or shrink during runtime. The underlying buffer is stored contiguously and is guaranteed to be compatible with C-style arrays.
Consider using a std::vector if:
You need your data to be stored contiguously in memory
Especially useful for C-style API compatibility
You do not know the size at compile time
You need efficient random access to your elements (O(1))
You will be adding and removing elements from the end
You want to iterate over the elements in any order
Avoid using a std::vector if:
You will frequently add or remove elements to the front or middle of the sequence
The size of your buffer is constant and known in advance (prefer std::array)
Be aware of the specialization of std::vector: Since C++98, std::vector has been specialized such that each element only occupies one bit. When accessing individual boolean elements, the operators return a copy of a bool that is constructed with the value of that bit.
std::array
The std::array container is the most like a built-in array, but offering extra features such as bounds checking and automatic memory management. Unlike std::vector, the size of std::array is fixed and cannot change during runtime.
Consider using a std::array if:
You need your data to be stored contiguously in memory
Especially useful for C-style API compatibility
The size of your array is known in advance
You need efficient random access to your elements (O(1))
You want to iterate over the elements in any order
Avoid using a std::array if:
You need to insert or remove elements
You don’t know the size of your array at compile time
You need to be able to resize your array dynamically
std::deque
The std::deque container gets its name from a shortening of “double ended queue”. The std::deque container is most efficient when appending items to the front or back of a queue. Unlike std::vector, std::deque does not provide a mechanism to reserve a buffer. The underlying buffer is also not guaranteed to be compatible with C-style array APIs.
Consider using std::deque if:
You need to insert new elements at both the front and back of a sequence (e.g. in a scheduler)
You need efficient random access to your elements (O(1))
You want the internal buffer to automatically shrink when elements are removed
You want to iterate over the elements in any order
Avoid using std::deque if:
You need to maintain compatibility with C-style APIs
You need to reserve memory ahead of time
You need to frequently insert or remove elements from the middle of the sequence
Calling insert in the middle of a std::deque invalidates all iterators and references to its elements
std::list
The std::list and std::forward_list containers implement linked list data structures. Where std::list provides a doubly-linked list, the std::forward_list only contains a pointer to the next object. Unlike the other sequential containers, the list types do not provide efficient random access to elements. Each element must be traversed in order.
Consider using std::list if:
You need to store many items but the number is unknown
You need to insert or remove new elements from any position in the sequence
You do not need efficient access to random elements
You want the ability to move elements or sets of elements within the container or between different containers
You want to implement a node-wise memory allocation scheme
Avoid using std::list if:
You need to maintain compatibility with C-style APIs
You need efficient access to random elements
Your system utilizes a cache (prefer std::vector for reduced cache misses)
The size of your data is known in advance and can be managed by a std::vector
A map is what is often refered to as an associative array, usually implemented using a binary tree (for example). A deque is a double-ended queue, a certain incarnation of a linked list.
That is not to say that the actual implementations of the container library uses these concepts - the containr library will just give you some guarantees about how you can access the container and at what (amortized) cost.
I suggest you take a look at a reference that will go into detail about what those guarantees are. Scott Meyers book "Effective STL: 50 Specific Ways to Improve Your Use of the Standard Template Library" should talk a bit about those, if I remember correctly. Apart from that, the C++ standard is obviously a good choice.
What I really want to say is: containers really are described by their properties, not by the underlying implementation.
set: holds unique values. Put 'a' in twice, the set has one 'a'.
map: maps keys to values, e.g. 'name' => 'fred', 'age' => 40. You can look up 'name' and you'll get 'fred' out.
dequeue, like a vector but you can only add/remove at the ends. No inserts into the middle. http://en.wikipedia.org/wiki/Deque
edit: my dequeue description is lacking, see comments below for corrections