How are STL List and Vector implemented? - c++

How are STL List and Vector implement?
I was just asked this in an an interview.
I just said maybe by using binary tree or hash table about vector. not sure about list...
Am I wrong, I guess so..
give some ideas thanks.

Hash table or binary tree? Why?
std::vector, as the name itself suggests, is implemented with a normal dynamically-allocated array, that is reallocated when its capacity is exhausted (usually doubling its size or something like that).
std::list instead is (usually1) implemented with a doubly-linked list.
The binary tree you mentioned is the usual implementation of std::map; the hash table instead is generally used for the unordered_map container (available in the upcoming C++0x standard).
"Usually" because the standard do not mandate a particular implementation, but specifies the asymptotic complexity of its methods, and such constraints are met easily with a doubly-linked list.
On the other hand, for std::vector the "contiguous space" requirement is enforced by the standard (from C++03 onwards), so it must be some form of dynamically allocated array.

std::vector uses a contiguously allocated array and placement new
std::list uses dynamically allocated chunks with pointer to the next and previous element.
nothing as fancy as binary trees or hash tables (which can be used for std::map)

You can spend half a semester talking about either of the containers, but here are a few points:
std::vector is a contiguous container, which means every element follows right after the previous element in memory. It can grow at runtime, which means it allocates its storage in dynamic memory.
std::list is a bidirectional linked list. This means that the elements are scattered in memory in arbitrary layout, and that each element knows where the next and previous elements in sequence are.
std::vector, std::list and the other containers don't take ownership of the elements they hold, but they do cleanup after themselves. So, if the elements are pointers to dynamic memory then the user must free the pointers before the container destructs. But if the container contains automatic data then the data's destructors will call automatically upon the container's cleanup.
So far, very simple and roughly equivalent to any other language or toolset. What's unique about the STL is that the containers are generic and decoupled from the means of iterating over them and (for the most part) from the operations you can perform over them. Some operations can be done particularly efficiently with some containers, so the containers will provide member functions in these cases. For example, std::list has a sort() member function.
The STL doesn't provide container classes (for the most part), but rather container templates. In other words, when the library talks about a container it only refers to the data type anonymously, say, as T, never by its true name. Never int or double or Car; always T, for any type. There are exceptions, like std::vector<bool>, but this is the general case. Then, when the user instantiates a container template, they specify a type, and the compiler creates a container class from the template for that type.
The STL also offers algorithms as free template functions. These algorithms work on iterators, themselves templates. Often iterators come in pairs that denote the beginning and end of a sequence, on which the algorithm operates. std::vector, std::list and other containers then expose their own iterators that can traverse and manipulate their data. So the same free algorithm can work on a std::vector and a std::list and other containers, provided the iterators conform with specific assumptions about the iterators' abilities.
All this abstraction is done at compile-time, and that is the biggest difference when compared to other languages. This translates to outstanding performance with relatively short and concise code. The same performance that in C you'd only get with lots of copy-pasting or hardcoding.

Related

Why does std::queue use std::dequeue as underlying default container?

As read on cplusplus.com, std::queue is implemented as follows:
queues are implemented as containers adaptors, which are classes that
use an encapsulated object of a specific container class as its
underlying container, providing a specific set of member functions to
access its elements. Elements are pushed into the "back" of the
specific container and popped from its "front".
The underlying container may be one of the standard container class
template or some other specifically designed container class. This
underlying container shall support at least the following operations:
......
The standard container classes deque and list fulfill these
requirements. By default, if no container class is specified for a
particular queue class instantiation, the standard container deque is
used.
I am confused as to why deque (a double-ended-queue on steroids) is used as a default here, instead of list (which is a doubly-linked list).
It seems to me that std::deque is very much overkill: It is a double-ended queue, but also has constant-time element access and many other features; being basically a full-featured std::vector bar the 'elements are stored contiguously in memory' guarantee.
As a normal std::queue only has very few possible operations, it seems to me that a doubly-linked list should be much more efficient, as there is a lot less plumbing that needs to happen internally.
Why then is std::queue implemented using std::deque as default, instead of std::list?
Stop thinking of list as "This is awkward to use, and lacks a bunch of useful features, so it must be the best choice when I don't need those features".
list is implemented as a doubly-linked list with a cached count. There are a narrow set of situations where it is optimal; when you need really, really strong reference/pointer/iterator stability. When you erase and insert in the middle of a container orders of magnitude more often than you iterate to the middle of a container.
And that is about it.
The std datatypes were generally implemented, then their performance and other characteristics analyzed, then the standard was written saying "you gotta guarantee these requirements". A little bit of wiggle room was left.
So when they wrote queue, someone probably profiled how list and deque performed and discovered how much faster deque was, so used deque by default.
In practice, someone could ship a deque with horrible performance (for example, MSVC has a tiny block size), but making it worse than what is required for a std::list would be tricky. list basically mandates one-node-per-element, and that makes memory caches cry.
The reason is that deque is orders of magnitude faster than list. List allocates each element separately, while deque allocates large chunks of elements.
The advantage of list is that it is possible to delete elements in the middle, but a queue does not require this feature.

What are the qualifications for a class in C++ to become a container?

I'm new to C++ programming and came across the term containers with examples such as vector, deque, map, etc.
What should be the minimum requirements that a class should fulfill to be called a container in C++?
I will start with the concept Range.
Range has only two methods -- begin and end. They both return iterators of the same type (note: there are proposals to permit end to return a Sentinel instead).
Iterators are presumed to be understood by the reader.
A high quality Range can also expose empty, size, front, back, and operator [] (if random access especially).
For a for(:) loop, you can qualify as a Range by being a raw C array, having begin() and end() methods, or having free functions in the same namespace as your type that take your type as one argument (and return iterator-like things). As of this post, the only thing in the standard that consumes Ranges is for(:) loops. One could argue that this answer is the only practical definition of the concept Range in C++.
Next, Container.
A Container is a Range of at least forward iterators (input and output Ranges are usually not called Containers) that owns its elements. Sequential and Associative containers are different beasts, and both are defined in the standard.
Containers in the standard have a set of typedefs -- value type, iterator, const iterator. Most also have allocator (except array). They have empty, and most have size (except forward_list).
Containers can all be constructed by 2 input or forward iterators to a compatible value type, and from an initializer list.
Sequential containers have push and emplace back (except forward list) (and some have emplace/push front), and insert and emplace at iterator (or after for forward list).
Associative containers have a key type. Many of them are containers of pairs. The data stored is usually partially const (the "key" part of the data, be it the key or the entire field in the case of sets). They have insert and emplace with and without hints -- they manage their own order. They also have a .find and .count methods.
There are currently no functions in the std library that depend on Container-ness. And there is an active proposal to make Container-ness and Range-ness be formalized as a concept in C++17. The actual technical definition of Container is in the standard in case you need to create an actual container exactly; however, usually you really need a Range with a means to edit it and ownership mechanics. The Container concept is, in my experience, mostly there to make specifying behaviour easier in the standard.
After something like Ranges-v3 is added, the concepts of Range and Container will be actual things that exist in code, and there may be algorithms that depend on exactly those features. Prior to that, they are ad-hoc concepts more than anything.
The absolute minimum requirement should be that the container has a constant iterator class associated with it. Although a generator would satisfy that requirement as well. So it must be that there is a constant iterator and that the said container has begin and end values of the constant iterator type.
C++ concepts: Container
A Container is an object used to store other objects and taking care
of the management of the memory used by the objects it contains.
http://en.cppreference.com/w/cpp/concept/Container
A container is a holder object that stores a collection of other
objects (its elements). They are implemented as class templates, which
allows a great flexibility in the types supported as elements.
http://www.cplusplus.com/reference/stl/
Given these definitions, I think we can say that containers should be able to store an arbitrary number of elements (although the number can be a compile-time constant). The container owns the objects it contains, including allocating space for them in the heap, or the stack (for the array container). Therefore, the programmer does not need to new or delete (allocate or free) the memory for the objects.
The following containers can be found in the STL: array, deque, vector, set, map, stack, queue, list, unordered_map, unordered_set
The container will usually allow you to access (or index) the elements it contains, although some only allow access to one or a few elements (eg. queue or stack). The container will provide methods to add or remove objects, or to search for an object.
Requirements:
Must hold some arbitrary number of objects
The objects it holds are an arbitrary type (although it may have to satisfy certain requirements, eg. sortable)
Possible features
While some containers are allocator aware, it does not have to be.
A container may hold more than one type of object (eg. map, although map can be considered to hold pairs of objects)
While containers may be iterable, this is not required, eg. queue or stack.
Classes which are containers
std::string: this is a collection of characters. Although is designed for characters, or wide-characters, it is a SequenceContainer
Some class which would not be considered containers:
std::unique_ptr, std::shared_ptr: while these types have a concept of ownership, they only manage 1 object, so they are not a collection of objects
std::tuple, std::pair: while a tuple can hold an arbitrary number of objects, the type of each object needs to be specified, so it doesn't have the flexibility expected of a general container. A tuple can be more accurately categorized as a type of structure.

C++ deque vs vector and C++ map vs Set

Can some one please tell me what is the difference between vector vs deque. I know the implementation of vector in C++ but not deque. Also interfaces of map and set seem similar to me. What is the difference between the two and when to use one.
std::vector: A dynamic-array class. The internal memory allocation makes sure that it always creates an array. Useful when the size of the data is known and is known to not change too often. It is also good when you want to have random-access to elements.
std::deque: A double-ended queue that can act as a stack or queue. Good for when you are not sure about the number of elements and when accessing data-element are always in a serial manner. They are fast when elements are added/removed from front and end but not when they're added/removed to/from the middle.
std::list: A double-linked list that can be used to create a 'list' of data. The advantage of a list is that elements can be inserted or deleted from any part of the list without affecting an iterator that is pointing to a list member (and is still a member of the list after deletion). Useful when you know that elements will be deleted very often from any part of the list.
std::map: A dictionary that maps a 'key' to a 'value'. Useful for applications like 'arrays' whose index are not an integer. Basically can be used to create a map-list of name to an element, like a map that stores name-to-widget relationship.
std::set: A list of 'unique' data values. For e.g. if you insert 1, 2, 2, 1, 3, the list will only have the elements 1, 2, 3. Note that the elements in this list are always ordered. Internally, they're usually implemented as binary search trees (like map).
See here for full details:
What are the complexity guarantees of the standard containers?
vector Vs deque
A deque is the same as a vector but with the following addition:
It is a "front insertion sequence"
This means that deque is the same as a vector but provides the following additional gurantees:
push_front() O(1)
pop_front() O(1)
set Vs map
A map is a "Pair Associative Container" while set is a "Simple Associative Container"
This means they are exactly the same. The difference is that map holds pairs of items (Key/Value) rather than just a value.
std::vector
Your default sequential containers should be a std::vector. Generally, std::vector will provide you with the right balance of performance and speed. The std::vector container is similar to a C-style array that can grow or shrink during runtime. The underlying buffer is stored contiguously and is guaranteed to be compatible with C-style arrays.
Consider using a std::vector if:
You need your data to be stored contiguously in memory
Especially useful for C-style API compatibility
You do not know the size at compile time
You need efficient random access to your elements (O(1))
You will be adding and removing elements from the end
You want to iterate over the elements in any order
Avoid using a std::vector if:
You will frequently add or remove elements to the front or middle of the sequence
The size of your buffer is constant and known in advance (prefer std::array)
Be aware of the specialization of std::vector: Since C++98, std::vector has been specialized such that each element only occupies one bit. When accessing individual boolean elements, the operators return a copy of a bool that is constructed with the value of that bit.
std::array
The std::array container is the most like a built-in array, but offering extra features such as bounds checking and automatic memory management. Unlike std::vector, the size of std::array is fixed and cannot change during runtime.
Consider using a std::array if:
You need your data to be stored contiguously in memory
Especially useful for C-style API compatibility
The size of your array is known in advance
You need efficient random access to your elements (O(1))
You want to iterate over the elements in any order
Avoid using a std::array if:
You need to insert or remove elements
You don’t know the size of your array at compile time
You need to be able to resize your array dynamically
std::deque
The std::deque container gets its name from a shortening of “double ended queue”. The std::deque container is most efficient when appending items to the front or back of a queue. Unlike std::vector, std::deque does not provide a mechanism to reserve a buffer. The underlying buffer is also not guaranteed to be compatible with C-style array APIs.
Consider using std::deque if:
You need to insert new elements at both the front and back of a sequence (e.g. in a scheduler)
You need efficient random access to your elements (O(1))
You want the internal buffer to automatically shrink when elements are removed
You want to iterate over the elements in any order
Avoid using std::deque if:
You need to maintain compatibility with C-style APIs
You need to reserve memory ahead of time
You need to frequently insert or remove elements from the middle of the sequence
Calling insert in the middle of a std::deque invalidates all iterators and references to its elements
std::list
The std::list and std::forward_list containers implement linked list data structures. Where std::list provides a doubly-linked list, the std::forward_list only contains a pointer to the next object. Unlike the other sequential containers, the list types do not provide efficient random access to elements. Each element must be traversed in order.
Consider using std::list if:
You need to store many items but the number is unknown
You need to insert or remove new elements from any position in the sequence
You do not need efficient access to random elements
You want the ability to move elements or sets of elements within the container or between different containers
You want to implement a node-wise memory allocation scheme
Avoid using std::list if:
You need to maintain compatibility with C-style APIs
You need efficient access to random elements
Your system utilizes a cache (prefer std::vector for reduced cache misses)
The size of your data is known in advance and can be managed by a std::vector
A map is what is often refered to as an associative array, usually implemented using a binary tree (for example). A deque is a double-ended queue, a certain incarnation of a linked list.
That is not to say that the actual implementations of the container library uses these concepts - the containr library will just give you some guarantees about how you can access the container and at what (amortized) cost.
I suggest you take a look at a reference that will go into detail about what those guarantees are. Scott Meyers book "Effective STL: 50 Specific Ways to Improve Your Use of the Standard Template Library" should talk a bit about those, if I remember correctly. Apart from that, the C++ standard is obviously a good choice.
What I really want to say is: containers really are described by their properties, not by the underlying implementation.
set: holds unique values. Put 'a' in twice, the set has one 'a'.
map: maps keys to values, e.g. 'name' => 'fred', 'age' => 40. You can look up 'name' and you'll get 'fred' out.
dequeue, like a vector but you can only add/remove at the ends. No inserts into the middle. http://en.wikipedia.org/wiki/Deque
edit: my dequeue description is lacking, see comments below for corrections

Why are heaps in c++ implemented as algorithms instead of containers?

I was wondering why the heap concept is implemented as algorithms (make_heap, pop_heap, push_heap, sort_heap) instead of a container. I am especially interested is some one's solution can also explain why set and map are containers instead of similar collections of algorithms (make_set add_set rm_set etc).
STL does provide a heap in the form of a std::priority_queue. The make_heap, etc., functions are there because they have uses outside the realm of the data structure itself (e.g. sorting), and to allow heaps to be built on top of custom structures (like stack arrays for a "keep the top 10" container).
By analogy, you can use a std::set to store a sorted list, or you can use std::sort on a vector with std::adjacent_find; std::sort is the more general-purpose and makes few assumptions about the underlying data structure.
(As a note, the std::priority_queue implementation does not actually provide for its own storage; by default it creates a std::vector as its backing store.)
One obvious reason is that you can arrange elements as a heap inside another container.
So you can call make_heap() on a vector or a deque or even a C array.
A heap is a specific data structure. The standard containers have complexity requirements but don't specify how they are to be implemented. It's a fine but important distinction. You can make_heap on several different containers, including one you wrote yourself. But a set or map mean more than just a way of arranging the data.
Said another way, a standard container is more than just its underlying data structure.
Heaps* are almost always implemented using an array as the underlying data structure. As such it can be considered a set of algorithms that operate on the array data structure. This is the path that the STL took when implementing the heap - it will work on any data structure that has random access iterators (a standard array, vector, deque, etc).
You'll also notice that the STL priority_queue requires a container (which by default is a vector). This is essentially your heap container - it implements a heap on your underlying data structure and provides a wrapper container for all of the typical heap operations.
*Binary heaps in particular. Other forms of heaps (Binomial, Fibonacci, etc) are not.
Well, heaps aren't really a generic container in the same sense as a set or a map. Usually, you use a heap to implement some other abstract data type. (The most obvious being a priority queue.) I suspect this is the reason for the different treatment.

Indices instead of pointers in STL containers?

Due to specific requirements [*], I need a singly-linked list implementation that uses integer indices instead of pointers to link nodes. The indices are always interpreted with respect to a vector containing the list nodes.
I thought I might achieve this by defining my own allocator, but looking into the gcc's implementation of , they explicitly use pointers for the link fields in the list nodes (i.e., they do not use the pointer type provided by the allocator):
struct _List_node_base
{
_List_node_base* _M_next; ///< Self-explanatory
_List_node_base* _M_prev; ///< Self-explanatory
...
}
(For this purpose, the allocator interface is also deficient in that it does not define a dereference function; "dereferencing" an integer index always needs a pointer to the underlying storage.)
Do you know a library of STL-like data structures (i am mostly in need of singly- and doubly-linked list) that use indices (wrt. a base vector) instead of pointers to link nodes?
[*] Saving space: the lists will contain many 32-bit integers. With two pointers per node (STL list is doubly-linked), the overhead is 200%, or 400% on 64-bit platform, not counting the overhead of the default allocator.
EDIT: I'm looking for a SLL implementation that defines nodes in the following manner:
struct list_node
{
int _value; ///< The value in the list
int _next; ///< Next node in the list
...
}
_next is interpreted wrt. an implicit array or vector (must be provided externally to each method operating on the list).
EDIT2: After a bit more searching, I've found that the standard actually requires that allocators intended to be used with standard collections must define the pointer type to be equivalent with T*.
Why are you using the STL list? Unless you have very specific requirements, you should be using vector or deque instead. If your reason for using the list was to increase insertion efficiency, you should note that a deque offers most of the advantages of both list and vector because it is not required to maintain contiguous storage, but uses arrays as it's underlying storage media.
EDIT: And regarding your desire for a list that offers operator[], such a structure does not exist (at least, does not exist and still conform to the STL). One of the key design ideas of the STL is that algorithms and containers offer only what they can efficiently. Considering offering operator[] on a linked list requires linear time for each access, that's not efficient.
We had to write our own list containers to get exactly this. It's about a half day's work.
Boost.Interprocess (containers) provides slist container that uses the pointer type of the allocator. Maybe this is what you are looking for :)
Even if these containers are included in Boost.Interprocess they work perfectly on intraprocess memory. In addition the author has already made the separation and proposed to bots as Boost.Containers (Documentation/Source)
One option that is a bit out there is to use Judy Arrays. They provide highly efficient storage and are computationally efficient. They are good for storing sets of integers; I don't know if that suits your problem.
If you're concerned about the memory overhead of the linked list for storing a large number of int values, you should definitely consider a vector or deque as Billy ONeal suggested.
The drawback to either of these containers when compared to a linked list comes when inserting elements. Either of deque or vector will have to copy elements if you insert items into the middle of the container (deque has a big advantage over vector if you're going to insert at the beginning of the container, and even has an advantage when adding to the end of the container).
However, copying int elements after insertion is really not a whole lot more costly than scanning a linked list to find an element by index. Since deque and vector can locate an element by index in constant time and since data structures are generally read far more often than they're modified, you'll probably see a net gain using either of deque or vector over a linked list of int that you access by index (even a custom version).