Indices instead of pointers in STL containers? - c++

Due to specific requirements [*], I need a singly-linked list implementation that uses integer indices instead of pointers to link nodes. The indices are always interpreted with respect to a vector containing the list nodes.
I thought I might achieve this by defining my own allocator, but looking into the gcc's implementation of , they explicitly use pointers for the link fields in the list nodes (i.e., they do not use the pointer type provided by the allocator):
struct _List_node_base
{
_List_node_base* _M_next; ///< Self-explanatory
_List_node_base* _M_prev; ///< Self-explanatory
...
}
(For this purpose, the allocator interface is also deficient in that it does not define a dereference function; "dereferencing" an integer index always needs a pointer to the underlying storage.)
Do you know a library of STL-like data structures (i am mostly in need of singly- and doubly-linked list) that use indices (wrt. a base vector) instead of pointers to link nodes?
[*] Saving space: the lists will contain many 32-bit integers. With two pointers per node (STL list is doubly-linked), the overhead is 200%, or 400% on 64-bit platform, not counting the overhead of the default allocator.
EDIT: I'm looking for a SLL implementation that defines nodes in the following manner:
struct list_node
{
int _value; ///< The value in the list
int _next; ///< Next node in the list
...
}
_next is interpreted wrt. an implicit array or vector (must be provided externally to each method operating on the list).
EDIT2: After a bit more searching, I've found that the standard actually requires that allocators intended to be used with standard collections must define the pointer type to be equivalent with T*.

Why are you using the STL list? Unless you have very specific requirements, you should be using vector or deque instead. If your reason for using the list was to increase insertion efficiency, you should note that a deque offers most of the advantages of both list and vector because it is not required to maintain contiguous storage, but uses arrays as it's underlying storage media.
EDIT: And regarding your desire for a list that offers operator[], such a structure does not exist (at least, does not exist and still conform to the STL). One of the key design ideas of the STL is that algorithms and containers offer only what they can efficiently. Considering offering operator[] on a linked list requires linear time for each access, that's not efficient.

We had to write our own list containers to get exactly this. It's about a half day's work.

Boost.Interprocess (containers) provides slist container that uses the pointer type of the allocator. Maybe this is what you are looking for :)
Even if these containers are included in Boost.Interprocess they work perfectly on intraprocess memory. In addition the author has already made the separation and proposed to bots as Boost.Containers (Documentation/Source)

One option that is a bit out there is to use Judy Arrays. They provide highly efficient storage and are computationally efficient. They are good for storing sets of integers; I don't know if that suits your problem.

If you're concerned about the memory overhead of the linked list for storing a large number of int values, you should definitely consider a vector or deque as Billy ONeal suggested.
The drawback to either of these containers when compared to a linked list comes when inserting elements. Either of deque or vector will have to copy elements if you insert items into the middle of the container (deque has a big advantage over vector if you're going to insert at the beginning of the container, and even has an advantage when adding to the end of the container).
However, copying int elements after insertion is really not a whole lot more costly than scanning a linked list to find an element by index. Since deque and vector can locate an element by index in constant time and since data structures are generally read far more often than they're modified, you'll probably see a net gain using either of deque or vector over a linked list of int that you access by index (even a custom version).

Related

Collecting objects faster than std::vector

I have a function that stores a lot of small objects (~16 bytes) in a vector, but it doesn't know in advance how many objects will be stored (imagine a recursive descent parser storing tokens for example).
std::vector<SmallObject> getObjects();
This is quite slow because of all the reallocation and copying (and apparently C++ even has to invoke the copy constructors if you don't use an optimised version (see "Object Relocation").
There must be a better way to do things like this where all I am doing to construct the vector is appending things. For example I could have a singly linked list of blocks that are filled, and convert everything to a single vector at the end, so everything only has to be copied once.
Is there anything in Boost or the standard C++ library that would help with this? Or any particularly clever algorithms?
Edit: To be more concrete:
struct SmallObject {
unsigned id;
boost::icl::discrete_interval<unsigned> ival;
};
The question which container is most efficient is always best answered by "it depends" and "measure it!".
Without any more information about your specific situation, there are two 'obvious' possibilities:
Use a linked list
The STL has two linked lists by default: a singly linked list std::forward_list and a doubly linked list std::deque. Moreover there is std::list which is usually the doubly-linked variant. Some quotes from the documentation:
std::forward [...] is implemented as a singly-linked list and essentially does not have any overhead compared to its implementation in C. Compared to std::list this container provides more space efficient storage when bidirectional iteration is not needed.
std::list [...] is usually implemented as a doubly-linked list. Compared to std::forward_list this container provides bidirectional iteration capability while being less space efficient.
std::deque (double-ended queue) [..] insertion and deletion at either end of a deque never invalidates pointers or references to the rest of the elements.
As opposed to std::vector, the elements of a deque are not stored contiguously: typical implementations use a sequence of individually allocated fixed-size arrays
Reserve space in a vector
If there is any way you can estimate an upper bound on the number of objects you will want to store, you can use that to reserve some space in advance.
For example, if you're reading these objects from a file, the number of objects may be at most the file size divided by 16, or the number of lines times two, or some other quick and easy calculation that you can do before constructing these objects.
In that case, if you reserve the capacity, you will allocate too much memory but prevent moves. Even if the upper bound is a bit too low, that's OK: you may still need to double the capacity once or twice but at least you prevent all the small increases (2 -> 4 -> 10 -> 16) at the start of the loop.

How are STL List and Vector implemented?

How are STL List and Vector implement?
I was just asked this in an an interview.
I just said maybe by using binary tree or hash table about vector. not sure about list...
Am I wrong, I guess so..
give some ideas thanks.
Hash table or binary tree? Why?
std::vector, as the name itself suggests, is implemented with a normal dynamically-allocated array, that is reallocated when its capacity is exhausted (usually doubling its size or something like that).
std::list instead is (usually1) implemented with a doubly-linked list.
The binary tree you mentioned is the usual implementation of std::map; the hash table instead is generally used for the unordered_map container (available in the upcoming C++0x standard).
"Usually" because the standard do not mandate a particular implementation, but specifies the asymptotic complexity of its methods, and such constraints are met easily with a doubly-linked list.
On the other hand, for std::vector the "contiguous space" requirement is enforced by the standard (from C++03 onwards), so it must be some form of dynamically allocated array.
std::vector uses a contiguously allocated array and placement new
std::list uses dynamically allocated chunks with pointer to the next and previous element.
nothing as fancy as binary trees or hash tables (which can be used for std::map)
You can spend half a semester talking about either of the containers, but here are a few points:
std::vector is a contiguous container, which means every element follows right after the previous element in memory. It can grow at runtime, which means it allocates its storage in dynamic memory.
std::list is a bidirectional linked list. This means that the elements are scattered in memory in arbitrary layout, and that each element knows where the next and previous elements in sequence are.
std::vector, std::list and the other containers don't take ownership of the elements they hold, but they do cleanup after themselves. So, if the elements are pointers to dynamic memory then the user must free the pointers before the container destructs. But if the container contains automatic data then the data's destructors will call automatically upon the container's cleanup.
So far, very simple and roughly equivalent to any other language or toolset. What's unique about the STL is that the containers are generic and decoupled from the means of iterating over them and (for the most part) from the operations you can perform over them. Some operations can be done particularly efficiently with some containers, so the containers will provide member functions in these cases. For example, std::list has a sort() member function.
The STL doesn't provide container classes (for the most part), but rather container templates. In other words, when the library talks about a container it only refers to the data type anonymously, say, as T, never by its true name. Never int or double or Car; always T, for any type. There are exceptions, like std::vector<bool>, but this is the general case. Then, when the user instantiates a container template, they specify a type, and the compiler creates a container class from the template for that type.
The STL also offers algorithms as free template functions. These algorithms work on iterators, themselves templates. Often iterators come in pairs that denote the beginning and end of a sequence, on which the algorithm operates. std::vector, std::list and other containers then expose their own iterators that can traverse and manipulate their data. So the same free algorithm can work on a std::vector and a std::list and other containers, provided the iterators conform with specific assumptions about the iterators' abilities.
All this abstraction is done at compile-time, and that is the biggest difference when compared to other languages. This translates to outstanding performance with relatively short and concise code. The same performance that in C you'd only get with lots of copy-pasting or hardcoding.

Maintaining an Ordered collection of objects

I have the following requirements for a collection of objects:
Dynamic size (in theory unlimited, but in practice a couple of thousand should be more than enough)
Ordered, but allowing reorder and insertion at arbitrary locations.
Allows for deletion
Indexed Access - Random Access
Count
The objects I am storing are not large, a couple of properties and a small array or two (256 booleans)
Is there any built in classes I should know about before I go writing a linked list?
Original answer: That sounds like std::list (a doubly linked list) from the Standard Library.
New answer:
After your change to the specs, a std::vector might work as long as there aren't more than a few thousand elements and not a huge number of insertions and deletions in the middle of the vector. The linear complexity of insertion and deletion in the middle may be outweighed by the low constants on the vector operations. If you are doing a lot of insertions and deletions just at the beginning and end, std::deque might work as well.
-Insertion and Deletion: This is possible for any STL container, but the question is how long it takes to do it. Any linked-list container (list, map, set) will do this in constant time, while array-like containers (vector) will do it in linear time (with constant-amortized allocation).
-Sorting: Considering that you can keep a sorted collection at all times, this isn't much of an issue, any STL container will allow that. For map and set, you don't have to do anything, they already take care of keeping the collection sorted at all times. For vector or list, you have to do that work, i.e. you have to do binary search for the place where the new elements go and insert them there (but STL Algorithms has all the pieces you need for that).
-Resorting: If you need to take the current collection you have sorted with respect to rule A, and resort the collection with respect to rule B, this might be a problem. Containers like map and set are parametrized (as a type) by the sorting rule, this means that to resort it, you would have to extract every element from the original collection and insert them in a new collection with a new sorting rule. However, if you use a vector container, you can just use the STL sort function anytime to resort with whatever rule you like.
-Random Access: You said you needed random access. Personally, my definition of random access means that any element in the collection can be accessed (by index) in constant time. With that definition (which I think is quite standard), any linked-list implementation does not qualify and it leaves you with the only option of using an array-like container (e.g. std::vector).
Conclusion, to have all those properties, it would probably be best to use a std::vector and implement your own sorted insertion and sorted deletion (performing binary search into the vector to find the element to delete or the place to insert the new element). If your objects that you need to store are of significant size, and the data according to which they are sorted (name, ID, etc.) is small, you might consider splitting the problem by holding a unsorted linked-list of objects (with full information) and keeping a sorted vector of keys along with a pointer to the corresponding node in the linked-list (in that case, of course, use std::list for the former, and std::vector for the latter).
BTW, I'm no grand expert with STL containers, so I might have been wrong in the above, just think for yourself. Explore the STL for yourself, I'm sure you will find what you need, or at least all the pieces that you need. Maybe, look at Boost libraries too.
You haven't specified enough of your requirements to select the best container.
Dynamic size (in theory unlimited, but in practice a couple of thousand should be more than enough)
STL containers are designed to grow as needed.
Ordered, but allowing reorder and insertion at arbitrary locations.
Allowing reorder? A std::map can't be reordered: you can delete from one std::map and insert into another using a different ordering, but as different template instantiations the two variables will have different types. std::list has a sort() member function [thanks Blastfurnace for pointing this out], particularly efficient for large objects. A std::vector can be resorted easily using the non-member std::sort() function, particularly efficient for tiny objects.
Efficient insertion at arbitrary locations can be done in a map or list, but how will you find those locations? In a list, searching is linear (you must start from somewhere you already know about and scan forwards or backwards element by element). std::map provides efficient searching, as does an already-sorted vector, but inserting into a vector involves shifting (copying) all the subsequent elements to make space: that's a costly operation in the scheme of things.
Allows for deletion
All containers allow for deletion, but you have the exact-same efficiency issues as you do for insertion (i.e. fast for list if you already know the location, fast for map, deletion in vectors is slow, though you can "mark" elements deleted without removing them, e.g. making a string empty, having a boolean flag in a struct).
Indexed Access - Random Access
vector is indexed numerically (but can be binary searched), map by an arbitrary key (but no numerical index). list is not indexed and must be searched linearly from a known element.
Count
std::list provides an O(n) size() function (so that it can provide O(1) splice), but you can easily track the size yourself (assuming you won't splice). Other STL containers already have O(1) time for size().
Conclusions
Consider whether using a std::list will result in lots of inefficient linear searches for the element you need. If not, then a list does give you efficient insertion and deletion. Resorting is good.
A map or hash map will allow quick lookup and easy insertion/deletion, can't be resorted, but you can easily move the data out to another map with another sort criteria (with moderate efficiency.
A vector allows fast searching and in-place resorting, but the worst insert/deletion. It's the fastest for random-access lookup using the element index.

Is there a QList type datastructure in STL or Boost?

just a brief QList intro, QList is similar to QVector and STL vector but in addition it also reserves some space at the beginning; (and in the end too)
I am looking for something similar in STL , if not atleast Boost
Reserving space at the beginning improves prepending or removing the first item (constant time), because the buffer can grow backwards. Insertion is improved depending on the position of insertion.
So does anyone know of a similar data structure in STL/C++??
Thanks.
std::deque provides front and back insertion removal at constant time, if it is what you are looking for.
If you want efficient insertion and removal at the beginning and end of a sequence, use a std::deque.
If you know an upper limit on container size and don't plan on mid-container inserts you could use boost::circular_buffer.
If you are doing a lot of mid-container inserts deque would be the a better choice than vector since it (typically) groups members into fixed-size blocks, rather than one contiguous block of memory.
Note - QList actually states that it uses an array of pointers to objects, if they are non-trivial. To simulate this in C++, you would use deque<MyClass*> or (better) some smart pointer wrapper on MyClass such as unique_ptr or shared_ptr, to prevent excessive copying of MyClass in housekeeping.
Internally, QList is represented as
an array of pointers to items of type
T. If T is itself a pointer type or a
basic type that is no larger than a
pointer, or if T is one of Qt's shared
classes, then QList stores the
items directly in the pointer array.
For lists under a thousand items, this
array representation allows for very
fast insertions in the middle, and it
allows index-based access.

C++ deque vs vector and C++ map vs Set

Can some one please tell me what is the difference between vector vs deque. I know the implementation of vector in C++ but not deque. Also interfaces of map and set seem similar to me. What is the difference between the two and when to use one.
std::vector: A dynamic-array class. The internal memory allocation makes sure that it always creates an array. Useful when the size of the data is known and is known to not change too often. It is also good when you want to have random-access to elements.
std::deque: A double-ended queue that can act as a stack or queue. Good for when you are not sure about the number of elements and when accessing data-element are always in a serial manner. They are fast when elements are added/removed from front and end but not when they're added/removed to/from the middle.
std::list: A double-linked list that can be used to create a 'list' of data. The advantage of a list is that elements can be inserted or deleted from any part of the list without affecting an iterator that is pointing to a list member (and is still a member of the list after deletion). Useful when you know that elements will be deleted very often from any part of the list.
std::map: A dictionary that maps a 'key' to a 'value'. Useful for applications like 'arrays' whose index are not an integer. Basically can be used to create a map-list of name to an element, like a map that stores name-to-widget relationship.
std::set: A list of 'unique' data values. For e.g. if you insert 1, 2, 2, 1, 3, the list will only have the elements 1, 2, 3. Note that the elements in this list are always ordered. Internally, they're usually implemented as binary search trees (like map).
See here for full details:
What are the complexity guarantees of the standard containers?
vector Vs deque
A deque is the same as a vector but with the following addition:
It is a "front insertion sequence"
This means that deque is the same as a vector but provides the following additional gurantees:
push_front() O(1)
pop_front() O(1)
set Vs map
A map is a "Pair Associative Container" while set is a "Simple Associative Container"
This means they are exactly the same. The difference is that map holds pairs of items (Key/Value) rather than just a value.
std::vector
Your default sequential containers should be a std::vector. Generally, std::vector will provide you with the right balance of performance and speed. The std::vector container is similar to a C-style array that can grow or shrink during runtime. The underlying buffer is stored contiguously and is guaranteed to be compatible with C-style arrays.
Consider using a std::vector if:
You need your data to be stored contiguously in memory
Especially useful for C-style API compatibility
You do not know the size at compile time
You need efficient random access to your elements (O(1))
You will be adding and removing elements from the end
You want to iterate over the elements in any order
Avoid using a std::vector if:
You will frequently add or remove elements to the front or middle of the sequence
The size of your buffer is constant and known in advance (prefer std::array)
Be aware of the specialization of std::vector: Since C++98, std::vector has been specialized such that each element only occupies one bit. When accessing individual boolean elements, the operators return a copy of a bool that is constructed with the value of that bit.
std::array
The std::array container is the most like a built-in array, but offering extra features such as bounds checking and automatic memory management. Unlike std::vector, the size of std::array is fixed and cannot change during runtime.
Consider using a std::array if:
You need your data to be stored contiguously in memory
Especially useful for C-style API compatibility
The size of your array is known in advance
You need efficient random access to your elements (O(1))
You want to iterate over the elements in any order
Avoid using a std::array if:
You need to insert or remove elements
You don’t know the size of your array at compile time
You need to be able to resize your array dynamically
std::deque
The std::deque container gets its name from a shortening of “double ended queue”. The std::deque container is most efficient when appending items to the front or back of a queue. Unlike std::vector, std::deque does not provide a mechanism to reserve a buffer. The underlying buffer is also not guaranteed to be compatible with C-style array APIs.
Consider using std::deque if:
You need to insert new elements at both the front and back of a sequence (e.g. in a scheduler)
You need efficient random access to your elements (O(1))
You want the internal buffer to automatically shrink when elements are removed
You want to iterate over the elements in any order
Avoid using std::deque if:
You need to maintain compatibility with C-style APIs
You need to reserve memory ahead of time
You need to frequently insert or remove elements from the middle of the sequence
Calling insert in the middle of a std::deque invalidates all iterators and references to its elements
std::list
The std::list and std::forward_list containers implement linked list data structures. Where std::list provides a doubly-linked list, the std::forward_list only contains a pointer to the next object. Unlike the other sequential containers, the list types do not provide efficient random access to elements. Each element must be traversed in order.
Consider using std::list if:
You need to store many items but the number is unknown
You need to insert or remove new elements from any position in the sequence
You do not need efficient access to random elements
You want the ability to move elements or sets of elements within the container or between different containers
You want to implement a node-wise memory allocation scheme
Avoid using std::list if:
You need to maintain compatibility with C-style APIs
You need efficient access to random elements
Your system utilizes a cache (prefer std::vector for reduced cache misses)
The size of your data is known in advance and can be managed by a std::vector
A map is what is often refered to as an associative array, usually implemented using a binary tree (for example). A deque is a double-ended queue, a certain incarnation of a linked list.
That is not to say that the actual implementations of the container library uses these concepts - the containr library will just give you some guarantees about how you can access the container and at what (amortized) cost.
I suggest you take a look at a reference that will go into detail about what those guarantees are. Scott Meyers book "Effective STL: 50 Specific Ways to Improve Your Use of the Standard Template Library" should talk a bit about those, if I remember correctly. Apart from that, the C++ standard is obviously a good choice.
What I really want to say is: containers really are described by their properties, not by the underlying implementation.
set: holds unique values. Put 'a' in twice, the set has one 'a'.
map: maps keys to values, e.g. 'name' => 'fred', 'age' => 40. You can look up 'name' and you'll get 'fred' out.
dequeue, like a vector but you can only add/remove at the ends. No inserts into the middle. http://en.wikipedia.org/wiki/Deque
edit: my dequeue description is lacking, see comments below for corrections