Practical summary/reference of C++11 containers/adapters properties? [closed] - c++

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I'm looking for a comprehensive summary/reference of important properties of the various C++11 standard containers and container adapters (optionally also including boost/Qt), but indexed by those properties rather than the usual per container documentation (more on that below).
The properties I have in mind include, from the top of my head:
Insertion capabilities (front / back / arbitrary).
Removal capabilities (front / back / arbitrary).
Access capabilities (front / back / uni/bi-directional traversal / random access).
Complexity of the aforementioned operations, and iterator invalidation rules.
Uniqueness? Ordered? Associative? Contiguous storage? Reservation ahead of time?
I may have forgotten some in which case don't hesitate to comment/edit.
The goal is to use that document as an aid to choose the right container/adapter for the right job, without having to wade through the various individual documentations over and over every time (I have a terrible memory).
Ideally it should be indexed both by property and by container type (eg. table-like) to allow for decision-making as well as for quick reference of the basic constraints. But really the per property indexes are the most important for me since this is the most painful to search in the documentation.
I'd be very surprised if nobody had already produced such a document, but my Search-fu is failing me on this one.
NOTE: I'm not asking for you to summarize all these informations (I'll do that myself if I really have to, in which case I'll post the result here) but only if you happen to know an existing document that fits those requirements. Something like this is a good start but as you can see it still lacks many of the information I'd like to have since it's restricted to member functions.
Thanks for your attention.

I am not aware of a single document that provides everything you need, but most of it has been catalogued somewhere.
This reference site has a large table with all the member functions of all the containers
This SO question has a large table of the complexity guarantees.
This SO question gives you a decision tree to choose between containers.
The complexity requirements for container member functions are not too hard to memorize since there are only 4 categories: (amortized) O(1), O(log N), O(N), and O(N log N) (member function std::list::sort() which really crosses into the algorithms domain of the Standard Library) so if you want you could make a 4-color-coded version of the cpp reference container table.
Choosing the right container can be as simple as always using std::vector unless your profiler indicates a bottleneck. After you reach that point, you have to make hard tradeoffs between space / time complexity, data locality, ease of lookup vs ease of insertion / modification, vs extra invariants (sortedness, uniqueness, iterator invalidation rules).
The hardest part is that you have to balance your containers (space requirements) against the algorithms that you are using (time requirements). Containers can maintain invariants (e.g. std::map is sorted on its keys) that other containers can only mimic using algorithms (e.g. std::vector with std::sort, but without the same insertion complexity). So after you finish the container table, make sure to do something similar for the algorithms!
Finally, no container summary would be complete without mentioning Boost.MultiIndex: because sometimes you don't have to choose!

Related

What are the time complexities for size?

I am studying the complexity of various operations of the different STL containers. Through a different question on this site I have found this chart.
website link
One operation I noticed was missing form this chart was the size operation.
I would suppose that if one knew the complexity of .begin and .end one could also calculate the complexity for size. But those are also missing.
I have found an answer similar to what I am looking for in this question, but this one is for Java so it does not cover all the STL containers and it only defines the big O of size for a few of the given datatypes.
Does anyone know the complexity for the .size operation of the various containers or could someone give me a pointer as to where I could find these complexities. Any help would be greatly appreciated.
Also, if my question is wrongly phrased and/or off-topic. Do not hesitate to suggest an edit.
Since C++11, the complexity of the size member function is constant for all standard containers.
std::forward_list which is an implementation of the singly linked list data structure does not provide a size member function. The size can be calculated in linear time using the iterators.
Aside from standard C++ containers, all data structures can be augmented with a separately stored size variable to achieve such constant complexity at the cost of small constant overhead on insert and delete operations. Array is special in regard that it does not require any additional overhead assuming iterator to end is stored.

What hashing method is implemented in standard unordered containers? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Since language standards rarely mandate implementation methods, I'd like to know what is the real world hashing method used by C++ standard library implementations (libc++, libstdc++ and dinkumware).
In case it's not clear, I expect the answer to be a method like these :
Hashing with chaining
Hashing by Division / Multiplication
Universal hashing
Perfect hashing (static, dynamic)
Hashing with open addressing (linear/quadratic probing or double hashing)
Robin-Hood hashing
Bloom Filters
Cuckoo hashing
Knowing why a particular method was chosen over the others would be a good thing as well.
libstdc++: Chaining, only power-of-two table size, default (if it is even configurable) load threshold for rehashing is 1.0, buckets are all separate allocations. Outdated. I don't know current state of things.
Rust: Robin Hood, default load threshold for rehashing is 0.9 (too much for open addressing, BTW)
Go: table slots point to "bins" of 5(7?) slots, not sure what happens if bin is full, AFAIR it is growing in a vector/ArrayList manner
Java: chaining, only power-of-two table size, default load threshold is 0.75 (configurable), buckets (called entries) are all separate allocations. In recent versions of Java, above a certain threshold, chains are changed to binary search trees.
C#: chaining, buckets are allocated from a flat array of bucket structures. If this array is full, it is rehashed (with the table, I suppose) in a vector/ArrayList manner.
Python: open addressing, with own unique collision-resolution scheme (not very fortunate, IMHO), only power-of-two table sizes, load threshold for rehashing is 0.666.. (good). However, slot data in a separate array of structures (like in C#), i. e. hash table operations touch at least two different random memory locations (in the table and in the array of slot data)
If some points missed in descriptions, it doesn't mean they are absent, it means I don't know/remember details.

List of arrays vs list [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
A list uses a lot of memory since it adds a pointer to each node, and it is not contiguous, the memory is there fragmented ... A List of arrays in my opinion is a lot better. For example if I am managing 100 object, A list of 5 arrays of 20 is a lot better than a List of 100, only 5 Pointers added vs 100 pointers, we win locality, when using the same array, and we have less fragmentation.
I did some research about this, but I can't find any interesting article about this, so I thought I am missing something.
What can be the benefit of using a List over a List of arrays ?
EDIT : This is definetely not Array vs List ... It is more like why putting only one element per list Node if it's possible to put more
I think this is a valid question as memory location might affect the performance of your program. You can try std::deque as some suggested. However, the statement that a typical implementation uses chunks of memory is a general statement about implementation, not standard. It is therefor not guaranteed to be so.
In C++, you can improve the locality of your data through custom allocators and memory pools. Every STL container takes as a parameter allocator. The default allocator is probably a simple wrapper around new and delete but you can supply your own allocator that uses a memory pool. You can find more about allocators here and here is a link to the C++ default allocator. Here is a dr.dobbs article about this topic. A quote from the article:
If you're using node-based containers (such as maps, sets, or lists), allocators optimized for smaller objects are probably a better way to go.
Only profiling will tell you what works best for your case. Maybe the combination of std::deque and a custom allocator will be the best option.
Some operations have guaranteed constant-time complexity with std::list and not with std::vector or std::deque. For example, with std::list you can move a subsequence ("splice") of 5000 consecutive elements from the middle of a list with a million items to the beginning (of the same list only!) in constant time. No way to do that with either std::vector or std::deque, and even for your list-of-arrays, it will be a fair bit more complicated.
This is a good read: Complexity of std::list::splice and other list containers

Why would I want to implement my own doubly linked list in C++? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Apart from the academic aspect of learning something from implementing my own doubly linked list in C++, is there any actual real-world advantage of implementing my own doubly linked list when there is already std::list? Can I make things more efficient on my own for a certain task, or has std::list been refined so much over the years that it is the optimal implementation of a doubly linked list in most cases?
is there any actual real-world advantage of implementing my own doubly linked list when there is already std::list?
Probably not.
Can I make things more efficient on my own for a certain task,
Maybe - depends on the task. You might only need a singly linked list, which might end up being faster, for example.
or has std::list been refined so much over the years that it is the optimal implementation of a doubly linked list in most cases?
Probably.
The best answer to all of this stuff is probably "use the standard implementation until it doesn't work, and then figure out what you're going to do about it."
"Why would I want to implement a doubly linked list in C++?"
You would not. Even if you needed a doubly linked list, you would avoid reinventing the wheel and use an existing implementation.
If you have to ask that question, the answer is no.
You would almost never want or need to implement your own list. For that matter, you would almost never want or need to reimplement your own anything if it is covered by the Standard Library.
There are exceptions. Those exceptions are, well... exceptional. Exceptions are made for performance reasons, typically when the performance requirements are extreme. But these exceptions can only be responsibly made when you have already proved that the Standard Library-provided functionality is a measurable problem for your specific use.
Yes, it is possible for a doubly-linked list to be more efficient than std::list for a certain task. There's an algorithmic complexity trade-off in std::list, and your task might benefit from the opposite decision.
Specifically, std::list is required in C++11, and recommended in C++03, to have constant-time size(). That is, the number of elements must be stored in the object.
This means that the 4-argument version of splice() necessarily has linear complexity, because it needs to count the number of elements moved from one list to another in order to update both sizes (except in some special cases, like the source list being empty after the splice, or the two lists being the same object).
However, if you're doing a lot of splicing then you might prefer to have these the other way around -- constant-time splice() and a size() function that is linear-time in the worst case (when a splice has been done on the list more recently than the last time the size was updated).
As it happens, GNU's implementation of std::list made this choice, and had to change for C++11 conformance. So even if this is what you want, you don't necessarily have to completely implement it yourself. Just dust off the old GNU code and change the names so that it's valid user code.

Super high performance C/C++ hash map (table, dictionary) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I need to map primitive keys (int, maybe long) to struct values in a high-performance hash map data structure.
My program will have a few hundred of these maps, and each map will generally have at most a few thousand entries. However, the maps will be "refreshing" or "churning" constantly; imagine processing millions of add and delete messages a second.
What libraries in C or C++ have a data structure that fits this use case? Or, how would you recommend building your own? Thanks!
I would recommend you to try Google SparseHash (or the C11 version Google SparseHash-c11) and see if it suits your needs. They have a memory efficient implementation as well as one optimized for speed.
I did a benchmark a long time ago, it was the best hashtable implementation available in terms of speed (however with drawbacks).
What libraries in C or C++ have a data structure that fits this use case? Or, how would you recommend building your own? Thanks!
Check out the LGPL'd Judy arrays. Never used myself, but was advertised to me on few occasions.
You can also try to benchmark STL containers (std::hash_map, etc). Depending on platform/implementation and source code tuning (preallocate as much as you can dynamic memory management is expensive) they could be performant enough.
Also, if performance of the final solution trumps the cost of the solution, you can try to order the system with sufficient RAM to put everything into plain arrays. Performance of access by index is unbeatable.
The add/delete operations are much (100x) more frequent than the get operation.
That hints that you might want to concentrate on improving algorithms first. If data are only written, not read, then why write them at all?
Just use boost::unordered_map (or tr1 etc) by default. Then profile your code and see if that code is the bottleneck. Only then would I suggest to precisely analyze your requirements to find a faster substitute.
If you have a multithreaded program, you can find some useful hash tables in intel thread building blocks library. For example, tbb::concurrent_unordered_map has the same api as std::unordered_map, but it's main functions are thread safe.
Also have a look at facebook's folly library, it has high performance concurrent hash table and skip list.
khash is very efficient. There is author's detailed benchmark: https://attractivechaos.wordpress.com/2008/10/07/another-look-at-my-old-benchmark/ and it also shows khash beats many other hash libraries.
from android sources (thus Apache 2 licensed)
https://github.com/CyanogenMod/android_system_core/tree/ics/libcutils
look at hashmap.c, pick include/cutils/hashmap.h, if you don't need thread safety you can remove mutex code, a sample implementation is in libcutils/str_parms.c
First check if existing solutions like libmemcache fits your need.
If not ...
Hash maps seems to be the definite answer to your requirement. It provides o(1) lookup based on the keys. Most STL libraries provide some sort of hash these days. So use the one provided by your platform.
Once that part is done, you have to test the solution to see if the default hashing algorithm is good enough performance wise for your needs.
If it is not, you should explore some good fast hashing algorithms found on the net
good old prime number multiply algo
http://www.azillionmonkeys.com/qed/hash.html
http://burtleburtle.net/bob/
http://code.google.com/p/google-sparsehash/
If this is not good enough, you could roll a hashing module by yourself, that fixes the problem that you saw with the STL containers you have tested, and one of the hashing algorithms above. Be sure to post the results somewhere.
Oh and its interesting that you have multiple maps ... perhaps you can simplify by having your key as a 64 bit num with the high bits used to distinguish which map it belongs to and add all key value pairs to one giant hash. I have seen hashes that have hundred thousand or so symbols working perfectly well on the basic prime number hashing algorithm quite well.
You can check how that solution performs compared to hundreds of maps .. i think that could be better from a memory profiling point of view ... please do post the results somewhere if you do get to do this exercise
I believe that more than the hashing algorithm it could be the constant add/delete of memory (can it be avoided?) and the cpu cache usage profile that might be more crucial for the performance of your application
good luck
Try hash tables from Miscellaneous Container Templates. Its closed_hash_map is about the same speed as Google's dense_hash_map, but is easier to use (no restriction on contained values) and has some other perks as well.
I would suggest uthash. Just include #include "uthash.h" then add a UT_hash_handle to the structure and choose one or more fields in your structure to act as the key. A word about performance here.
http://incise.org/hash-table-benchmarks.html gcc has a very very good implementation. However, mind that it must respect a very bad standard decision :
If a rehash happens, all iterators are invalidated, but references and
pointers to individual elements remain valid. If no actual rehash
happens, no changes.
http://www.cplusplus.com/reference/unordered_map/unordered_map/rehash/
This means basically the standard says that the implementation MUST BE based on linked lists.
It prevents open addressing which has better performance.
I think google sparse is using open addressing, though in these benchmarks only the dense version outperforms the competition.
However, the sparse version outperforms all competition in memory usage. (also it doesn't have any plateau, pure straight line wrt number of elements)