Is there already some std::vector based set/map implementation?

Is there already some std::vector based set/map implementation? - c++

For small sets or maps, it's usually much faster to just use a sorted vector, instead of the tree-based set/map - especially for something like 5-10 elements. LLVM has some classes in that spirit, but no real adapter that would provide a std::map like interface backed up with a std::vector.
Any (free) implementation of this out there?
Edit: Thanks for all the alternative ideas, but I'm really interested in a vector based set/map. I do have specific cases where I tend to create huge amounts of sets/maps which contain usually less than 10 elements, and I do really want to have less memory pressure. Think about for example neighbor edges for a vertex in a triangle mesh, you easily wind up with 100k sets of 3-4 elements each.

I just stumbled upon your question, hope its not too late.
I recommend a great (open source) library named Loki.
It has a vector based implementation of an associative container that is a drop-in replacement for std::map, called AssocVector.
It offers better performance for accessing elements (and worst performance for insertions/deletions).
The library was written by Andrei Alexandrescu author of Modern C++ Design.
It also contains some other really nifty stuff.

If you can't find anything suitable, I would just wrap a std::vector to do sort() on insert, and implement find() using lower_bound(). It should be straight forward, and just as efficient as a custom solution.

Old post, I know, but for more recent visitors, Boost's flat_set and flat_map look like what you need. See https://theboostcpplibraries.com/boost.container for more information.

I don't know any such implementation, but there are some functions that help working with sorted vectors already in STL, such as lower_bound and upper_bound.

If the set or map truly is small, the performance gained by micro-optimizing the data structure will have little to no noticeable effects. You'll save maybe one or two memory (read: cache) lookups when searching a tiny tree vs tiny vector, which in the big picture is insignificant.
Having said that, you could give hash_map a try. Lookups by key are guaranteed to run in constant time.

Maybe you're looking for unordered map's and unordered set's. Try taking a look at the TR1 unordered containers that rely on hashing, or the Boost.Unordered container library. Underneath the interface, I'm not sure if they really do use std::vector, but I'd wager it's worth taking a look at.

Related

Boost flat_map container

Working on some legacy code, I am running into memory issues due mainly (I believe) to the extensive use of STL maps (particularly “maps-of-maps”.)
I am looking at Boost flat_map as a possible solution. Does anyone have any firsthand experience with flat_maps, in particular with regards improvements in speed and/or memory usage? I realize of course this can be very dependent on the types of data stored and the manner in which they are stored but still curious of folk’s actual experience.
Can anyone point me to some solid examples?
As an example: there are several cases in this code of a map-of-a-map; that is, a map where the value is another map.
By replacing the “inner” map with a pair of vectors, I reduced the memory footprint 10:1 (3G to 300M). Of course this can slow down searches but for this particular case it doesn’t seem to matter much. And it involved about a day of refactoring and careful testing.
Boost’s flat_map sounds like it might be just what I need but I can’t seem to find out much about it other than the class description on the Boost web site. Looking for some firsthand feedback.

Boost's flat_map is a binary-tree-based map implementation, except that that binary tree is stored as a (sorted) vector of key-value pairs.
You can basically figure out the answers regarding performance (relative to an std::map by yourself based on that fact:
Iterating the map or a large part of it should be super-fast, relatively
Lookup should typically be relatively fast
Adding or removing values is theoretically much slower, but in practice - assuming your key and value types are small and the number of map elements not very high - probably comparable in speed (or even better on small maps - often no allocation is necessary on insert)
etc.
In your case - maps-of-maps - you're going to lose some of the benefit of "flattening things out", since you'll have an outer map with a pointer to an inner map (if not more levels of indirection); but the flat map would at least help you reduce that. Also, supposing you have two levels of maps, you could arrange it so that you store all of the inner maps contiguously (either by constructing the inner maps appropriately or by instantiating them with your own allocator, a trickier affair); in that case, you could replace pointers to maps with map indices, reducing the amount of space they take up and making life easier for the compiler.
You might also want read Boost's documentation of flat_map; and you could also just use the force and read the source (and the source of the underlying flat_tree) - like I have; I dont actually have flat_map experience myself.

I know this is an old question, but this might be of use to someone finding this question.
I found that flat_map was a big improvement in searching, lookup and iterating large maps. The fact the map is using contiguous data in memory also makes inserting faster than you might expect due to great data locality. If you're doing more inserts than lookups in your map then it might not be for you.
Having said that, repeatedly inserting a random value into a sorted vector is faster than the same on a linked list because of the data locality - despite what Big O might tell you. (tested in VS2017 and G++ 4.8).

Why is std::vector so much more popular than std::deque? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why would I prefer using vector to deque
I am curious why is it that std::vector is so much more popular than std::deque. Deque is almost as efficient in lookup, more efficient in insert (without vector::reserve)and allows for inserting/deleting in the front.
Herb Sutter once recommended that if you want to use vector, just prefer deque (I am paraphrasing). However in a recent talk on Writing Modern C++ he again strongly recommends thinking of std::vector as a default container. According to the GOTW I linked earlier, even the standard has similar wording.
Is there a reason for this disparity? Is it just that vector is simpler and more well known, or is there a technical reason? Or is it that vector is just a cooler name .. ?

I can't speak for anybody else, but I can for myself.
When I first read about std::deque, I thought it was cool enough that for a while, I treated it not only as the default container, but as nearly the only container I used.
Then somebody asked about why, and I expounded at length on its virtues and why it was the best container to use for practically everything, and how it was much more versatile than std::vector.
Fortunately, the guy questioning my decision on it was persuasive enough that I did some testing. Testing showed that in nearly every case, std::deque was slower than std::vector -- often by a substantial factor (e.g., around 2). In fact, of the code I'd written using std::deque, just replacing std::deque with std::vector gave a speedup in all but a handful of cases.
I have used std::deque in a few cases since then, but definitely don't treat it as the default any more. The simple fact is that in the usual implementation it's noticeably slower than std::vector for most purposes.
I should add, however, that I'm reasonably certain that with the right implementation, it could be nearly equivalent to std::vector in virtually all cases. Most use a representation that's undoubtedly great from an asymptotic viewpoint, but doesn't work out quite so wonderfully (for many purposes) in the real world.

std::vector is very well understood, simple and is compatible with C (both in terms of the memory layout, and in terms of using pointers as iterators).
For some operations it is also more efficient than std::deque. Accessing elements by index is one example.
For a given task, it makes sense to use the simplest container that does the job well. In many cases that simplest container is std::vector.

People use std::vector over std::deque for a simple reason. They interface with a lot of C libraries and they have functions with parameters which require a pointer to contiguous storage, which std::deque doesn't (can't) guarantee.
Another reason is when you build code according to std::deque and your requirements change so that you have to support contiguous access, you will have a bit of refactoring to do.
I am compelled to mention that there are libraries out there whose authors claim to have created more efficient vector implementations in order to overcome inefficiencies when capacity is increased.

The structure of std::deque is a bit more complex which makes naive iteration quite a bit more expensive than std::vector. The insertions into std::vector with its reallocations tend not to be big problem, especially when using reserve() and only appending to the end. Also, there are easier to understand invalidation rules for std::vector although it is actually an advantage of std::deque that objects stay put when inserting/removing only at either end (note that std::deque iterators get invalidate upon each insertion, independent of where the insertion happens). Another plus for std:vector is the guarantee that the values are contiguous in memory causing it to make fewer memory allocations.
I guess, I would recommend use of std::deque algorithms were consistently optimized to use segmented sequences (I'm not aware that any standard C++ library does this optimization) and users were accessing sequences consistently using algorithms (as far as I can tell, only a tiny fraction of users considers the option to use algorithms). Otherwise I would suspect that std::deque is the better option with respect to performance only if you take advantage of its specific properties (e.g., that objects stay put and that you can insert/remove at the end). It is worth profiling the two alternatives, though.

Apart from std::vector being the most commonly known container class, it also has several advantages over std::deque, namely:
A typical std::deque requires an additional indirection to access the elments unlike as in case of std::vector.
Iterators in case of std::deque must be smart pointers and not pointers as in case of std::vector.
Elements of an std::vector are guaranteed to be contiguous and hence it is compatible with c-style functions which take arrays as parameters.
std::deque provide no support to control the capacity and the moment of reallocation.
Especially, the last point is noteworthy.

Should std::list be deprecated?

According to Bjarne Stroustrup's slides from his Going Native 2012 keynote, insertion and deletion in a std::list are terribly inefficient on modern hardware:
Vector beats list massively for insertion and deletion
If this is indeed true, what use cases are left for std::list? Shouldn't it be deprecated then?

Vector and list solve different problems. List provides the guarantee that iterators never become invalidated as you insert and remove other elements. Vector doesn't make that guarantee.
Its not all about performance. So the answer is no. List should not be deprecated.
Edit Beyond this, C++ isn't designed to work solely on "modern hardware." It is intended to be useful across a much wider range of hardware than that. I'm a programmer in the financial industries and I use C++, but other domains such as embedded devices, programmable controllers, heart-lung machines and myriad others are just as important. The C++ language should not be designed solely with the needs of certain domains and the performance of certain classes of hardware in mind. Just because I might not use a list doesn't mean it should be deprecated from the language.

Whether a vector outperforms a list or not also depends on the type of the elements. For example, for int elements vector is indeed very fast as most of the data fits inside the CPU cache and SIMD instructions can be used for the data copying. So the O(n) complexity of vector doesn't have much impact.
But what about larger data types, where copying doesn't translate to a stream operation, and instead data must be fetched from all over the place? Also, what about hardware that doesn't have large CPU caches and possibly also lacks SIMD instructions? C++ is used on much more than just modern desktop and workstation machines. Deprecating std::list is out of the question.
What Stroustrup is saying in that presentation is that before you pick std::list for your data, you should make sure that it's the right choice for your particular situation. In other words, benchmark and profile. It definitely doesn't say you should always pick std::vector.

No, and especially not based on one particular graph. There are instances where list will perform better than vector. See: http://www.baptiste-wicht.com/2012/12/cpp-benchmark-vector-list-deque/
And that's ignoring the non-performance differences, as others have mentioned.
Bjarne's point in that talk wasn't that you shouldn't use list. It was that people make too many assumptions about list's performance that often turn out to be wrong. He was simply justifying the stance that vector should always be your default go-to container type unless you actually find a need for the performance or other semantic characteristics of lists.

std::list is a deque, it has push_front() and pop_front(). It still has a niche role as such, though it may not be the best choice for a deque.
std::list does not reallocate memory, while std::vector may. Sometimes you don't want an item to move in memory (e.g. a stackful coroutine).
Linked lists are related to tree data structures. Both contain links. If we deprecate std::list, then what about tree-based containers?

Of course not. std::list is a different data structure. Comparing different data structure is good indication of its properties, advantages or disadvantages. But each data structure has its advantage.

Fastest way to speed up map<string,int> .find() in c++ . Where the keys are in alphabetical order

I have a map with about 100,000 pairs . Is there any way that i can speed up searching when using find(), given that the keys are in alphabetical order. Also how should i go about doing it. I know that you can specify a new comparator when you create the map. But will that speed up the find() function at all?
Thanks in advance.
[solved] Thanks a bunch guys i have decided to go with a vector and use lower and upperbound to "snip" some of the searching.
Also i am new here is there any way to mark this question as answered , or pick a best answer?

A different comparator will only speed up find if it manages to do the comparison faster (which, for strings will usually be pretty difficult).
If you're basically inserting all the data in order, then doing the searching, it may be faster to use a std::vector with std::lower_bound or std::upper_bound.
If you don't really care about ordering, and just want to find the data as quickly as possible, you might find that std::unordered_map works better for you.
Edit: Just for the record: the way you "might find" or "may find" those things is normally by profiling. Depending on the situation, it might be enough faster that it's pretty obvious even in simple testing, so profiling isn't really necessary, but if there's (much) doubt, or you want to quantify the effect, a profiler is probably the right way to do it.

std::map is already taking advantage of the fact the keys are in alphabetical order - it guarantees that itself. You aren't going to be able to improve it by changing the comparator (one assumes it's already a reasonably efficient string comparison).
Have you considered using unordered_map (aka hash_map in various implementations pre C++11? It should be able to search in O(1) instead of O(log(n)) for std::map.
You could also look into something slightly more exotic, like a trie, but that's not part of the standard library so you'd either have to find one elsewhere or roll your own, so I'd suggest unordered_map is a good place to start.

If you're using std::find to find elements, you should switch to using map::find (you don't really say in your question.) map::find uses the fact that the map is ordered to search much faster.
If that's still not good enough, you might look into a hash container such as unordered_map rather than map.

I've put in a vote for unordered_map but I wanted to also make another point.
One of the things that can hurt performance on modern machines is poor use of the cache. A map is going to have nodes allocated all over the place and there won't be much locality of reference. Also since it has to store a bunch of pointers between nodes it will use up more memory.
At the recent Going Native 2012 conference Bjarne Stroustroup gave an interesting talk that touched on this topic. He compared vector and list performance at a task involving a lot of random insertions and deletions, where it might seem list ought to have dominated, but because of the memory size and layout issue vector was in fact the fastest by far. Take a look at his slides, starting at slide 43.
unordered_map gives you direct access to the element and so it probably means even less hopping around in memory than trying to stick your data in a vector (and thus better performance than vector) so my comment is simply an admonishment to always keep your memory access pattern in mind for performance

When should the STL algorithms be used instead of using your own?

I frequently use the STL containers but have never used the STL algorithms that are to be used with the STL containers.
One benefit of using the STL algorithms is that they provide a method for removing loops so that code logic complexity is reduced. There are other benefits that I won't list here.
I have never seen C++ code that uses the STL algorithms. From sample code within web page articles to open source projects, I haven't seen their use.
Are they used more frequently than it seems?

Short answer: Always.
Long answer: Always. That's what they are there for. They're optimized for use with STL containers, and they're faster, clearer, and more idiomatic than anything you can write yourself. The only situation you should consider rolling your own is if you can articulate a very specific, mission-critical need that the STL algorithms don't satisfy.
Edited to add: (Okay, so not really really always, but if you have to ask whether you should use STL, the answer is "yes".)

You've gotten a number of answers already, but I can't really agree with any of them. A few come fairly close to the mark, but fail to mention the crucial point (IMO, of course).
At least to me, the crucial point is quite simple: you should use the standard algorithms when they help clarify the code you're writing.
It's really that simple. In some cases, what you're doing would require an arcane invocation using std::bind1st and std::mem_fun_ref (or something on that order) that's extremely dense and opaque, where a for loop would be almost trivially simple and straightforward. In such a case, go ahead and use the for loop.
If there is no standard algorithm that does what you want, take some care and look again -- you'll often have missed something that really will do what you want (one place that's often missed: the algorithms in <numeric> are often useful for non-numeric uses). Having looked a couple of times, and confirmed that there's really not a standard algorithm to do what you want, instead of writing that for loop (or whatever) inline, consider writing an generic algorithm to do what you need done. If you're using it one place, there's a pretty good chance you can use it two or three more, at which point it can be a big win in clarity.
Writing generic algorithms isn't all that hard -- in fact, it's often almost no extra work compared to writing a loop inline, so even if you can only use it twice you're already saving a little bit of work, even if you ignore the improvement in the code's readability and clarity.

STL algorithms should be used whenever they fit what you need to do. Which is almost all the time.

When should the STL algorithms be used instead of using your own?
When you value your time and sanity and have more fun things to do than reinventing the wheel again and again.
You need to use your own algorithms when project demands it, and there are no acceptable alternatives to writing stuff yourself, or if you identified STL algorithm as a bottleneck (using profiler, of course), or have some kind of restrictions STL doesn't conform to, or adapting STL for the task will take longer than writing algorithm from scratch (I had to use twisted version of binary search few times...). STL is not perfect and isn't fit for everything, but when you can, you should use it. When someone already did all the work for you, there is frequently no reason to do the same thing again.

I write performance critical applications. These are the kinds of things that need to process millions of pieces of information in as fast a time as possible. I wouldn't be able to do some of the things that I do now if it weren't for STL. Use them always.

There are many good algorithms besides stuff like std::foreach.
However there are lots of non-trivial and very useful algorithms:
Sorting: std::sort, std::upper_bound, std::lower_bound, std::binary_search
Min/Max std::max, std::min, std::partition, std::min_element, std::max_element
Search like std::find, std::find_first_of etc.
And many others.
Algorithms like std::transform are much useful with C++0x lambda expressions or stuff like boost::lambda or boost::bind

If I had to write something due this afternoon, and I knew how to do it using hand-made loops and would need to figure out how to do it in STL algorithms, I would write it using hand-made loops.
Having said that, I would work to make the STL algorithms a reliable part of my toolkit, for reasons articulated in the other answers.
--
Reasons you might not see it is in code is that it is either legacy code or written by legacy programmers. We had about 20 years of C++ programming before the STL came out, and at that point we had a community of programmers who knew how to do things the old way and had not yet learned the STL way. This will likely remain for a generation.

Bear in mind that the STL algorithms cover a lot of bases, but most C++ developers will probably end up coding something that does something equivalent to std::find(), std::find_if() and std::max() almost every day of their working lives (if they're not using the STL versions already). By using the STL versions you separate the algorithm from both the logical flow of your code and from the data representation.
For other less commonly used STL algorithms such as std::merge() or std::lower_bound() these are hugely useful routines (the first for merging two sorted containers, the second for working out where to insert an item in a container to keep it ordered). If you were to try to implement them yourself then it would probably take a few attempts (the algorithms aren't complicated, but you'd probably get off-by-one errors or the like).
I myself use them every day of my professional career. Some legacy codebases that predate a stable STL may not use it as extensively, but if there's a newer project that is intentionally avoiding it I would be inclined to think it was by a part-time hacker who was still labouring under the mid-90's assumption that templates are slow and therefore to be avoided.

The only time I don't use STL algorithms is when the cross-platform implementation differences affect the outcome of my program. This has only happened in one or two rare cases (on the PlayStation 3). Although the interface of the STL is standardized across platforms, the implementation is not.
Also, in certain extremely high performance applications (think: video games, video game servers) we replaced a some STL structures with our own to eke out a bit more efficiency.
However, the vast majority of the time using STL is the way to go. And in my other (non-video game) jobs, I used the STL exclusively.

The main problem with STL algorithms until now was that, even though the algorithm call itself clearer, defining the functors that you'd need to pass to them would make your code longer and more complex, due to the way the language forced you to do it. C++0x is expected to change that, with its support for lambda expressions.
I've been using STL heavily for the past six years and although I tried to use STL algorithms anywhere I could, in most instances it would make my code more obscure, so I got back to a simple loop. Now with C++0x is the opposite, the code seems to always look simpler with them.
The problem is that by now C++0x support is still limited to a few compilers, even because the standard is not completely finished yet. So probably we will have to wait a few years to really see widespread use of STL algorithms in production code.

I would not use STL in two cases:
When STL is not designed for your task. STL is nearly the best for general purposes. However, for specific applications STL may not always be the best. For example, in one of my programs, I need a huge hash table while STL/tr1's hashmap equivalence takes too much memory.
When you are learning algorithms. I am one of the few who enjoy reinventing the wheels and learn a lot in this process. For that program, I reimplemented a hash table. It really took me a lot of time, but in the end all the efforts paid off. I have learned many things that greatly benefit my future career as a programmer.

When you think you can code it better than a really clever coder who spent weeks researching and testing and trying to cope with every conceivable set of inputs.
For most Earthlings the answer is never!

I want to answer the "when not to use STL" case, with a clear example.
(as a challenge to all of you, show me that you can solve this with any STL algorithm until C++17)
Convert a vector of integers std::vector<int> into a vector of std::pair<int,int>, i.e.:
Convert
std::vector<int> values = {1,2,3,4,5,6,7,8,9,10};
to
std::vector<std::pair<int,int>> values = { {1,2}, {3,4} , {5,6}, {7,8} ,{9,10} };
And guess what? It's impossible with any STL algorithm until C++17.
See the complete discussion on solving this problem here:
How can I convert std::vector<T> to a vector of pairs std::vector<std::pair<T,T>> using an STL algorithm?
So to answer your question: Use STL algorithm always only when it perfectly fits your problem. Don't hack an STL algorithm to make it fit your problem.

Are they used more frequently than it seems?
I've never seen them used; except in books. Maybe they're used in the implementation of the STL itself. Maybe they'll become more used because easier to use (see for example Lambda functions and expressions), or even become obsoleted (see for example the Range-based for-loop), in the next version of C++ .

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js