Merging a lot of containers with STL Algorithms - c++

I have a lot of lists, vectors, sets ... (what ever you prefer) of pointers called RealAlgebraicNumberPtr to certain class. They are sorted.
I want to merge them and of course I want to do it fast and efficient.
What is the best choice? std::merge ? Or maybe std::set? I can provide both an < and == ordering.
Any ideas?

As mentioned, std::merge is ok.
Only for std::list, you can profit from the optimization that std::list::merge member function implements: it splices the list nodes from the source into the target. That way, the source list will become empty, but it will avoid resource (re)allocation
Re: std::set
you could in fact std::merge into a std::set to get unique values in one go. With generic merge, duplicate values are not filtered, but the result is sorted, so you could apply std::unique to the result. If you expect a lot of duplicates, you might be quicker using a std::set

std::merge is as efficient as it gets. Which underlying container you use depends on your requirements. std::vector has the smallest memory-overhead of all standard containers, so if your data are large, you should stick with that.
If you use std::vector, you should resize the target-vector before merging to avoid reallocations (you should be able to calculate the required size up-front), instead of using an std::back_inserter.

Related

Should I use std::set or std::unordered_set for a set of pointers?

I have a set of pointers. In the first step, I insert data pointers, and in the second step, I iterate over the whole set and do something with the elements. The order is not important, I just need to avoid duplicates, which works fine with pointer comparison.
My question is, whether it might be advantageous to use an unordered set for the same purpose. Is insertion faster for an unordered set?
As Ami Tavory commented, if you don't need order, then it's usually best to go for unordered containers. The reason being that if order somehow improved performance, unordered containers would still be free to use it, and hence get the same or better complexity anyhow.
A downside of unordered collections is that they usually require a hash function for the key type. If it's too hard or expensive to make one, then containers which don't use hashes might be better.
In C++'s standard library, the average insertion complexity for std::set is O(log(N)), whereas for std::unordered_set it's O(1). Aside from that, there are probably less cache misses on average when using std::unordered_set.
At the end of the day though, this is just theory. You should try something that sounds good enough and profile it to see if it really is.

Prefer unordered_set over vector

Is it safe to say that if I don't want duplicates in my container, and I don't care about element position as I only want to iterate through the container, then I should use an unordered_set instead of vector?
Is it safe to say that if I don't want duplicates in my container, and I don't care about element position as I only want to iterate through the container, then I should use an unordered_set instead of vector?
No, it is not. It depends on many factors. For example if you seldom add new elements but iterate over container quite often it would be preferable to use std::vector and maintain uniqueness manually. There also could be other factors affecting your decision. But normally yes you may prefer std::unordered_set as it simplifies your program.
Not entirely. unordered_sets are not required to be contiguous containers; in the case where you'd frequently want to read all numerous values contained in the set, you may prefer std::vector on time-critic application.
std::unordered_set:
Internally, the elements are not sorted in any particular order, but organized into buckets. Which bucket an element is placed into depends entirely on the hash of its value. This allows fast access to individual elements, since once a hash is computed, it refers to the exact bucket the element is placed into.
But in the general case, I'd say Yes.
I generally prefer vector or map. (or in your case, std::set).
Hash tables can be faster than maps/sets (red-black trees), but red-black trees have guaranteed performance 100% of the time. And logarithmic performance is REALLY fast! A hash table kan kill performance when it starts rehashing.
std::vector is the workhorse of the STL and should be your default choice. Vector is very straightforward, and is very cache-friendly
This article by Matt Austern is related to this topic and it is worth reading:
Why you shouldn't use set (and what you should use instead) by Matt Austern
This thread is trying to identify conditions under which unordered_set is preferable over vectors. Similarly, in the above article, the author clearly identifies four conditions, which all need to be satisfied in order to prefer set over a custom but simpler data structure called sorted_vector (last section: What is set good for?). It will be interesting to clearly state a set of conditions for preferring unordered_set over vector.
also, the last paragraph of the article summarizes a useful rule to keep in mind:
Every component in the standard C++ library is there because it's useful for some purpose, but sometimes that purpose is narrowly defined and rare. As a general rule you should always use the simplest data structure that meets your needs. The more complicated a data structure, the more likely that it's not as widely useful as it might seem.
Of course yes. If you do not want duplicates, you have to use a key-aware container, and since unordered_* totally win over their tree-based counterparts, this is pretty much your only choice.

C+11 Associative Container that keeps insertion order?

is there, in the c++ "Standard Library", any "Associative" (i.e. "Key-Value") Container/Data Structure, that has the ability, to preserve order, by order of insertion?
I have seen several topics on this, however, it seems, most before C++11.
Some suggest using "boost::multi_index", but, if at all possible, I would "rather" use standard containers/structures.
I see that C++11 has several, apparently, "unordered" associative containers :link.
Are any of these, by some way, "configurable", such that they are only sorted by insertion order?
Thanks!
C
No.
You are mixing linear access with random. Not very good bed fellows.
Just use both a vector/list (i.e. order of insertion) along with a map using an index into the former.
No; such capability was apparently sacrificed in the name of performance.
The order of equivalent items is required to be preserved across operations including rehashes, but there's no way to specify the original order. You could, in theory, use std::rotate or the like to permute the objects into the desired order after each insertion. Obviously impractical, but it proves the lack of capability is a little arbitrary.
Your best bet is to keep the subsequences in inner containers. You may use an iterator adaptor to iterate over such a "deep" container as if it were a single sequence. Such a utility can probably be found in Boost.
No.
In unordered maps too, there are not stored according to the order of insertion.
You can use vector to keep the track of the key!

Associating and iterating in C++

I've got a situation where I want to use an associative container, and I chose to use a std::unordered_map, because it's perfectly feasible that this container could be used to hold millions or more of elements. But now I also need to iterate in order. I considered having the value types link to each other in a list, but now I'm going to have issues with memory management.
Should I change container, say to a std::map? Or just iterate once through my unordered_map, insert into a vector, and sort, then iterate? It's pretty unlikely that I will need to iterate in an ordered fashion repeatedly.
Well, you know the O() of the various operations of the two alternatives you've picked. You should pick based on that and do a cost/benefit analysis based on where you need the performance to happen and which container does best for THAT.
Of course, I couldn't possibly know enough to do that analysis for you.
You could use Boost.MultiIndex, specifying the unordered (hashed) index as well as an ordered one, on the same underlying object collection.
Possible issues with this - there is no natural mapping from an existing associative container model, and it might be overkill if you don't need the second index all the time.

Dynamically sorted STL containers

I'm fairly new to the STL, so I was wondering whether there are any dynamically sortable containers? At the moment my current thinking is to use a vector in conjunction with the various sort algorithms, but I'm not sure whether there's a more appropriate selection given the (presumably) linear complexity of inserting entries into a sorted vector.
To clarify "dynamically", I am looking for a container that I can modify the sorting order at runtime - e.g. sort it in an ascending order, then later re-sort in a descending order.
You'll want to look at std::map
std::map<keyType, valueType>
The map is sorted based on the < operator provided for keyType.
Or
std::set<valueType>
Also sorted on the < operator of the template argument, but does not allow duplicate elements.
There's
std::multiset<valueType>
which does the same thing as std::set but allows identical elements.
I highly reccomend "The C++ Standard Library" by Josuttis for more information. It is the most comprehensive overview of the std library, very readable, and chock full of obscure and not-so-obscure information.
Also, as mentioned by 17 of 26, Effective Stl by Meyers is worth a read.
If you know you're going to be sorting on a single value ascending and descending, then set is your friend. Use a reverse iterator when you want to "sort" in the opposite direction.
If your objects are complex and you're going to be sorting in many different ways based on the member fields within the objects, then you're probably better off with using a vector and sort. Try to do your inserts all at once, and then call sort once. If that isn't feasible, then deque may be a better option than the vector for large collections of objects.
I think that if you're interested in that level of optimization, you had better be profiling your code using actual data. (Which is probably the best advice anyone here can give: it may not matter that you call sort after each insert if you're only doing it once in a blue moon.)
It sounds like you want a multi-index container. This allows you to create a container and tell that container the various ways you may want to traverse the items in it. The container then keeps multiple lists of the items, and those lists are updated on each insert/delete.
If you really want to re-sort the container, you can call the std::sort function on any std::deque, std::vector, or even a simple C-style array. That function takes an optional third argument to determine how to sort the contents.
The stl provides no such container. You can define your own, backed by either a set/multiset or a vector, but you are going to have to re-sort every time the sorting function changes by either calling sort (for a vector) or by creating a new collection (for set/multiset).
If you just want to change from increasing sort order to decreasing sort order, you can use the reverse iterator on your container by calling rbegin() and rend() instead of begin() and end(). Both vector and set/multiset are reversible containers, so this would work for either.
std::set is basically a sorted container.
You should definitely use a set/map. Like hazzen says, you get O(log n) insert/find. You won't get this with a sorted vector; you can get O(log n) find using binary search, but insertion is O(n) because inserting (or deleting) an item may cause all existing items in the vector to be shifted.
It's not that simple. In my experience insert/delete is used less often than find. Advantage of sorted vector is that it takes less memory and is more cache-friendly. If happen to have version that is compatible with STL maps (like the one I linked before) it's easy to switch back and forth and use optimal container for every situation.
in theory an associative container (set, multiset, map, multimap) should be your best solution.
In practice it depends by the average number of the elements you are putting in.
for less than 100 elements a vector is probably the best solution due to:
- avoiding continuous allocation-deallocation
- cache friendly due to data locality
these advantages probably will outperform nevertheless continuous sorting.
Obviously it also depends on how many insertion-deletation you have to do. Are you going to do per-frame insertion/deletation?
More generally: are you talking about a performance-critical application?
remember to not prematurely optimize...
The answer is as always it depends.
set and multiset are appropriate for keeping items sorted but are generally optimised for a balanced set of add, remove and fetch. If you have manly lookup operations then a sorted vector may be more appropriate and then use lower_bound to lookup the element.
Also your second requirement of resorting in a different order at runtime will actually mean that set and multiset are not appropriate because the predicate cannot be modified a run time.
I would therefore recommend a sorted vector. But remember to pass the same predicate to lower_bound that you passed to the previous sort as the results will be undefined and most likely wrong if you pass the wrong predicate.
Set and multiset use an underlying binary tree; you can define the <= operator for your own use. These containers keep themselves sorted, so may not be the best choice if you are switching sort parameters. Vectors and lists are probably best if you are going to be resorting quite a bit; in general list has it's own sort (usually a mergesort) and you can use the stl binary search algorithm on vectors. If inserts will dominate, list outperforms vector.
STL maps and sets are both sorted containers.
I second Doug T's book recommendation - the Josuttis STL book is the best I've ever seen as both a learning and reference book.
Effective STL is also an excellent book for learning the inner details of STL and what you should and shouldn't do.
For "STL compatible" sorted vector see A. Alexandrescu's AssocVector from Loki.