I use a std::set to sort a vector of unordered duplicate values. Every time I find an element in my set, I need to know the position (index) of the element as well. There are a lot of elements (hundreds of thousands) in my set, and using std::distance() gives me abysmal performance.
Is std::distance the only way to go?
You can sort the elements in place using the std::sort() algorithm. Then when you find an element in the vector using binary_search() just subtract the result of a call to begin() from the iterator pointing to the element.
Another alternative is to use std::partial_sort_copy() if you don't want to overwrite your original vector. Just sort into another vector and you can do the same thing I described above.
Related
Which function is more efficient in searching an element in vector find() or binary_search() ?
The simple answer is: std::find for unsorted data and std::binary_search for sorted data. But I think there's much more to this:
Both methods take a range [start, end) with n elements and and a value x that is to be found as input. But note the important difference that std::binary_search only returns a bool that tells you wether the range contained the element, or not. std::find() returns an iterator. So both have different, but overlapping use cases.
std::find() is pretty straight forward. O(n) iterator increments and O(n) comparisons. Also it doesn't matter wether the input data is sorted or not.
For std::binary_search() you need to consider multiple factors:
It only works on sorted data. You need to take the cost of sorting into account.
The number of comparisons is always O(log n).
If the iterator does not satisfy LegacyRandomAccessIterator the number of iterator increments is O(n), it will be logarithmic when they do satisfy this requirement.
Conclusion (a bit opinionated):
when you operate on un-sorted data or need the location of the item you searched for you must use std::find()
when your data is already sorted or needs to be sorted anyway and you simply want to check if an element is present or not, use std::binary_search()
If you want to search containers like std::set, std::map or their unordered counterparts also consider their builtin methods like std::set::find
When you are not sure if the data is sorted or not, You have to use find() and If the data will be sorted you should use binary_search().
For more information, You can refer find() and binary_search()
If your input is sorted then you can use binary_search as it will take O(lg n) time. if your input is unsorted you can use find, which will take O(n) time.
I find myself in a situation where I have to access a std::map by position. Since std::advance(iter, position) is slow af, I want to add a second data structure to speedup this operation.
My idea: Maintain a vector for every key-value pair in the map. Then access the vector by position, vector[position]->second.
When erasing/inserting a new element I obviously have to remove the iterator from the vector. But besides that the iterator-preserving properties of std::map seem to be sufficient.
Question: Is this a good idea?
Alternative: Maintain a vector of map::keys. Then access vector by position an use the key to lookup the value in the map,map[vector[position]]. Is this smarter?
If iteration through the map is your primary performance issue, then you should be using a flat_map (not as of yet part of the standard, but Boost has a decent one). If you don't have one of those, just use a vector<pair> that you keep sorted using the same comparison function you would have used in your map.
You can use std::lower_bound as the equivalent function for being able to find a value by its key. You can also use the iterator returned from std::lower_bound as the position for doing a single-element insertion of a new element. Everything else works just like any other vector; simply keep it sorted and you'll be fine.
std::map search, removal, and insertion operations have logarithmic complexity. The same complexity can be achieved by a sorted std::vector
Don't use a map, but a vector. Keep it sorted by key. Binary search by key is logaritgmic. Access by position is the fastest.
The only drawback is that inserting and removing needs memory reallocation. You may test its performance and consider if it's worth.
std::partition is nifty, but it is in-place; and std::partition_copy is also nice, but it takes two output iterators, i.e. you have to at least count in advance the number of elements satisfying the predicate if you want to use the same output array. Why isn't there an out-of-place std::partition, or a single-output-iterator std::partition_copy, in <algorithm>?
Presumably because the functionality can already be attained by either:
Copying the original container and using in-place partition.
Sizing the single destination container to the correct size and using the begin() and rbegin() iterators to fill it from the front and back with partition_copy.
Is there an way of inserting/deleting an element from the vectors other than the following..
The formal method of using 'push_back'
Using 'find()' in this way... find(v.begin(), v.end(), int)
I have read some where that inserting in the middle can be achieved by inclusive insertion/deletion.
So, is it really possible?
You can use std::vector::insert; however, note that this operation is O(.size()). If your code needs to perform insertions in the middle frequently, you may want to switch to a linked-list structure.
Is there an way of inserting/deleting an element from the vectors other than the following
Yes, you can use std::vector::insert() to insert element at a specified position.
Because vectors use an array as their underlying storage, inserting elements in positions other than the vector end causes the container to move all the elements that were after position to their new positions. This is generally an inefficient operation compared to the one performed for the same operation by other kinds of sequence containers (such as std::list).
std::vector is standard container, you could apply standard STL algorithms on it.
vector::insert seems to be what you want.
Hey, I right now have a list of a struct that I made, I sort this list everytime I add a new object, using the std::list sort method.
I want to know what would be faster, using a std::multimap for this or std::list,
since I'm iterating the whole list every frame (I am making a game).
I would like to hear your opinion, for what should I use for this incident.
std::multimap will probably be faster, as it is O(log n) per insertion, whereas an insert and sort of the list is O(n log n).
Depending on your usage pattern, you might be better off with sorted vectors. If you insert a whole bunch of items at once and then do a bunch of reads -- i.e. reads and writes aren't interleaved -- then you'll have better performance with vector, std::sort, and std::binary_search.
You might consider using the lower_bound algorithm to find where to insert into your list. http://stdcxx.apache.org/doc/stdlibref/lower-bound.html
Edit: In light of Neil's comment, note that this will work with any sequence container (vector, deque, etc.)
If you do not need Key/Value pairs std::set or std::multiset is probably better than using std::multimap.
Reference for std::set:
http://www.cplusplus.com/reference/stl/set/
Reference for std::multiset:
http://www.cplusplus.com/reference/stl/multiset/
Edit: (seems like it was unclear before)
It is in general better to use a container like std::(multi)set or std:(multi)map than using std::list and sorting it afterwards everytime an element is inserted because std::list does not perform very good in inserting elements in the middle of the container.
Generally speaking, iterating over a container is likely to take about as much time as iterating over another, so if you keep adding to a container and then iterating over it, it's mainly a question of picking a container that avoids constantly having to reallocate memory and inserts the way you want quickly.
Both list and multimap will avoid having to reallocate themselves simply from adding an element (like you could get with a vector), so it's primarily a question of how long it takes to insert. Adding to the end of a list will be O(1) while adding to a multimap will be O(log n). However, the multimap will insert the elements in sorted order, while if you want to have the list be sorted, you're going to have to either sort the list in O(n log n) or insert the element in a sorted manner with something like lower_bound which would be O(n). In either case, it will be far worse (in the worst case at least) to use the list.
Generally, if you're maintaining a container in sorted order and continually adding to it rather than creating it and sorting it once, sets and maps are more efficient since they're designed to be sorted. Of course, as always, if you really care about performance, profiling your specific application and seeing which works better is what you need to do. However, in this case, I'd say that it's almost a guarantee that multimap will be faster (especially if you have very many elements at all).