find() vs binary_search() in STL - c++

Which function is more efficient in searching an element in vector find() or binary_search() ?

The simple answer is: std::find for unsorted data and std::binary_search for sorted data. But I think there's much more to this:
Both methods take a range [start, end) with n elements and and a value x that is to be found as input. But note the important difference that std::binary_search only returns a bool that tells you wether the range contained the element, or not. std::find() returns an iterator. So both have different, but overlapping use cases.
std::find() is pretty straight forward. O(n) iterator increments and O(n) comparisons. Also it doesn't matter wether the input data is sorted or not.
For std::binary_search() you need to consider multiple factors:
It only works on sorted data. You need to take the cost of sorting into account.
The number of comparisons is always O(log n).
If the iterator does not satisfy LegacyRandomAccessIterator the number of iterator increments is O(n), it will be logarithmic when they do satisfy this requirement.
Conclusion (a bit opinionated):
when you operate on un-sorted data or need the location of the item you searched for you must use std::find()
when your data is already sorted or needs to be sorted anyway and you simply want to check if an element is present or not, use std::binary_search()
If you want to search containers like std::set, std::map or their unordered counterparts also consider their builtin methods like std::set::find

When you are not sure if the data is sorted or not, You have to use find() and If the data will be sorted you should use binary_search().
For more information, You can refer find() and binary_search()

If your input is sorted then you can use binary_search as it will take O(lg n) time. if your input is unsorted you can use find, which will take O(n) time.

Related

Is find() function efficient for sets?

As far as I am concerned, binary search stands for the most efficient way to determine whethere there exists a certain element x in a sorted array. Thus, I was wondering if it is a good idea to make use of the find() or count() functions in order to perform this process of seeking for an element or it is more reasonable to use a sorted array rather than a set and apply the binary search method.
Yes it is efficient.
A set contains unique and sorted elements. Therefore find() uses binary search and has a O(logN) complexity in a set of N elements. Insertion is logarithmic too, in order to keep it sorted and unique.
set::find() is fairly efficient, O(log n).
If you don't need to access the elements in order, you should consider using an unordered_set. unordered_set::find() is O(1) on average.

C++ - List with logarithmic read, insertion at given position

I'm looking for data structure that behaves like a list, where we can insert an element at ANY given position and then read an element at ANY given position, where insertion and reading should be in logarithmic time. Is there something like this in the standard library or maybe I'm stuck with having to write this on my own (I know it can be implemented as a tree)?
std::multiset behaves pretty much like the logarithmic std::list that you are looking for
iteration is bidirectional
insertion / reading are O(log N)
Note however (as pointed out by #SergeRogatch) that the "price" you pay for O(log N) lookup (instead of O(N) for list) multiset will order elements as they are inserted. This behaves differently than std::list. This also means that your elements need to be comparable using std::less<> or you need to provide your own comparator.
An alternative would be to use std::unordered_multiset (i.e. a hash table), which has amortized O(1) element acces, but then there is no deterministic order either. But again, then your elements need to be usable with std::hash<> or you need to write your own hash function.

Optimal way to search a std::set

How should one search a std::set, when speed is the critical criterion for his/her project?
set:find?
Complexity:
Logarithmic in size.
std::binary_search?
Complexity:
On average, logarithmic in the distance between first and last: Performs approximately log2(N)+2 element comparisons (where N is this distance).
On non-random-access iterators, the iterator advances produce themselves an additional linear complexity in N on average.
Just a binary search implemented by him/her (like this one)? Or the STL's one is good enough?
Is there a way to answer this theoretically? Or we have to test ourselves? If someone has, it would be nice if (s)he would share this information with us (if no, we are not lazy :) ).
The iterator type provided by std::set is a bidirectional_iterator, a category which does not require random access to elements, but only element-wise movements in both directions. All random_access_iterator's are bidirectional_iterators, but not vice versa.
Using std::binary_search on a std::set can therefore yield O(n) runtime as per the remarks you quoted, while std::set::find has guaranteed O(logn).
So to search a set, use set::find.
It's unlikely that std::set has a random access iterator. Even if it did, std::binary_search would access at least as many nodes as .find, since .find accesses only the ancestors of the target node.

How to efficiently insert a range of consecutive integers into a std::set?

In C++, I have a std::set that I would like to insert a range of consecutive integers. How can I do this efficiently, hopefully in O(n) time where n is the length of the range?
I'm thinking I'd use the inputIterator version of std::insert, but am unclear on how to build the input iterator.
std::set<int> mySet;
// Insert [34 - 75):
mySet.insert(inputIteratorTo34, inputIteratorTo75);
How can I create the input iterator and will this be O(n) on the range size?
The efficient way of inserting already ordered elements into a set is to hint the library as to where the next element will be. For that you want to use the version of insert that takes an iterator:
std::set<int>::iterator it = mySet.end();
for (int x : input) {
it = mySet.insert(it, x);
}
On the other hand, you might want to consider other containers. Whenever possible, use std::vector. If the amount of insertions is small compared to lookups, or if all inserts happen upfront, then you can build a vector, sort it and use lower_bound for lookups. In this case, since the input is already sorted, you can skip the sorting.
If insertions (or removals) happen all over the place, you might want to consider using std::unordered_set<int> which has an average O(1) insertion (per element) and lookup cost.
For the particular case of tracking small numbers in a set, all of which are small (34 to 75 are small numbers) you can also consider using bitsets or even a plain array of bool in which you set the elements to true when inserted. Either will have O(n) insertion (all elements) and O(1) lookup (each lookup), which is better than the set.
A Boost way could be:
std::set<int> numbers(
boost::counting_iterator<int>(0),
boost::counting_iterator<int>(10));
A great LINK for other answers, Specially #Mani's answer
std::set is a type of binary-search-tree, which means an insertion costs O(lgn) on average,
c++98:If N elements are inserted, Nlog(size+N) in general, but linear
in size+N if the elements are already sorted according to the same
ordering criterion used by the container.
c++11:If N elements are inserted, Nlog(size+N). Implementations may
optimize if the range is already sorted.
I think the C++98 implement will trace the current insertion node and check if the next value to insert is larger than the current one, in which case there's no need to start from root again.
in c++11, this is an optional optimize, so you may implement a skiplist structure, and use this range-insert feture in your implement, or you may optimize the programm according to your scenarios
Taking the hint provided by aksham, I see the answer is:
#include <boost/iterator/counting_iterator.hpp>
std::set<int> mySet;
// Insert [34 - 75):
mySet.insert(boost::counting_iterator<int>(34),
boost::counting_iterator<int>(75));
It's not clear why you specifically want to insert using iterators to specify a range.
However, I believe you can use a simple for-loop to insert with the desired O(n) complexity.
Quoting from cppreference's page on std::set, the complexity is:
If N elements are inserted, Nlog(size+N) in general, but linear in size+N if the elements are already sorted according to the same ordering criterion used by the container.
So, using a for-loop:
std::set<int> mySet;
for(int i = 34; i < 75; ++i)
mySet.insert(i);

Set find member vs. using find on list

Since the items in a Standard Library set container are sorted, will using the find member on the set, in general, perform faster than using the find algorithm on the same items in a sorted list?
Since the list is linear and the set is often implemented using a sorted tree, it seems as though the set-find should be faster.
With a linked list, even a sorted one, finding an element is O(n). A set can be searched in O(log n). Therefore yes, finding an element in a set is asymptotically faster.
A sorted array/vector can be searched in O(log n) by using binary search. Unfortunately, since a linked list doesn't support random access, the same method can't be used to search a sorted linked list in O(log n).
It's actually in the standard: std::set::find() has complexity O(log n), where n is the number of elements in the set. std::find() on the other hand is linear in the length of the search range.
If your generic search range happens to be sorted and has random access (e.g. a sorted vector), then you can use std::lower_bound() to find an element (or rather a position) efficiently.
Note that std::set comes with its own member-lower_bound(), which works the same way. Having an insertion position may be useful even in a set, because insert() with a correct hint has complexity O(1).
You can generally expect a find operation to be faster on a Set than on a List, since lists are linear access (O(n)), while sets may have near-constant access for HashSets (O(1)), or logarithmic access for TreeSets (O(log n)).
set::find has a complexity of O(log(n)), while std::find has a complexity of O(n). This means that std::set::find() is asymptotically faster than std::find(std::list), but that doesn't mean it is faster for any particular data set, or for any particular search.
I found this article helpful on the topic. http://lafstern.org/matt/col1.pdf
You could reconsider your requirements for just a "list" vs. a "set". According to that article, if your program consists primarily of a bunch of insertions at the start, and then after that, only comparisons to what you have stored, then you are better off with adding everything to a vector, using std::sort (vector.begin(), vector.end()) once, and then using lower_bound. In my particular application, I load from a text file a list of names when the program starts up, and then during program execution I determine if a user is in that list. If they are, I do something, otherwise, I do nothing. In other words, I had a single discrete insertion phase, then I sorted, then after that I used std::binary_search (vector.begin(), vector.end(), std::string username) to determine whether the user is in the list.