As far as I am concerned, binary search stands for the most efficient way to determine whethere there exists a certain element x in a sorted array. Thus, I was wondering if it is a good idea to make use of the find() or count() functions in order to perform this process of seeking for an element or it is more reasonable to use a sorted array rather than a set and apply the binary search method.
Yes it is efficient.
A set contains unique and sorted elements. Therefore find() uses binary search and has a O(logN) complexity in a set of N elements. Insertion is logarithmic too, in order to keep it sorted and unique.
set::find() is fairly efficient, O(log n).
If you don't need to access the elements in order, you should consider using an unordered_set. unordered_set::find() is O(1) on average.
Related
Which function is more efficient in searching an element in vector find() or binary_search() ?
The simple answer is: std::find for unsorted data and std::binary_search for sorted data. But I think there's much more to this:
Both methods take a range [start, end) with n elements and and a value x that is to be found as input. But note the important difference that std::binary_search only returns a bool that tells you wether the range contained the element, or not. std::find() returns an iterator. So both have different, but overlapping use cases.
std::find() is pretty straight forward. O(n) iterator increments and O(n) comparisons. Also it doesn't matter wether the input data is sorted or not.
For std::binary_search() you need to consider multiple factors:
It only works on sorted data. You need to take the cost of sorting into account.
The number of comparisons is always O(log n).
If the iterator does not satisfy LegacyRandomAccessIterator the number of iterator increments is O(n), it will be logarithmic when they do satisfy this requirement.
Conclusion (a bit opinionated):
when you operate on un-sorted data or need the location of the item you searched for you must use std::find()
when your data is already sorted or needs to be sorted anyway and you simply want to check if an element is present or not, use std::binary_search()
If you want to search containers like std::set, std::map or their unordered counterparts also consider their builtin methods like std::set::find
When you are not sure if the data is sorted or not, You have to use find() and If the data will be sorted you should use binary_search().
For more information, You can refer find() and binary_search()
If your input is sorted then you can use binary_search as it will take O(lg n) time. if your input is unsorted you can use find, which will take O(n) time.
It is possible to perform binary search on a doubly-linked list in Θ(log 𝑛) time?
My answer is yes because if the list is already somewhat ordered it could be faster than just O(n).
In order to do a binary search on a doubly-linked list, you're going to have to first iterate to the halfway-point of the list, so that you can do your first recursion on the two halves of the list.
Iterating to the halfway-point of a linked list is already an O(n) operation, since the time it takes to iterate to the halfway-point will grow linearly as the list itself gets longer.
So you're already at O(n) time, even before you've done any actual searching. Hence, the answer is no.
As you asked the question, the answer is no. You cannot have O(lg(n)) time for a linked list since traversal is linear, it cannot be better than O(n) in general, but binary search would be worse than a linear scan in that case since it must iterate multiple times to "jump" around. It would be better to do a single linear scan to find the element.
However, the C++ standard specifies that std::lower_bound algorithm (which does a binary search) has the following complexity:
[lower.bound]
Complexity: At most log2(last - first) + O(1) comparisons and projections.
That is, it is counting the element comparisons, not time, if you are measuring time by number of iterator advancements. That is, it finds the proper place by calling std::advance() on an iterator many times, but each of those calls on a list will be O(N) iterator advancements but on random access containers it's a constant, and for each call to advance there would be a corresponding call to the comparator.
That's why it is always so important to be clear what big-oh notation is measuring. Often the comparisons are a proxy for time, but not always!
I'm looking for data structure that behaves like a list, where we can insert an element at ANY given position and then read an element at ANY given position, where insertion and reading should be in logarithmic time. Is there something like this in the standard library or maybe I'm stuck with having to write this on my own (I know it can be implemented as a tree)?
std::multiset behaves pretty much like the logarithmic std::list that you are looking for
iteration is bidirectional
insertion / reading are O(log N)
Note however (as pointed out by #SergeRogatch) that the "price" you pay for O(log N) lookup (instead of O(N) for list) multiset will order elements as they are inserted. This behaves differently than std::list. This also means that your elements need to be comparable using std::less<> or you need to provide your own comparator.
An alternative would be to use std::unordered_multiset (i.e. a hash table), which has amortized O(1) element acces, but then there is no deterministic order either. But again, then your elements need to be usable with std::hash<> or you need to write your own hash function.
I have an array and I need to insert items there as fast as possible. Before adding an item I need to see if it exists, so I do a full array scan. I can't use binary search since I can't sort the array after every insert.
Is there a more efficient data structure for this job?
Edit: On that array I store strings. Next to each string I store a 4 byte hash. I first compare the hashes and if they are the same then the string.
std::unordered_map usually implemented as (hashtable) will give you best insert/search time (O(1)) but does not preserve nor provide any order.
std::map gives you O(log(n)) for search and insert as it requires particular ordering (not the one you got to insert items so) and usually implemented with balanced tree.
Custom balanced search trees are another option if you need sorted order and fast (O(log n)) insert/search.
Sorted std::vector (to support ability to add items) is another option if O(n) is acceptable insert time but you need smallest memory footprint and O(log n) search time. You'd need to insert items in sorted order which is O(n) due to need to copy the rest of the array.
If you need to preserve original order you stuck with O(n) for both insert/search if you are using just an array ('std::vector').
You can use separate std::unordered_map/std::unordered_set in addition to 'std::vector' to add "is already present" check to gain speed at price of essentially 2-3x memory space and need to update 2 structures when adding items. This array+hashtable combination will give you O(n) insert and O(1) search.
Since the items in a Standard Library set container are sorted, will using the find member on the set, in general, perform faster than using the find algorithm on the same items in a sorted list?
Since the list is linear and the set is often implemented using a sorted tree, it seems as though the set-find should be faster.
With a linked list, even a sorted one, finding an element is O(n). A set can be searched in O(log n). Therefore yes, finding an element in a set is asymptotically faster.
A sorted array/vector can be searched in O(log n) by using binary search. Unfortunately, since a linked list doesn't support random access, the same method can't be used to search a sorted linked list in O(log n).
It's actually in the standard: std::set::find() has complexity O(log n), where n is the number of elements in the set. std::find() on the other hand is linear in the length of the search range.
If your generic search range happens to be sorted and has random access (e.g. a sorted vector), then you can use std::lower_bound() to find an element (or rather a position) efficiently.
Note that std::set comes with its own member-lower_bound(), which works the same way. Having an insertion position may be useful even in a set, because insert() with a correct hint has complexity O(1).
You can generally expect a find operation to be faster on a Set than on a List, since lists are linear access (O(n)), while sets may have near-constant access for HashSets (O(1)), or logarithmic access for TreeSets (O(log n)).
set::find has a complexity of O(log(n)), while std::find has a complexity of O(n). This means that std::set::find() is asymptotically faster than std::find(std::list), but that doesn't mean it is faster for any particular data set, or for any particular search.
I found this article helpful on the topic. http://lafstern.org/matt/col1.pdf
You could reconsider your requirements for just a "list" vs. a "set". According to that article, if your program consists primarily of a bunch of insertions at the start, and then after that, only comparisons to what you have stored, then you are better off with adding everything to a vector, using std::sort (vector.begin(), vector.end()) once, and then using lower_bound. In my particular application, I load from a text file a list of names when the program starts up, and then during program execution I determine if a user is in that list. If they are, I do something, otherwise, I do nothing. In other words, I had a single discrete insertion phase, then I sorted, then after that I used std::binary_search (vector.begin(), vector.end(), std::string username) to determine whether the user is in the list.