Searching a std::vector for approximations of a value - c++

I have a vector,
std::vector<float> v;
and a float value, x. The vector v contains x+epsilon, where epsilon is very small (yet greater than the machine epsilon), but it doesn't contain x. Is there a way, using the STL, to find the index of x+epsilon in the vector?
Something like:
int i = alternative_find(v.begin(), v.end(), x, gamma) - v.begin();
which will return the index of all the values in v which are in [x-gamma,x+gamma]? I could implement a binary search function (I'd like to avoid linear time complexity), but I'd really like to know if it could be done in an easier way.

Find the std::lower_bound, then the std::upper_bound, and you'll have your range.
From an iterator, you can obtain an index using std::distance (though stick with the iterator if you can!).
This assumes your data is sorted, but since you talk about binary searches that seems like a sensible assumption.
If it's not then you're going to have to examine every element anyway, in which case any approach is basically as good as another.

If you're talking about binary search then obvious the vector's pre-sorted, which means you want to find the first element above x-gamma, then if you actually want to use the values it's fastest to increment further while they're in range. Check out lower_bound: http://en.cppreference.com/w/cpp/algorithm/lower_bound
If you just want to find the first and last, an alternative is to use upper_bound to binary search to its position, but that's likely slower than incrementing if there are a lot of elements and only a few match.

In C++11:
std::find_if(v.begin(), v.end(),
[x,gamma](float f){return f >= x-gamma && f <= x+gamma;})
Historically, you'd have to write your own predicate, and it would probably be simpler to use a regular for loop.
(Although, regarding your last sentence, if the vector is or can be sorted, then you can do a binary search with lower_bound and upper_bound as described in other answers).

Related

Why is std::find( s.begin(), s.end(), val ) 1000x slower than s.find(val) for a set<int> s?

I have recently started relearning C++ as I have not been coding in C++ for more than a decade. I have rarely used the STL, even when I worked at SGI, and I want to master it. I have ordered a book and I am currently running different online tutorials.
One tutorial introduced std::find(begin(),end(),value) and I was shocked at how slow it was in the test code I wrote. After doing some trial and error I found that s.find(value) was clearly what I should be using.
Why is the first find in the code so dramatically slow?
set<int> s;
for (int i = 0; i < 100000; i++)
s.insert(rand());
for (int i = 0; i < 10000; i++) {
int r = rand();
//first find is about 1000x slower than the next one
auto iter1 = std::find(s.begin(), s.end(), r);
auto iter2 = s.find(r);
}
EDIT: added timing experiment results
#juanchopanza asked about timing in the comments so I timed std::find() on Set, List, Vector and set.find()
(I only measured find - variation between runs was below 10%)
Vector performs much better than List or Set, but the specialized find from set wins with big data sets.
Elements Vector List Set | Set.Find()
10 0.0017 0.0017 0.0020 | 0.0017
100 0.0028 0.0051 0.0120 | 0.0019
1000 0.0105 0.0808 0.1495 | 0.0035
10000 0.0767 0.7486 2.7009 | 0.0068
100000 0.2572 2.4700 6.9636 | 0.0080
1000000 0.2674 2.5922 7.0149 | 0.0082
10000000 0.2728 2.6485 7.0833 | 0.0082
std::find is a generic algorithm that given a pair of iterators can find a value. And if all it has been given is a pair of iterators the best way to find a value is just to linearly search for it which is O(n).
set::find is a member function of std::set and so it knows the data structure its searching over and so can optimise the search. And sorted, balanced trees have excellent searching behavoir of O(log(n))
To expand on my comment.
Because set::find has more information about elements in the search range. It knows its (probably) implemented as a sorted binary tree and can search it in logarithmic time.
std::find on the other hand only gets two bidirectional iterators, so the best it can do is basically just a for loop.
Had the set returned random-access iterator, std::find would have also been logarithmic.
EDIT: Corrected my wrong claims.
The first reason is that std::find is specified in terms of linear search. Meanwhile, std::set.find is specified in terms of logarithmic time search.
But if you replaced std::find with std::equal_range, which will do a binary search, you'll find it is as slow as std::find.
So I'll answer a better question than you asked:
Why is std::equal_range ridiculously slow on set iterators?
Well, there really isn't a great reason.
std::set iterators are bidirectional iterators. This means that they permit going forward one step, or backwards one step.
std::equal_range on bidirectional iterators is extremely slow, because it has to walk step by step through the range.
The std::set.find method, on the other hand, uses the tree structure of std::set to find the element really fast. It can, basically, get midpoints of a range really fast.
C++ does not expose this tree structure when you access std::set via its iterators. If it had, there might have been an operation like std::somewhere_between( start, finish ) which would in O(1) time get an iterator between start and finish, returning finish if no such iterator exists.
Such an operation is actually really cheap on the tree structure implementation of std::set.
However this operation doesn't exist. So std::equal_range( begin(set), end(set) ) is ridiculously slow.
Possibly not exposing an operation like std::somewhere_between for sorted associative containers makes some set/map implementations more efficient; many used to use special nodes to replace some leaf cases. And maybe you'd need access to that special node to efficiently binary search the tree.
But I seriously doubt it is worth the lack of that operation. With that operation, you can work on a std::set or std::map subsection efficiently; without it, you got nothing.

Given sorted vector find transition from negative to positive

Given a sorted std::vector<int>, I would like, using C++11-STD functions, to find the index where the elements transition from negative to positive.
I am aware that I can implement this using a binary search but I am interested if there is any function in the standard library, similar to the unaryfind_if, which would facilitate this search (maybe in connection with the right lambda expression).
You should find the lower_bound of 0:
auto iter = std::lower_bound(vec.begin(), vec.end(), 0);
the resulting iterator will point to the earliest position where you can insert 0 without disrupting the ordering of elements. Similarly, upper_bound will return the right-most such iterator.
The runtime of the algorithm is O(logN)

Searching in a map: Member vs. non-member lower_bound

I need to look up two elements in a std::map. Since my map is sparse, I might not have an entry for every key, so I use lower_bound for lookup. For sake of simplicity, let's assume that I can always find two elements like this and that they are always distinct.
The quickest solution would of course be:
auto it1 = my_map.lower_bound(k1);
auto it2 = my_map.lower_bound(k2);
However, I know that the element at index k2 is located between begin and the element at index k1. Therefore, I was thinking of using std::lower_bound for the second lookup to avoid having to search the full range again:
auto it1 = my_map.lower_bound(k1);
auto it2 = std::lower_bound(begin(my_map), it1, k2);
Any opinions on the second solution? Complexity-wise it should be better, but it's a lot less pleasant to look at than the original code and I'm wondering whether it is worth bothering at all. Also, should I expect any drawbacks due to the fact that I'm using the non-member lower_bound for the second call?
The primary drawback is that the non-member std::lower_bound has to rely on the bidirectional iterators that map provides. So while it's able to perform O(log(n)) comparisons, it still has to perform O(n) iterations.
On the other hand, the member lower_bound() is aware of the internal structure of the map, which is typically some sort of binary tree. This means that's capable of traversing in ways the standard algorithm cannot.

How to efficiently insert a range of consecutive integers into a std::set?

In C++, I have a std::set that I would like to insert a range of consecutive integers. How can I do this efficiently, hopefully in O(n) time where n is the length of the range?
I'm thinking I'd use the inputIterator version of std::insert, but am unclear on how to build the input iterator.
std::set<int> mySet;
// Insert [34 - 75):
mySet.insert(inputIteratorTo34, inputIteratorTo75);
How can I create the input iterator and will this be O(n) on the range size?
The efficient way of inserting already ordered elements into a set is to hint the library as to where the next element will be. For that you want to use the version of insert that takes an iterator:
std::set<int>::iterator it = mySet.end();
for (int x : input) {
it = mySet.insert(it, x);
}
On the other hand, you might want to consider other containers. Whenever possible, use std::vector. If the amount of insertions is small compared to lookups, or if all inserts happen upfront, then you can build a vector, sort it and use lower_bound for lookups. In this case, since the input is already sorted, you can skip the sorting.
If insertions (or removals) happen all over the place, you might want to consider using std::unordered_set<int> which has an average O(1) insertion (per element) and lookup cost.
For the particular case of tracking small numbers in a set, all of which are small (34 to 75 are small numbers) you can also consider using bitsets or even a plain array of bool in which you set the elements to true when inserted. Either will have O(n) insertion (all elements) and O(1) lookup (each lookup), which is better than the set.
A Boost way could be:
std::set<int> numbers(
boost::counting_iterator<int>(0),
boost::counting_iterator<int>(10));
A great LINK for other answers, Specially #Mani's answer
std::set is a type of binary-search-tree, which means an insertion costs O(lgn) on average,
c++98:If N elements are inserted, Nlog(size+N) in general, but linear
in size+N if the elements are already sorted according to the same
ordering criterion used by the container.
c++11:If N elements are inserted, Nlog(size+N). Implementations may
optimize if the range is already sorted.
I think the C++98 implement will trace the current insertion node and check if the next value to insert is larger than the current one, in which case there's no need to start from root again.
in c++11, this is an optional optimize, so you may implement a skiplist structure, and use this range-insert feture in your implement, or you may optimize the programm according to your scenarios
Taking the hint provided by aksham, I see the answer is:
#include <boost/iterator/counting_iterator.hpp>
std::set<int> mySet;
// Insert [34 - 75):
mySet.insert(boost::counting_iterator<int>(34),
boost::counting_iterator<int>(75));
It's not clear why you specifically want to insert using iterators to specify a range.
However, I believe you can use a simple for-loop to insert with the desired O(n) complexity.
Quoting from cppreference's page on std::set, the complexity is:
If N elements are inserted, Nlog(size+N) in general, but linear in size+N if the elements are already sorted according to the same ordering criterion used by the container.
So, using a for-loop:
std::set<int> mySet;
for(int i = 34; i < 75; ++i)
mySet.insert(i);

Random element in STL set/map in log n

Since C++ STL set/map are implemented as red-black trees, it should be possible to not only do insert, delete, and find in O(log n) time, but also getMin, getMax, getRandom. As I understand the former two have their equivalent in begin() and end() (is that correct?). How about the last one? How can I do that?
The only idea I had so far was to use advance with a random argument, which however takes linear time...
EDIT: 'random' should refer to a uniform distribution
begin() is equivalent to a getMin operation, but end() returns an iterator one past the maximum, so it'd be rbegin().
As for getRandom: assuming you mean getting any item randomly with uniform probability, that might be possible in O(lg n) time in an AVL tree, but I don't see how to do it efficiently in a red-black tree. How will you know how many subtrees there are left and right of a given node without counting them in n/2 = O(n) time? And since std::set and std::map don't give direct access to their underlying tree, how are you going to traverse it?
I see three possible solutions:
use an AVL tree instead;
maintain a vector with the elements in the map or set parallel to it;
use a Boost::MultiIndex container with a sorted and a random-access view.
Edit: Boost.Intrusive might also do the trick.
Yes, begin and rbegin (not end!) are the minimum and maximum key value, respectively.
If your key is simple, e.g. an integer, you could just create a random integer in the range [min, max) (using ) and get the map's lower_bound for that.
As you suspect begin() and either end() - 1 or rbegin() will get you the min and max values. I can't see any way to uniformly get a random element in such a tree though. However you have a couple options:
You can do it in linear time using advance.
You can keep a separate vector of map iterators that you keep up to date on all insertions/deletions.
You could revisit the container choice. For example, would a sorted vector, heap, or some other representation be better?
If you have an even distribution of values in the set or map, you could choose a random value between the min and max and use lower_bound to find the closest value to it.
If insertions and deletions are infrequent, you can use a vector instead and sort it as necessary. Populating a vector and sorting it takes approximately the same amount of time as populating a set or map; it might even be faster, you'd need to test it to be sure. Selecting a random element would be trivial at that point.
I think you can actually do that with STL, but it's a bit more complicated.
You need to maintain a map. Each with a key from 1..N (N is the number of elements).
So each time you need to take a random element, generate a random number from 1..N, then find the element in the map with the chosen key. This is the element that you pick.
Afterwards, you need to maintain the consistency of the map by finding the biggest element, and update its key with the random number that you just picked.
Since each step is a log(n) operation, the total time is log(n).
with existing STL, there's probably no way. But There's a way to get random key in O(1) with addition std::map and std::vector structure by using reverse indexing.
Maintaining a map m & a vector v.
when inserting a new key k, let i = v.length(), then insert into m, and push k into v so that v[i] = k;
when deleting key k, let i = m[k], lookup the last element k2 in v, set m[k2]=i & v[i] = k2, pop_back v, and remove k from m;
to get a random key, let r = rand()%v.length(), then random key k = v[r];
so the basic idea is to have a continuous array of all existing keys.