Find closest number in integer array - c++

Given an array of sorted integers, I want to find the closest value to a given number. Array may contain duplicate values and negative numbers.
An example :
Input :arr[] = {-5, 2, 5, 6, 7, 8, 8, 9};
Target number = 4
Output : 5
Which is the fastest algorithm? binary search? STL find algortithms?
Thanks for your help.

There is an algorithm in the std library that does almost exactly what you are asking for: std::lower_bound
Returns an iterator pointing to the first element in the range [first,
last) that is not less than (i.e. greater or equal to) value, or last
if no such element is found.
You can use this to find the first element that is equal or higher than your target. The answer is either that number of the number that precedes it.
Check the following example:
int find_closest(const vector<int>& A, const int a)
{
if(A.size() <=0)
throw std::invalid_argument("empty array");
const auto lb = std::lower_bound(A.begin(), A.end(), a);
int ans = lb!= A.end() ? *lb : A.back();
if (lb != A.begin()) {
auto prec = lb - 1;
if (abs(ans - a) > abs(*prec - a))
ans = *prec;
}
return ans;
}
The complexity of this approach is logarithmic in the size of the input collection as lower_bound performs a binary search.
This is much faster than a naive solution in which you would loop over the whole collection and check every element one by one.

Related

There is a given element say N. How to modify Binary Search to find greatest element in a sorted vector which smaller than N

For example:
Let us have a sorted vector with elements: [1, 3, 4, 6, 7, 10, 11, 13]
And we have an element N = 5
I want output as:
4
Since 4 is the greatest element smaller than N.
I want to modify Binary Search to get the answer
What would you want to happen if there is an element that equals N in the vector?
I would use std::lower_bound (or std::upper_bound depending on the answer to the above question). It runs in logarithmic time which means it's probably using binary search under the hood.
std::optional<int> find_first_less_than(int n, std::vector<int> data) {
// things must be sorted before processing
std::sort(data.begin(), data.end());
auto it = std::lower_bound(data.begin(), data.end(), n);
// if all of the elements are above N, we'll return nullopt
if (it == data.begin()) return std::nullopt;
return *std::prev(it);
}

Using comparator in upper_bound STL

I'm trying to make a program that gives the last element that is less than or equal to our given value.
According to the definition of lower_bound, it gives the first element that is greater than or equal to the given key value passed. I created a comparator function
bool compare(int a, int b) {
return a <= b;
}
This was passed in my lower bound function:
int largest_idx = lower_bound(ic, ic + n, m, compare)
On execution, it was giving me the last element which was less than equal to my m (key value). Isn't this opposite to how lower_bound works ? Lower bound is supposed to give me the first value for my comparison or does the comparator actually change that?
If you want to turn "first" into "last", you have two options. First, you can use std::upper_bound and then take the previous element (see below). Second, you can use reverse iterators:
const auto pos = std::lower_bound(
std::make_reverse_iterator(ic + n),
std::make_reverse_iterator(ic), m, compare);
where compare is
bool compare(int a, int b) {
return b < a;
}
With this comparator, std::lower_bound() returns the iterator pointing to the first element that is not greater (= less than or equal) than m. On the reversed range this is equivalent to returning the iterator pointing to the last element satisfying this criterion in the original range.
Simple example:
int ic[] = {1, 3, 3, 5};
// pos
// m = 1 ^
// m = 2 ^
// m = 3 ^
How do I modify that search criteria (change <= to something else)?
std::lower_bound finds the first element in the range (partitioned by the comparator into true, ..., true, false, ... false), for which the comparator returns false. If your criterion can be rephrased in this language, you can use std::lower_bound.
Suppose we have a range 1 3 3 5 and we replace < with <= (your version of compare). Then we have:
1 3 3 5
m = 2 T F F F
m = 3 T T T F
m = 4 T T T F
For m = 3 and m = 4, std::lower_bound will return the iterator to 5, i.e. past the last 3. In other words, std::lower_bound with default < being replaced with <= is exactly what std::upper_bound with default < is. You can advance the resulting iterator by -1 to get the last element (but be careful about corner cases like m = 0 in this example).
How do I change whether I want the first or last element
It always returns the first element for which the comparator returns false. You can either reverse the range or find the first element that follows the one you want to find.
The comparator must not check for equality, use less than.
Also the data shall already be sorted or must at least be partitioned according to the comparator.
cf. https://www.cplusplus.com/reference/algorithm/lower_bound/

Find out in linear time whether there is a pair in sorted vector that adds up to certain value

Given an std::vector of distinct elements sorted in ascending order, I want to develop an algorithm that determines whether there are two elements in the collection whose sum is a certain value, sum.
I've tried two different approaches with their respective trade-offs:
I can scan the whole vector and, for each element in the vector, apply binary search (std::lower_bound) on the vector for searching an element corresponding to the difference between sum and the current element. This is an O(n log n) time solution that requires no additional space.
I can traverse the whole vector and populate an std::unordered_set. Then, I scan the vector and, for each element, I look up in the std::unordered_set for the difference between sum and the current element. Since searching on a hash table runs in constant time on average, this solution runs in linear time. However, this solution requires additional linear space because of the std::unordered_set data structure.
Nevertheless, I'm looking for a solution that runs in linear time and requires no additional linear space. Any ideas? It seems that I'm forced to trade speed for space.
As the std::vector is already sorted and you can calculate the sum of a pair on the fly, you can achieve a linear time solution in the size of the vector with O(1) space.
The following is an STL-like implementation that requires no additional space and runs in linear time:
template<typename BidirIt, typename T>
bool has_pair_sum(BidirIt first, BidirIt last, T sum) {
if (first == last)
return false; // empty range
for (--last; first != last;) {
if ((*first + *last) == sum)
return true; // pair found
if ((*first + *last) > sum)
--last; // decrease pair sum
else // (*first + *last) < sum (trichotomy)
++first; // increase pair sum
}
return false;
}
The idea is to traverse the vector from both ends – front and back – in opposite directions at the same time and calculate the sum of the pair of elements while doing so.
At the very beginning, the pair consists of the elements with the lowest and the highest values, respectively. If the resulting sum is lower than sum, then advance first – the iterator pointing at the left end. Otherwise, move last – the iterator pointing at the right end – backward. This way, the resulting sum progressively approaches to sum. If both iterators end up pointing at the same element and no pair whose sum is equal to sum has been found, then there is no such a pair.
auto main() -> int {
std::vector<int> vec{1, 3, 4, 7, 11, 13, 17};
std::cout << has_pair_sum(vec.begin(), vec.end(), 2) << ' ';
std::cout << has_pair_sum(vec.begin(), vec.end(), 7) << ' ';
std::cout << has_pair_sum(vec.begin(), vec.end(), 19) << ' ';
std::cout << has_pair_sum(vec.begin(), vec.end(), 30) << '\n';
}
The output is:
0 1 0 1
Thanks to the generic nature of the function template has_pair_sum() and since it just requires bidirectional iterators, this solution works with std::list as well:
std::list<int> lst{1, 3, 4, 7, 11, 13, 17};
has_pair_sum(lst.begin(), lst.end(), 2);
I had the same idea as the one in the answer of 眠りネロク, but with a little bit more comprehensible implementation.
bool has_pair_sum(std::vector<int> v, int sum){
if(v.empty())
return false;
std::vector<int>::iterator p1 = v.begin();
std::vector<int>::iterator p2 = v.end(); // points to the End(Null-terminator), after the last element
p2--; // Now it points to the last element.
while(p1 != p2){
if(*p1 + *p2 == sum)
return true;
else if(*p1 + *p2 < sum){
p1++;
}else{
p2--;
}
}
return false;
}
well, since we are already given sorted array, we can do it with two pointer approach, we first keep a left pointer at start of the array and a right pointer at end of array, then in each iteration we check if sum of value of left pointer index and value of right pointer index is equal or not , if yes, return from here, otherwise we have to decide how to reduce the boundary, that is either increase left pointer or decrease right pointer, so we compare the temporary sum with given sum and if this temporary sum is greater than the given sum then we decide to reduce the right pointer, if we increase left pointer the temporary sum will remain same or only increase but never lesser, so we decide to reduce the right pointer so that temporary sum decrease and we reach near our given sum, similary if temporary sum is less than given sum, so no meaning of reducing the right pointer as temporary sum will either remain sum or decrease more but never increase so we increase our left pointer so our temporary sum increase and we reach near given sum, and we do the same process again and again unless we get the equal sum or left pointer index value becomes greater than right right pointer index or vice versa
below is the code for demonstration, let me know if something is not clear
bool pairSumExists(vector<int> &a, int &sum){
if(a.empty())
return false;
int len = a.size();
int left_pointer = 0 , right_pointer = len - 1;
while(left_pointer < right_pointer){
if(a[left_pointer] + a[right_pointer] == sum){
return true;
}
if(a[left_pointer] + a[right_pointer] > sum){
--right_pointer;
}
else
if(a[left_pointer] + a[right_poitner] < sum){
++left_pointer;
}
}
return false;
}

More efficient way of counting the number of values within an interval?

I want to determine how many numbers of an input-array (up to 50000) lie in each of my given intervals (many).
Currently, I'm trying to do it with this algorithm, but it is far too slow:
Example-array: {-3, 10, 5, 4, -999999, 999999, 6000}
Example-interval: [0, 11] (inclusive)
Sort array - O(n * log(n)). (-999999, -3, 4, 5, 10, 6000, 999999)
Find min_index: array[min_index] >= 0 - O(n). (for my example, min_index == 2).
Find max_index: array[max_index] <= 11 - O(n). (for my example, max_index == 4).
If both indexes exists, then Result == right_index - left_index + 1 (for my example, Result = (4 - 2 + 1) = 3).
You have good idea, but it needs amendments. You should find begin and end of interval in O(lg n) time using binary search. If n is length of array and q is number of questions [a, b] you have O(n+q*n) time, with binary search it's O((n + q) lg n) (n lg n from sorting array).
The advantage of this solution is simplicity, because C++ have std::lower_bound and std::upper_bound. And you can use std::distance. It's just a few lines of code.
If q is equal to n, this algorithm has O(n lg n) complexity. Could be better? Not at all. Why? Because the problem is equivalent to sorting. As is well known, it is impossible to obtain a better computational complexity. (Sorting by means of comparison.)
There's a simple O(ninput*mintervals) algorithm:
For ease of implementation, we use half-open intervals. Convert yours as needed.
Convert your intervals to half-open intervals (Always prefer half-open intervals)
Save all limits in an array.
For all elements in the input
For all elements in the limits-array
Increment the count if the input is smaller than the limit
Go through your intervals and get the answers by subtracting the counts for the corresponding limits.
For a slight performance-boost, sort the limits-array in step 2.
Create a std::map of your numbers to their index in the sorted array.
From your example map[-999999] = 0, map[-3] = 1, ... map[999999] = 7.
To find an interval, find the lowest number higher than or equal to the min (using map.lower_bound()), and find the first number higher than the max (using map.upper_bound()).
You can now subtract the lower index from the upper index to find the number of elements in that range in O(log n).
typedef std::pair<int,int> interval;
typedef std::map<interval,size_t> answers;
typedef std::vector<interval> questions;
// O((m+n)lg m)
answers solve( std::vector<int>& data, questions const& qs ){
// m = qs.size()
// n = data.size()
answers retval;
std::vector<std::pair<int, size_t>> edges;
edges.reserve( q.size()+1 );
// O(m) -- all start and ends of intervals is in edges
for ( auto q:qs ) {
edges.emplace_back( q.first, 0 );
edges.emplace_back( q.second, 0 );
}
// O(mlgm) -- sort
std::sort(begin(edges),end(edges));
edges.emplace_back( std::numeric_limits<int>::max(), 0 );
// O(m) -- remove duplicates
edges.erase(std::unique(begin(edges),end(edges)),end(edges));
// O(n lg m) -- count the number of elements < a given edge:
for(int x:data ){
auto it = std::lower_bound( begin(edges), end(edges), std::make_pair(x,0) );
it->second++;
}
// O(m)
size_t accum = 0;
for(auto& e:edges) {
accum += edges.second;
edges.second = accum;
}
// now edge (x,y) states that there are y elements < x.
// O(n lg m) -- find the edge corresponding
for(auto q:questions){
auto low = std::lower_bound(begin(edges), end(edges),
std::make_pair(q.first, size_t(0))
);
auto high = std::upper_bound(begin(edges), end(edges),
std::make_pair(q.second, size_t(0))
}
size_t total = high->second - low->second;
answers.emplace(q,total);
}
return answers;
}
O((n+m)lg m), where n is the integer count, m is the number of intervals, and x is the average number of intervals each interval overlaps with.

What's the fastest way to find the number of elements in a sorted range?

Given a sorted list
1, 3, 5, 6, 9....
Is there a fast algorithm rather than O(n) to count the number of elements in a given range [a, b], assuming that all numbers are integers?
Here is an O(log n) algorithm: Search for the two endpoints using binary search, the number of elements in the range is then basically the difference of the indices.
To get the exact number one needs to distinguish the cases where the endpoints of the range are in the array or not.
Since the list is sorted, you can find the location of a value (or, if the value is not in the list, where it should be inserted) in O(log(n)) time. You simply need to do this for both ends and subtract to get the count of elements in the range. It makes no difference whether the elements are integers; the list simply needs to be sorted.
You do need to be careful if the elements are not unique; in that case after finding a hit you may need to do a linear scan to the end of the sequence of repeated elements.
lower_bound and upper_bound operate on sorted containers.
First find the lower value in the range, then search from there to the end for the upper value. Implementations of the functions probably use binary search:
#include <algorithm>
#include <list>
#include <iterator>
int main() {
using std::list;
using std::upper_bound;
using std::lower_bound;
using std::distance;
list<int> numbers = {1, 3, 5, 6, 9};
int a = 3;
int b = 6;
auto lower = lower_bound(numbers.begin(), numbers.end(),
a);
auto upper = upper_bound(lower, numbers.end(),
b);
int count = distance(lower, upper);
return 0;
}