Find elements in a vector which lie within specified ranges - c++

I have a vector of integer elements in sorted. An example is given below:
vector<int> A ={3,4,5,9,20,71,89,92,100,103,109,110,121,172,189,194,198};
Now given the following "start" and "end" ranges I want to find out which elements of vector A fall into the start and end ranges.
int startA=4; int endA=8;
int startB=20; int endB=99;
int startA=120; int endC=195;
For example,
elements lying in range startA and startB are: {4,5}
elements lying in range startA and startB are: {20,71,89,92}
elements lying in range startC and startC are: {121,172,189,194}
One way to do this is to iterate over all elements of "A" and check whether they lie between the specified ranges. Is there some other more efficient way to find out the elements in the vector satisfying a given range

One way to do this is to iterate over all elements of "A" and check whether they lie between the specified ranges. Is there some other more efficient way to find out the elements in the vector satisfying a given range
If the vector is sorted, as you have shown it to be, you can use binary search to locate the index of the element that is higher than the lower value of the range and index of element that is lower than the higher value of the range.
That will make your search O(log(N)).
You can use std::lower_bound and std::upper_bound, which requires the container to be partially ordered, which is true in your case.
If the vector is not sorted, linear iteration is the best you can do.

If the vector is sorted all you need to do is to use dedicated functions to find your start range iterator and end range iterator - std::lower_bound and std::upper_bound. Eg.:
#include <vector>
#include <algorithm>
#include <iostream>
int main() {
std::vector<int> A ={3,4,5,9,20,71,89,92,100,103,109,110,121,172,189,194,198};
auto start = std::lower_bound(A.begin(), A.end(), 4);
auto end = std::upper_bound(A.begin(), A.end(), 8);
for (auto it = start; it != end; it++) {
std::cout << *it << " ";
}
std::cout << std::endl;
}
//or the C++1z version (works in VS2015u3)
int main() {
std::vector<int> A ={3,4,5,9,20,71,89,92,100,103,109,110,121,172,189,194,198};
std::copy(std::lower_bound(A.begin(), A.end(), 4),
std::upper_bound(A.begin(), A.end(), 8),
std::ostream_iterator<int>(cout, " "));
std::cout << std::endl;
}
This however will work only if startX <= endX so you may want to test the appropriate condition before running it with arbitrary numbers...
Searching bound iterators using std::lower_bound and std::upper_bound will cost O(log(N)) however it has to be stated that iterating through the range of elements in average case is O(N) and the range may contain all the elements in your vector...

The best way I can think is to apply modified binary search twice and find two indices in the vector arr and then print all items in between this range . Time complexity will be O(log n).
A modified form of binary search looks like:(PS its for arrays, also applicable for vector):
int binary_search(int *arr,int start,int end,int key)
{
if(start==end)
{
if(arr[start]==key){return start+1;}
else if(arr[start]>key&&arr[start-1]<=key){return start;}
else return 0;
}
int mid=(start+end)/2;
if(arr[mid]>key && arr[mid-1]<=key)return mid;
else if(arr[mid]>key)return binary_search(arr,start,mid-1,key);
else return binary_search(arr,mid+1,end,key);
}

If range of integers of vector A is not wide, bitmap is worth the consideration.
Let's assume all integers of A are positive and are in between 0 ... 1024, the bitmap can be built with:
#include <bitset>
// ...
// If fixed size is not an option
// consider vector<bool> or boost::dynamic_bitset
std::bitset<1024> bitmap;
for(auto i : A)
bitmap.set(i);
That takes N iterations to set bits, and N/8 for storing bits. With the bitmap, one can match elements as follows:
std::vector<int> result;
for(auto i = startA; i < endA; ++i) {
if (bitmap[i]) result.emplace_back(i);
}
Hence speed of the matching depends on size of range rather than N. This solution should be attractive when you have many limited ranges to match.

Related

Find uncommon elements using hashing

I think this is a fairly common question but I didn't find any answer for this using hashing in C++.
I have two arrays, both of the same lengths, which contain some elements, for example:
A={5,3,5,4,2}
B={3,4,1,2,1}
Here, the uncommon elements are: {5,5,1,1}
I have tried this approach- iterating a while loop on both the arrays after sorting:
while(i<n && j<n) {
if(a[i]<b[j])
uncommon[k++]=a[i++];
else if (a[i] > b[j])
uncommon[k++]=b[j++];
else {
i++;
j++;
}
}
while(i<n && a[i]!=b[j-1])
uncommon[k++]=a[i++];
while(j < n && b[j]!=a[i-1])
uncommon[k++]=b[j++];
and I am getting the correct answer with this. However, I want a better approach in terms of time complexity since sorting both arrays every time might be computationally expensive.
I tried to do hashing but couldn't figure it out entirely.
To insert elements from arr1[]:
set<int> uncommon;
for (int i=0;i<n1;i++)
uncommon.insert(arr1[i]);
To compare arr2[] elements:
for (int i = 0; i < n2; i++)
if (uncommon.find(arr2[i]) != uncommon.end())
Now, what I am unable to do is to send only those elements to the uncommon array[] which are uncommon to both of them.
Thank you!
First of all, std::set does not have anything to do with hashing. Sets and maps are ordered containers. Implementations may differ, but most likely it is a binary search tree. Whatever you do, you wont get faster that nlogn with them - the same complexity as sorting.
If you're fine with nlogn and sorting, I'd strongly advice just using set_symmetric_difference algorithm https://en.cppreference.com/w/cpp/algorithm/set_symmetric_difference , it requires two sorted containers.
But if you insist on an implementation relying on hashing, you should use std::unordered_set or std::unordered_map. This way you can be faster than nlogn. You can get your answer in nm time, where n = a.size() and m = b.size(). You should create two unordered_set`s: hashed_a, hashed_b and in two loops check what elements from hashed_a are not in hashed_b, and what elements in hashed_b are not in hashed_a. Here a pseudocode:
create hashed_a and hashed_b
create set_result // for the result
for (a_v : hashed_a)
if (a_v not in hashed_b)
set_result.insert(a_v)
for (b_v : hashed_b)
if (b_v not in hashed_a)
set_result.insert(b_v)
return set_result // it holds the symmetric diference, which you need
UPDATE: as noted in the comments, my answer doesn't count for duplicates. The easiest way to modify it for duplicates would be to use unordered_map<int, int> with the keys for elements in the set and values for number of encounters.
First, you need to find a way to distinguish between the same values contained in the same array (for ex. 5 and 5 in the first array, and 1 and 1 in the second array). This is the key to reducing the overall complexity, otherwise you can't do better than O(nlogn). A good possible algorithm for this task is to create a wrapper object to hold your actual values, and put in your arrays pointers to those wrapper objects with actual data, so your pointer addresses will serve as a unique identifier for objects. This wrapping will cost you just O(n1+n2) operations, but also an additional O(n1+n2) space.
Now your problem is that you have in both arrays only elements unique to each of those arrays, and you want to find the uncommon elements. This means the (Union of both array elements) - (Intersection of both array elements). Therefore, all you need to do is to push all the elements of the first array into a hash-map (complexity O(n1)), and then start pushing all the elements of the second array into the same hash-map (complexity O(n2)), by detecting the collisions (equality of an element from first array with an element from the second array). This comparison step will require O(n2) comparisons in the worst case. So for the maximum performance optimization you could have checked the size of the arrays before starting pushing the elements into the hash-map, and swap the arrays so that the first push will take place with the longest array. Your overall algorithm complexity would be O(n1+n2) pushes (hashings) and O(n2) comparisons.
The implementation is the most boring stuff, so I let it to you ;)
A solution without sorting (and without hashing but you seem to care more about complexity then the hashing itself) is to notice the following : an uncommon element e is an element that is in exactly one multiset.
This means that the multiset of all uncommon elements is the union between 2 multisets:
S1 = The element in A that are not in B
S2 = The element in B that are not in A
Using the std::set_difference, you get:
#include <set>
#include <vector>
#include <iostream>
#include <algorithm>
int main() {
std::multiset<int> ms1{5,3,5,4,2};
std::multiset<int> ms2{3,4,1,2,1};
std::vector<int> v;
std::set_difference( ms1.begin(), ms1.end(), ms2.begin(), ms2.end(), std::back_inserter(v));
std::set_difference( ms2.begin(), ms2.end(), ms1.begin(), ms1.end(), std::back_inserter(v));
for(int e : v)
std::cout << e << ' ';
return 0;
}
Output:
5 5 1 1
The complexity of this code is 4.(N1+N2 -1) where N1 and N2 are the size of the multisets.
Links:
set_difference: https://en.cppreference.com/w/cpp/algorithm/set_difference
compiler explorer: https://godbolt.org/z/o3KGbf
The Question can Be solved in O(nlogn) time-complexity.
ALGORITHM
Sort both array with merge sort in O(nlogn) complexity. You can also use sort-function. For example sort(array1.begin(),array1.end()).
Now use two pointer method to remove all common elements on both arrays.
Program of above Method
int i = 0, j = 0;
while (i < array1.size() && j < array2.size()) {
// If not common, print smaller
if (array1[i] < array2[j]) {
cout << array1[i] << " ";
i++;
}
else if (array2[j] < array1[i]) {
cout << array2[j] << " ";
j++;
}
// Skip common element
else {
i++;
j++;
}
}
Complexity of above program is O(array1.size() + array2.size()). In worst case say O(2n)
The above program gives the uncommon elements as output. If you want to store them , just create a vector and push them into vector.
Original Problem LINK

Sort std::vector<int> but ignore a certain number

I have an std::vector<int> of the size 10 and each entry is initially -1. This vector represents a leaderboard for my game (high scores), and -1 just means there is no score for that entry.
std::vector<int> myVector;
myVector.resize(10, -1);
When the game is started, I want to load the high score from a file. I load each line (up to 10 lines), convert the value that is found to an int with std::stoi, and if the number is >0 I replace it with the -1 currently in the vector at the current position.
All this works. Now to the problem:
Since the values in the file aren't necessarily sorted, I want to sort myVector after I load all entries. I do this with
std::sort(myVector.begin(), myVector.end());
This sorts it in ascending order (lower score is better in my game).
The problem is that, since the vector is initially filled with -1 and there aren't necessarily 10 entries saved in the high scores file, the vector might contain a few -1 in addition to the player's scores.
That means when sorting the vector with the above code, all the -1 will appear before the player's scores.
My question is: How do I sort the vector (in ascending order), but all entries with -1 will be put at the end (since they don't represent a real score)?
Combine partitioning and sorting:
std::sort(v.begin(),
std::partition(v.begin(), v.end(), [](int n){ return n != -1; }));
If you store the iterator returned from partition, you already have a complete description of the range of non-trivial values, so you don't need to look for −1s later.
You can provide lambda as parameter for sort:
std::sort(myVector.begin(), myVector.end(),[]( int i1, int i2 ) {
if( i1 == -1 ) return false;
if( i2 == -1 ) return true;
return i1 < i2; }
);
here is the demo (copied from Kerrek)
but it is not clear how you realize where is which score after sort.
From your description, it appears that the score can be never negative. In that case, I'd recommend the scores to be a vector of unsigned int. You can define a constant
const unsigned int INFINITY = -1;
and load your vector with INFINITY initially. INFINITY is the maximum positive integer that can be stored in a 32 bit unsigned integer (which also corresponds to -1 in 2's complement)
Then you could simply sort using
sort(v.begin(),v.end());
All INFINITY will be at the end after the sort.
std::sort supports using your own comparison function with the signature bool cmp(const T& a, const T& b);. So write your own function similar to this:
bool sort_negatives(const int& a, const int& b)
{
if (a == -1) {
return false;
}
if (b == -1) {
return true;
}
return a < b;
}
And then call sort like std::sort(myVector.begin(), myVector.end(), sort_negatives);.
EDIT: Fixed the logic courtesy of Slava. If you are using a compiler with C++11 support, use the lambda or partition answers, but this should work on compilers pre C++11.
For the following, I assume that the -1 values are all placed at the end of the vector. If they are not, use KerrekSB's method, or make sure that you do not skip the indices in the vector for which no valid score is in the file (by using an extra index / iterator for writing to the vector).
std::sort uses a pair of iterators. Simply provide the sub-range which contains non--1 values. You already know the end of this range after reading from a file. If you already use iterators to fill the vector, like in
auto it = myVector.begin();
while (...) {
*it = stoi(...);
++it;
}
then simply use it instead of myVector.end():
std::sort(myVector.begin(), it);
Otherwise (i.e., when using indices to fill up the vector, let's say i is the number of values), use
std::sort(myVector.begin(), myVector.begin() + i);
An alternative approach is to use reserve() instead of resize().
std::vector<int> myVector;
myVector.reserve(10);
for each line in file:
int number_in_line = ...;
myVector.push_back(number_in_line);
std::sort(myVector.begin(), myVector.end());
This way, the vector would have only the numbers that are actually in file, no extra (spurious) values (e.g. -1). If the vector need to be later passed to other module or function for further processing, they do not need to know about the special nature of '-1' values.

How to find the maximum number of pairs having difference less than a particular value?

I am given two arrays (can contain duplicates and of same length) containing positive integers. I have to find the maximum number of pairs that have absolute difference less than equal to a particular value (given) when numbers can be used only once from both the arrays.
For example:
arr1 = {1,2,3,4}
arr2 = {8,9,10,11}
diff = 5
Then, possible pairs are (3,8), (4,8). That is, only two such possible pairs are there.
Output should be 2.
Also, I can think of an algo for this in O(n^2). But, I need something better. I thought of hash maps (won't work because arrays contain duplicates), thought of sorting the arrays in descending and ascending order, wasn't really able to move forward from there.
The usual idea is to loop over sorted ranges. This, you can bring down the brute-force O(N^2) effort to usually O(N log N).
Here is an algorithm for that in pseudo code (maybe I'll update later with real C++ code):
Sort both arrays
Loop over both simultaneously with two iterators:
If a pair is found insert it into your list. Increase both iterators.
Otherwise, increase the indicator pointing to the smaller element.
In total, this is dominated by the sort which on average takes O(N log N).
Here is the promised code:
auto find_pairs(std::vector<int>& arr1, std::vector<int>& arr2, int diff)
{
std::vector<std::pair<int,int> > ret;
std::sort(std::begin(arr1), std::end(arr1));
std::sort(std::begin(arr2), std::end(arr2));
auto it1= std::begin(arr1);
auto it2= std::begin(arr2);
while(it1!= std::end(arr1) && it2!= std::end(arr2))
{
if(std::abs(*it1-*it2) == diff)
{
ret.push_back(std::make_pair(*it1,*it2));
++it1;
++it2;
}
else if(*it1<*it2)
{
++it1;
}
else
{
++it2;
}
}
return ret;
}
It returns the matching elements of the two vectors as a vector of std::pairs. For your example, it prints
3 8
4 9
DEMO

C++ Apply function to some elements in a container

I would like to apply a function to some elements of an std::vector.I use std::includes to check if a "smaller" vector exists in a "bigger" one, and if exists I would like to apply a function to these elements of the "bigger" vector that are equal to the elements of the "smaller". Any suggestions?
Edit:
The following was incorrectly posted as an answer by the OP
There is a problem with std::search! It finds the first occurrence of a sequence contained in a vector while in my vector these elements are in several positions.Also i have a vector of objects!!!
Not sure what part you're having trouble with, but here's a simple example showing the range of elements contained in the larger vector that are identical to the contents of the smaller one being multiplied by 2. I used std::search instead of std::includes to determine whether the larger vector contains the range of elements in the smaller one because unlike includes, which returns a boolean result, search will return an iterator to the beginning of the contained range in the larger vector.
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
void times_two(int& t)
{
t *= 2;
}
int main()
{
std::vector<int> v1{1,2,3,4,5,6,7,8,9};
std::vector<int> v2{4,5,6};
// find if the larger vector contains the smaller one
auto first = std::search(v1.begin(), v1.end(), v2.begin(), v2.end());
if(first != v1.end()) {
// get the last element in the sub-range
auto last = std::next(first, v2.size());
// apply function to each sub-range element
std::for_each(first, last, times_two);
}
for(auto const& v : v1) {
std::cout << v << ' ';
}
std::cout << '\n';
}
Output:
1 2 3 8 10 12 7 8 9
Edit:
Here's an example that uses boost::find_nth to perform the search.

What's the fastest way to find the number of elements in a sorted range?

Given a sorted list
1, 3, 5, 6, 9....
Is there a fast algorithm rather than O(n) to count the number of elements in a given range [a, b], assuming that all numbers are integers?
Here is an O(log n) algorithm: Search for the two endpoints using binary search, the number of elements in the range is then basically the difference of the indices.
To get the exact number one needs to distinguish the cases where the endpoints of the range are in the array or not.
Since the list is sorted, you can find the location of a value (or, if the value is not in the list, where it should be inserted) in O(log(n)) time. You simply need to do this for both ends and subtract to get the count of elements in the range. It makes no difference whether the elements are integers; the list simply needs to be sorted.
You do need to be careful if the elements are not unique; in that case after finding a hit you may need to do a linear scan to the end of the sequence of repeated elements.
lower_bound and upper_bound operate on sorted containers.
First find the lower value in the range, then search from there to the end for the upper value. Implementations of the functions probably use binary search:
#include <algorithm>
#include <list>
#include <iterator>
int main() {
using std::list;
using std::upper_bound;
using std::lower_bound;
using std::distance;
list<int> numbers = {1, 3, 5, 6, 9};
int a = 3;
int b = 6;
auto lower = lower_bound(numbers.begin(), numbers.end(),
a);
auto upper = upper_bound(lower, numbers.end(),
b);
int count = distance(lower, upper);
return 0;
}