C++: Get K smallest elements+indices from vector with ties - c++

The task is to extract k smallest elements and their indices from double array, possibly including more elements that are tied to the k-th smallest one. E.g.:
input: {3.3,1.1,6.5,4.2,1.1,3.3}
output (k=3): {1,1.1} {4,1.1} {0,3.3} {5,3.3}
[This seems like a pretty common task, but I couldn't find a similar thread on SO - which handles ties. Hopefully, I didn't miss any and didn't duplicate the question.]
I came up with the following solution, which works and seems to be fairly efficient complexity-wise. E.g. for random 1MLN doubles and k=10 it takes ~40ms with MSVC 2013. I wonder if there's a better/cleaner/more efficient(for large data and/or large k) way to perform this task (validations for k value and similar things are our of scope here). Avoid allocating the queue with all elements? Make use of std::partial_sum or std::nth_element?
typedef std::pair<double, int> idx_pair;
typedef std::priority_queue<idx_pair, std::vector<idx_pair>, std::greater<idx_pair>> idx_queue;
std::vector<idx_pair> getKSmallest(std::vector<double> const& data, int k)
{
idx_queue q;
{
std::vector<idx_pair> idxPairs(data.size());
for (auto i = 0; i < data.size(); i++)
idxPairs[i] = idx_pair(data[i], i);
q = idx_queue(std::begin(idxPairs), std::end(idxPairs));
};
std::vector<idx_pair> result;
auto topPop = [&q, &result]()
{
result.push_back(q.top());
q.pop();
};
for (auto i = 0; i < k; i++)
topPop();
auto const largest = result.back().first;
while (q.empty() == false)
{
if (q.top().first == largest)
topPop();
else
break;
}
return result;
}
Working example is here.

Here's an alternative solution, suggested by #piotrekg2 - using nth_element with average O(N) complexity:
bool equal(double value1, double value2)
{
return value1 == value2 || std::abs(value2 - value1) <= std::numeric_limits<double>::epsilon();
}
std::vector<idx_pair> getNSmallest(std::vector<double> const& data, int n)
{
std::vector<idx_pair> idxPairs(data.size());
for (auto i = 0; i < data.size(); i++)
idxPairs[i] = idx_pair(data[i], i);
std::nth_element(std::begin(idxPairs), std::begin(idxPairs) + n, std::end(idxPairs));
std::vector<idx_pair> result(std::begin(idxPairs), std::begin(idxPairs) + n);
auto const largest = result.back().first;
for (auto it = std::begin(idxPairs) + n; it != std::end(idxPairs); ++it)
if (equal(it->first, largest))
result.push_back(*it);
return result;
}
Indeed, the code looks a bit cleaner. However, I've run some tests and empirically this solution is slightly slower than the original one with std::priority_queue.
Note: The answer below by Petar offers a similar solution using std::nth_element, which in my experiments, performs slightly better than this one and also better than the solution using std::priority_queue - perhaps because of eliminating the operation on pairs and working with primitive doubles instead.

As pointed out by asker, I will suggest first copy the vector of double and use a nth_element to find out the kth element.
Then do a linear scan and get the elements that are smaller than or equal to the kth element. The Time complexity should be linear.
However, it should be careful when comparing double.
vector<idx_pair> getKSmallest(vector<double> const& data, int k){
vector<double> data_copy = data;
nth_element(data_copy.begin(), data_copy.begin() + k, data_copy.end());
vector<idx_pair> result;
double kth_element = data_copy[k - 1];
for (int i = 0; i < data.size(); i++)
if (data[i] <= kth_element)
result.push_back({i, data[i]});
return result;
}
update: It is also possible to find the kth_element by maintaing a max heap with size at most k.
It only need O(k) memory for heap instead of O(n) memory in the nth_element method.
It needs O(n log k) time but if k is small then i think it should be comparable to O(n) method.
I am not sure about it but my reason are the heap may be cached and you don't need to spend time for copying data.
vector<idx_pair> getKSmallest(vector<double> const& data, int k)
{
priority_queue<double> pq;
for (auto d : data){
if (pq.size() >= k && pq.top() > d){
pq.push(d)
pq.pop();
}
else if (pq.size() < k)
pq.push(d);
}
double kth_element = pq.top();
vector<idx_pair> result;
for (int i = 0; i < data.size(); i++)
if (data[i] <= kth_element)
result.push_back({i, data[i]});
return result;
}

Related

Maintain an unordered_map but at the same time need the lowest of it's mapped values at every step

I have an unordered_map<int, int> which is updated at every step of a for loop. But at the end of the loop, I also need the lowest of the mapped values. Traversing it to find the minimum in O(n) is too slow. I know there exists MultiIndex container in boost but I can't use boost. What is the simplest way it can be done using only STL?
Question:
Given an array A of positive integers, call a (contiguous, not
necessarily distinct) subarray of A good if the number of different
integers in that subarray is exactly K.
(For example, [1,2,3,1,2] has 3 different integers: 1, 2, and 3.)
Return the number of good subarrays of A.
My code:
class Solution {
public:
int subarraysWithKDistinct(vector<int>& A, int K) {
int left, right;
unordered_map<int, int> M;
for (left = right = 0; right < A.size() && M.size() < K; ++right)
M[A[right]] = right;
if (right == A.size())
return 0;
int smallest, count;
smallest = numeric_limits<int>::max();
for (auto p : M)
smallest = min(smallest, p.second);
count = smallest - left + 1;
for (; right < A.size(); ++right)
{
M[A[right]] = right;
while (M.size() > K)
{
if (M[A[left]] == left)
M.erase(A[left]);
++left;
}
smallest = numeric_limits<int>::max();
for (auto p : M)
smallest = min(smallest, p.second);
count += smallest - left + 1;
}
return count;
}
};
Link to the question: https://leetcode.com/problems/subarrays-with-k-different-integers/
O(n) is not slow, in fact it is the theoretically fastest possible way to find the minimum, as it's obviously not possible to find the minimum of n items without actually considering each of them.
You could update the minimum during the loop, which is trivial if the loop only adds new items to the map but becomes much harder if the loop may change existing items (and may increase the value of the until-then minimum item!), but ultimately, this also adds O(n) amount of work, or more, so complexity-wise, it's not different from doing an extra loop at the end (obviously, the constant can be different - the extra loop may be slower than reusing the original loop, but the complexity is the same).
As you said, there are data structures that make it more efficient (O(log n) or even O(1)) to retrieve the minimum item, but at the cost of increased complexity to maintain this data structure during insertion. These data structures only make sense if you frequently need to check the minimum item while inserting or changing items - not if you only need to know the minimum only at the end of the loop, as you described.
I made a simple class to make it work although it's far from perfect, it's good enough for the above linked question.
class BiMap
{
public:
void insert(int key, int value)
{
auto itr = M.find(key);
if (itr == M.cend())
M.emplace(key, S.insert(value).first);
else
{
S.erase(itr->second);
M[key] = S.insert(value).first;
}
}
void erase(int key)
{
auto itr = M.find(key);
S.erase(itr->second);
M.erase(itr);
}
int operator[] (int key)
{
return *M.find(key)->second;
}
int size()
{
return M.size();
}
int minimum()
{
return *S.cbegin();
}
private:
unordered_map<int, set<int>::const_iterator> M;
set<int> S;
};
class Solution {
public:
int subarraysWithKDistinct(vector<int>& A, int K) {
int left, right;
BiMap M;
for (left = right = 0; right < A.size() && M.size() < K; ++right)
M.insert(A[right], right);
if (right == A.size())
return 0;
int count = M.minimum() - left + 1;
for (; right < A.size(); ++right)
{
M.insert(A[right], right);
while (M.size() > K)
{
if (M[A[left]] == left)
M.erase(A[left]);
++left;
}
count += M.minimum() - left + 1;
}
return count;
}
};

Return an array which contains number of elements in an array that is lesser or equal to elements in a given array

I came across this problem and wondering if there could be a better complexity to solve the problem.
For e.g.
array a = [1,4,2,4]
array b = [3,5]
DESIRED OUTPUT ==> [2, 4]
EDIT: Put another example
array a = [1,4,2,4]
array b = [3, 1000000]
DESIRED OUTPUT ==> [2,4]
So far, what I've found and tried runs in O(nlogn) + O(blogn) and O(n).
O(nlogn) + O(blogn) approach:
int binarysearch(vector<int>arr, int l, int r, int target)
{
int mid;
while(l <= r)
{
mid = (r+l) / 2;
if(arr[mid] > target)
r = mid - 1;
else
l = mid + 1;
}
return r;
}
vector<int> counts(vector<int> a, vector<int> b)
{
vector<int> result;
sort(a.begin(), a.end()); // O(nlogn)
for(auto i : b){
int count = binarysearch(a, 0, a.size()-1, b); // b*O(log n) times
result.push_back(count)
}
return result;
}
O(n) approach:
vector<int> counts(vector<int> a, vector<int> b)
{
vector<int> result;
int maxi = *max_element(b.begin(), b.end()) + 1;
int mymap[maxi] = {0};
for(auto i : a) mymap[i]++;
for(int i = 1; i < maxi; i++){
mymap[i] = mymap[i] + mymap[i-1];
}
for(auto i : b){
result.push_back(mymap[i]);
}
return result;
}
[I am] wondering if there could be a better complexity to solve the problem.
Time complexity.
O(n) approach:
No, there exists no solution with less than linear time complexity.
That said, you linear solution is incorrect. If the input array contains the value 1000000 or greater, or a negative number, then you access outside the bounds of mymap and behaviour is undefined. Furthermore, i <= 1000000 also accesses mymap outside the bound upon the last iteration. Besides, int[1000000] is way too big to be a local variable. On some systems, even one such variable could cause the stack to overflow.
There is no better way doing this than O(n).
So this is also O(n) but with STL style adapted:
template <class Iter1, class Iter2>
std::vector<std::size_t> counts(const Iter1 beg_a, const Iter1 end_a, const Iter2 beg_b, const Iter2 end_b)
{
std::vector<std::size_t> result;
const auto& max = *std::max_element(beg_b, end_b);
std::vector<std::size_t> mymap(max + 1, 0);
for (auto iter = beg_a; iter != end_a; iter++)
{
if (*iter <= max)
{
mymap[*iter]++;
}
}
for (std::size_t i = 1; i < mymap.size(); i++)
{
mymap[i] = mymap[i] + mymap[i - 1];
}
for (auto iter = beg_b; iter != end_b; iter++)
{
result.push_back(mymap[*iter]);
}
return result;
}
Okay, it turns out there's a faster way to compute the map index. For e.g. given a = {1,4,2,4,5,8,80} and b = {3,1000000}. DESIRED OUTPUT would be [2,7].
Using my previous approach, I would need to compute mymap[4], mymap[5].. mymap[9999]....mymap[1000000]. And that's why the program crashes and returns running time error.
the way we handle this is to use for(auto& entry:mymap) which gives access to all dictionary/map. Then, we use the upper_bound STL C++ to return the correct map.
vector<int> counts(vector<int> nums, vector<int> maxes){
vector<int> result;
map<int,unsigned int> mymap;
for(auto i : nums) mymap[i]++;
// doesn't need to run 1000000 times
int temp = 0;
for(auto& entry: mymap){
entry.second = entry.second + temp;
temp = entry.second;
//cout << "first : " << entry.first << "second: " << entry.second << endl;
}
map<int,unsigned int>::iterator itl;
for(auto i : maxes){
itl = --mymap.upper_bound(i); // points to the correct map
result.push_back(itl->second);
}
return result;
}
First let's take better notations: let's call A the number of elements in a, B the number of elements in B, and say elements are M bits values.
Your solution is, as others said, A log(A) to construct the map plus B log(A) to get the returned values.
Using a https://en.wikipedia.org/wiki/Y-fast_trie you could get (A+B) log M instead, which is faster if A >> M (but probably slower in practice for most use cases).

How to find the minimal missing integer in a list in an STL way

I want to find the minimal missing positive integer in a given list. That is if given a list of positive integers, i.e. larger than 0 with duplicate, how to find from those missing the one that is the smallest.
There is always at least one missing element from the sequence.
For example given
std::vector<int> S={9,2,1,10};
The answer should be 3, because the missing integers are 3,4,5,6,7,8,11,... and the minimum is 3.
I have come up with this:
int min_missing( std::vector<int> & S)
{
int max = std::max_element(S.begin(), S.end());
int min = std::min_element(S.begin(), S.end());
int i = min;
for(; i!=max and std::find(S.begin(), S.end(), i) != S.end() ; ++i);
return i;
}
This is O(nmlogn) in time, but I cannot figure out if there is a more efficient C++ STL way to do this?
This is not an exercise but I am doing a set of problems for self-improvement , and I have found this to be a very interesting problem. I am interested to see how I can improve this.
You could use std::sort, and then use std::adjacent_findwith a custom predicate.
int f(std::vector<int> v)
{
std::sort(v.begin(), v.end());
auto i = std::adjacent_find( v.begin(), v.end(), [](int x, int y)
{
return y != x+1;
} );
if (i != v.end())
{
return *i + 1;
}
}
It is left open what happens when no such element exists, e.g. when the vector is empty.
Find the first missing positive, With O(n) time and constant space
Basiclly, when you read a value a, just swap with the S[a], like 2 should swap with A[2]
class Solution {
public:
/**
* #param A: a vector of integers
* #return: an integer
*/
int firstMissingPositive(vector<int> A) {
// write your code here
int n = A.size();
for(int i=0;i<n;)
{
if(A[i]==i+1)
i++;
else
{
if(A[i]>=1&&A[i]<=n&& A[A[i]-1]!=A[i])
swap(A[i],A[A[i]-1]);
else
i++;
}
}
for(int i=0;i<n;i++)
if(A[i]!=i+1)
return i+1;
return n+1;
}
};
Assuming the data are sorted first:
auto missing_data = std::mismatch(S.cbegin(), S.cend()-1, S.cbegin() + 1,
[](int x, int y) { return (x+1) == y;});
EDIT
As your input data are not sorted, the simplest solution is to sort them first:
std::vector<int> data(S.size());
std::partial_sort_copy (S.cbegin(), S.cend(), data.begin(), data.end());
auto missing_data = std::mismatch (data.cbegin(), data.cend()-1, data.cbegin()+1,
[](int x, int y) { return (x+1) == y;});
you can use algorithm the standard template library c ++ to work in your code.
#include <algorithm> // std::sort
this std::sort in algorithm:
std::vector<int> v={9,2,5,1,3};
std::sort(v.begin(),v.end());
std::cout << v[0];
I hope I understand what you, looking.
You can do this by building a set of integers and adding larger seen in the set, and holding the minimum not seen in as a counter. Once there is a number that is equal to the latter, go through the set removing elements until there is a missing integer.
Please see below for implementation.
template<typename I> typename I::value_type solver(I b, I e)
{
constexpr typename I::value_type maxseen=
std::numeric_limits<typename I::value_type>::max();
std::set<typename I::value_type> seen{maxseen};
typename I::value_type minnotseen(1);
for(I p=b; p!=e;++p)
{
if(*p == minnotseen)
{
while(++minnotseen == *seen.begin())
{
seen.erase(seen.begin());
}
} else if( *p > minnotseen)
{
seen.insert(*p);
}
}
return minnotseen;
}
In case you sequence is in a vector you should use this with:
solver(sequence.begin(),sequence.end());
The algorithm is O(N) in time and O(1) in space since it uses only a counter, constant size additional space, and a few iterators to keep track of the least value.
Complexity ( order of growth rate ) The algorithm keeps a subset only of the input which is expected to be of constant order of growth with respect the growth rate of the input, thus O(1) in space. The growth rate of the iterations is O(N+NlogK) where K is the growth rate of the larger subsequence of seen larger numbers. The latter is the aforementioned subsequence of constant growth rate i.e. K=1 , which results in the algorithm having O(N) complexity. (see comments)

How can I create an array with Fibonacci numbers up to a certain integer n?

So for an assignment I've been asked to create a function that will generate an array of fibonacci numbers and the user will then provide an array of random numbers. My function must then check if the array the user has entered contains any fibonacci numbers then the function will output true, otherwise it will output false. I have already been able to create the array of Fib numbers and check it against the array that the user enters however it is limited since my Fib array has a max size of 100.
bool hasFibNum (int arr[], int size){
int fibarray[100];
fibarray[0] = 0;
fibarray[1] = 1;
bool result = false;
for (int i = 2; i < 100; i++)
{
fibarray[i] = fibarray[i-1] + fibarray[i-2];
}
for (int i = 0; i < size; i++)
{
for(int j = 0; j < 100; j++){
if (fibarray[j] == arr[i])
result = true;
}
}
return result;
}
So basically how can I make it so that I don't have to use int fibarray[100] and can instead generate fib numbers up to a certain point. That point being the maximum number in the user's array.
So for example if the user enters the array {4,2,1,8,21}, I need to generate a fibarray up to the number 21 {1,1,2,3,5,8,13,21}. If the user enters the array {1,4,10} I would need to generate a fibarray with {1,1,2,3,5,8,13}
Quite new to programming so any help would be appreciated! Sorry if my code is terrible.
It is possible that I still don't understand your question, but if I do, then I would achieve what you want like this:
bool hasFibNum (int arr[], int size){
if (size == 0) return false;
int maxValue = arr[0];
for (int i = 1; i < size; i++)
{
if (arr[i] > maxValue) maxValue = arr[i];
}
int first = 0;
int second = 1;
while (second < maxValue)
{
for (int i = 0; i < size; i++)
{
if (arr[i] == first) return true;
if (arr[i] == second) return true;
}
first = first + second;
second = second + first;
}
return false;
}
Here is a function that returns a dynamic array with all of the Fibonacci numbers up to and including max (assuming max > 0)
std::vector<size_t> make_fibs( size_t max ) {
std::vector<size_t> retval = {1,1};
while( retval.back() < max ) {
retval.push_back( retval.back()+*(retval.end()-2) );
}
return retval;
}
I prepopulate it with 2 elements rather than keeping track of the last 2 separately.
Note that under some definitions, 0 and -1 are Fibonacci numbers. If you are using that, start the array off with {-1, 0, 1} (which isn't their order, it is actually -1, 1, 0, 1, but by keeping them in ascending order we can binary_search below). If you do so, change the type to an int not a size_t.
Next, a sketch of an implementation for has_fibs:
template<class T, size_t N>
bool has_fibs( T(&array)[N] ) {
// bring `begin` and `end` into view, one of the good uses of `using`:
using std::begin; using std::end;
// guaranteed array is nonempty, so
T m = *std::max_element( begin(array), end(array) ); will have a max, so * is safe.
if (m < 0) m = 0; // deal with the possibility the `array` is all negative
// use `auto` to not repeat a type, and `const` because we aren't going to alter it:
const auto fibs = make_fibs(m);
// d-d-d-ouble `std` algorithm:
return std::find_if( begin(array), end(array), [&fibs]( T v )->bool {
return std::binary_search( begin(fibs), end(fibs), v );
}) != end(array);
}
here I create a template function that takes your (fixed sized) array as a reference. This has the advantage that ranged-based loops will work on it.
Next, I use a std algorithm max_element to find the max element.
Finally, I use two std algorithms, find_if and binary_search, plus a lambda to glue them together, to find any intersections between the two containers.
I'm liberally using C++11 features and lots of abstraction here. If you don't understand a function, I encourage you to rewrite the parts you don't understand rather than copying blindly.
This code has runtime O(n lg lg n) which is probably overkill. (fibs grow exponentially. Building them takes lg n time, searching them takes lg lg n time, and we search then n times).

Find which numbers appears most in a vector

I have some numbers stored in a std::vector<int>. I want to find which number appears most in the vector.
e.g. in the vector
1 3 4 3 4 2 1 3 2 3
the element that occurs the most is 3.
Is there any algorithm (STL or whatever) that does this ?
Sort it, then iterate through it and keep a counter that you increment when the current number is the same as the previous number and reset to 0 otherwise. Also keep track of what was the highest value of the counter thus far and what the current number was when that value was reached. This solution is O(n log n) (because of the sort).
Alternatively you can use a hashmap from int to int (or if you know the numbers are within a limited range, you could just use an array) and iterate over the vector, increasing the_hashmap[current_number] by 1 for each number. Afterwards iterate through the hashmap to find its largest value (and the key belonging to it). This requires a hashmap datastructure though (unless you can use arrays which will also be faster), which isn't part of STL.
If you want to avoid sorting your vector v, use a map:
int max = 0;
int most_common = -1;
map<int,int> m;
for (vi = v.begin(); vi != v.end(); vi++) {
m[*vi]++;
if (m[*vi] > max) {
max = m[*vi];
most_common = *vi;
}
}
This requires more memory and has a very similar expected runtime. The memory required should be on the order of a full vector copy, less if there are many duplicate entries.
Try this
int FindMode(vector<int> value)
{
int index = 0;
int highest = 0;
for (unsigned int a = 0; a < value.size(); a++)
{
int count = 1;
int Position = value.at(a);
for (unsigned int b = a + 1; b < value.size(); b++)
{
if (value.at(b) == Position)
{
count++;
}
}
if (count >= index)
{
index = count;
highest = Position;
}
}
return highest;
}
This is how i did it:
int max=0,mostvalue=a[0];
for(i=0;i<a.size();i++)
{
co = (int)count(a.begin(), a.end(), a[i]);
if(co > max)
{ max = co;
mostvalue = a[i];
}
}
I just don't know how fast it is, i.e. O() ? If someone could calculate it and post it here that would be fine.
Here is an O(n) generic solution for finding the most common element in an iterator range. You use it simply by doing:
int commonest = most_common(my_vector.begin(), my_vector.end());
The value type is extracted from the iterator using iterator_traits<>.
template<class InputIt, class T = typename std::iterator_traits<InputIt>::value_type>
T most_common(InputIt begin, InputIt end)
{
std::map<T, int> counts;
for (InputIt it = begin; it != end; ++it) {
if (counts.find(*it) != counts.end()) {
++counts[*it];
}
else {
counts[*it] = 1;
}
}
return std::max_element(counts.begin(), counts.end(),
[] (const std::pair<T, int>& pair1, const std::pair<T, int>& pair2) {
return pair1.second < pair2.second;})->first;
}