Deleting like elements in two vectors C++ - c++

I am trying to search two vectors (each of any size) for elements that are identical and then delete both elements.
My implementation is as follows:
for (int i = vec1.size() - 1; i >= 0; i--) {
for (int j = 0; j < vec2.size(); j++) {
if (vec1[i] == vec2[j]) {
vec1.erase(vec1.begin() + i);
vec2.erase(vec2.begin() + j);
}
}
}
However, while this works for most cases, I am running into some where it doesn't. Is it the way I am iterating through these vectors or am I just going about this all wrong?

You don't actually need to iterate backwards at all. In which case your code can be:
for (int i = 0; i < vec1.size(); i++) {
for (int j = 0; j < vec2.size(); j++) {
if (vec1[i] == vec2[j]) {
vec1.erase(vec1.begin() + i);
vec2.erase(vec2.begin() + j);
}
}
}
But wait up...what happens after we erase an element? Then all of the elements after it have their indexes decreased by 1, so we'll skip the next item! To fix that we can add this small modification:
vec1.erase(vec1.begin() + i--);
vec2.erase(vec2.begin() + j--);
^^^^
This will work even when we change the size by erasing, because we're checking the size of the vec2 every loop! But what if we end up erasing the last item of vec1? We don't compare its size again until we've iterated all the way through vec2, which will be a problem in your vec1 = {2}, vec2 = {2, 2, 2} example. To fix that we can just break out of the inner loop and repeat the check on vec2.
Put it all together (and change your subscript operator into .at() calls so we'll have bounds checking) and you get:
for (int i = 0; i < vec1.size(); i++) {
for (int j = 0; j < vec2.size(); j++) {
if (vec1.at(i) == vec2.at(j)) {
vec1.erase(vec1.begin() + i--);
vec2.erase(vec2.begin() + j--);
break;
}
}
}
(See it in action here: ideone)

The problem is that you keep accessing vec1[i] as you loop over vec2 after deleting an element from vec1 and vec2. This causes undefined behavior if you do this after removing the last element in vec1 as vec1[i] is no longer valid. Add a break statement in your if to fix this.
for (int i = vec1.size() - 1; i >= 0; i--) {
for (int j = 0; j < vec2.size(); j++) {
if (vec1[i] == vec2[j]) {
vec1.erase(vec1.begin() + i);
vec2.erase(vec2.begin() + j);
break; // Look at next element in vec1
}
}
}
There's also a more efficient way of doing this too (O(n*log(n)+m*log(m)+n+m) instead of O(n*m) for n=vec1.size() and m=vec2.size()). It involves sorting the vectors. I'll leave that to you to figure out.

If you can sort, you could do something like this:
#include <algorithm>
#include <iostream>
#include <vector>
int main() {
std::vector<int> ex {3, 1, 2, 3, 3, 4, 5}, as {2, 3, 6, 1, 1, 1}, tmp;
std::sort(std::begin(ex), std::end(ex));
std::sort(std::begin(as), std::end(as));
as.erase(
std::remove_if(
std::begin(as),std::end(as),
[&](int const& s){
bool found = std::binary_search(std::begin(ex), std::end(ex), s);
if (found) {
tmp.push_back(s);
}
return found;}), std::end(as));
for (auto const& i : tmp) {
ex.erase(std::remove(std::begin(ex),std::end(ex), i), std::end(ex));
}
}

Try use std::set_difference to subtract one vector from another and them merge these subtractions with a help of std::merge. But vectors need to be sorted to use these functions, so use std::sort at first. Code is here:
void TraceVector( std::vector<int> v, const std::string& title )
{
if ( !title.empty() )
{
std::cout << title << std::endl;
}
std::copy(v.begin(), v.end(), std::ostream_iterator<int>(std::cout, ","));
std::cout << std::endl;
}
int main() {
std::vector<int> Vec1 {7, 1, 2, 5, 5, 5, 8, 9};
std::vector<int> Vec2 {3, 2, 5, 7, 10};
std::vector<int> Difference1; // Contains subtraction Vec1 - Vec2
std::vector<int> Difference2;// Contains subtraction Vec2 - Vec1
std::vector<int> Merged; // RESULT Merged vector after subtractions
//Need to be sorted
std::sort(Vec1.begin(), Vec1.end());
std::sort(Vec2.begin(), Vec2.end());
TraceVector(Vec1, "Vec1 sorted is: ");
TraceVector(Vec2, "Vec2 sorted is: ");
//Make subtractions
std::set_difference(Vec1.begin(), Vec1.end(), Vec2.begin(), Vec2.end(),
std::inserter(Difference1, Difference1.begin()));
std::set_difference(Vec2.begin(), Vec2.end(), Vec1.begin(), Vec1.end(),
std::inserter(Difference2, Difference2.begin()));
TraceVector(Difference1, "Difference is: ");
TraceVector(Difference2, "Difference is: ");
//Merge subtrctions
std::merge(Difference1.begin(), Difference1.end(), Difference2.begin(), Difference2.end(), back_inserter(Merged));
TraceVector(Merged, "Merged is: ");
}
Output is:
Vec1 sorted is:
1,2,5,5,5,7,8,9,
Vec2 sorted is:
2,3,5,7,10,
Difference is:
1,5,5,8,9,
Difference is:
3,10,
Merged is:
1,3,5,5,8,9,10,
Program ended with exit code: 0

Related

Count the number of occurrences of one vector entirely in another vector

I am writing a simple function that returns an integer indicating the number of times the
contents of one vector appear in another.
For Example:
vector<int> v1 {1, 4, 2, 4, 2, 1, 4, 2, 9, 1, 4, 2, 0, 1, 4, 2};
vector<int> v2 {1, 4, 2};
cout << countOccurrences(v1, v2);
Should return 4.
Here is my iterative solution
int countOccurrences(vector<int> &v1, vector<int> &v2) {
int i, j, count = 0;
for(i = 0; i <= v1.size() - v2.size(); ++i) {
for(j = 0; j < v2.size(); ++j) {
if(v1[i + j] != v2[j])
break;
}
if(j == v2.size())
++count;
}
return count;
}
I want to write the same function recursively but I am clueless. I am new to recursion and It seems intimidating to me.
Here's one way (in pseudo code):
int countOccurrences(vector<int> &v1, vector<int> &v2) {
if v1 is shorter than v2
return 0;
if v1 starts with v2
return 1 + countOccurrences( v1[1:], v2 )
else
return countOccurrences( v1[1:], v2 );
}
Recursion is a bit easier if you use iterators:
template <typename IT>
int count_occurences(IT begin,IT end,IT s_begin,IT s_end) {
auto it = std::search(begin,end,s_begin,s_end);
auto dist = std::distance(s_begin,s_end);
if (it == end) return 0;
return 1 + count_occurences(it+dist,end,s_begin,s_end);
}
std::search searches for one range, [s_begin,s_end), inside another range, [begin,end). I suppose you do not want to use it, so I leave it to you to replace it with your handwritten way to find one inside the other. The recursion comes into play by accumulating 1 when the sequence was found and call the function again only for the remainder of the vector.
Complete Example
yet, a c++ 20 solution :
#include <vector>
#include <span>
#include <algorithm>
int countOccurrences(std::span<int> data, std::span<int> needle)
{
if (data.size() < needle.size())
return 0;
if (std::equal(needle.begin(), needle.end(), data.begin()))
return 1 + countOccurrences(data.subspan(1), needle);
else
return countOccurrences(data.subspan(1), needle);
}
int main()
{
std::vector<int> data{ 1, 4, 2, 4, 2, 1, 4, 2, 9, 1, 4, 2, 0, 1, 4, 2 };
std::vector<int> needle{ 1, 4, 2 };
printf_s("%d\n", countOccurrences(data, needle));
}
this is much faster than using sub vectors each recursion because it is only a view ! no allocation !
This code is based on the pseudo-code provided by Scott Hunter.
bool start_with(vector<int> &v1, vector<int> &v2) {
for(auto i = 0; i < v2.size(); ++i)
if (v1[i] != v2[i])
return false;
return true;
}
int countOccurrences(vector<int> &v1, vector<int> &v2) {
static int i = 0;
if(v1.size() < v2.size()) {
return 0;
}
else if(start_with(v1, v2)) {
vector<int> temp(v1.begin() + i + 1, v1.end());
return 1 + countOccurrences(temp, v2);
}
vector<int> temp(v1.begin() + i + 1, v1.end());
return countOccurrences(temp, v2);
}
Feel free to suggest an alternative to the lazy hack i.e. static variable that does not change the function prototype.

Mapping a vector to specific range

I have a standard vector contains, for example, the following elements
[-6, -7, 1, 2]
I need to map these elements to the range from 1 to 4. i.e I need the vector to be like this
[2, 1, 3, 4]
Note that: the smallest value in the first vector (-7) was mapped to the smallest value in the second vector (1). How can I achieve that with STL?
With range-v3:
std::vector<int> v{-6, -7, 1, 2};
auto res = ranges::view::ints(1, 1 + (int)v.size()) | ranges::to_vector;
ranges::sort(ranges::view::zip(v, res));
Demo
With just the standard library as it exists in C++17 (or really, C++11), you make a vector of indices and sort it - using itself as a projection:
vector<int> idxs(values.size());
iota(idxs.begin(), idxs.end(), 1);
sort(idxs.begin(), idxs.end(), [&](int i, int j){
return values[i-1] < values[j-1];
});
A different way of generating the indices would be to use generate_n:
vector<int> idxs;
generate_n(back_inserter(idxs),
values.size(),
[cnt=1]() mutable { return cnt++; });
// same sort()
Using a helper vector of pairs:
std::vector<int> a { -6, -7, 1, 2 };
std::vector<std::pair<int, int>> tmp;
for (int i = 0; i < (int) a.size(); ++i) {
tmp.push_back({ a[i], i });
}
std::sort(tmp.begin(), tmp.end());
std::vector<int> b;
for (auto & x : tmp) {
b.push_back(x.second + 1);
}
Demo
Using a helper priority_queue of pairs (to avoid explicit sorting):
std::vector<int> a { -6, -7, 1, 2 };
std::priority_queue<std::pair<int, int>> tmp;
for (std::size_t i = 0; i < a.size(); ++i) {
tmp.push({ -a[i], i});
}
std::vector<int> b;
do {
b.push_back(tmp.top().second + 1);
} while (tmp.pop(), !tmp.empty());
Demo

efficient union of n sorted arrays in C++ (set vs vector)?

I need to implement an efficient algorithm for finding a sorted union from several sorted arrays. Since my program does a lot of these kinds of operation, I simulated it with C++. My first approach (method1) was to simply create an empty vector and append every element in the other vectors to the empty vector then use std::sort and std::unique to obtain the wanted sorted union of all the elements. However, I thought it might be more efficient to dump all the vector elements into a set (method2) because sets will already make them unique and sorted in one go. To my surprise method1 was 5 times faster than method2! Am I doing something wrong here? shouldn't method2 be faster because it does less computations? Thanks in advance
//// method1 with vectors:
std::vector<long> arr1{5,12,32,33,34,50};
std::vector<long> arr2{1,2,3,4,5};
std::vector<long> arr3{1,8,9,11};
std::vector<long> arr;
int main(int argc, const char * argv[]) {
double sec;
clock_t t;
t=clock();
for(long j=0; j<1000000; j++){ // repeating for benchmark
arr.clear();
for(long i=0; i<arr1.size(); i++){
arr.push_back(arr1[i]);
}
for(long i=0; i<arr2.size(); i++){
arr.push_back(arr2[i]);
}
for(long i=0; i<arr3.size(); i++){
arr.push_back(arr3[i]);
}
std::sort(arr.begin(), arr.end());
auto last = std::unique(arr.begin(), arr.end());
arr.erase(last, arr.end());
}
t=clock() - t;
sec = (double)t/CLOCKS_PER_SEC;
std::cout<<"seconds = "<< sec <<" clicks = " << t << std::endl;
return 0;
}
//// method2 with sets:
std::vector<long> arr1{5,12,32,33,34,50};
std::vector<long> arr2{1,2,3,4,5};
std::vector<long> arr3{1,8,9,11};
std::set<long> arr;
int main(int argc, const char * argv[]) {
double sec;
clock_t t;
t=clock();
for(long j=0; j<1000000; j++){ //repeating for benchmark
arr.clear();
arr.insert(arr1.begin(), arr1.end());
arr.insert(arr2.begin(), arr2.end());
arr.insert(arr3.begin(), arr3.end());
}
t=clock() - t;
sec = (double)t/CLOCKS_PER_SEC;
std::cout<<"seconds = "<< sec <<" clicks = " << t << std::endl;
return 0;
}
Here's how it's done with 2 vectors. You can easily generalize the process to N vectors.
vector<int> v1{ 4, 8, 12, 16 };
vector<int> v2{ 2, 6, 10, 14 };
vector<int> merged;
merged.reserve(v1.size() + v2.size());
// An iterator on each vector
auto it1 = v1.begin();
auto it2 = v2.begin();
while (it1 != v1.end() && it2 != v2.end())
{
// Find the iterator that points to the smallest number.
// Grab the value.
// Advance the iterator, and repeat.
if (*it1 < *it2)
{
if (merged.empty() || merged.back() < *it1)
merged.push_back(*it1);
++it1;
}
else
{
if (merged.empty() || merged.back() < *it2)
merged.push_back(*it2);
++it2;
}
}
while(it1 != v1.end())
{
merged.push_back(*it1);
++it1;
}
while (it2 != v2.end())
{
merged.push_back(*it2);
++it2;
}
// if you print out the values in 'merged', it gives the expected result
[2, 4, 6, 8, 10, 12, 14, 16]
...And you can generalize with the following. Note that a helper struct containing both the 'current' iterator and the end iterator would cleaner, but the idea remains the same.
vector<int> v1{ 4, 8, 12, 16 };
vector<int> v2{ 2, 6, 10, 14 };
vector<int> v3{ 3, 7, 11, 15 };
vector<int> v4{ 0, 21};
vector<int> merged;
// reserve space accordingly...
using vectorIt = vector<int>::const_iterator;
vector<vectorIt> fwdIterators;
fwdIterators.push_back(v1.begin());
fwdIterators.push_back(v2.begin());
fwdIterators.push_back(v3.begin());
fwdIterators.push_back(v4.begin());
vector<vectorIt> endIterators;
endIterators.push_back(v1.end());
endIterators.push_back(v2.end());
endIterators.push_back(v3.end());
endIterators.push_back(v4.end());
while (!fwdIterators.empty())
{
// Find out which iterator carries the smallest value
size_t index = 0;
for (size_t i = 1; i < fwdIterators.size(); ++i)
{
if (*fwdIterators[i] < *fwdIterators[index])
index = i;
}
if (merged.empty() || merged.back() < *fwdIterators[index])
merged.push_back(*fwdIterators[index]);
++fwdIterators[index];
if (fwdIterators[index] == endIterators[index])
{
fwdIterators.erase(fwdIterators.begin() + index);
endIterators.erase(endIterators.begin() + index);
}
}
// again, merged contains the expected result
[0, 2, 3, 4, 6, 7, 8, 10, 11, 12, 14, 15, 16, 21]
...And as some pointed out, using a heap would be even faster
// Helper struct to make it more convenient
struct Entry
{
vector<int>::const_iterator fwdIt;
vector<int>::const_iterator endIt;
Entry(vector<int> const& v) : fwdIt(v.begin()), endIt(v.end()) {}
bool IsAlive() const { return fwdIt != endIt; }
bool operator< (Entry const& rhs) const { return *fwdIt > *rhs.fwdIt; }
};
int main()
{
vector<int> v1{ 4, 8, 12, 16 };
vector<int> v2{ 2, 6, 10, 14 };
vector<int> v3{ 3, 7, 11, 15 };
vector<int> v4{ 0, 21};
vector<int> merged;
merged.reserve(v1.size() + v2.size() + v3.size() + v4.size());
std::priority_queue<Entry> queue;
queue.push(Entry(v1));
queue.push(Entry(v2));
queue.push(Entry(v3));
queue.push(Entry(v4));
while (!queue.empty())
{
Entry tmp = queue.top();
queue.pop();
if (merged.empty() || merged.back() < *tmp.fwdIt)
merged.push_back(*tmp.fwdIt);
tmp.fwdIt++;
if (tmp.IsAlive())
queue.push(tmp);
}
It does seem like a lot of copying of the 'Entry' object though, maybe a pointer to an entry with a proper comparison function would have been better for the std::priority_queue.
The usual way to merge many queues is to put the queues in a min heap based on the value of their first elements. The you repeatedly pull an item from the queue on the top of the heap and then push it down to restore the heap property.
This merges a total of N items K queues in O(N log K) time.
Since you are merging vector<int>, your queues could be either tuple<int, vector *> (current position and vector) or tuple<vector::const_iterator, vector::const_iterator> (current position and end)

Group and sort a vector by common/repetitive elements in c++

Suppose I have a vector as follows
std::vector<int> v = {3, 9, 7, 7, 2};
I would like to sort this vector of elements so that the vector will be stored as 77932. So first, we store the common elements (7), then we sort the remaining elements from the highest to the lowest.
If I have a vector as follows
std::vector<int> v = {3, 7, 7, 7, 2};
Here, it would lead to 77732.
Same for
std::vector<int> v = {7, 9, 2, 7, 9};
it should lead to 99772, because the 9s are higher than 7s.
One last example
std::vector<int> v = {7, 9, 7, 7, 9};
it should lead to 77799, because there are more 7s than 9s.
What could be the fastest algorithm to implement this?
Use std::multiset to do counting for you. Then sort using a simple custom comparer with tie breaking logic implemented with std::tie:
std::vector<int> data = {7, 9, 2, 7, 9};
std::multiset<int> count(data.begin(), data.end());
std::sort(
data.begin()
, data.end()
, [&](int a, int b) {
int ca = count.count(a);
int cb = count.count(b);
return std::tie(ca, a) > std::tie(cb, b);
}
);
std::copy(data.begin(), data.end(), std::ostream_iterator<int>(std::cout, " "));
Demo 1
Edit: count(n) function of of std::multiset is linear in the number of duplicates, which may degrade the performance of your sorting algorithm. You can address this by using std::unordered_map in its place:
std::vector<int> data = {7, 9, 2, 7, 9};
std::unordered_map<int,int> count;
for (auto v : data)
count[v]++;
std::sort(
data.begin()
, data.end()
, [&](int a, int b) {
return std::tie(count[a], a) > std::tie(count[b], b);
}
);
std::copy(data.begin(), data.end(), std::ostream_iterator<int>(std::cout, " "));
Demo 2.
You will need an auxiliary frequency count structure, then you can just define a comparator lambda and use whatever sort you like, std::sort is a sensible default
std::unordered_map<int, size_t> frequency;
std::for_each(v.begin(), v.end()
, [&](int i) { ++frequency[i]; });
std::sort(v.begin(), v.end()
, [&](int lhs, int rhs)
{
return std::tie(frequency[lhs], lhs) < std::tie(frequency[rhs], rhs);
});
I wouldn't be satisfied if a candidate proposed an auxiliary map for this task - clearly a sort does most of the work, and the auxiliary structure should be a vector (or, after I've actually tried to implement it, 2 vectors):
void custom_sort(vector<int> &v)
{
if (v.size() < 2)
return;
sort(v.begin(), v.end(), std::greater<int>());
vector<int> dupl;
vector<int> singl;
int d;
bool dv = false;
for (int i = 1; i < v.size(); ++i)
{
if (!dv)
{
if (v[i - 1] == v[i])
{
d = v[i];
dv = true;
dupl.push_back(d);
}
else
{
singl.push_back(v[i - 1]);
}
}
else
{
dupl.push_back(d);
if (v[i] != d)
dv = false;
}
}
if (!dv)
singl.push_back(v.back());
else
dupl.push_back(d);
auto mid = copy(dupl.begin(), dupl.end(), v.begin());
copy(singl.begin(), singl.end(), mid);
}
But yes, the branching is tricky - if you want to use it for more than an inverview, please test it... :-)
EDIT this answers an early version of the question.
If the elements are small integers, i.e. have limited range, we can extend the counting sort algorithm (since the keys here are the elements, we don't need to establish the starting position separately).
void custom_sort(std::vector<int>&v, const int N)
// assume that all elements are in [0,N[ and N elements fit into cash
{
vector<int> count(N);
for(auto x:v)
count.at(x) ++; // replace by count[x]++ if you're sure that 0 <= x < N
int i=0;
// first pass: insert multiple elements
for(auto n=N-1; n>=0; --n)
if(count[n] > 1)
for(auto k=0; k!=count[n]; ++k)
v[i++] = n;
// second pass: insert single elements
for(auto n=N-1; n>=0; --n)
if(count[n] == 1)
v[i++] = n;
}
There is O(N Log(N)) algorithm with extra O(N) memory.
#include <cstdio>
#include <vector>
#include <algorithm>
#include <utility>
int main(){
typedef std::pair<int, int> pii;
typedef std::vector< int > vi ;
typedef std::vector< pii > vii;
vi v = {7, 9, 7, 7, 9};
//O( N log(N) )
std::sort(v.begin(), v.end());
vii vc;
vc.reserve(v.size());
// O (N) make (cnt, value) pair of vector
for(size_t i = 0; i != v.size(); ++i)
{
if (vc.empty() || v[i] != vc.back().second ){
vc.push_back( pii(0, v[i]) ) ;
}
vc.back().first ++ ;
}
// O (N Log(N) ) sort by (cnt, value)
std::sort( vc.begin(), vc.end() ) ;
// O(N) restore they, reverse order.
v.clear();
for(int i = 0; i < (int)vc.size(); ++i){
int rev_i = vc.size() - i - 1;
int cnt = vc[rev_i].first;
for(int k = 0; k < cnt; ++k)
v.push_back( vc[rev_i].second ) ;
}
/////////////////////////
for(size_t i = 0; i != v.size(); ++i){
printf("%4d, ", v[i]);
}
printf("\n");
}

indices of the k largest elements in an unsorted length n array

I need to find the indices of the k largest elements of an unsorted, length n, array/vector in C++, with k < n. I have seen how to use nth_element() to find the k-th statistic, but I'm not sure if using this is the right choice for my problem as it seems like I would need to make k calls to nth_statistic, which I guess it would have complexity O(kn), which may be as good as it can get? Or is there a way to do this just in O(n)?
Implementing it without nth_element() seems like I will have to iterate over the whole array once, populating a list of indices of the largest elements at each step.
Is there anything in the standard C++ library that makes this a one-liner or any clever way to implement this myself in just a couple lines? In my particular case, k = 3, and n = 6, so efficiency isn't a huge concern, but it would be nice to find a clean and efficient way to do this for arbitrary k and n.
It looks like Mark the top N elements of an unsorted array is probably the closest posting I can find on SO, the postings there are in Python and PHP.
This should be an improved version of #hazelnusse which is executed in O(nlogk) instead of O(nlogn)
#include <queue>
#include <iostream>
#include <vector>
// maxindices.cc
// compile with:
// g++ -std=c++11 maxindices.cc -o maxindices
int main()
{
std::vector<double> test = {2, 8, 7, 5, 9, 3, 6, 1, 10, 4};
std::priority_queue< std::pair<double, int>, std::vector< std::pair<double, int> >, std::greater <std::pair<double, int> > > q;
int k = 5; // number of indices we need
for (int i = 0; i < test.size(); ++i) {
if(q.size()<k)
q.push(std::pair<double, int>(test[i], i));
else if(q.top().first < test[i]){
q.pop();
q.push(std::pair<double, int>(test[i], i));
}
}
k = q.size();
std::vector<int> res(k);
for (int i = 0; i < k; ++i) {
res[k - i - 1] = q.top().second;
q.pop();
}
for (int i = 0; i < k; ++i) {
std::cout<< res[i] <<std::endl;
}
}
8
4
1
2
6
Here is my implementation that does what I want and I think is reasonably efficient:
#include <queue>
#include <vector>
// maxindices.cc
// compile with:
// g++ -std=c++11 maxindices.cc -o maxindices
int main()
{
std::vector<double> test = {0.2, 1.0, 0.01, 3.0, 0.002, -1.0, -20};
std::priority_queue<std::pair<double, int>> q;
for (int i = 0; i < test.size(); ++i) {
q.push(std::pair<double, int>(test[i], i));
}
int k = 3; // number of indices we need
for (int i = 0; i < k; ++i) {
int ki = q.top().second;
std::cout << "index[" << i << "] = " << ki << std::endl;
q.pop();
}
}
which gives output:
index[0] = 3
index[1] = 1
index[2] = 0
The question has the partial answer; that is std::nth_element returns the "the n-th statistic" with a property that none of the elements preceding nth one are greater than it, and none of the elements following it are less.
Therefore, just one call to std::nth_element is enough to get the k largest elements. Time complexity will be O(n) which is theoretically the smallest since you have to visit each element at least one time to find the smallest (or in this case k-smallest) element(s). If you need these k elements to be ordered, then you need to order them which will be O(k log(k)). So, in total O(n + k log(k)).
You can use the basis of the quicksort algorithm to do what you need, except instead of reordering the partitions, you can get rid of the entries falling out of your desired range.
It's been referred to as "quick select" and here is a C++ implementation:
int partition(int* input, int p, int r)
{
int pivot = input[r];
while ( p < r )
{
while ( input[p] < pivot )
p++;
while ( input[r] > pivot )
r--;
if ( input[p] == input[r] )
p++;
else if ( p < r ) {
int tmp = input[p];
input[p] = input[r];
input[r] = tmp;
}
}
return r;
}
int quick_select(int* input, int p, int r, int k)
{
if ( p == r ) return input[p];
int j = partition(input, p, r);
int length = j - p + 1;
if ( length == k ) return input[j];
else if ( k < length ) return quick_select(input, p, j - 1, k);
else return quick_select(input, j + 1, r, k - length);
}
int main()
{
int A1[] = { 100, 400, 300, 500, 200 };
cout << "1st order element " << quick_select(A1, 0, 4, 1) << endl;
int A2[] = { 100, 400, 300, 500, 200 };
cout << "2nd order element " << quick_select(A2, 0, 4, 2) << endl;
int A3[] = { 100, 400, 300, 500, 200 };
cout << "3rd order element " << quick_select(A3, 0, 4, 3) << endl;
int A4[] = { 100, 400, 300, 500, 200 };
cout << "4th order element " << quick_select(A4, 0, 4, 4) << endl;
int A5[] = { 100, 400, 300, 500, 200 };
cout << "5th order element " << quick_select(A5, 0, 4, 5) << endl;
}
OUTPUT:
1st order element 100
2nd order element 200
3rd order element 300
4th order element 400
5th order element 500
EDIT
That particular implementation has an O(n) average run time; due to the method of selection of pivot, it shares quicksort's worst-case run time. By optimizing the pivot choice, your worst case also becomes O(n).
The standard library won't get you a list of indices (it has been designed to avoid passing around redundant data). However, if you're interested in n largest elements, use some kind of partitioning (both std::partition and std::nth_element are O(n)):
#include <iostream>
#include <algorithm>
#include <vector>
struct Pred {
Pred(int nth) : nth(nth) {};
bool operator()(int k) { return k >= nth; }
int nth;
};
int main() {
int n = 4;
std::vector<int> v = {5, 12, 27, 9, 4, 7, 2, 1, 8, 13, 1};
// Moves the nth element to the nth from the end position.
std::nth_element(v.begin(), v.end() - n, v.end());
// Reorders the range, so that the first n elements would be >= nth.
std::partition(v.begin(), v.end(), Pred(*(v.end() - n)));
for (auto it = v.begin(); it != v.end(); ++it)
std::cout << *it << " ";
std::cout << "\n";
return 0;
}
You can do this in O(n) time with a single order statistic calculation:
Let r be the k-th order statistic
Initialize two empty lists bigger and equal.
For each index i:
If array[i] > r, add i to bigger
If array[i] = r, add i to equal
Discard elements from equal until the sum of the lengths of the two lists is k
Return the concatenation of the two lists.
Naturally, you only need one list if all items are distinct. And if needed, you could do tricks to combine the two lists into one, although that would make the code more complicated.
Even though the following code might not fulfill the desired complexity constraints it might be an interesting alternative for the before-mentioned priority queue.
#include <queue>
#include <vector>
#include <iostream>
#include <iterator>
#include <algorithm>
std::vector<int> largestIndices(const std::vector<double>& values, int k) {
std::vector<int> ret;
std::vector<std::pair<double, int>> q;
int index = -1;
std::transform(values.begin(), values.end(), std::back_inserter(q), [&](double val) {return std::make_pair(val, ++index); });
auto functor = [](const std::pair<double, int>& a, const std::pair<double, int>& b) { return b.first > a.first; };
std::make_heap(q.begin(), q.end(), functor);
for (auto i = 0; i < k && i<values.size(); i++) {
std::pop_heap(q.begin(), q.end(), functor);
ret.push_back(q.back().second);
q.pop_back();
}
return ret;
}
int main()
{
std::vector<double> values = { 7,6,3,4,5,2,1,0 };
auto ret=largestIndices(values, 4);
std::copy(ret.begin(), ret.end(), std::ostream_iterator<int>(std::cout, "\n"));
}