remove duplicates int number in a vector c++ - c++

I'm trying to remove the same integer numbers in a vector. My aim is to have only one copy them. Well I wrote a simple code, but it doesn't work properly. Can anyone help? Thanks in advance.
#include <iostream>
#include <vector>
using namespace std;
int main()
{
int a = 10, b = 10 , c = 8, d = 8, e = 10 , f = 6;
vector<int> vec;
vec.push_back(a);
vec.push_back(b);
vec.push_back(c);
vec.push_back(d);
vec.push_back(e);
vec.push_back(f);
for (int i=vec.size()-1; i>=0; i--)
{
for(int j=vec.size()-1; j>=0; j--)
{
if(vec[j] == vec[i-1])
vec.erase(vec.begin() + j);
}
}
for(int i=0; i<vec.size(); i++)
{
cout<< "vec: "<< vec[i]<<endl;
}
return 0;
}

Don't use a list for this. Use a set:
#include <set>
...
set<int> vec;
This will ensure you will have no duplicates by not adding an element if it already exists.

To remove duplicates it's easier if you sort the array first. The code below uses two different methods for removing the duplicates: one using the built-in C++ algorithms and the other using a loop.
#include <iostream>
#include <vector>
#include <iterator>
#include <algorithm>
using namespace std;
int main() {
int a = 10, b = 10 , c = 8, d = 8, e = 10 , f = 6;
vector<int> vec;
vec.push_back(a);
vec.push_back(b);
vec.push_back(c);
vec.push_back(d);
vec.push_back(e);
vec.push_back(f);
// Sort the vector
std::sort(vec.begin(), vec.end());
// Remove duplicates (v1)
std::vector<int> result;
std::unique_copy(vec.begin(), vec.end(), std::back_inserter(result));
// Print results
std::cout << "Result v1: ";
std::copy(result.begin(), result.end(), std::ostream_iterator<int>(cout, " "));
std::cout << std::endl;
// Remove duplicates (v2)
std::vector<int> result2;
for (int i = 0; i < vec.size(); i++) {
if (i > 0 && vec[i] == vec[i - 1])
continue;
result2.push_back(vec[i]);
}
// Print results (v2)
std::cout << "Result v2: ";
std::copy(result2.begin(), result2.end(), std::ostream_iterator<int>(cout, " "));
std::cout << std::endl;
return 0;
}

If you need to save initial order of numbers you can make a function that will remove duplicates using helper set<int> structure:
void removeDuplicates( vector<int>& v )
{
set<int> s;
vector<int> res;
for( int i = 0; i < v.size(); i++ ) {
int x = v[i];
if( s.find(x) == s.end() ) {
s.insert(x);
res.push_back(x);
}
}
swap(v, res);
}

The problem with your code is here:
for(int j=vec.size()-1; j>=0; j--)
{
if(vec[j] == vec[i-1])
vec.erase(vec.begin() + j);
}
there's going to be a time when j==i-1 and that's going to kill your algorithms and there will be a time when i-1 < 0 so you will get an out of boundary exception.
What you can do is to change your for loop conditions:
for (int i = vec.size() - 1; i>0; i--){
for(int j = i - 1; j >= 0; j--){
//do stuff
}
}
this way, your the two variables your comparing will never be the same and your indices will always be at least 0.

Others have already pointed to std::set. This is certainly simple and easy--but it can be fairly slow (quite a bit slower than std::vector, largely because (like a linked list) it consists of individually allocated nodes, linked together via pointers to form a balanced tree1.
You can (often) improve on that by using an std::unordered_set instead of a std::set. This uses a hash table2 instead of a tree to store the data, so it normally uses contiguous storage, and gives O(1) expected access time instead of the O(log N) expected for a tree.
An alternative that's often faster is to collect the data in the vector, then sort the data and use std::unique to eliminate duplicates. This tends to be best when you have two distinct phases of operation: first you collect all the data, then you need duplicates removed. If you frequently alternate between adding/deleting data, and needing a duplicate free set, then something like std::set or std::unordered_set that maintain the set without duplicates at all times may be more useful.
All of these also affect the order of the items. An std::set always maintains the items sorted in a defined order. With std::unique you need to explicit sort the data. With std::unordered_set you get the items sorted in an arbitrary order that's neither their original order nor is it sorted.
If you need to maintain the original order, but without duplicates, you normally end up needing to store the data twice. For example when you need to add a new item, you attempt to insert it into an std::unordered_set, then if and only if that succeeds, add it to the vector as well.
Technically, implementation as a tree isn't strictly required, but it's about the only possibility of which I'm aware that can meet the requirements, and all the implementations of which I'm aware are based on trees.
Again, other implementations might be theoretically possible, but all of which I'm aware use hashing--but in this case, enough of the implementation is exposed that avoiding a hash table would probably be even more difficult.

The body of a range for must not change the size of the sequence over which it is iterating..
you can remove duplicates before push_back
void push(std::vector<int> & arr, int n)
{
for(int i = 0; i != arr.size(); ++i)
{
if(arr[i] == n)
{
return;
}
}
arr.push_back(n);
}
... ...
push(vec, a);
push(vec, b);
push(vec, c);
...

Related

Efficient way of finding if a container contains duplicated values with STL? [duplicate]

I wrote this code in C++ as part of a uni task where I need to ensure that there are no duplicates within an array:
// Check for duplicate numbers in user inputted data
int i; // Need to declare i here so that it can be accessed by the 'inner' loop that starts on line 21
for(i = 0;i < 6; i++) { // Check each other number in the array
for(int j = i; j < 6; j++) { // Check the rest of the numbers
if(j != i) { // Makes sure don't check number against itself
if(userNumbers[i] == userNumbers[j]) {
b = true;
}
}
if(b == true) { // If there is a duplicate, change that particular number
cout << "Please re-enter number " << i + 1 << ". Duplicate numbers are not allowed:" << endl;
cin >> userNumbers[i];
}
} // Comparison loop
b = false; // Reset the boolean after each number entered has been checked
} // Main check loop
It works perfectly, but I'd like to know if there is a more elegant or efficient way to check.
You could sort the array in O(nlog(n)), then simply look until the next number. That is substantially faster than your O(n^2) existing algorithm. The code is also a lot cleaner. Your code also doesn't ensure no duplicates were inserted when they were re-entered. You need to prevent duplicates from existing in the first place.
std::sort(userNumbers.begin(), userNumbers.end());
for(int i = 0; i < userNumbers.size() - 1; i++) {
if (userNumbers[i] == userNumbers[i + 1]) {
userNumbers.erase(userNumbers.begin() + i);
i--;
}
}
I also second the reccomendation to use a std::set - no duplicates there.
The following solution is based on sorting the numbers and then removing the duplicates:
#include <algorithm>
int main()
{
int userNumbers[6];
// ...
int* end = userNumbers + 6;
std::sort(userNumbers, end);
bool containsDuplicates = (std::unique(userNumbers, end) != end);
}
Indeed, the fastest and as far I can see most elegant method is as advised above:
std::vector<int> tUserNumbers;
// ...
std::set<int> tSet(tUserNumbers.begin(), tUserNumbers.end());
std::vector<int>(tSet.begin(), tSet.end()).swap(tUserNumbers);
It is O(n log n). This however does not make it, if the ordering of the numbers in the input array needs to be kept... In this case I did:
std::set<int> tTmp;
std::vector<int>::iterator tNewEnd =
std::remove_if(tUserNumbers.begin(), tUserNumbers.end(),
[&tTmp] (int pNumber) -> bool {
return (!tTmp.insert(pNumber).second);
});
tUserNumbers.erase(tNewEnd, tUserNumbers.end());
which is still O(n log n) and keeps the original ordering of elements in tUserNumbers.
Cheers,
Paul
It is in extension to the answer by #Puppy, which is the current best answer.
PS : I tried to insert this post as comment in the current best answer by #Puppy but couldn't so as I don't have 50 points yet. Also a bit of experimental data is shared here for further help.
Both std::set and std::map are implemented in STL using Balanced Binary Search tree only. So both will lead to a complexity of O(nlogn) only in this case. While the better performance can be achieved if a hash table is used. std::unordered_map offers hash table based implementation for faster search. I experimented with all three implementations and found the results using std::unordered_map to be better than std::set and std::map. Results and code are shared below. Images are the snapshot of performance measured by LeetCode on the solutions.
bool hasDuplicate(vector<int>& nums) {
size_t count = nums.size();
if (!count)
return false;
std::unordered_map<int, int> tbl;
//std::set<int> tbl;
for (size_t i = 0; i < count; i++) {
if (tbl.find(nums[i]) != tbl.end())
return true;
tbl[nums[i]] = 1;
//tbl.insert(nums[i]);
}
return false;
}
unordered_map Performance (Run time was 52 ms here)
Set/Map Performance
You can add all elements in a set and check when adding if it is already present or not. That would be more elegant and efficient.
I'm not sure why this hasn't been suggested but here is a way in base 10 to find duplicates in O(n).. The problem I see with the already suggested O(n) solution is that it requires that the digits be sorted first.. This method is O(n) and does not require the set to be sorted. The cool thing is that checking if a specific digit has duplicates is O(1). I know this thread is probably dead but maybe it will help somebody! :)
/*
============================
Foo
============================
*
Takes in a read only unsigned int. A table is created to store counters
for each digit. If any digit's counter is flipped higher than 1, function
returns. For example, with 48778584:
0 1 2 3 4 5 6 7 8 9
[0] [0] [0] [0] [2] [1] [0] [2] [2] [0]
When we iterate over this array, we find that 4 is duplicated and immediately
return false.
*/
bool Foo(int number)
{
int temp = number;
int digitTable[10]={0};
while(temp > 0)
{
digitTable[temp % 10]++; // Last digit's respective index.
temp /= 10; // Move to next digit
}
for (int i=0; i < 10; i++)
{
if (digitTable [i] > 1)
{
return false;
}
}
return true;
}
It's ok, specially for small array lengths. I'd use more efficient aproaches (less than n^2/2 comparisons) if the array is mugh bigger - see DeadMG's answer.
Some small corrections for your code:
Instead of int j = i writeint j = i +1 and you can omit your if(j != i) test
You should't need to declare i variable outside the for statement.
I think #Michael Jaison G's solution is really brilliant, I modify his code a little to avoid sorting. (By using unordered_set, the algorithm may faster a little.)
template <class Iterator>
bool isDuplicated(Iterator begin, Iterator end) {
using T = typename std::iterator_traits<Iterator>::value_type;
std::unordered_set<T> values(begin, end);
std::size_t size = std::distance(begin,end);
return size != values.size();
}
//std::unique(_copy) requires a sorted container.
std::sort(cont.begin(), cont.end());
//testing if cont has duplicates
std::unique(cont.begin(), cont.end()) != cont.end();
//getting a new container with no duplicates
std::unique_copy(cont.begin(), cont.end(), std::back_inserter(cont2));
#include<iostream>
#include<algorithm>
int main(){
int arr[] = {3, 2, 3, 4, 1, 5, 5, 5};
int len = sizeof(arr) / sizeof(*arr); // Finding length of array
std::sort(arr, arr+len);
int unique_elements = std::unique(arr, arr+len) - arr;
if(unique_elements == len) std::cout << "Duplicate number is not present here\n";
else std::cout << "Duplicate number present in this array\n";
return 0;
}
As mentioned by #underscore_d, an elegant and efficient solution would be,
#include <algorithm>
#include <vector>
template <class Iterator>
bool has_duplicates(Iterator begin, Iterator end) {
using T = typename std::iterator_traits<Iterator>::value_type;
std::vector<T> values(begin, end);
std::sort(values.begin(), values.end());
return (std::adjacent_find(values.begin(), values.end()) != values.end());
}
int main() {
int user_ids[6];
// ...
std::cout << has_duplicates(user_ids, user_ids + 6) << std::endl;
}
fast O(N) time and space solution
return first when it hits duplicate
template <typename T>
bool containsDuplicate(vector<T>& items) {
return any_of(items.begin(), items.end(), [s = unordered_set<T>{}](const auto& item) mutable {
return !s.insert(item).second;
});
}
Not enough karma to post a comment. Hence a post.
vector <int> numArray = { 1,2,1,4,5 };
unordered_map<int, bool> hasDuplicate;
bool flag = false;
for (auto i : numArray)
{
if (hasDuplicate[i])
{
flag = true;
break;
}
else
hasDuplicate[i] = true;
}
(flag)?(cout << "Duplicate"):("No duplicate");

How can I find repeated words in a vector of strings in C++?

I have a std::vector<string> where each element is a word. I want to print the vector without repeated words!
I searched a lot on the web and I found lots of material, but I can't and I don't want to use hash maps, iterators and "advanced" (to me) stuff. I can only use plain string comparison == as I am still a beginner.
So, let my_vec a std::vector<std::string> initialized from std input. My idea was to read all the vector and erase any repeated word once I found it:
for(int i=0;i<my_vec.size();++i){
for (int j=i+1;j<my_vec.size();++j){
if(my_vec[i]==my_vec[j]){
my_vec.erase(my_vec.begin()+j); //remove the component from the vector
}
}
}
I tried to test for std::vector<std::string> my_vec{"hey","how","are","you","fine","and","you","fine"}
and indeed I found
hey how are you fine and
so it seems to be right, but for instance if I write the simple vector std::vector<std::string> my_vec{"hello","hello","hello","hello","hello"}
I obtain
hello hello
The problem is that at every call to erase the dimension gets smaller and so I lose information. How can I do that?
Minimalist approach to your existing code. The auto-increment of j is what is ultimately breaking your algorithm. Don't do that. Instead, only increment it when you do NOT remove an element.
I.e.
for (int i = 0; i < my_vec.size(); ++i) {
for (int j = i + 1; j < my_vec.size(); ) { // NOTE: no ++j
if (my_vec[i] == my_vec[j]) {
my_vec.erase(my_vec.begin() + j);
}
else ++j; // NOTE: moved to else-clause
}
}
That is literally it.
You can store the element element index to erase and then eliminate it at the end.
Or repeat the cycle until no erase are performed.
First code Example:
std::vector<int> index_to_erase();
for(int i=0;i<my_vec.size();++i){
for (int j=i+1;j<my_vec.size();++j){
if(my_vec[i]==my_vec[j]){
index_to_erase.push_back(j);
}
}
}
//starting the cycle from the last element to the vector of index, in this
//way the vector of element remains equal for the first n elements
for (int i = index_to_erase.size()-1; i >= 0; i--){
my_vec.erase(my_vec.begin()+index_to_erase[i]); //remove the component from the vector
}
Second code Example:
bool Erase = true;
while(Erase){
Erase = false;
for(int i=0;i<my_vec.size();++i){
for (int j=i+1;j<my_vec.size();++j){
if(my_vec[i]==my_vec[j]){
my_vec.erase(my_vec.begin()+j); //remove the component from the vector
Erase = true;
}
}
}
}
Why don't you use std::unique?
You can use it as easy as:
std::vector<std::string> v{ "hello", "hello", "hello", "hello", "hello" };
std::sort(v.begin(), v.end());
v.erase(std::unique(v.begin(), v.end()), v.end());
N.B. Elements need to be sorted because std::unique works only for consecutive duplicates.
In case you don't want to change the content of the std::vector, but only have stable output, I recommend other answers.
Erasing elements from a container inside a loop is a little tricky, because after erasing element at index i the next element (in the next iteration) is not at index i+1 but at index i.
Read about the erase-remove-idiom for the idomatic way to erase elements. However, if you just want to print on the screen there is a much simpler way to fix your code:
for(int i=0; i<my_vec.size(); ++i){
bool unique = true;
for (int j=0; j<i; ++j){
if(my_vec[i]==my_vec[j]) {
unique = false;
break;
}
if (unique) std::cout << my_vec[i];
}
}
Instead of checking for elements after the current one you should compare to elements before. Otherwise "bar x bar y bar" will result in "x x bar" when I suppose it should be "bar x y".
Last but not least, consider that using the traditional loops with indices is the complicated way, while using iterators or a range-based loop is much simpler. Don't be afraid of new stuff, on the long run it will be easier to use.
You can simply use the combination of sort and unique as follows.
#include <iostream>
#include <algorithm>
#include <vector>
int main() {
std::vector<std::string> vec{"hey","how","are","you","fine","and","you","fine"};
sort(vec.begin(), vec.end());
vec.erase(unique(vec.begin(), vec.end() ), vec.end());
for (int i = 0; i < vec.size(); i ++) {
std::cout << vec[i] << " ";
}
std::cout << "\n";
return 0;
}

Erase elements in vector using for loop

How do I use a for loop to erase elements from a vector by its index ? I am getting a vector out of range error. I have a sample code below.
vector<int> to_erase = {0, 1, 2};
vector<int> data = {3, 3, 3, 3};
for(int i = 0; i < to_erase.size(); i++) {
data.erase(data.begin() + to_erase[i]);
}
I think it is because the size of my vector reduces through every iteration therefore it cannot access index 2.
You would normally employ the erase–remove idiom to delete multiple elements from a vector efficiently (erasing them one by one is generally less efficient, and, as you’ve seen, not always trivial). In its most general form, the idiom looks like this:
data.erase(remove_algorithm(begin(data), end(data)), end(data));
In your case, the remove_algorithm is based off indices in another vector, so we need to provide those as well:
data.erase(
remove_indices(begin(data), end(data), begin(to_erase), end(to_erase)),
end(data));
Unfortunately, such an algorithm isn’t contained in the standard library. However, it’s trivial to write yourself1:
template <typename It, typename It2>
auto remove_indices(It begin, It end, It2 idx_b, It2 idx_e) -> It {
using idx_t = typename std::iterator_traits<It2>::value_type;
std::sort(idx_b, idx_e, std::greater<idx_t>{});
for (; idx_b != idx_e; ++idx_b) {
auto pos = begin + *idx_b;
std::move(std::next(pos), end--, pos);
}
return end;
}
Here, we first sort the indices to be removed from largest to smallest. Next, we loop over those indices. We then (maximally efficiently) move all elements between the current position (to be deleted) and the end of the vector forward by one. Subsequently, the end is decreased by one (to account for the fact that an element got deleted).
Live code
1 *Ahem* Once you’ve removed all the silly typos in your code.
I think it is because the size of my vector reduces through every iteration
Yes!
You could do it by keeping an extra variable, which counts the elements that are deleted, like this:
#include <iostream>
#include <vector>
using namespace std;
int main() {
vector<int> to_erase = {0, 1, 2};
vector<int> data = {3, 3, 3, 3};
int count_removed = 0;
for(unsigned int i = 0; i < to_erase.size(); i++)
data.erase(data.begin() + to_erase[i] - count_removed++);
for(unsigned int i = 0; i < data.size(); ++i)
cout << data[i] << "\n";
return 0;
}
Output:
3
I had the same problem when I first used std::erase(), good question, +1.
Deleting a collection of elements meanwhile you iterate, is unsafe and probalby expensive. I would suggest that each element that meets your criteria gets swapped with an element at the end. (at the end because will be cheaper to erase from the end. You can keep track of how much back you came from the end of the vector (based on the number of swap), and break early our of the loop. Now based on how many elements you swapped you can do something like:
data.resize(data.size() - reverse_counter);
or
int pos = data.size() - reverse_counter;
data.erease(data.begin()+pos, data.end();
It is sudo code just to explain the idea.
As mentioned in the reference, erase not at the end cause re-allocation, which is expensive. Something worth keep in mind:
http://www.cplusplus.com/reference/vector/vector/erase/
I think this is a bad design, because you will change the for loop invariant and will need a lot of workaround to make this happen. Anyway, if you really want to use a for loop, you MAY flag what you whant to delete and run a stl remove_if, something like:
#include <iostream>
#include <vector>
#include <limits>
#include <algorithm>
using namespace std;
int main() {
vector<int> to_erase = {0, 1, 2};
vector<int> data = {3, 3, 3, 3};
cout << "Before:\n" ;
for(int i=0; i<data.size(); i++)
cout << i << "\t";
cout << endl;
for(int i=0; i<data.size(); i++)
cout << data[i] << "\t";
cout << endl;
for(int i = 0; i < to_erase.size(); i++) {
//data.erase(data.begin() + to_erase[i]);
data[i] = numeric_limits<int>::max();
}
data.erase(remove_if(data.begin(),
data.end(),
[](int i){return i==numeric_limits<int>::max();}), data.end());
cout << "Next:\n" ;
for(int i=0; i<data.size(); i++)
cout << i << "\t";
cout << endl;
for(int i=0; i<data.size(); i++)
cout << data[i] << "\t";
return 0;
}

Given a vector with integers from 0 to n, but not all included, how do I efficiently get the non-included integers?

Given a vector with integers from 0 to n, but not all included, how do I efficiently get the non-included integers?
For example if I have a vector with 1 2 3 5, I need to get the vector that contains 0 4.
But I need to do it very efficiently.
Since the vector is already sorted, this becomes trivial:
vector<int> v = {1,2,3,5};
vector<int> ret;
v.push_back(n+1); // this is to enforce a limit using less branches in the loop
for(int i = 0, j = 0; i <= n; ++i){
int present = v[j++];
while(i < present){
ret.push_back(i++);
}
}
return ret;
Additionally, if it wasn't sorted, you could either sort it and apply the above algorithm, or, if you know the range of n, and you can afford the extra memory, you could instead create an array of boolean (or a bitset) and mark the index corresponding to every element you encounter (e.g. bitset[v[j++]] = true;), subsequently iterating from 0 to n and inserting into your vector every element whose bitset position has not been marked.
Basically the idea presented here is that we know the number of missing items beforehand if we can assume sorted input without duplicate values.
Then it is possible to pre-allocate enough space to hold the missing values beforehand (no later dynamic allocation required). Then we can also exploit the possible shortcut when all missing values were found.
If the input vector is not sorted or contains duplicate values, a wrapper function can be used that establishes this precondition.
#include <iostream>
#include <set>
#include <vector>
inline std::vector<int> find_missing(std::vector<int> const & input) {
// assuming non-empty, sorted input, no duplicates
// number of items missing
int n_missing = input.back() - input.size() + 1;
// pre-allocate enough memory for missing values
std::vector<int> result(n_missing);
// iterate input vector with shortcut if all missing values were found
auto input_it = input.begin();
auto result_it = result.begin();
for (int i = 0; result_it != result.end() && input_it != input.end(); ++i) {
if (i < *input_it) (*result_it++) = i;
else ++input_it;
}
return result;
}
// use this if the input vector is not sorted/unique
inline std::vector<int> find_missing_unordered(std::vector<int> const & input) {
std::set<int> values(input.begin(), input.end());
return find_missing(std::vector<int>(values.begin(), values.end()));
}
int main() {
std::vector<int> input = {1,2,3,5,5,5,7};
std::vector<int> result = find_missing_unordered(input);
for (int i : result)
std::cout << i << " ";
std::cout << "\n";
}
The output is:
$ g++ test.cc -std=c++11 && ./a.out
0 4 6

how would I sort a list and get the top K elements? (STL)

I have a vector of doubles. I want to sort it from highest to lowest, and get the indices of the top K elements. std::sort just sorts in place, and does not return the indices I believe. What would be a quick way to get the top K indices of largest elements?
you could use the nth_element STL algorithm - this will return you the N greatest elements ( this is the fastest way,using stl ) and then use .sort on them,or you could use the partial_sort algorithm,if you want the first K elements to be sorted (:
Using just .sort is awful - it is very slow for the purpose you want.. .sort is great STL algorithm,but for sorting the whole container,not just the first K elements (; it's not an accident the existung of nth_element and partial_sort ;)
The first thing that comes to mind is somewhat hackish, but you could define a struct that stored both the double and its original index, then overload the < operator to sort based on the double:
struct s {
double d;
int index;
bool operator < (const struct &s) const {
return d < s.d;
}
};
Then you could retrieve the original indices from the struct.
Fuller example:
vector<double> orig;
vector<s> v;
...
for (int i=0; i < orig.size(); ++i) {
s s_temp;
s_temp.d = orig[i];
s_temp.index = i;
v.push_back(s);
}
sort(v.begin(), v.end());
//now just retrieve v[i].index
This will leave them sorted from smallest to largest, but you could overload the > operator instead and then pass in greater to the sort function if wanted.
OK, how about this?
bool isSmaller (std::pair<double, int> x, std::pair<double, int> y)
{
return x.first< y.first;
}
int main()
{
//...
//you have your vector<double> here, say name is d;
std::vector<std::pair<double, int> > newVec(d.size());
for(int i = 0; i < newVec.size(); ++i)
{
newVec[i].first = d[i];
newVec[i].second = i; //store the initial index
}
std::sort(newVec.begin(), newVec.end(), &isSmaller);
//now you can iterate through first k elements and the second components will be the initial indices
}
Not sure about pre-canned algorithms, but take a look at selection algorithms; if you need the top K elements of a set of N values and N is much larger than K, there are much more efficient methods.
If you can create an indexing class (like #user470379's answer -- basically a class that encapsulates a pointer/index to the "real" data which is read-only), then use a priority queue of maximum size K, and add each unsorted element to the priority queue, popping off the bottom-most element when the queue reaches size K+1. In cases like N = 106, K = 100, this handles cases much more simply + efficiently than a full sort.
So you actually need a structure that maps indices to corresponding doubles.
You could use std::multimap class to perform this mapping. As Jason have noted std::map does not allow duplicate keys.
std::vector<double> v; // assume it is populated already
std::multimap<double, int> m;
for (int i = 0; i < v.size(); ++i)
m.insert(std::make_pair(v[i], i));
...
After you've done this you could iterate over first ten elements as map preserves sorting of keys to the elements.
Use multimap for vector's (value, index) to handle dups. Use reverse iterators to walk results in descending order.
#include <multimap>
#include <vector>
using namespace std;
multimap<double, size_t> indices;
vector<double> values;
values.push_back(1.0);
values.push_back(2.0);
values.push_back(3.0);
values.push_back(4.0);
size_t i = 0;
for(vector<double>::const_iterator iter = values.begin();
iter != values.end(); ++iter, ++i)
{
indices.insert(make_pair<double,int>(*iter, i));
}
i = 0;
size_t limit = 2;
for (multimap<double, size_t>::const_reverse_iterator iter = indices.rbegin();
iter != indices.rend() && i < limit; ++iter, ++i)
{
cout << "Value " << iter->first << " index " << iter->second << endl;
}
Output is
Value 4 index 3
Value 3 index 2
If you just want the vector indices after sort, use this:
#include <algorithm>
#include <vector>
using namespace std;
vector<double> values;
values.push_back(1.0);
values.push_back(2.0);
values.push_back(3.0);
values.push_back(4.0);
sort(values.rbegin(), values.rend());
The top K entries are indexed by 0 to K-1, and appear in descending order. This uses reverse iterators combined with standard sort (using less<double> to achieve descending order when iterated forward. Equivalently:
sort(values.rbegin(), values.rend(), less<double>());
Sample code for the excellent nth_element solution suggested by #Kiril here (K = 125000, N = 500000). I wanted to try this out, so here it is.
vector<double> values;
for (size_t i = 0; i < 500000; ++i)
{
values.push_back(rand());
}
nth_element(values.begin(), values.begin()+375000, values.end());
sort(values.begin()+375000, values.end());
vector<double> results(values.rbegin(), values.rbegin() + values.size() - 375000);