c++ concatenate many std::vectors and remove duplicates

c++ concatenate many std::vectors and remove duplicates - c++

I am receiving some integers from an external API (std::vector).
The API usually needs to be called multiple times, so I need to accumulate all the integers from the consecutive API calls to a local vector. At the end every element of the array must be unique (does not need to be sorted).
My code is below (uses the getNextVector to "mock" data and simulate the API call).
The code works, however I want maximum performance for this operation. Is my approach the right one?
#include <vector>
#include <iostream>
#include <iterator>
#include <algorithm>
std::vector<int> getNextVector(int i) {
if ( i == 0 ) {
std::vector<int> v = { 1,2,3 };
return v;
} else if ( i == 1 ) {
std::vector<int> v = { 3,4,5 };
return v;
} else if ( i == 2 ) {
std::vector<int> v = { 5,6,7 };
return v;
} else if ( i == 3 ) {
std::vector<int> v = { 7,8,9 };
return v;
}
}
int count() { return 4; } //we have four vectors
int main(int argc, char** argv) {
std::vector<int> dest;
dest.reserve(20); // we can find this, for simplicity hardcode...
for( int i = 0; i < count(); i++ ) {
std::vector<int> src = getNextVector(i);
dest.insert(
dest.end(),
std::make_move_iterator(src.begin()),
std::make_move_iterator(src.end())
);
}
std::sort(dest.begin(), dest.end());
dest.erase(unique(dest.begin(), dest.end()), dest.end());
/*
std::copy(
dest.begin(),
dest.end(),
std::ostream_iterator<int>(std::cout, "\n")
);
*/
return 0;
}

I think you can store the elements of the vector in a set. If ordering is not needed you can use unordered_set. Simply do the following -
std::unordered_set<int> integers;
for (int i = 0; i < count; i++) {
std::vector<int> src = getNextVector(i);
for (int j = 0; j < src.size(); j++) {
integers.insert(src[i]);
}
}
Or as suggested by #StoryTeller, you can use an appropriate function instead of the loop. For example -
std::unordered_set<int> integers;
for (int i = 0; i < count; i++) {
std::vector<int> src = getNextVector(i);
integers.insert(src.begin(), src.end());
}

My first thought was about "It can be done fast and easly with unordered_set", later I realised that it will not help too much with ints (hash of int is still int, so I don't see here performance increase). So, lastly I decided to benchmark it and my results are:
N = 4 Set implementation 304703 miliseconds
N = 4 Unordered set implementation 404469 miliseconds
N = 4 Vect implementation 91769 miliseconds
N = 20 Set implementation 563320 miliseconds
N = 20 Unordered set implementation 398049 miliseconds
N = 20 Vect implementation 176558 miliseconds
N = 40 Set implementation 569628 miliseconds
N = 40 Unordered set implementation 420496 miliseconds
N = 40 Vect implementation 207368 miliseconds
N = 200 Set implementation 639829 miliseconds
N = 200 Unordered set implementation 456763 miliseconds
N = 200 Vect implementation 245343 miliseconds
N = 2000 Set implementation 728753 miliseconds
N = 2000 Unordered set implementation 499716 miliseconds
N = 2000 Vect implementation 303813 miliseconds
N = 20000 Set implementation 760176 miliseconds
N = 20000 Unordered set implementation 480219 miliseconds
N = 20000 Vect implementation 331941 miliseconds
So, apperently, for samples you gave us here you implementation is the fastest one. This is case when your API returns only few possible vector combinations and number of iterations is small. I've decided to verify what happends when you have more different values via rand() for N > 4 (*). And it keeps it that way. Unordered set is the slowest one (hash calculation cost).
So, to answer your question: benchmark you case on your own - this is the best way to determine which is the fastest one.
(*) Bad randomness of rand() is not bug, but a feature here.
EDIT:
My answer does not provide not says there are no faster algorithms - I've benchmarked STL ones, which at first glance seems to be behave differently than results provide. But for sure there is a way of doing unique concatetion faster, maybe some combination of set of vectors or different container and I hope someone will provide one.
#include <vector>
#include <iostream>
#include <iterator>
#include <algorithm>
#include <set>
#include <unordered_set>
#include <chrono>
std::vector<int> getNextVector(int i) {
if (i == 0) {
std::vector<int> v = { 1,2,3 };
return v;
}
else if (i == 1) {
std::vector<int> v = { 3,4,5 };
return v;
}
else if (i == 2) {
std::vector<int> v = { 5,6,7 };
return v;
}
else if (i == 3) {
std::vector<int> v = { 7,8,9 };
return v;
}
return {rand() % 10000,rand() % 10000,rand() % 10000 };
}
void set_impl(std::set<int>& dest, int N)
{
// dest.reserve(20); // we can find this, for simplicity hardcode...
for (int i = 0; i < N; i++) {
std::vector<int> src = getNextVector(i);
dest.insert(
std::make_move_iterator(src.begin()),
std::make_move_iterator(src.end())
);
}
}
void uset_impl(std::unordered_set<int>& dest, int N)
{
// dest.reserve(20); // we can find this, for simplicity hardcode...
for (int i = 0; i < N; i++) {
std::vector<int> src = getNextVector(i);
dest.insert(
std::make_move_iterator(src.begin()),
std::make_move_iterator(src.end())
);
}
}
void vect_impl(std::vector<int>& dest, int N)
{
for (int i = 0; i < N; i++) {
std::vector<int> src = getNextVector(i);
dest.insert(
dest.end(),
std::make_move_iterator(src.begin()),
std::make_move_iterator(src.end())
);
}
std::sort(dest.begin(), dest.end());
dest.erase(unique(dest.begin(), dest.end()), dest.end());
}
int main(int argc, char** argv) {
for (int N : { 4, 20, 40, 200, 2000, 20000 })
{
const int K = 1000000 / N;
using clock = std::chrono::high_resolution_clock;
std::set<int> sdest;
auto start = clock::now();
for (int i = 0; i < K; i++)
{
sdest.clear();
set_impl(sdest, N);
}
auto set_ms = std::chrono::duration_cast<std::chrono::microseconds>(clock::now() - start).count();
std::unordered_set<int> usdest;
start = clock::now();
for (int i = 0; i < K; i++)
{
usdest.clear();
uset_impl(usdest, N);
}
auto uset_ms = std::chrono::duration_cast<std::chrono::microseconds>(clock::now() - start).count();
std::vector<int> dest;
dest.reserve(N); // we can find this, for simplicity hardcode...
start = clock::now();
for (int i = 0; i < K; i++)
{
dest.clear();
vect_impl(dest, N);
}
auto vect_ms = std::chrono::duration_cast<std::chrono::microseconds>(clock::now() - start).count();
std::cout << "N = " << N << " Set implementation " << set_ms << " miliseconds\n";
std::cout << "N = " << N << " Unordered set implementation " << uset_ms << " miliseconds\n";
std::cout << "N = " << N << " Vect implementation " << vect_ms << " miliseconds\n";
}
return 0;
}

If you want to preserve the order of elements recieved from the external API and they are not sorted then I recommend you create a second vector which you keep sorted. Then do a lower_bound on the sorted vector and if the returned iterator is not the value insert in both the target and sorted vectors (using the returned iterator as the insert position in the sorted vector). Using set or unordered set for integers is likely to be very much slower (probably orders of magnitude slower). If you don't care about the order then use a single sorted vector.
vector<int> sorted;
....
vector<int> src = getNextVector(i);
for( int i : src ) {
auto itr = std::lower_bound( sorted.begin(), sorted.end(), i );
if( *itr != i ) {
sorted.insert(itr, i);
integers.push_back(i);
}
}
If you know the values from each call to getNextVector are unique then you could do something like the following (which might be faster.)
vector<int> sorted;
....
vector<int> src = getNextVector(i);
vector<int> usrc;
for( int i : src ) {
auto itr = std::lower_bound( sorted.begin(), sorted.end(), i );
if( *itr != i ) {
usrc.push_back(i);
integers.push_back(i);
}
}
sorted.insert(sorted.end(), usrc.begin(), usrc.end());
std::sort( sorted.begin(), sorted.end() );

[C++20 solution]
In order to merge multiples containers into a vector,
while removing duplicates, I'd use something simplier like that :
namespace mp {
template <typename T>
concept Container = requires (T value) { // very naive container requirements
typename T::value_type;
std::begin(value), std::end(value);
std::cbegin(value), std::cend(value);
};
}
template <mp::Container ... Ts>
requires
requires { typename std::common_type_t<typename Ts::value_type...>; }
auto merge_uniques(const Ts & ... names) {
using value_type = typename std::common_type_t<typename Ts::value_type...>;
auto value = std::unordered_set<value_type>{};
(value.insert(std::cbegin(names), std::cend(names)), ...);
return std::vector<value_type> {
std::move_iterator{std::begin(value)},
std::move_iterator{std::end(value)}
};
}

Related

How would you remove a number from an unsorted vector using only .at(), .push_back(), .size(), and/or .resize()?

I'm new to C++ and I'm trying to write a void function that will delete a duplicate from a vector while preserving the order of the vector. I'm having trouble deleting the number from my vector using just .at(), .push_back(), .size(), and .resize(). How would I go about doing this?
This is what I have so far:
void RemoveDuplicates(std::vector<int>& vector, int vectorSize)
{
int i;
int j;
std::vector<int> tempVec;
for (i = 0; i < vector.size(); i++)
{
for (j = 1; j < vector.size(); j++)
{
if (vector.at(i) == vector.at(j))
{
tempVec.push_back(vector.at(i)); //Unduplicated Vector
}
}
}
}
If I were to put "1 2 3 3" in this it returns the tempVec as "2 3 3 3 3." The expected result was just "1 2 3." How would I go about fixing this so that it deduplicates the vector using just those vector manipulators?

Here's a simple approach, not very efficient but easy to understand
void RemoveDuplicates(std::vector<int>& vector)
{
std::vector<int> tempVec;
for (size_t i = 0; i < vector.size(); i++)
{
// look for vector[i] in remainder of vector
bool found = false;
for (size_t j = i + 1; j < vector.size(); j++)
{
if (vector.at(i) == vector.at(j))
{
found = true;
break;
}
}
// if not found it's not a duplicate
if (!found)
tempVec.push_back(vector.at(i));
}
vector = tempVec;
}

Building on your current idea, you can compare each value in vector with all the values in tempVec. If it's not found in tempVec, add it.
I'm using range based for-loops to simplify the looping:
#include <utility> // std::move
void RemoveDuplicates(std::vector<int>& vector) {
std::vector<int> tempVec;
for(int in : vector) { // `in` will be assigned one value at a time from vector
bool found = false; // set to `true` if the `in` value has already been seen
for(int out : tempVec) { // range based for-loop again
if(in == out) { // oups, duplicate
found = true; // set to true to avoid storing it
break; // and abort this inner loop
}
}
// only stored values not found:
if(not found) tempVec.push_back(in);
}
// move assign the result to `vector`:
vector = std::move(tempVec);
}

You do not need to pass the size of the vector. A vector knows its size(). Your code is not actually removing anything from vector.
Use the tools available to you.
You can use std::unique to remove duplicate adjacent elements. After sorting the vector this will remove all duplicates:
void RemoveDuplicates(std::vector<int>& vector)
{
std::sort(vector.begin(),vector.end());
auto it = std::unique(vector.begin(),vector.end());
vector = std::vector<int>(vector.begin(),it);
}
A std::set stores only unique elements, hence can be used too:
void RemoveDuplicates2(std::vector<int>& vector)
{
std::set<int> s{vector.begin(),vector.end()};
vector = std::vector<int>(s.begin(),s.end());
}
If you want to keep the initial ordering of the elements you can still use a std::set:
void RemoveDuplicates3(std::vector<int>& vector)
{
std::set<int> s;
std::vector<int> result;
for (const auto& e : vector) {
if (s.insert(e).second) { // not a duplicate
result.push_back(e);
}
}
vector = result;
}
And very similar, by searching the elements not in the set but in the result vector:
void RemoveDuplicates4(std::vector<int>& vector)
{
std::vector<int> result;
for (const auto& e : vector) {
if (std::find(result.begin(),result.end(),e) == result.end()){
result.push_back(e);
}
}
vector = result;
}
Live Demo

For starters the second parameter of the function is redundant and not used within the function. Remove it. The function should be declared like
void RemoveDuplicates( std::vector<int> &vector );
Also you forgot to change the original vector.
And it seems you mean inequality in the if condition
if (vector.at(i) != vector.at(j))
{
tempVec.push_back(vector.at(i)); //Unduplicated Vector
}
instead of the equality
if (vector.at(i) == vector.at(j))
{
tempVec.push_back(vector.at(i)); //Unduplicated Vector
}
Though in any case it is an incorrect logic within the inner for loop.
Your inner for loop is always start from 1
for (j = 1; j < vector.size(); j++)
So for this source vector { 1, 2, 3, 3 } the value 2 will be pushed once on the tempVec and for each element with the value 3 there will be pushed two values equal to 3 to tempVec.
Using your approach the function can look the following way.
void RemoveDuplicates( std::vector<int> &vector )
{
std::vector<int> tempVec;
for ( std::vector<int>::size_type i = 0; i < vector.size(); i++ )
{
std::vector<int>::size_type j = 0;
while ( j < tempVec.size() && vector.at( i ) != tempVec.at( j ) )
{
++j;
}
if ( j == tempVec.size() )
{
tempVec.push_back( vector.at( i ) );
}
}
std::swap( vector, tempVec );
}
Here is a demonstration program.
#include <iostream>
#include <vector>
void RemoveDuplicates( std::vector<int> &vector )
{
std::vector<int> tempVec;
for ( std::vector<int>::size_type i = 0; i < vector.size(); i++ )
{
std::vector<int>::size_type j = 0;
while ( j < tempVec.size() && vector.at( i ) != tempVec.at( j ) )
{
++j;
}
if ( j == tempVec.size() )
{
tempVec.push_back( vector.at( i ) );
}
}
std::swap( vector, tempVec );
}
int main()
{
std::vector<int> v = { 1, 2, 3, 3 };
std::cout << v.size() << ": ";
for ( const auto &item : v )
{
std::cout << item << ' ';
}
std::cout << '\n';
RemoveDuplicates( v );
std::cout << v.size() << ": ";
for ( const auto &item : v )
{
std::cout << item << ' ';
}
std::cout << '\n';
}
The program output is
4: 1 2 3 3
3: 1 2 3

Resizing an array (including vectors) in-place and shifting items over by 1 each time a duplicate is found can be expensive.
If you can use a hash table based collection (i.e. unordered_set or unordered_map) to keep track of items already seen, you can have an O(N) based algorithm.
Aside from the std::unqiue solution already suggested, it's hard to beat this. std::unique is effectively the same thing.
void RemoveDuplicates(std::vector<int>& vec)
{
std::unordered_set<int> dupes;
std::vector<int> vecNew;
for (int x : vec)
{
if (dupes.insert(x).second)
{
vecNew.push_back(x);
}
}
vec = std::move(vecNew);
}

Removing duplicates from an vector is giving unexpected results

Given a vector of integers, iterate through the vector and check whether there are more than one of the same number. In that case, remove them so that only a single index of the vector contains that number. Here are a few examples:
vector<int> arr {1,1,1,1}
When arr is printed out the result should be 1.
vector<int> arr {1,2,1,2}
When arr is printed out the result should be 1,2.
vector<int> arr {1,3,2}
When arr is printed out the result should be 1,3,2.
I know there are many solutions regarding this, but I want to solve it using my method. The solutions I've looked at use a lot of built-in functions, which I don't want to get too comfortable with as a beginner. I want to practice my problem-solving skills.
This is my code:
#include <iostream>
#include <vector>
using namespace std;
int main()
{
vector<int> arr {1,1,1,1,2,1,1,1,1,1};
for (int i {}; i < arr.size(); ++i)
{
int counter {};
for (int j {}; j < arr.size(); ++j)
{
if (arr.at(i) == arr.at(j))
{
counter++;
if (counter > 1)
arr.erase(arr.begin()+j);
}
}
}
//Prints out the vector arr
for (auto value : arr)
{
cout << value << endl;
}
return 0;
}
The thing is that it works for the most part, except a few cases which have me confused.
For instance:
vector<int> arr {1,1,1,1,2,1,1,1,1,1}
When arr is printed out the result is 1,2,1 instead of 1,2.
However, in this case:
vector<int> arr {1,1,1,1,2,1,1,1}
When arr is printed out the result is 1,2.
It seems to work in the vast majority of cases, but when a number repeats itself a lot of times in the vector, it seems to not work, and I can't seem to find a reason for this.
I am now asking you to firstly tell me the cause of the problem, and then to give me guidance on how I should tackle this problem using my solution.

The machine I'm using has a pre C++11 compiler so this is an answer in the old fashioned C++. The easy way around this is to erase backwards. That way you don't have to worry about the size. Also, instead of using a for loop, which may be optimised, use a while loop.
#include <iostream>
#include <vector>
int main()
{
int dummy[] = {1,1,1,1,2,1,1,1,1,1};
std::vector<int> arr(dummy, dummy + sizeof(dummy)/sizeof(dummy[0]));
size_t ii = 0;
while (ii < arr.size())
{
// Save the value for a little efficiency
int value = arr[ii];
// Go through backwards only as far as ii.
for (size_t jj = arr.size() - 1; jj > ii; --jj)
{
if (value == arr[jj])
arr.erase(arr.begin() + jj);
}
++ii;
}
//Prints out the vector arr
for (size_t ii = 0; ii < arr.size(); ++ii)
{
std::cout << arr[ii] << std::endl;
}
return 0;
}

As mentioned in the comments, when you erase a found duplicate (at index j), you are potentially modifying the position of the element at index i.
So, after you have called arr.erase(arr.begin() + j), you need to adjust i accordingly, if it was referring to an element that occurs after the removed element.
Here's a "quick fix" for your function:
#include <iostream>
#include <vector>
int main()
{
std::vector<int> arr{ 1,1,1,1,2,1,1,1,1,1 };
for (size_t i{}; i < arr.size(); ++i) {
int counter{};
for (size_t j{}; j < arr.size(); ++j) {
if (arr.at(i) == arr.at(j)) {
counter++;
if (counter > 1) {
arr.erase(arr.begin() + j);
if (i >= j) --i; // Adjust "i" if it's after the erased element.
}
}
}
}
//Prints out the vector arr
for (auto value : arr) {
std::cout << value << std::endl;
}
return 0;
}
As also mentioned in the comments, there are other ways of making the function more efficient; however, you have stated that you want to "practice your own problem-solving skills" (which is highly commendable), so I shall stick to offering a fix for your immediate issue.

This inner for loop is incorrect
int counter {};
for (int j {}; j < arr.size(); ++j)
{
if (arr.at(i) == arr.at(j))
{
counter++;
if (counter > 1)
arr.erase(arr.begin()+j);
}
}
If an element was removed the index j shall not be increased. Otherwise the next element after deleted will be bypassed because all elements after the deleted element in the vector are moved one position left.
Using the variable counter is redundant. Just start the inner loop with j = i + 1.
Using your approach the program can look the following way
#include <iostream>
#include <vector>
int main()
{
std::vector<int> arr{ 1,1,1,1,2,1,1,1,1,1 };
for ( decltype( arr )::size_type i = 0; i < arr.size(); ++i)
{
for ( decltype( arr )::size_type j = i + 1; j < arr.size(); )
{
if (arr.at( i ) == arr.at( j ))
{
arr.erase( arr.begin() + j );
}
else
{
j++;
}
}
}
//Prints out the vector arr
for (auto value : arr)
{
std::cout << value << std::endl;
}
}
The program output is
1
2
This approach when each duplicated element is deleted separately is inefficient. It is better to use the so called erase-remove idiom.
Here is a demonstration program.
#include <iostream>
#include <vector>
#include <iterator>
#include <algorithm>
int main()
{
std::vector<int> arr{ 1,1,1,1,2,1,1,1,1,1 };
for (auto first = std::begin( arr ); first != std::end( arr ); ++first)
{
arr.erase( std::remove( std::next( first ), std::end( arr ), *first ), std::end( arr ) );
}
//Prints out the vector arr
for (auto value : arr)
{
std::cout << value << std::endl;
}
}

Keep the duplicated values only - Vectors C++

Assume I have a vector with the following elements {1, 1, 2, 3, 3, 4}
I want to write a program with c++ code to remove the unique values and keep only the duplicated once. So the end result will be something like this {1,3}.
So far this is what I've done, but it takes a lot of time,
Is there any way this can be more efficient,
vector <int> g1 = {1,1,2,3,3,4}
vector <int> g2;
for(int i = 0; i < g1.size(); i++)
{
if(count(g1.begin(), g1.end(), g1[i]) > 1)
g2.push_back(g1[i]);
}
v.erase(std::unique(g2.begin(), g2.end()), g2.end());
for(int i = 0; i < g2.size(); i++)
{
cout << g2[i];
}

My approach is to create an <algorithm>-style template, and use an unordered_map to do the counting. This means you only iterate over the input list once, and the time complexity is O(n). It does use O(n) extra memory though, and isn't particularly cache-friendly. Also this does assume that the type in the input is hashable.
#include <algorithm>
#include <iostream>
#include <iterator>
#include <unordered_map>
template <typename InputIt, typename OutputIt>
OutputIt copy_duplicates(
InputIt first,
InputIt last,
OutputIt d_first)
{
std::unordered_map<typename std::iterator_traits<InputIt>::value_type,
std::size_t> seen;
for ( ; first != last; ++first) {
if ( 2 == ++seen[*first] ) {
// only output on the second time of seeing a value
*d_first = *first;
++d_first;
}
}
return d_first;
}
int main()
{
int i[] = {1, 2, 3, 1, 1, 3, 5}; // print 1, 3,
// ^ ^
copy_duplicates(std::begin(i), std::end(i),
std::ostream_iterator<int>(std::cout, ", "));
}
This can output to any kind of iterator. There are special iterators you can use that when written to will insert the value into a container.

Here's a way that's a little more cache friendly than unordered_map answer, but is O(n log n) instead of O(n), though it does not use any extra memory and does no allocations. Additionally, the overall multiplier is probably higher, in spite of it's cache friendliness.
#include <vector>
#include <algorithm>
void only_distinct_duplicates(::std::vector<int> &v)
{
::std::sort(v.begin(), v.end());
auto output = v.begin();
auto test = v.begin();
auto run_start = v.begin();
auto const end = v.end();
for (auto test = v.begin(); test != end; ++test) {
if (*test == *run_start) {
if ((test - run_start) == 1) {
*output = *run_start;
++output;
}
} else {
run_start = test;
}
}
v.erase(output, end);
}
I've tested this, and it works. If you want a generic version that should work on any type that vector can store:
template <typename T>
void only_distinct_duplicates(::std::vector<T> &v)
{
::std::sort(v.begin(), v.end());
auto output = v.begin();
auto test = v.begin();
auto run_start = v.begin();
auto const end = v.end();
for (auto test = v.begin(); test != end; ++test) {
if (*test != *run_start) {
if ((test - run_start) > 1) {
::std::swap(*output, *run_start);
++output;
}
run_start = test;
}
}
if ((end - run_start) > 1) {
::std::swap(*output, *run_start);
++output;
}
v.erase(output, end);
}

Assuming the input vector is not sorted, the following will work and is generalized to support any vector with element type T. It will be more efficient than the other solutions proposed so far.
#include <algorithm>
#include <iostream>
#include <vector>
template<typename T>
void erase_unique_and_duplicates(std::vector<T>& v)
{
auto first{v.begin()};
std::sort(first, v.end());
while (first != v.end()) {
auto last{std::find_if(first, v.end(), [&](int i) { return i != *first; })};
if (last - first > 1) {
first = v.erase(first + 1, last);
}
else {
first = v.erase(first);
}
}
}
int main(int argc, char** argv)
{
std::vector<int> v{1, 2, 3, 4, 5, 2, 3, 4};
erase_unique_and_duplicates(v);
// The following will print '2 3 4'.
for (int i : v) {
std::cout << i << ' ';
}
std::cout << '\n';
return 0;
}

I have 2 improvements for you:
You can change your count to start at g1.begin() + i, everything before was handled by the previous iterations of the loop.
You can change the if to == 2 instead of > 1, so it will add numbers only once, independent of the occurences. If a number is 5 times in the vector, the first 3 will be ignored, the 4th will make it into the new vector and the 5th will be ignored again. So you can remove the erase of the duplicates
Example:
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
int main() {
vector <int> g1 = {1,1,2,3,3,1,4};
vector <int> g2;
for(int i = 0; i < g1.size(); i++)
{
if(count(g1.begin() + i, g1.end(), g1[i]) == 2)
g2.push_back(g1[i]);
}
for(int i = 0; i < g2.size(); i++)
{
cout << g2[i] << " ";
}
cout << endl;
return 0;
}

I'll borrow a principal from Python which is excellent for such operations -
You can use a dictionary where the dictionary-key is the item in the vector and the dictionary-value is the count (start with 1 and increase by one every time you encounter a value that is already in the dictionary).
afterward, create a new vector (or clear the original) with only the dictionary keys that are larger than 1.
Look up in google - std::map
Hope this helps.

In general, that task got complexity about O(n*n), that's why it appears slow. Does it have to be a vector? Is that a restriction? Must it be ordered? If not, it better to actually store values as std::map, which eliminates doubles when populated, or as a std::multimap if presence of doubles matters.

What's the most efficient way to print all elements of vector in ascending order till it's empty without duplicates?

I'm supposed to:
Print vector elements sorted without repetition.
Delete the elements that are printed from vector.
Repeat the the previous steps until vector is empty.
But it seems that my code takes more time so, I seek for optimisation. I've tried to do this task with std::vector and std::set.
Here is my approach:
#include <iostream>
#include <algorithm>
#include <vector>
#include <set>
using namespace std;
int main () {
int n;
cin >> n;
vector<int> v(n);
set<int> st;
for (int i = 0; i < n; i++) {
cin >> v[i];
}
while (!v.empty()) {
for (int i = 0; i < v.size(); i++)
st.insert(v[i]);
for (auto x : st) {
cout << x << ' ';
auto it = find(v.begin(), v.end(), x);
if (it != v.end())
v.erase(it);
}
st.clear();
cout << "\n";
}
return 0;
}
For example input is like:
7
1 2 3 3 2 4 3
Output gonna be like this:
1 2 3 4
2 3
3

You might use std::map instead of std::vector/std::set to keep track of numbers:
#include <iostream>
#include <map>
int main () {
map<int, int> m;
int size;
std::cin >> size;
for (int i = 0; i != size; i++) {
int number;
std::cin >> number;
++m[number];
}
while (!m.empty()) {
for (auto it = m.begin(); it != m.end(); /*Empty*/) {
const auto number = it->first;
auto& count = it->second;
std::cout << number << ' ';
if (--count == 0) {
it = m.erase(it);
} else {
++it;
}
}
std::cout << "\n";
}
}
Complexity is now O(n log(n)) instead of O(n²) (with lot of internal allocations).

Due to it overwriting the elements expected to be deleted, std::unique won't be much use for this problem. My solution:
std::sort(v.begin(), v.end());
while (!v.empty())
{
int last = v.front();
std::cout << last << " ";
v.erase(v.begin());
for (auto it = v.begin(); it != v.end(); /* no-op */)
{
if (*it == last)
{
++it;
}
else
{
last = *it;
std::cout << last << " ";
it = v.erase(it);
}
}
std::cout << std::endl;
}
You could probably improve performance further by reversing the sorting of the vector, and then iterating through backwards (since it's cheaper to delete from closer to the back of the vector), but that would complicate the code further, so I'll say "left as an exercise for the reader".

You can use std::map
auto n = 0;
std::cin >> n;
std::map<int, int> mp;
while (--n >= 0) {
auto i = 0;
std::cin >> i;
mp[i] += 1;
}
while (!mp.empty()) {
for (auto& it: mp) {
std::cout << it.first << " ";
it.second--;
}
for (auto it = mp.begin(); it != mp.end(); ++it) {
if (it->second == 0) mp.erase(it);
}
std::cout << "\n";
}
without any erase
auto n = 0;
std::cin >> n;
std::map<int, int> mp;
while (--n >= 0) {
auto i = 0;
std::cin >> i;
mp[i] += 1;
}
auto isDone = false;
while (!isDone) {
isDone = true;
for (auto& it: mp) {
if (it.second > 0) std::cout << it.first << " ";
if (--it.second > 0) isDone = false;
}
std::cout << "\n";
}

Here is a solution using sort and vector. It uses a second vector to hold the unique items and print them.
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
int main()
{
std::vector<int> v{1,2,3,3,2,4,3};
std::sort(v.begin(), v.end());
std::vector<int>::iterator vit;
while(!v.empty()){
std::vector<int> printer;
std::vector<int>::iterator pit;
vit = v.begin();
while (vit != v.end()){
pit = find(printer.begin(), printer.end(), *vit);
if (pit == printer.end()){
printer.push_back(*vit);
vit = v.erase(vit);
} else {
++vit;
}
}
std::copy(printer.begin(), printer.end(), std::ostream_iterator<int>(std::cout, " "));
std::cout << '\n';
}
}
Output:
1 2 3 4
2 3
3

It's not clear (at least to me) exactly what you're talking about when you mention "efficiency". Some people use it to refer solely to computational complexity. Others think primarily in terms of programmer's time, while still others think of overall execution speed, regardless of whether that's obtained via changes in computational complexity, or (for one example) improved locality of reference leading to better cache utilization.
So, with that warning, I'm not sure whether this really improves what you care about or not, but it's how I think I'd do the job anyway:
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
// preconditions: input range is sorted
template <class BidiIt>
BidiIt partition_unique(BidiIt begin, BidiIt end) {
auto pivot = end;
for (auto pos = begin; pos != pivot; ++pos) {
auto mid = std::next(pos);
for ( ; mid < pivot && *mid == *pos; ++mid, --pivot)
;
std::rotate(std::next(pos), mid, end);
}
return pivot;
}
template <class It>
void show(It b, It e, std::ostream &os) {
while (b != e) {
os << *b << ' ';
++b;
}
os << '\n';
}
int main() {
std::vector<int> input{ 1, 2, 3, 3, 2, 4, 3 };
std::sort(input.begin(), input.end());
auto begin = input.begin();
auto pos = begin;
while ((pos = partition_unique(begin, input.end())) != input.end()) {
show(begin, pos, std::cout);
begin = pos;
}
show(begin, input.end(), std::cout);
}
I'm not really sure it's possible to improve the computational complexity much over what this does (but it might be--I haven't thought about it enough to be sure one way or the other). Compared to some versions I see posted already, there's a decent chance this will improve overall speed (e.g., since it just moves things around inside the same vector, it's likely to get better locality than those that copy data from one vector to another.

The code is in java but the idea remains the same.
At first, I sort the array. Now, the idea is to create buckets.
This means that each line of sorted elements is like a bucket. So, find the count of each element. Now, put that element into each bucket, count number of times. If it so happens that bucket size is less, create a new bucket and add the current element to it.
In the end, print all buckets.
Time Complexity is O(nlog(n)) for sorting and O(n) for the buckets since you have to visit each and every element to print it. So, it's O(nlog(n)) + O(n) = O(nlog(n)) asymptotically.
Code:
import java.util.*;
public class GFG {
public static void main(String[] args){
int[] arr1 = {1,2,3,3,2,4,3};
int[] arr2 = {45,98,65,32,65,74865};
int[] arr3 = {100,100,100,100,100};
int[] arr4 = {100,200,300,400,500};
printSeries(compute(arr1,arr1.length));
printSeries(compute(arr2,arr2.length));
printSeries(compute(arr3,arr3.length));
printSeries(compute(arr4,arr4.length));
}
private static void printSeries(List<List<Integer>> res){
int size = res.size();
for(int i=0;i<size;++i){
System.out.println(res.get(i).toString());
}
}
private static List<List<Integer>> compute(int[] arr,int N){
List<List<Integer>> buckets = new ArrayList<List<Integer>>();
Arrays.sort(arr);
int bucket_size = 0;
for(int i=0;i<N;++i){
int last_index = i;
if(bucket_size > 0){
buckets.get(0).add(arr[i]);
}else{
buckets.add(newBucket(arr[i]));
bucket_size++;
}
for(int j=i+1;j<N;++j){
if(arr[i] != arr[j]) break;
if(j-i < bucket_size){
buckets.get(j-i).add(arr[i]);
}else{
buckets.add(newBucket(arr[i]));
bucket_size++;
}
last_index = j;
}
i = last_index;
}
return buckets;
}
private static List<Integer> newBucket(int value){
List<Integer> new_bucket = new ArrayList<>();
new_bucket.add(value);
return new_bucket;
}
}
OUTPUT
[1, 2, 3, 4]
[2, 3]
[3]
[32, 45, 65, 98, 74865]
[65]
[100]
[100]
[100]
[100]
[100]
[100, 200, 300, 400, 500]

This is what i came up with:
http://coliru.stacked-crooked.com/a/b3f06693a74193e5
The key idea:
sort vector
print by iterating. just print a value if it differs from last printed
remove unique elements. i have done this with what i called inverse_unique. the std library comes with an algorithm called unique, which will remove all duplicates. i inverted this so that it will just keep all dublicates.
so we have no memory allocation at all. i cant see how one could make the algorithm more efficient. we are just doing the bare minimum and its exactly done the way a human thinks about.
i tested it with several combinations. hope its bug free ;-P
code:
#include <iostream>
#include <algorithm>
#include <vector>
template<class ForwardIt>
ForwardIt inverse_unique(ForwardIt first, ForwardIt last)
{
if (first == last)
return last;
auto one_ahead = first+1;
auto dst = first;
while(one_ahead != last)
{
if(*first == *one_ahead)
{
*dst = std::move(*first);
++dst;
}
++first;
++one_ahead;
}
return dst;
}
void print_unique(std::vector<int> const& v)
{
if(v.empty()) return;
// print first
std::cout << v[0] << ' ';
auto last_printed = v.cbegin();
// print others
for(auto it = std::next(std::cbegin(v)); it != std::cend(v); ++it)
{
if(*it != *last_printed)
{
std::cout << *it << ' ';
last_printed = it;
}
}
std::cout << "\n";
}
void remove_uniques(std::vector<int> & v)
{
auto new_end = inverse_unique(std::begin(v), std::end(v));
v.erase(new_end, v.end());
}
int main ()
{
std::vector<int> v = {1, 2, 3, 3, 2, 4, 3};
std::sort(std::begin(v), std::end(v));
while (!v.empty())
{
print_unique(v);
remove_uniques(v);
}
return 0;
}
Edit: updated inverse_unique function. should be easy to understand now.

Half baked at http://coliru.stacked-crooked.com/a/c45df1591d967075
Slightly modified counting sort.
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
#include <map>
int main() {
std::vector<int> v{1,2,3,3,2,4,3};
std::map<int, int> map;
for (auto x : v)
++map[x];
while(map.size()) {
for(auto pair = map.begin(); pair != map.end(); ) {
std::cout << pair->first << ' ';
if (!--pair->second)
pair = map.erase(pair);
else
++pair;
}
std::cout << "\n";
}
return 0;
}

indices of the k largest elements in an unsorted length n array

I need to find the indices of the k largest elements of an unsorted, length n, array/vector in C++, with k < n. I have seen how to use nth_element() to find the k-th statistic, but I'm not sure if using this is the right choice for my problem as it seems like I would need to make k calls to nth_statistic, which I guess it would have complexity O(kn), which may be as good as it can get? Or is there a way to do this just in O(n)?
Implementing it without nth_element() seems like I will have to iterate over the whole array once, populating a list of indices of the largest elements at each step.
Is there anything in the standard C++ library that makes this a one-liner or any clever way to implement this myself in just a couple lines? In my particular case, k = 3, and n = 6, so efficiency isn't a huge concern, but it would be nice to find a clean and efficient way to do this for arbitrary k and n.
It looks like Mark the top N elements of an unsorted array is probably the closest posting I can find on SO, the postings there are in Python and PHP.

This should be an improved version of #hazelnusse which is executed in O(nlogk) instead of O(nlogn)
#include <queue>
#include <iostream>
#include <vector>
// maxindices.cc
// compile with:
// g++ -std=c++11 maxindices.cc -o maxindices
int main()
{
std::vector<double> test = {2, 8, 7, 5, 9, 3, 6, 1, 10, 4};
std::priority_queue< std::pair<double, int>, std::vector< std::pair<double, int> >, std::greater <std::pair<double, int> > > q;
int k = 5; // number of indices we need
for (int i = 0; i < test.size(); ++i) {
if(q.size()<k)
q.push(std::pair<double, int>(test[i], i));
else if(q.top().first < test[i]){
q.pop();
q.push(std::pair<double, int>(test[i], i));
}
}
k = q.size();
std::vector<int> res(k);
for (int i = 0; i < k; ++i) {
res[k - i - 1] = q.top().second;
q.pop();
}
for (int i = 0; i < k; ++i) {
std::cout<< res[i] <<std::endl;
}
}
8
4
1
2
6

Here is my implementation that does what I want and I think is reasonably efficient:
#include <queue>
#include <vector>
// maxindices.cc
// compile with:
// g++ -std=c++11 maxindices.cc -o maxindices
int main()
{
std::vector<double> test = {0.2, 1.0, 0.01, 3.0, 0.002, -1.0, -20};
std::priority_queue<std::pair<double, int>> q;
for (int i = 0; i < test.size(); ++i) {
q.push(std::pair<double, int>(test[i], i));
}
int k = 3; // number of indices we need
for (int i = 0; i < k; ++i) {
int ki = q.top().second;
std::cout << "index[" << i << "] = " << ki << std::endl;
q.pop();
}
}
which gives output:
index[0] = 3
index[1] = 1
index[2] = 0

The question has the partial answer; that is std::nth_element returns the "the n-th statistic" with a property that none of the elements preceding nth one are greater than it, and none of the elements following it are less.
Therefore, just one call to std::nth_element is enough to get the k largest elements. Time complexity will be O(n) which is theoretically the smallest since you have to visit each element at least one time to find the smallest (or in this case k-smallest) element(s). If you need these k elements to be ordered, then you need to order them which will be O(k log(k)). So, in total O(n + k log(k)).

You can use the basis of the quicksort algorithm to do what you need, except instead of reordering the partitions, you can get rid of the entries falling out of your desired range.
It's been referred to as "quick select" and here is a C++ implementation:
int partition(int* input, int p, int r)
{
int pivot = input[r];
while ( p < r )
{
while ( input[p] < pivot )
p++;
while ( input[r] > pivot )
r--;
if ( input[p] == input[r] )
p++;
else if ( p < r ) {
int tmp = input[p];
input[p] = input[r];
input[r] = tmp;
}
}
return r;
}
int quick_select(int* input, int p, int r, int k)
{
if ( p == r ) return input[p];
int j = partition(input, p, r);
int length = j - p + 1;
if ( length == k ) return input[j];
else if ( k < length ) return quick_select(input, p, j - 1, k);
else return quick_select(input, j + 1, r, k - length);
}
int main()
{
int A1[] = { 100, 400, 300, 500, 200 };
cout << "1st order element " << quick_select(A1, 0, 4, 1) << endl;
int A2[] = { 100, 400, 300, 500, 200 };
cout << "2nd order element " << quick_select(A2, 0, 4, 2) << endl;
int A3[] = { 100, 400, 300, 500, 200 };
cout << "3rd order element " << quick_select(A3, 0, 4, 3) << endl;
int A4[] = { 100, 400, 300, 500, 200 };
cout << "4th order element " << quick_select(A4, 0, 4, 4) << endl;
int A5[] = { 100, 400, 300, 500, 200 };
cout << "5th order element " << quick_select(A5, 0, 4, 5) << endl;
}
OUTPUT:
1st order element 100
2nd order element 200
3rd order element 300
4th order element 400
5th order element 500
EDIT
That particular implementation has an O(n) average run time; due to the method of selection of pivot, it shares quicksort's worst-case run time. By optimizing the pivot choice, your worst case also becomes O(n).

The standard library won't get you a list of indices (it has been designed to avoid passing around redundant data). However, if you're interested in n largest elements, use some kind of partitioning (both std::partition and std::nth_element are O(n)):
#include <iostream>
#include <algorithm>
#include <vector>
struct Pred {
Pred(int nth) : nth(nth) {};
bool operator()(int k) { return k >= nth; }
int nth;
};
int main() {
int n = 4;
std::vector<int> v = {5, 12, 27, 9, 4, 7, 2, 1, 8, 13, 1};
// Moves the nth element to the nth from the end position.
std::nth_element(v.begin(), v.end() - n, v.end());
// Reorders the range, so that the first n elements would be >= nth.
std::partition(v.begin(), v.end(), Pred(*(v.end() - n)));
for (auto it = v.begin(); it != v.end(); ++it)
std::cout << *it << " ";
std::cout << "\n";
return 0;
}

You can do this in O(n) time with a single order statistic calculation:
Let r be the k-th order statistic
Initialize two empty lists bigger and equal.
For each index i:
If array[i] > r, add i to bigger
If array[i] = r, add i to equal
Discard elements from equal until the sum of the lengths of the two lists is k
Return the concatenation of the two lists.
Naturally, you only need one list if all items are distinct. And if needed, you could do tricks to combine the two lists into one, although that would make the code more complicated.

Even though the following code might not fulfill the desired complexity constraints it might be an interesting alternative for the before-mentioned priority queue.
#include <queue>
#include <vector>
#include <iostream>
#include <iterator>
#include <algorithm>
std::vector<int> largestIndices(const std::vector<double>& values, int k) {
std::vector<int> ret;
std::vector<std::pair<double, int>> q;
int index = -1;
std::transform(values.begin(), values.end(), std::back_inserter(q), [&](double val) {return std::make_pair(val, ++index); });
auto functor = [](const std::pair<double, int>& a, const std::pair<double, int>& b) { return b.first > a.first; };
std::make_heap(q.begin(), q.end(), functor);
for (auto i = 0; i < k && i<values.size(); i++) {
std::pop_heap(q.begin(), q.end(), functor);
ret.push_back(q.back().second);
q.pop_back();
}
return ret;
}
int main()
{
std::vector<double> values = { 7,6,3,4,5,2,1,0 };
auto ret=largestIndices(values, 4);
std::copy(ret.begin(), ret.end(), std::ostream_iterator<int>(std::cout, "\n"));
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

c++ concatenate many std::vectors and remove duplicates - c++

Related

How would you remove a number from an unsorted vector using only .at(), .push_back(), .size(), and/or .resize()?

Removing duplicates from an vector is giving unexpected results

Keep the duplicated values only - Vectors C++

What's the most efficient way to print all elements of vector in ascending order till it's empty without duplicates?

indices of the k largest elements in an unsorted length n array

Categories

Resources