How to draw a sample of n elements from std::set

How to draw a sample of n elements from std::set - c++

I have the following function to pick a random element from a std::set:
int pick_random(const std::set<int>& vertex_set) {
std::uniform_int_distribution<std::set<int>::size_type> dist(0, vertex_set.size() - 1);
const std::set<int>::size_type rand_idx = dist(mt);
std::set<int>::const_iterator it = vertex_set.begin();
for (std::set<int>::size_type i = 0; i < rand_idx; i++) {
it++;
}
return *it;
}
However, I wonder how to properly draw a sample of n elements from a set. With C++17 compiler I can use std::sample function, but this is not the case here because I have C++11 compiler.

If you don't mind the copy, an easy way is to create a std::vector from your std::set, shuffle it using std::shuffle and then takes the first n elements:
std::vector<int> pick_random_n(const std::set<int>& vertex_set, std::size_t n) {
std::vector<int> vec(std::begin(vertex_set), std::end(vertex_set));
std::shuffle(std::begin(vec), std::end(vec), mt);
vec.resize(std::min(n, vertex_set.size()));
return vec;
}
If you do not want the extra-copy, you could look at the implementation of std::sample, from, e.g., libc++ and implement your own for std::set:
std::vector<int> pick_random_n(const std::set<int>& vertex_set, std::size_t n) {
auto unsampled_sz = vertex_set.size();
auto first = std::begin(vertex_set);
std::vector<int> vec;
vec.reserve(std::min(n, unsampled_sz));
for (n = std::min(n, unsampled_sz); n != 0; ++first) {
auto r =
std::uniform_int_distribution<std::size_t>(0, --unsampled_sz)(mt);
if (r < n) {
vec.push_back(*first);
--n;
}
}
return vec;
}

Related

Deleting both an element and its duplicates in a Vector in C++

I've searched the Internet and known how to delete an element (with std::erase) and finding duplicates of an element to then delete it (vec.erase(std::unique(vec.begin(), vec.end()),vec.end());). But all methods only delete either an element or its duplicates.
I want to delete both.
For example, using this vector:
std::vector<int> vec = {2,3,1,5,2,2,5,1};
I want output to be:
{3}
My initial idea was:
void removeDuplicatesandElement(std::vector<int> &vec)
{
std::sort(vec.begin(), vec.end());
int passedNumber = 0; //To tell amount of number not deleted (since not duplicated)
for (int i = 0; i != vec.size(); i = passedNumber) //This is not best practice, but I tried
{
if (vec[i] == vec[i+1])
{
int ctr = 1;
for(int j = i+1; j != vec.size(); j++)
{
if (vec[j] == vec[i]) ctr++;
else break;
}
vec.erase(vec.begin()+i, vec.begin()+i+ctr);
}
else passedNumber++;
}
}
And it worked. But this code is redundant and runs at O(n^2), so I'm trying to find other ways to solve the problem (maybe an STL function that I've never heard of, or just improve the code).

Something like this, perhaps:
void removeDuplicatesandElement(std::vector<int> &vec) {
if (vec.size() <= 1) return;
std::sort(vec.begin(), vec.end());
int cur_val = vec.front() - 1;
auto pred = [&](const int& val) {
if (val == cur_val) return true;
cur_val = val;
// Look ahead to the next element to see if it's a duplicate.
return &val != &vec.back() && (&val)[1] == val;
};
vec.erase(std::remove_if(vec.begin(), vec.end(), pred), vec.end());
}
Demo
This relies heavily on the fact that std::vector is guaranteed to have contiguous storage. It won't work with any other container.

You can do it using STL maps as follows:
#include <iostream>
#include <vector>
#include <unordered_map>
using namespace std;
void retainUniqueElements(vector<int> &A){
unordered_map<int, int> Cnt;
for(auto element:A) Cnt[element]++;
A.clear(); //removes all the elements of A
for(auto i:Cnt){
if(i.second == 1){ // that if the element occurs once
A.push_back(i.first); //then add it in our vector
}
}
}
int main() {
vector<int> vec = {2,3,1,5,2,2,5,1};
retainUniqueElements(vec);
for(auto i:vec){
cout << i << " ";
}
cout << "\n";
return 0;
}
Output:
3
Time Complexity of the above approach: O(n)
Space Complexity of the above approach: O(n)

From what you have searched, we can look in the vector for duplicated values, then use the Erase–remove idiom to clean up the vector.
#include <vector>
#include <algorithm>
#include <iostream>
void removeDuplicatesandElement(std::vector<int> &vec)
{
std::sort(vec.begin(), vec.end());
if (vec.size() < 2)
return;
for (int i = 0; i < vec.size() - 1;)
{
// This is for the case we emptied our vector
if (vec.size() < 2)
return;
// This heavily relies on the fact that this vector is sorted
if (vec[i] == vec[i + 1])
vec.erase(std::remove(vec.begin(), vec.end(), (int)vec[i]), vec.end());
else
i += 1;
}
// Since all duplicates are removed, the remaining elements in the vector are unique, thus the size of the vector
// But we are not returning anything or any reference, so I'm just gonna leave this here
// return vec.size()
}
int main()
{
std::vector<int> vec = {2, 3, 1, 5, 2, 2, 5, 1};
removeDuplicatesandElement(vec);
for (auto i : vec)
{
std::cout << i << " ";
}
std::cout << "\n";
return 0;
}
Output: 3
Time complexity: O(n)

Create a vector with repeated entries of each Element

I want to expand a given vector by repeating every entry three times. For example, if the vector is [5,7]. The output vector should be [5 5 5 7 7 7]
#include<iostream.h>
#include<vector.h>
int main(void)
{
std::vector<int> x;
x.push_back(5);
x.push_back(7);
x.insert(x.end(), x.begin(), x.begin() + 1);
return 0;
}
This didnt work out. Any help would be appreciated.

A simple approach is to loop over the vector and create a new one:
std::vector<int> vec{5, 7};
// create a new vector
std::vector<int> new_vec;
new_vec.reserve(vec.size() * 3);
for (auto elem : vec) {
for (std::size_t i = 0; i < 3; ++i) {
new_vec.push_back(elem);
}
}

I don't know how simple code you want, but for example this works.
#include<iostream>
#include<vector>
int main(void)
{
std::vector<int> x;
x.push_back(5);
x.push_back(7);
for (std::vector<int>::iterator it = x.end(); it != x.begin(); )
{
it--;
it = x.insert(it, 2, *it);
}
// print the vector to check
for (size_t i = 0; i < x.size(); i++) std::cout << x[i] << " ";
std::cout << std::endl;
return 0;
}

Maybe something like this could help you achieve that:
#include <iostream>
#include <algorithm>
#include <vector>
template<typename T>
std::vector<T> RepeateEntryNumberOfTimes(std::vector<T> input, std::uint16_t numberOfTimes)
{
std::vector<T> result;
std::for_each(input.begin(), input.end(), [&result, numberOfTimes](T item){
for(std::uint16_t numberOfReps = 0; numberOfReps != numberOfTimes; ++numberOfReps)
{
result.push_back(item);
}
});
return result;
}
See godbolt example: https://godbolt.org/z/ns9o3b

Your code has problem since it inserting elements to same vector.
Modification of vector invalidates old iterators, so your code has undefined behavior.
Even ignoring this error, logic of your code doesn't seem to do what you are expecting.
template<typename In, typename Out>
Out replicate_elements(In b, In e, size_t n, Out o)
{
while(b != e) {
o = std::fill_n(o, n, *b++);
}
return o;
}
std::vector<int> foo(const std::vector<int>& x)
{
std::vector<int> r;
r.reserve(x.size() * 3);
replicate_elements(x.begin(), x.end(), 3, std::back_inserter(r));
return r;
}
https://www.godbolt.org/z/zvE5TG

c++ concatenate many std::vectors and remove duplicates

I am receiving some integers from an external API (std::vector).
The API usually needs to be called multiple times, so I need to accumulate all the integers from the consecutive API calls to a local vector. At the end every element of the array must be unique (does not need to be sorted).
My code is below (uses the getNextVector to "mock" data and simulate the API call).
The code works, however I want maximum performance for this operation. Is my approach the right one?
#include <vector>
#include <iostream>
#include <iterator>
#include <algorithm>
std::vector<int> getNextVector(int i) {
if ( i == 0 ) {
std::vector<int> v = { 1,2,3 };
return v;
} else if ( i == 1 ) {
std::vector<int> v = { 3,4,5 };
return v;
} else if ( i == 2 ) {
std::vector<int> v = { 5,6,7 };
return v;
} else if ( i == 3 ) {
std::vector<int> v = { 7,8,9 };
return v;
}
}
int count() { return 4; } //we have four vectors
int main(int argc, char** argv) {
std::vector<int> dest;
dest.reserve(20); // we can find this, for simplicity hardcode...
for( int i = 0; i < count(); i++ ) {
std::vector<int> src = getNextVector(i);
dest.insert(
dest.end(),
std::make_move_iterator(src.begin()),
std::make_move_iterator(src.end())
);
}
std::sort(dest.begin(), dest.end());
dest.erase(unique(dest.begin(), dest.end()), dest.end());
/*
std::copy(
dest.begin(),
dest.end(),
std::ostream_iterator<int>(std::cout, "\n")
);
*/
return 0;
}

I think you can store the elements of the vector in a set. If ordering is not needed you can use unordered_set. Simply do the following -
std::unordered_set<int> integers;
for (int i = 0; i < count; i++) {
std::vector<int> src = getNextVector(i);
for (int j = 0; j < src.size(); j++) {
integers.insert(src[i]);
}
}
Or as suggested by #StoryTeller, you can use an appropriate function instead of the loop. For example -
std::unordered_set<int> integers;
for (int i = 0; i < count; i++) {
std::vector<int> src = getNextVector(i);
integers.insert(src.begin(), src.end());
}

My first thought was about "It can be done fast and easly with unordered_set", later I realised that it will not help too much with ints (hash of int is still int, so I don't see here performance increase). So, lastly I decided to benchmark it and my results are:
N = 4 Set implementation 304703 miliseconds
N = 4 Unordered set implementation 404469 miliseconds
N = 4 Vect implementation 91769 miliseconds
N = 20 Set implementation 563320 miliseconds
N = 20 Unordered set implementation 398049 miliseconds
N = 20 Vect implementation 176558 miliseconds
N = 40 Set implementation 569628 miliseconds
N = 40 Unordered set implementation 420496 miliseconds
N = 40 Vect implementation 207368 miliseconds
N = 200 Set implementation 639829 miliseconds
N = 200 Unordered set implementation 456763 miliseconds
N = 200 Vect implementation 245343 miliseconds
N = 2000 Set implementation 728753 miliseconds
N = 2000 Unordered set implementation 499716 miliseconds
N = 2000 Vect implementation 303813 miliseconds
N = 20000 Set implementation 760176 miliseconds
N = 20000 Unordered set implementation 480219 miliseconds
N = 20000 Vect implementation 331941 miliseconds
So, apperently, for samples you gave us here you implementation is the fastest one. This is case when your API returns only few possible vector combinations and number of iterations is small. I've decided to verify what happends when you have more different values via rand() for N > 4 (*). And it keeps it that way. Unordered set is the slowest one (hash calculation cost).
So, to answer your question: benchmark you case on your own - this is the best way to determine which is the fastest one.
(*) Bad randomness of rand() is not bug, but a feature here.
EDIT:
My answer does not provide not says there are no faster algorithms - I've benchmarked STL ones, which at first glance seems to be behave differently than results provide. But for sure there is a way of doing unique concatetion faster, maybe some combination of set of vectors or different container and I hope someone will provide one.
#include <vector>
#include <iostream>
#include <iterator>
#include <algorithm>
#include <set>
#include <unordered_set>
#include <chrono>
std::vector<int> getNextVector(int i) {
if (i == 0) {
std::vector<int> v = { 1,2,3 };
return v;
}
else if (i == 1) {
std::vector<int> v = { 3,4,5 };
return v;
}
else if (i == 2) {
std::vector<int> v = { 5,6,7 };
return v;
}
else if (i == 3) {
std::vector<int> v = { 7,8,9 };
return v;
}
return {rand() % 10000,rand() % 10000,rand() % 10000 };
}
void set_impl(std::set<int>& dest, int N)
{
// dest.reserve(20); // we can find this, for simplicity hardcode...
for (int i = 0; i < N; i++) {
std::vector<int> src = getNextVector(i);
dest.insert(
std::make_move_iterator(src.begin()),
std::make_move_iterator(src.end())
);
}
}
void uset_impl(std::unordered_set<int>& dest, int N)
{
// dest.reserve(20); // we can find this, for simplicity hardcode...
for (int i = 0; i < N; i++) {
std::vector<int> src = getNextVector(i);
dest.insert(
std::make_move_iterator(src.begin()),
std::make_move_iterator(src.end())
);
}
}
void vect_impl(std::vector<int>& dest, int N)
{
for (int i = 0; i < N; i++) {
std::vector<int> src = getNextVector(i);
dest.insert(
dest.end(),
std::make_move_iterator(src.begin()),
std::make_move_iterator(src.end())
);
}
std::sort(dest.begin(), dest.end());
dest.erase(unique(dest.begin(), dest.end()), dest.end());
}
int main(int argc, char** argv) {
for (int N : { 4, 20, 40, 200, 2000, 20000 })
{
const int K = 1000000 / N;
using clock = std::chrono::high_resolution_clock;
std::set<int> sdest;
auto start = clock::now();
for (int i = 0; i < K; i++)
{
sdest.clear();
set_impl(sdest, N);
}
auto set_ms = std::chrono::duration_cast<std::chrono::microseconds>(clock::now() - start).count();
std::unordered_set<int> usdest;
start = clock::now();
for (int i = 0; i < K; i++)
{
usdest.clear();
uset_impl(usdest, N);
}
auto uset_ms = std::chrono::duration_cast<std::chrono::microseconds>(clock::now() - start).count();
std::vector<int> dest;
dest.reserve(N); // we can find this, for simplicity hardcode...
start = clock::now();
for (int i = 0; i < K; i++)
{
dest.clear();
vect_impl(dest, N);
}
auto vect_ms = std::chrono::duration_cast<std::chrono::microseconds>(clock::now() - start).count();
std::cout << "N = " << N << " Set implementation " << set_ms << " miliseconds\n";
std::cout << "N = " << N << " Unordered set implementation " << uset_ms << " miliseconds\n";
std::cout << "N = " << N << " Vect implementation " << vect_ms << " miliseconds\n";
}
return 0;
}

If you want to preserve the order of elements recieved from the external API and they are not sorted then I recommend you create a second vector which you keep sorted. Then do a lower_bound on the sorted vector and if the returned iterator is not the value insert in both the target and sorted vectors (using the returned iterator as the insert position in the sorted vector). Using set or unordered set for integers is likely to be very much slower (probably orders of magnitude slower). If you don't care about the order then use a single sorted vector.
vector<int> sorted;
....
vector<int> src = getNextVector(i);
for( int i : src ) {
auto itr = std::lower_bound( sorted.begin(), sorted.end(), i );
if( *itr != i ) {
sorted.insert(itr, i);
integers.push_back(i);
}
}
If you know the values from each call to getNextVector are unique then you could do something like the following (which might be faster.)
vector<int> sorted;
....
vector<int> src = getNextVector(i);
vector<int> usrc;
for( int i : src ) {
auto itr = std::lower_bound( sorted.begin(), sorted.end(), i );
if( *itr != i ) {
usrc.push_back(i);
integers.push_back(i);
}
}
sorted.insert(sorted.end(), usrc.begin(), usrc.end());
std::sort( sorted.begin(), sorted.end() );

[C++20 solution]
In order to merge multiples containers into a vector,
while removing duplicates, I'd use something simplier like that :
namespace mp {
template <typename T>
concept Container = requires (T value) { // very naive container requirements
typename T::value_type;
std::begin(value), std::end(value);
std::cbegin(value), std::cend(value);
};
}
template <mp::Container ... Ts>
requires
requires { typename std::common_type_t<typename Ts::value_type...>; }
auto merge_uniques(const Ts & ... names) {
using value_type = typename std::common_type_t<typename Ts::value_type...>;
auto value = std::unordered_set<value_type>{};
(value.insert(std::cbegin(names), std::cend(names)), ...);
return std::vector<value_type> {
std::move_iterator{std::begin(value)},
std::move_iterator{std::end(value)}
};
}

How to sort and rank a vector in C++ (without using C++11)

I am trying to construct a function take takes a vector, ranks it, sorts it and outputs the sorted and ranked vector with the original positioning of the values. For example: Input: [10,332,42,0.9,0] Output: [3, 5, 4, 2, 1]
I used this stack overflow question (specifically Marius' answer) as a reference guide, however I am stuck with my code now and do not understand where the issue is.
I am running a C++03.
One of the errors I get is
error: invalid types ‘const float*[float]’ for array subscript’ for array subscript on my if statement.
//Rank the values in a vector
std::vector<float> rankSort(const float *v_temp, size_t size)
{
vector <float> v_sort;
//create a new array with increasing values from 0 to n-1
for(unsigned i = 0; i < size; i++)
{
v_sort.push_back(i);
}
bool swapped = false;
do
{
for(unsigned i = 0; i < size; i++)
{
if(v_temp[v_sort[i]] > v_temp[v_sort[i+1]]) //error line
{
float temp = v_sort[i];
v_sort[i] = v_sort[i+1];
v_sort[i+1] = temp;
swapped = true;
}
}
}
while(swapped);
return v_sort;
}
std::vector<float> rankSort(const std::vector<float> &v_temp)
{
return rankSort(&v_temp[0], v_temp.size());
}

Your problem is a misconception on rankings. Array indices are of size_t not float, so you'll need to return a vector<size_t> not a vector<float>.
That said your sort is O(n2). If you're willing to use more memory we can get that time down to O(n log(n)):
vector<size_t> rankSort(const float* v_temp, const size_t size) {
vector<pair<float, size_t> > v_sort(size);
for (size_t i = 0U; i < size; ++i) {
v_sort[i] = make_pair(v_temp[i], i);
}
sort(v_sort.begin(), v_sort.end());
pair<double, size_t> rank;
vector<size_t> result(size);
for (size_t i = 0U; i < size; ++i) {
if (v_sort[i].first != rank.first) {
rank = make_pair(v_sort[i].first, i);
}
result[v_sort[i].second] = rank.second;
}
return result;
}
Live Example
EDIT:
Yeah this actually gets a little simpler when taking a vector<float> instead of a float[]:
vector<size_t> rankSort(const vector<float>& v_temp) {
vector<pair<float, size_t> > v_sort(v_temp.size());
for (size_t i = 0U; i < v_sort.size(); ++i) {
v_sort[i] = make_pair(v_temp[i], i);
}
sort(v_sort.begin(), v_sort.end());
pair<double, size_t> rank;
vector<size_t> result(v_temp.size());
for (size_t i = 0U; i < v_sort.size(); ++i) {
if (v_sort[i].first != rank.first) {
rank = make_pair(v_sort[i].first, i);
}
result[v_sort[i].second] = rank.second;
}
return result;
}
Live Example

//Rank the values in a vector
std::vector<size_t> rankSort(const std::vector<float> &v_temp)
{
vector <size_t> v_sort;
//create a new array with increasing values from 0 to size-1
for(size_t i = 0; i < v_temp.size(); i++)
v_sort.push_back(i);
bool swapped = false;
do
{
swapped = false; //it's important to reset swapped
for(size_t i = 0; i < v_temp.size()-1; i++) // size-2 should be the last, since it is compared to next element (size-1)
if(v_temp[v_sort[i]] > v_temp[v_sort[i+1]])
{
size_t temp = v_sort[i]; // we swap indexing array elements, not original array elements
v_sort[i] = v_sort[i+1];
v_sort[i+1] = temp;
swapped = true;
}
}
while(swapped);
return v_sort;
}

v_sort[i] is a float (it's just an element of v_sort vector) while only integral types can be used as array subscripts.
Probably you meant v_sort as an array of indices, thus, you should declare it as std::vector<size_t> or std::vector<int> something like that.
UP: Also, given that you change the values of the array passed, it's not an elegant way of pass it by const reference.
To sum up, the following code compiles correctly on my machine:
std::vector<unsigned> rankSort(float *v_temp, size_t size)
{
vector <unsigned> v_sort;
//create a new array with increasing values from 0 to n-1
for(unsigned i = 0; i < size; i++)
{
v_sort.push_back(i);
}
bool swapped = false;
do
{
for(unsigned i = 0; i < size; i++)
{
if(v_temp[v_sort[i]] > v_temp[v_sort[i+1]]) //error line
{
unsigned temp = v_sort[i];
v_sort[i] = v_sort[i+1];
v_sort[i+1] = temp;
swapped = true;
}
}
}
while(swapped);
return v_sort;
}
std::vector<unsigned> rankSort(std::vector<float> &v_temp)
{
return rankSort(&v_temp[0], v_temp.size());
}

I suggest you adopt a more robust solution by taking advantage of what you have in the STL. To do so, we will first make an "index vector", ie. a std::vector<std::size_t> vsuch that for any i, v[i] == i is true:
// I'm sure there's a more elegant solution to generate this vector
// But this will do
std::vector<std::size_t> make_index_vector(std::size_t n) {
std::vector<std::size_t> result(n, 0);
for (std::size_t i = 0; i < n; ++i) {
result[i] = i;
}
return result;
}
Now all we have to do is to sort this vector according to a specific comparison function that will use the input vector. Furthermore, to allow for the most generic approach we will give the user the opportunity to use any comparison functor:
template <typename T, typename A, typename Cmp>
struct idx_compare {
std::vector<T, A> const& v;
Cmp& cmp;
idx_compare(std::vector<T, A> const& vec, Cmp& comp) : v(vec), cmp(comp) {}
bool operator()(std::size_t i, std::size_t j) {
return cmp(v[i], v[j]);
}
};
template <typename T, typename A, typename Cmp>
std::vector<std::size_t> sorted_index_vector(std::vector<T, A> const& vec, Cmp comp) {
std::vector<std::size_t> index = make_index_vector(vec.size());
std::sort(index.begin(), index.end(),
idx_compare<T, A, Cmp>(vec, comp));
return index;
}
In the sorted index vector, index[0] is the index of the lowest value in the input vector, index[1] the second lowest and so on. Therefore, we need one additional step to get the rank vector from this one:
std::vector<std::size_t> get_rank_vector(std::vector<std::size_t> const& index) {
std::vector<std::size_t> rank(index.size());
for (std::size_t i = 0; i < index.size(); ++i) {
// We add 1 since you want your rank to start at 1 instead of 0
// Just remove it if you want 0-based ranks
rank[index[i]] = i + 1;
}
return rank;
}
Now we combine all the pieces together:
template <typename T, typename A, typename Cmp>
std::vector<std::size_t> make_rank_vector(
std::vector<T, A> const& vec, Cmp comp) {
return get_rank_vector(sorted_index_vector(vec, comp));
}
// I had to stop using default template parameters since early gcc version did not support it (4.3.6)
// So I simply made another overload to handle the basic usage.
template <typename T, typename A>
std::vector<std::size_t> make_rank_vector(
std::vector<T, A> const& vec) {
return make_rank_vector(vec, std::less<T>());
}
Result with [10, 332, 42, 0.9, 0]: [3, 5, 4, 2, 1].
You can find a Live Demo on gcc 4.3.6 to explicit this behavior.

Here is my codes using STL to achieve this in a concise way to get the rank.
template <typename T>
vector<size_t> calRank(const vector<T> & var) {
vector<size_t> result(var.size(),0);
//sorted index
vector<size_t> indx(var.size());
iota(indx.begin(),indx.end(),0);
sort(indx.begin(),indx.end(),[&var](int i1, int i2){return var[i1]<var[i2];});
//return ranking
for(size_t iter=0;iter<var.size();++iter){
result[indx[iter]]=iter+1;
}
return result;
}

Removing duplicates in a vector of strings

I have a vector of strings:
std::vector<std::string> fName
which holds a list of file names <a,b,c,d,a,e,e,d,b>.
I want to get rid of all the files that have duplicates and want to retain only the files that do not have duplicates in the vector.
for(size_t l = 0; l < fName.size(); l++)
{
strFile = fName.at(l);
for(size_t k = 1; k < fName.size(); k++)
{
strFile2 = fName.at(k);
if(strFile.compare(strFile2) == 0)
{
fName.erase(fName.begin() + l);
fName.erase(fName.begin() + k);
}
}
}
This is removing a few of the duplicate but still has a few duplicates left, need help in debugging.
Also my input looks like <a,b,c,d,e,e,d,c,a> and my expected output is <b> as all other files b,c,d,e have duplicates they are removed.

#include <algorithm>
template <typename T>
void remove_duplicates(std::vector<T>& vec)
{
std::sort(vec.begin(), vec.end());
vec.erase(std::unique(vec.begin(), vec.end()), vec.end());
}
Note: this require that T has operator< and operator== defined.
Why it work?
std::sort sort the elements using their less-than comparison operator
std::unique removes the duplicate consecutive elements, comparing them using their equal comparison operator
What if i want only the unique elements?
Then you better use std::map
#include <algorithm>
#include <map>
template <typename T>
void unique_elements(std::vector<T>& vec)
{
std::map<T, int> m;
for(auto p : vec) ++m[p];
vec.erase(transform_if(m.begin(), m.end(), vec.begin(),
[](std::pair<T,int> const& p) {return p.first;},
[](std::pair<T,int> const& p) {return p.second==1;}),
vec.end());
}
See: here.

If I understand your requirements correctly, and I'm not entirely sure that I do. You want to only keep the elements in your vector of which do not repeat, correct?
Make a map of strings to ints, used for counting occurrences of each string. Clear the vector, then copy back only the strings that only occurred once.
map<string,int> m;
for (auto & i : v)
m[i]++;
v.clear();
for (auto & i : m)
if(i.second == 1)
v.push_back(i.first);
Or, for the compiler-feature challenged:
map<string,int> m;
for (vector<string>::iterator i=v.begin(); i!=v.end(); ++i)
m[*i]++;
v.clear();
for (map<string,int>::iterator i=m.begin(); i!=m.end(); ++i)
if (i->second == 1)
v.push_back(i->first);

#include <algorithms>
template <typename T>
remove_duplicates(std::vector<T>& vec)
{
std::vector<T> tvec;
uint32_t size = vec.size();
for (uint32_t i; i < size; i++) {
if (std::find(vec.begin() + i + 1, vec.end(), vec[i]) == vector.end()) {
tvec.push_back(t);
} else {
vec.push_back(t);
}
vec = tvec; // : )
}
}

You can eliminate duplicates in O(log n) runtime and O(n) space:
std::set<std::string> const uniques(vec.begin(), vec.end());
vec.assign(uniques.begin(), uniques.end());
But the O(log n) runtime is a bit misleading, because the O(n) space actually does O(n) dynamic allocations, which are expensive in terms of speed. The elements must also be comparable (here with operator<(), which std::string supports as a lexicographical compare).
If you want to store only unique elements:
template<typename In>
In find_unique(In first, In last)
{
if( first == last ) return last;
In tail(first++);
int dupes = 0;
while( first != last ) {
if( *tail++ == *first++ ) ++dupes;
else if( dupes != 0 ) dupes = 0;
else return --tail;
}
return dupes == 0 ? tail : last;
}
The algorithm above takes a sorted range and returns the first unique element, in linear time and constant space. To get all uniques in a container you might use it like so:
auto pivot = vec.begin();
for( auto i(find_unique(vec.begin(), vec.end()));
i != vec.end();
i = find_unique(++i, vec.end())) {
std::iter_swap(pivot++, i);
}
vec.erase(pivot, vec.end());

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to draw a sample of n elements from std::set - c++

Related

Deleting both an element and its duplicates in a Vector in C++

Create a vector with repeated entries of each Element

c++ concatenate many std::vectors and remove duplicates

How to sort and rank a vector in C++ (without using C++11)

Removing duplicates in a vector of strings

Categories

Resources