Removing duplicates in a vector of strings

Removing duplicates in a vector of strings - c++

I have a vector of strings:
std::vector<std::string> fName
which holds a list of file names <a,b,c,d,a,e,e,d,b>.
I want to get rid of all the files that have duplicates and want to retain only the files that do not have duplicates in the vector.
for(size_t l = 0; l < fName.size(); l++)
{
strFile = fName.at(l);
for(size_t k = 1; k < fName.size(); k++)
{
strFile2 = fName.at(k);
if(strFile.compare(strFile2) == 0)
{
fName.erase(fName.begin() + l);
fName.erase(fName.begin() + k);
}
}
}
This is removing a few of the duplicate but still has a few duplicates left, need help in debugging.
Also my input looks like <a,b,c,d,e,e,d,c,a> and my expected output is <b> as all other files b,c,d,e have duplicates they are removed.

#include <algorithm>
template <typename T>
void remove_duplicates(std::vector<T>& vec)
{
std::sort(vec.begin(), vec.end());
vec.erase(std::unique(vec.begin(), vec.end()), vec.end());
}
Note: this require that T has operator< and operator== defined.
Why it work?
std::sort sort the elements using their less-than comparison operator
std::unique removes the duplicate consecutive elements, comparing them using their equal comparison operator
What if i want only the unique elements?
Then you better use std::map
#include <algorithm>
#include <map>
template <typename T>
void unique_elements(std::vector<T>& vec)
{
std::map<T, int> m;
for(auto p : vec) ++m[p];
vec.erase(transform_if(m.begin(), m.end(), vec.begin(),
[](std::pair<T,int> const& p) {return p.first;},
[](std::pair<T,int> const& p) {return p.second==1;}),
vec.end());
}
See: here.

If I understand your requirements correctly, and I'm not entirely sure that I do. You want to only keep the elements in your vector of which do not repeat, correct?
Make a map of strings to ints, used for counting occurrences of each string. Clear the vector, then copy back only the strings that only occurred once.
map<string,int> m;
for (auto & i : v)
m[i]++;
v.clear();
for (auto & i : m)
if(i.second == 1)
v.push_back(i.first);
Or, for the compiler-feature challenged:
map<string,int> m;
for (vector<string>::iterator i=v.begin(); i!=v.end(); ++i)
m[*i]++;
v.clear();
for (map<string,int>::iterator i=m.begin(); i!=m.end(); ++i)
if (i->second == 1)
v.push_back(i->first);

#include <algorithms>
template <typename T>
remove_duplicates(std::vector<T>& vec)
{
std::vector<T> tvec;
uint32_t size = vec.size();
for (uint32_t i; i < size; i++) {
if (std::find(vec.begin() + i + 1, vec.end(), vec[i]) == vector.end()) {
tvec.push_back(t);
} else {
vec.push_back(t);
}
vec = tvec; // : )
}
}

You can eliminate duplicates in O(log n) runtime and O(n) space:
std::set<std::string> const uniques(vec.begin(), vec.end());
vec.assign(uniques.begin(), uniques.end());
But the O(log n) runtime is a bit misleading, because the O(n) space actually does O(n) dynamic allocations, which are expensive in terms of speed. The elements must also be comparable (here with operator<(), which std::string supports as a lexicographical compare).
If you want to store only unique elements:
template<typename In>
In find_unique(In first, In last)
{
if( first == last ) return last;
In tail(first++);
int dupes = 0;
while( first != last ) {
if( *tail++ == *first++ ) ++dupes;
else if( dupes != 0 ) dupes = 0;
else return --tail;
}
return dupes == 0 ? tail : last;
}
The algorithm above takes a sorted range and returns the first unique element, in linear time and constant space. To get all uniques in a container you might use it like so:
auto pivot = vec.begin();
for( auto i(find_unique(vec.begin(), vec.end()));
i != vec.end();
i = find_unique(++i, vec.end())) {
std::iter_swap(pivot++, i);
}
vec.erase(pivot, vec.end());

Related

Sorting an array whose elements are attached to another array [duplicate]

How can I sort two vectors in the same way, with criteria that uses only one of the vectors?
For example, suppose I have two vectors of the same size:
vector<MyObject> vectorA;
vector<int> vectorB;
I then sort vectorA using some comparison function. That sorting reordered vectorA. How can I have the same reordering applied to vectorB?
One option is to create a struct:
struct ExampleStruct {
MyObject mo;
int i;
};
and then sort a vector that contains the contents of vectorA and vectorB zipped up into a single vector:
// vectorC[i] is vectorA[i] and vectorB[i] combined
vector<ExampleStruct> vectorC;
This doesn't seem like an ideal solution. Are there other options, especially in C++11?

Finding a sort permutation
Given a std::vector<T> and a comparison for T's, we want to be able to find the permutation you would use if you were to sort the vector using this comparison.
template <typename T, typename Compare>
std::vector<std::size_t> sort_permutation(
const std::vector<T>& vec,
Compare& compare)
{
std::vector<std::size_t> p(vec.size());
std::iota(p.begin(), p.end(), 0);
std::sort(p.begin(), p.end(),
[&](std::size_t i, std::size_t j){ return compare(vec[i], vec[j]); });
return p;
}
Applying a sort permutation
Given a std::vector<T> and a permutation, we want to be able to build a new std::vector<T> that is reordered according to the permutation.
template <typename T>
std::vector<T> apply_permutation(
const std::vector<T>& vec,
const std::vector<std::size_t>& p)
{
std::vector<T> sorted_vec(vec.size());
std::transform(p.begin(), p.end(), sorted_vec.begin(),
[&](std::size_t i){ return vec[i]; });
return sorted_vec;
}
You could of course modify apply_permutation to mutate the vector you give it rather than returning a new sorted copy. This approach is still linear time complexity and uses one bit per item in your vector. Theoretically, it's still linear space complexity; but, in practice, when sizeof(T) is large the reduction in memory usage can be dramatic. (See details)
template <typename T>
void apply_permutation_in_place(
std::vector<T>& vec,
const std::vector<std::size_t>& p)
{
std::vector<bool> done(vec.size());
for (std::size_t i = 0; i < vec.size(); ++i)
{
if (done[i])
{
continue;
}
done[i] = true;
std::size_t prev_j = i;
std::size_t j = p[i];
while (i != j)
{
std::swap(vec[prev_j], vec[j]);
done[j] = true;
prev_j = j;
j = p[j];
}
}
}
Example
vector<MyObject> vectorA;
vector<int> vectorB;
auto p = sort_permutation(vectorA,
[](T const& a, T const& b){ /*some comparison*/ });
vectorA = apply_permutation(vectorA, p);
vectorB = apply_permutation(vectorB, p);
Resources
std::vector
std::iota
std::sort
std::swap
std::transform

With range-v3, it is simple, sort a zip view:
std::vector<MyObject> vectorA = /*..*/;
std::vector<int> vectorB = /*..*/;
ranges::v3::sort(ranges::view::zip(vectorA, vectorB));
or explicitly use projection:
ranges::v3::sort(ranges::view::zip(vectorA, vectorB),
std::less<>{},
[](const auto& t) -> decltype(auto) { return std::get<0>(t); });
Demo

I would like to contribute with a extension I came up with.
The goal is to be able to sort multiple vectors at the same time using a simple syntax.
sortVectorsAscending(criteriaVec, vec1, vec2, ...)
The algorithm is the same as the one Timothy proposed but using variadic templates, so we can sort multiple vectors of arbitrary types at the same time.
Here's the code snippet:
template <typename T, typename Compare>
void getSortPermutation(
std::vector<unsigned>& out,
const std::vector<T>& v,
Compare compare = std::less<T>())
{
out.resize(v.size());
std::iota(out.begin(), out.end(), 0);
std::sort(out.begin(), out.end(),
[&](unsigned i, unsigned j){ return compare(v[i], v[j]); });
}
template <typename T>
void applyPermutation(
const std::vector<unsigned>& order,
std::vector<T>& t)
{
assert(order.size() == t.size());
std::vector<T> st(t.size());
for(unsigned i=0; i<t.size(); i++)
{
st[i] = t[order[i]];
}
t = st;
}
template <typename T, typename... S>
void applyPermutation(
const std::vector<unsigned>& order,
std::vector<T>& t,
std::vector<S>&... s)
{
applyPermutation(order, t);
applyPermutation(order, s...);
}
template<typename T, typename Compare, typename... SS>
void sortVectors(
const std::vector<T>& t,
Compare comp,
std::vector<SS>&... ss)
{
std::vector<unsigned> order;
getSortPermutation(order, t, comp);
applyPermutation(order, ss...);
}
// make less verbose for the usual ascending order
template<typename T, typename... SS>
void sortVectorsAscending(
const std::vector<T>& t,
std::vector<SS>&... ss)
{
sortVectors(t, std::less<T>(), ss...);
}
Test it in Ideone.
I explain this a little bit better in this blog post.

In-place sorting using permutation
I would use a permutation like Timothy, although if your data is too large and you don't want to allocate more memory for the sorted vector you should do it in-place. Here is a example of a O(n) (linear complexity) in-place sorting using permutation:
The trick is to get the permutation and the reverse permutation to know where to put the data overwritten by the last sorting step.
template <class K, class T>
void sortByKey(K * keys, T * data, size_t size){
std::vector<size_t> p(size,0);
std::vector<size_t> rp(size);
std::vector<bool> sorted(size, false);
size_t i = 0;
// Sort
std::iota(p.begin(), p.end(), 0);
std::sort(p.begin(), p.end(),
[&](size_t i, size_t j){ return keys[i] < keys[j]; });
// ----------- Apply permutation in-place ---------- //
// Get reverse permutation item>position
for (i = 0; i < size; ++i){
rp[p[i]] = i;
}
i = 0;
K savedKey;
T savedData;
while ( i < size){
size_t pos = i;
// Save This element;
if ( ! sorted[pos] ){
savedKey = keys[p[pos]];
savedData = data[p[pos]];
}
while ( ! sorted[pos] ){
// Hold item to be replaced
K heldKey = keys[pos];
T heldData = data[pos];
// Save where it should go
size_t heldPos = rp[pos];
// Replace
keys[pos] = savedKey;
data[pos] = savedData;
// Get last item to be the pivot
savedKey = heldKey;
savedData = heldData;
// Mark this item as sorted
sorted[pos] = true;
// Go to the held item proper location
pos = heldPos;
}
++i;
}
}

I have recently wrote a proper zip iterator which works with the stl algorithms.
It allows you to produce code like this:
std::vector<int> a{3,1,4,2};
std::vector<std::string> b{"Alice","Bob","Charles","David"};
auto zip = Zip(a,b);
std::sort(zip.begin(), zip.end());
for (const auto & z: zip) std::cout << z << std::endl;
It is contained in a single header and the only requirement is C++17.
Check it out on GitHub.
There is also a post on codereview which contains all the source code.

Make a vector of pairs out of your individual vectors.
initialize vector of pairs
Adding to a vector of pair
Make a custom sort comparator:
Sorting a vector of custom objects
http://rosettacode.org/wiki/Sort_using_a_custom_comparator#C.2B.2B
Sort your vector of pairs.
Separate your vector of pairs into individual vectors.
Put all of these into a function.
Code:
std::vector<MyObject> vectorA;
std::vector<int> vectorB;
struct less_than_int
{
inline bool operator() (const std::pair<MyObject,int>& a, const std::pair<MyObject,int>& b)
{
return (a.second < b.second);
}
};
sortVecPair(vectorA, vectorB, less_than_int());
// make sure vectorA and vectorB are of the same size, before calling function
template <typename T, typename R, typename Compare>
sortVecPair(std::vector<T>& vecA, std::vector<R>& vecB, Compare cmp)
{
std::vector<pair<T,R>> vecC;
vecC.reserve(vecA.size());
for(int i=0; i<vecA.size(); i++)
{
vecC.push_back(std::make_pair(vecA[i],vecB[i]);
}
std::sort(vecC.begin(), vecC.end(), cmp);
vecA.clear();
vecB.clear();
vecA.reserve(vecC.size());
vecB.reserve(vecC.size());
for(int i=0; i<vecC.size(); i++)
{
vecA.push_back(vecC[i].first);
vecB.push_back(vecC[i].second);
}
}

I'm assuming that vectorA and vectorB have equal lengths. You could create another vector, let's call it pos, where:
pos[i] = the position of vectorA[i] after sorting phase
and then, you can sort vectorB using pos, i.e create vectorBsorted where:
vectorBsorted[pos[i]] = vectorB[i]
and then vectorBsorted is sorted by the same permutation of indexes as vectorA is.

I am not sure if this works but i would use something like this. For example to sort two vectors i would use descending bubble sort method and vector pairs.
For descending bubble sort, i would create a function that requires a vector pair.
void bubbleSort(vector< pair<MyObject,int> >& a)
{
bool swapp = true;
while (swapp) {
int key;
MyObject temp_obj;
swapp = false;
for (size_t i = 0; i < a.size() - 1; i++) {
if (a[i].first < a[i + 1].first) {
temp_obj = a[i].first;
key = a[i].second;
a[i].first = a[i + 1].first;
a[i + 1].first = temp_obj;
a[i].second = a[i + 1].second;
a[i + 1].second = key;
swapp = true;
}
}
}
}
After that i would put your 2 vector values into one vector pair. If you are able to add values at the same time use this one and than call the bubble sort function.
vector< pair<MyObject,int> > my_vector;
my_vector.push_back( pair<MyObject,int> (object_value,int_value));
bubbleSort(my_vector);
If you want to use values after adding to your 2 vectors, you can use this one and than call the bubble sort function.
vector< pair<MyObject,int> > temp_vector;
for (size_t i = 0; i < vectorA.size(); i++) {
temp_vector.push_back(pair<MyObject,int> (vectorA[i],vectorB[i]));
}
bubbleSort(temp_vector);
I hope this helps.
Regards,
Caner

Based on Timothy Shields answer.
With a small tweak to apply_permutaion you can apply the permutation to multiple vectors of different types at once with use of a fold expression.
template <typename T, typename... Ts>
void apply_permutation(const std::vector<size_t>& perm, std::vector<T>& v, std::vector<Ts>&... vs) {
std::vector<bool> done(v.size());
for(size_t i = 0; i < v.size(); ++i) {
if(done[i]) continue;
done[i] = true;
size_t prev = i;
size_t curr = perm[i];
while(i != curr) {
std::swap(v[prev], v[curr]);
(std::swap(vs[prev], vs[curr]), ...);
done[curr] = true;
prev = curr;
curr = perm[curr];
}
}
}

Deleting both an element and its duplicates in a Vector in C++

I've searched the Internet and known how to delete an element (with std::erase) and finding duplicates of an element to then delete it (vec.erase(std::unique(vec.begin(), vec.end()),vec.end());). But all methods only delete either an element or its duplicates.
I want to delete both.
For example, using this vector:
std::vector<int> vec = {2,3,1,5,2,2,5,1};
I want output to be:
{3}
My initial idea was:
void removeDuplicatesandElement(std::vector<int> &vec)
{
std::sort(vec.begin(), vec.end());
int passedNumber = 0; //To tell amount of number not deleted (since not duplicated)
for (int i = 0; i != vec.size(); i = passedNumber) //This is not best practice, but I tried
{
if (vec[i] == vec[i+1])
{
int ctr = 1;
for(int j = i+1; j != vec.size(); j++)
{
if (vec[j] == vec[i]) ctr++;
else break;
}
vec.erase(vec.begin()+i, vec.begin()+i+ctr);
}
else passedNumber++;
}
}
And it worked. But this code is redundant and runs at O(n^2), so I'm trying to find other ways to solve the problem (maybe an STL function that I've never heard of, or just improve the code).

Something like this, perhaps:
void removeDuplicatesandElement(std::vector<int> &vec) {
if (vec.size() <= 1) return;
std::sort(vec.begin(), vec.end());
int cur_val = vec.front() - 1;
auto pred = [&](const int& val) {
if (val == cur_val) return true;
cur_val = val;
// Look ahead to the next element to see if it's a duplicate.
return &val != &vec.back() && (&val)[1] == val;
};
vec.erase(std::remove_if(vec.begin(), vec.end(), pred), vec.end());
}
Demo
This relies heavily on the fact that std::vector is guaranteed to have contiguous storage. It won't work with any other container.

You can do it using STL maps as follows:
#include <iostream>
#include <vector>
#include <unordered_map>
using namespace std;
void retainUniqueElements(vector<int> &A){
unordered_map<int, int> Cnt;
for(auto element:A) Cnt[element]++;
A.clear(); //removes all the elements of A
for(auto i:Cnt){
if(i.second == 1){ // that if the element occurs once
A.push_back(i.first); //then add it in our vector
}
}
}
int main() {
vector<int> vec = {2,3,1,5,2,2,5,1};
retainUniqueElements(vec);
for(auto i:vec){
cout << i << " ";
}
cout << "\n";
return 0;
}
Output:
3
Time Complexity of the above approach: O(n)
Space Complexity of the above approach: O(n)

From what you have searched, we can look in the vector for duplicated values, then use the Erase–remove idiom to clean up the vector.
#include <vector>
#include <algorithm>
#include <iostream>
void removeDuplicatesandElement(std::vector<int> &vec)
{
std::sort(vec.begin(), vec.end());
if (vec.size() < 2)
return;
for (int i = 0; i < vec.size() - 1;)
{
// This is for the case we emptied our vector
if (vec.size() < 2)
return;
// This heavily relies on the fact that this vector is sorted
if (vec[i] == vec[i + 1])
vec.erase(std::remove(vec.begin(), vec.end(), (int)vec[i]), vec.end());
else
i += 1;
}
// Since all duplicates are removed, the remaining elements in the vector are unique, thus the size of the vector
// But we are not returning anything or any reference, so I'm just gonna leave this here
// return vec.size()
}
int main()
{
std::vector<int> vec = {2, 3, 1, 5, 2, 2, 5, 1};
removeDuplicatesandElement(vec);
for (auto i : vec)
{
std::cout << i << " ";
}
std::cout << "\n";
return 0;
}
Output: 3
Time complexity: O(n)

How to remove duplicated items in a sorted vector

I currently have a vector<int> c which contains {1,2,2,4,5,5,6}
and I want to remove the duplicated numbers so that c will have
{1,4,6}. A lot of solutions on the internet seem to just remove one of the duplicates, but I'm trying to remove all occurrences of the duplicated number.
Assume c is always sorted.
I currently have
#include <iostream>
#include <vector>
int main() {
vector<int> c{1,2,2,4,5,5,6};
for (int i = 0; i < c.size()-1; i++) {
for (int j=1; j<c.size();j++){
if(c[i] == c[j]){
// delete i and j?
}
}
}
}
I tried to use two for-loops so that I can compare the current element and the next element. This is where my doubt kicked in. I'm not sure if I'm approaching the problem correctly.
Could I get help on how to approach my problem?

This code is based on the insight that an element is unique in a sorted list if and only if it is different from both elements immediately adjacent to it (except for the starting and ending elements, which are adjacent to one element each). This is true because all identical elements must be adjacent in a sorted array.
void keep_unique(vector <int> &v){
if(v.size() < 2){return;}
vector <int> unique;
// check the first element individually
if(v[0] != v[1]){
unique.push_back(v[0]);
}
for(unsigned int i = 1; i < v.size()-1; ++i){
if(v[i] != v[i-1] && v[i] != v[i+1]){
unique.push_back(v[i]);
}
}
// check the last item individually
if(v[v.size()-1] != v[v.size()-2]){
unique.push_back(v[v.size()-1]);
}
v = unique;
}

Almost any time you find yourself deleting elements from the middle of a vector, it's probably best to sit back and think about whether this is the best way to do the job--chances are pretty good that it isn't.
There are a couple of obvious alternatives to that. One is to copy the items you're going to keep into a temporary vector, then when you're done, swap the temporary vector and the original vector. This works particularly well in a case like you've shown in the question, where you're keeping only a fairly small minority of the input data.
The other is to rearrange the data in your existing vector so all the data you don't want is at the end, and all the data you do want is at the beginning, then resize your vector to eliminate those you don't want.
When I doubt, I tend to go the first route. In theory it's probably a bit less efficient (poorer locality of reference) but I've rarely seen a significant slow-down in real use.
That being the case, my initial take would probably be something on this general order:
#include <vector>
#include <iostream>
#include <iterator>
std::vector<int> remove_all_dupes(std::vector<int> const &input) {
if (input.size() < 2) // zero or one element is automatically unique
return input;
std::vector<int> ret;
// first item is unique if it's different from its successor
if (input[0] != input[1])
ret.push_back(input[0]);
// in the middle, items are unique if they're different from both predecessor and successor
for (std::size_t pos = 1; pos < input.size() - 2; pos++)
if (input[pos] != input[pos-1] && input[pos] != input[pos+1])
ret.push_back(input[pos]);
// last item is unique if it's different from predecessor
if (input[input.size()-1] != input[input.size()-2])
ret.push_back(input[input.size() - 1]);
return ret;
}
int main() {
std::vector<int> c { 1, 2, 2, 4, 5, 5, 6 };
std::vector<int> uniques = remove_all_dupes(c);
std::copy(uniques.begin(), uniques.end(), std::ostream_iterator<int>(std::cout, "\n"));
}
Probably a little longer of code than we'd really prefer, but still simple, straightforward, and efficient.
If you are going to do the job in place, the usual way to do it efficiently (and this applies to filtering in general, not just this particular filter) is to start with a copying phase and follow that by a deletion phase. In the copying phase, you use two pointers: a source and a destination. You start them both at the first element, then advance through the input with the source. If it meets your criteria, you copy it to the destination position, and advance both. If it doesn't meet your criteria, advance only the source.
Then when you're done with that, you resize your vector down to the number of elements you're keeping.
void remove_all_dupes2(std::vector<int> & input) {
if (input.size() < 2) { // 0 or 1 element is automatically unique
return;
}
std::size_t dest = 0;
if (input[0] != input[1])
++dest;
for (std::size_t source = 1; source < input.size() - 2; source++) {
if (input[source] != input[source-1] && input[source] != input[source+1]) {
input[dest++] = input[source];
}
}
if (input[input.size()-1] != input[input.size()-2]) {
input[dest++] = input[input.size() - 1];
}
input.resize(dest);
}
At least in my view, the big thing to keep in mind here is the general pattern. You'll almost certainly run into a lot more situations where you want to filter some inputs to those that fit some criteria, and this basic pattern of tracking source and destination, and copying only those from the source to the destination that fit your criteria works well in a lot of situations, not just this one.

Generally one has to be very careful when deleting from containers while iterating over them. C++ STL can do this easily and faster (on average) than using nested loops.
#include <vector>
#include <algorithm>
#include <unordered_set>
int main() {
std::vector<int> c{1,2,2,4,5,5,6};
std::unordered_multiset<int> unique( c.begin(), c.end() );
c.erase(std::remove_if(c.begin(),c.end(),[&](const auto& e){return unique.count(e)>1;}),c.end());
for(auto e: c){
std::cout<<e<<' ';
}
}
//Output: 1 4 6
Alternatively, you could use std::map<int,std::size_t> and count the occurences this way.

Similarly to std::unique/std::copy_if, you might do:
void keep_unique(std::vector<int>& v){
auto it = v.begin();
auto w = v.begin();
while (it != v.end())
{
auto next = std::find_if(it, v.end(), [&](int e){ return e != *it; });
if (std::distance(it, next) == 1) {
if (w != it) {
*w = std::move(*it);
}
++w;
}
it = next;
}
v.erase(w, v.end());
}
Demo

Use std::remove_if to move items occurring multiple times to the rear, then erase them.
#include <iostream>
#include <vector>
#include <algorithm>
int main()
{
std::vector<int> V {1,2,2,4,5,5,6};
auto it = std::remove_if(V.begin(), V.end(), [&](const auto& val)
{
return std::count(V.begin(), V.end(), val) > 1;
});
V.erase(it, V.end());
for (const auto& val : V)
std::cout << val << std::endl;
return 0;
}
Output:
1
4
6
For demo: https://godbolt.org/z/j6fxe1

Iterating in reverse ensures an O(N) operation and does not cause element shifting when erasing because we are only ever erasing the last element in the vector. Also, no other data structures need to be allocated.
For every element encountered, we check if the adjacent element is equal, and if so, remove all instances of that element.
Requires the vector to be sorted, or at least grouped by duplicates.
#include <iostream>
#include <vector>
int main()
{
std::vector<int> c {1, 2, 2, 4, 5, 5, 6};
for (int i = c.size() - 1; i > 0;)
{
const int n = c[i];
if (c[i - 1] == n)
{
while (c[i] == n)
{
c.erase(c.begin() + i--);
}
}
else
{
i--;
}
}
//output result
for (auto it : c)
{
std::cout<<it;
}
std::cout << std::endl;
}
Output: 146
Update
An actual O(N) implementation using a sentinel value:
#include <iostream>
#include <vector>
#include <limits>
#include <algorithm>
int main()
{
std::vector<int> c { 1, 2, 2, 4, 5, 5, 6 };
const int sentinel = std::numeric_limits<int>::lowest(); //assumed that no valid member uses this value.
for (int i = 0; i < c.size() - 1;)
{
const int n = c[i];
if (c[i + 1] == n)
{
while (c[i] == n)
{
c[i++] = sentinel;
}
}
else
{
i++;
}
}
c.erase(std::remove(c.begin(),c.end(),sentinel), c.end());
for (auto it : c) std::cout<< it << ' ';
}

This can be achieved with the proper use of iterators to avoid runtime errors.
Have a look at the following code:
#include <iostream>
#include <vector>
int main() {
std::vector<int> c{1,2,2,4,5,5,6};
for (auto it = c.begin(); it != c.end(); ){
auto it2 = it;
advance(it2, 1);
bool isDuplicate = false;
for(; it2 != c.end(); ++it2){
if(*it == *it2){
c.erase(it2);
isDuplicate = true;
}
}
if(isDuplicate){
auto it3 = it;
advance(it3, 1);
c.erase(it);
it = it3;
}else{
it++;
}
}
for (auto it = c.begin(); it != c.end(); it++){
std::cout<<*it<<" ";
}
}
Output:
1 4 6

How to draw a sample of n elements from std::set

I have the following function to pick a random element from a std::set:
int pick_random(const std::set<int>& vertex_set) {
std::uniform_int_distribution<std::set<int>::size_type> dist(0, vertex_set.size() - 1);
const std::set<int>::size_type rand_idx = dist(mt);
std::set<int>::const_iterator it = vertex_set.begin();
for (std::set<int>::size_type i = 0; i < rand_idx; i++) {
it++;
}
return *it;
}
However, I wonder how to properly draw a sample of n elements from a set. With C++17 compiler I can use std::sample function, but this is not the case here because I have C++11 compiler.

If you don't mind the copy, an easy way is to create a std::vector from your std::set, shuffle it using std::shuffle and then takes the first n elements:
std::vector<int> pick_random_n(const std::set<int>& vertex_set, std::size_t n) {
std::vector<int> vec(std::begin(vertex_set), std::end(vertex_set));
std::shuffle(std::begin(vec), std::end(vec), mt);
vec.resize(std::min(n, vertex_set.size()));
return vec;
}
If you do not want the extra-copy, you could look at the implementation of std::sample, from, e.g., libc++ and implement your own for std::set:
std::vector<int> pick_random_n(const std::set<int>& vertex_set, std::size_t n) {
auto unsampled_sz = vertex_set.size();
auto first = std::begin(vertex_set);
std::vector<int> vec;
vec.reserve(std::min(n, unsampled_sz));
for (n = std::min(n, unsampled_sz); n != 0; ++first) {
auto r =
std::uniform_int_distribution<std::size_t>(0, --unsampled_sz)(mt);
if (r < n) {
vec.push_back(*first);
--n;
}
}
return vec;
}

Keep the duplicated values only - Vectors C++

Assume I have a vector with the following elements {1, 1, 2, 3, 3, 4}
I want to write a program with c++ code to remove the unique values and keep only the duplicated once. So the end result will be something like this {1,3}.
So far this is what I've done, but it takes a lot of time,
Is there any way this can be more efficient,
vector <int> g1 = {1,1,2,3,3,4}
vector <int> g2;
for(int i = 0; i < g1.size(); i++)
{
if(count(g1.begin(), g1.end(), g1[i]) > 1)
g2.push_back(g1[i]);
}
v.erase(std::unique(g2.begin(), g2.end()), g2.end());
for(int i = 0; i < g2.size(); i++)
{
cout << g2[i];
}

My approach is to create an <algorithm>-style template, and use an unordered_map to do the counting. This means you only iterate over the input list once, and the time complexity is O(n). It does use O(n) extra memory though, and isn't particularly cache-friendly. Also this does assume that the type in the input is hashable.
#include <algorithm>
#include <iostream>
#include <iterator>
#include <unordered_map>
template <typename InputIt, typename OutputIt>
OutputIt copy_duplicates(
InputIt first,
InputIt last,
OutputIt d_first)
{
std::unordered_map<typename std::iterator_traits<InputIt>::value_type,
std::size_t> seen;
for ( ; first != last; ++first) {
if ( 2 == ++seen[*first] ) {
// only output on the second time of seeing a value
*d_first = *first;
++d_first;
}
}
return d_first;
}
int main()
{
int i[] = {1, 2, 3, 1, 1, 3, 5}; // print 1, 3,
// ^ ^
copy_duplicates(std::begin(i), std::end(i),
std::ostream_iterator<int>(std::cout, ", "));
}
This can output to any kind of iterator. There are special iterators you can use that when written to will insert the value into a container.

Here's a way that's a little more cache friendly than unordered_map answer, but is O(n log n) instead of O(n), though it does not use any extra memory and does no allocations. Additionally, the overall multiplier is probably higher, in spite of it's cache friendliness.
#include <vector>
#include <algorithm>
void only_distinct_duplicates(::std::vector<int> &v)
{
::std::sort(v.begin(), v.end());
auto output = v.begin();
auto test = v.begin();
auto run_start = v.begin();
auto const end = v.end();
for (auto test = v.begin(); test != end; ++test) {
if (*test == *run_start) {
if ((test - run_start) == 1) {
*output = *run_start;
++output;
}
} else {
run_start = test;
}
}
v.erase(output, end);
}
I've tested this, and it works. If you want a generic version that should work on any type that vector can store:
template <typename T>
void only_distinct_duplicates(::std::vector<T> &v)
{
::std::sort(v.begin(), v.end());
auto output = v.begin();
auto test = v.begin();
auto run_start = v.begin();
auto const end = v.end();
for (auto test = v.begin(); test != end; ++test) {
if (*test != *run_start) {
if ((test - run_start) > 1) {
::std::swap(*output, *run_start);
++output;
}
run_start = test;
}
}
if ((end - run_start) > 1) {
::std::swap(*output, *run_start);
++output;
}
v.erase(output, end);
}

Assuming the input vector is not sorted, the following will work and is generalized to support any vector with element type T. It will be more efficient than the other solutions proposed so far.
#include <algorithm>
#include <iostream>
#include <vector>
template<typename T>
void erase_unique_and_duplicates(std::vector<T>& v)
{
auto first{v.begin()};
std::sort(first, v.end());
while (first != v.end()) {
auto last{std::find_if(first, v.end(), [&](int i) { return i != *first; })};
if (last - first > 1) {
first = v.erase(first + 1, last);
}
else {
first = v.erase(first);
}
}
}
int main(int argc, char** argv)
{
std::vector<int> v{1, 2, 3, 4, 5, 2, 3, 4};
erase_unique_and_duplicates(v);
// The following will print '2 3 4'.
for (int i : v) {
std::cout << i << ' ';
}
std::cout << '\n';
return 0;
}

I have 2 improvements for you:
You can change your count to start at g1.begin() + i, everything before was handled by the previous iterations of the loop.
You can change the if to == 2 instead of > 1, so it will add numbers only once, independent of the occurences. If a number is 5 times in the vector, the first 3 will be ignored, the 4th will make it into the new vector and the 5th will be ignored again. So you can remove the erase of the duplicates
Example:
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
int main() {
vector <int> g1 = {1,1,2,3,3,1,4};
vector <int> g2;
for(int i = 0; i < g1.size(); i++)
{
if(count(g1.begin() + i, g1.end(), g1[i]) == 2)
g2.push_back(g1[i]);
}
for(int i = 0; i < g2.size(); i++)
{
cout << g2[i] << " ";
}
cout << endl;
return 0;
}

I'll borrow a principal from Python which is excellent for such operations -
You can use a dictionary where the dictionary-key is the item in the vector and the dictionary-value is the count (start with 1 and increase by one every time you encounter a value that is already in the dictionary).
afterward, create a new vector (or clear the original) with only the dictionary keys that are larger than 1.
Look up in google - std::map
Hope this helps.

In general, that task got complexity about O(n*n), that's why it appears slow. Does it have to be a vector? Is that a restriction? Must it be ordered? If not, it better to actually store values as std::map, which eliminates doubles when populated, or as a std::multimap if presence of doubles matters.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Removing duplicates in a vector of strings - c++

Related

Sorting an array whose elements are attached to another array [duplicate]

Deleting both an element and its duplicates in a Vector in C++

How to remove duplicated items in a sorted vector

How to draw a sample of n elements from std::set

Keep the duplicated values only - Vectors C++

Categories

Resources