Remove duplicates without using any STL containers

Remove duplicates without using any STL containers - c++

I was asked the following question in a 30-minute interview:
Given an array of integers, remove the duplicates without using any STL containers. For e.g.:
For the input array [1,2,3,4,5,3,3,5,4] the output should be:
[1,2,3,4,5];
Note that the first 3, 4 and 5 have been included, but the subsequent ones have been removed since we have already included them once in the output array. How do we do without using an extra STL container?
In the interview, I assumed that we only have positive integers and suggested using a bit array to mark off every element present in the input (assume every element in the input array as an index of the bit array and update it to 1). Finally, we could iterate over this bit vector, populating (or displaying) the unique elements. However, he was not satisfied with this approach. Any other methods that I could have used?
Thanks.

Just use std::sort() and std::unique():
int arr[] = { 1,2,3,4,5,3,3,5,4 };
std::sort( std::begin(arr), std::end(arr) );
auto end = std::unique( std::begin(arr), std::end(arr) );
Live example

We can first sort the array then check if the next element is equal to the previous one and finally give the answer with the help of another array of size 2 larger than the previous one like this.
Initialize the second array with a value that first array will not take (any number larger/smaller than the limit given) ,suppose 0 for simplicity then
int arr1[] = { 1,2,3,4,5,3,3,5,4 };
int arr2[] = { 0,0,0,0,0,0,0,0,0,0,0 };
std::sort( std::begin(arr1), std::end(arr1) );
int position=1;
arr2[0] = arr1[0];
for(int* i=begin(arr1)+1;i!=end(arr1);i++){
if((*i)!=(*(i-1))){
arr2[position] = (*i);
position++;
}
}
int size = 0;
for(int* i=begin(arr2);i!=end(arr2);i++){
if((*i)!=(*(i+1))){
size++;
}
else{
break;
}
}
int ans[size];
for(int i=0;i<size;i++){
ans[i]=arr2[i];
}

Easy algorithm in O(n^2):
void remove_duplicates(Vec& v) {
// range end
auto it_end = end(v);
for (auto it = begin(v); it != it_end; ++it) {
// remove elements matching *it
it_end = remove(it+1, it_end, *it);
}
// erase now-unused elements
v.erase(it_end, end(v));
}
See also erase-remove idiom
Edit: This is assuming you get a std::vector in, but it would work with C-style arrays too, you would just have to implement the erasure yourself.

Related

check if all item array equal in array [duplicate]

If I have a vector of values and want to check that they are all the same, what is the best way to do this in C++ efficiently? If I were programming in some other language like R one way my minds jumps to is to return only the unique elements of the container and then if the length of the unique elements is more than 1, I know all the elements cannot be the same. In C++ this can be done like this:
//build an int vector
std::sort(myvector.begin(), myvector.end());
std::vector<int>::iterator it;
//Use unique algorithm to get the unique values.
it = std::unique(myvector.begin(), myvector.end());
positions.resize(std::distance(myvector.begin(),it));
if (myvector.size() > 1) {
std::cout << "All elements are not the same!" << std::endl;
}
However reading on the internet and SO, I see other answers such using a set or the find_if algorithm. So what is the most efficient way of doing this and why? I imagine mine is not the best way since it involves sorting every element and then a resizing of the vector - but maybe I'm wrong.

You need not to use std::sort. It can be done in a simpler way:
if ( std::adjacent_find( myvector.begin(), myvector.end(), std::not_equal_to<>() ) == myvector.end() )
{
std::cout << "All elements are equal each other" << std::endl;
}

you can use std::equal
version 1:
//assuming v has at least 1 element
if ( std::equal(v.begin() + 1, v.end(), v.begin()) )
{
//all equal
}
This will compare each element with the previous one.
version 2:
//assuming v has at least 1 element
int e = v[0]; //preferably "const auto& e" instead
bool all_equal = true;
for(std::size_t i = 1,s = v.size();i<s && all_equal;i++)
all_equal = e == v[i];
Edit:
Regarding performance, after testing with 100m elements i found out that in Visual Studio 2015 version 1 is about twice as fast as version 2. This is because the latest compiler for vs2015 uses sse instructions in c++ std implementations when you use ints, float , etc..
if you use _mm_testc_si128 you will get a similar performance to std::equal

using std::all_of and C++11 lambda
if (all_of(values.begin(), values.end(), [&] (int i) {return i == values[0];})){
//all are the same
}

Given no constraints on the vector, you have to iterate through the vector at least once, no matter the approach. So just pick the first element and check that all others are equal to it.

While the asymptotic complexity of std::unique is linear, the actual cost of the operation is probably much larger than you need, and it is an inplace algorithm (it will modify the data as it goes).
The fastest approach is to assume that if the vector contains a single element, it is unique by definition. If the vector contains more elements, then you just need to check whether all of them are exactly equal to the first. For that you only need to find the first element that differs from the first, starting the search from the second. If there is such an element, the elements are not unique.
if (v.size() < 2) return true;
auto different = std::find_if(v.begin()+1, v.end(),
[&v](auto const &x) { x != v[0]; });
return different == v.end();
That is using C++14 syntax, in an C++11 toolchain you can use the correct type in the lambda. In C++03 you could use a combination of std::not, std::bind1st/std::bind2nd and std::equal in place of the lambda.
The cost of this approach is distance(start,different element) comparisons and no copies. Expected and worst case linear cost in the number of comparisons (and no copies!)

Sorting is an O(NlogN) task.
This is easily solvable in O(N), so your current method is poor.
A simple O(N) would be as Luchian Grigore suggests, iterate over the vector, just once, comparing every element to the first element.

if(std::all_of(myvector.begin()+1, myvector.end(), std::bind(std::equal_to<int>(),
std::placeholders::_1, myvector.front())) {
// all members are equal
}

You can use FunctionalPlus(https://github.com/Dobiasd/FunctionalPlus):
std::vector<std::string> things = {"same old", "same old"};
if (fplus::all_the_same(things))
std::cout << "All things being equal." << std::endl;

Maybe something like this. It traverses vector just once and does not mess with the vector content.
std::vector<int> values { 5, 5, 5, 4 };
bool equal = std::count_if(values.begin(), values.end(), [ &values ] (auto size) { return size == values[0]; }) == values.size();
If the values in the vector are something different than basic type you have to implement equality operator.
After taking into account underscore_d remarks, I'm changing possible solution
std::vector<int> values { 5, 5, 5, 4 };
bool equal = std::all_of(values.begin(),values.end(),[ &values ] (auto item) { return item == values[0]; });

In your specific case, iterating over vector element and finding a different element from the first one would be enough. You may even be lucky enough to stop before evaluating all the elements in your vector. (A while loop could be used but I sticked with a for loop for readability reasons)
bool uniqueElt = true;
int firstItem = *myvector.begin();
for (std::vector<int>::const_iterator it = myvector.begin()+1; it != myvector.end() ; ++it) {
if(*it != firstItem) {
uniqueElt = false;
break;
}
}
In case you want to know how many different values your vector contains, you could build a set and check its size to see how many different values are inside:
std::set mySet;
std::copy(mySet.begin(), myvector.begin(), myvector.end());

You can simply use std::count to count all the elements that match the starting element:
std::vector<int> numbers = { 5, 5, 5, 5, 5, 5, 5 };
if (std::count(std::begin(numbers), std::end(numbers), numbers.front()) == numbers.size())
{
std::cout << "Elements are all the same" << std::endl;
}

LLVM provides some independently usable headers+libraries:
#include <llvm/ADT/STLExtras.h>
if (llvm::is_splat(myvector))
std::cout << "All elements are the same!" << std::endl;
https://godbolt.org/z/fQX-jc

for the sake of completeness, because it still isn't the most efficient, you can use std::unique in a more efficient way to decide whether all members are the same, but beware that after using std::unique this way the container is useless:
#include <algorithm>
#include <iterator>
if (std::distance(cntnr.begin(), std::unique(cntnr.begin(), cntnr.end()) == 1)
{
// all members were the same, but
}

Another approach using C++ 14:
bool allEqual = accumulate(v.begin(), v.end(), true, [first = v[0]](bool acc, int b) {
return acc && (b == first);
});
which is also order N.

Here is a readable C++17 solution which might remind students of the other constructors of std::vector:
if (v==std::vector(v.size(),v[0])) {
// you guys are all the same
}
...before C++17, the std::vector rvalue would need its type provided explicitly:
if (v==std::vector<typename decltype(v)::value_type>(v.size(),v[0])) {
// you guys are all the same
}

The C++ function is defined in library in STL. This function operates on whole range of array elements and can save time to run a loop to check each elements one by one. It checks for a given property on every element and returns true when each element in range satisfies specified property, else returns false.
// C++ code to demonstrate working of all_of()
#include <vector>
#include <algorithm>
#include <iostream>
int main()
{
std::vector<int> v(10, 2);
// illustrate all_of
if (std::all_of(v.cbegin(), v.cend(), [](int i){ return i % 2 == 0; }))
{
std::cout << "All numbers are even\n";
}
}

Could you please see what is wrong with my itteration because i guess the problem is in that if from the nest [duplicate]

I have a vector that holds items that are either active or inactive. I want the size of this vector to stay small for performance issues, so I want items that have been marked inactive to be erased from the vector. I tried doing this while iterating but I am getting the error "vector iterators incompatible".
vector<Orb>::iterator i = orbsList.begin();
while(i != orbsList.end()) {
bool isActive = (*i).active;
if(!isActive) {
orbsList.erase(i++);
}
else {
// do something with *i
++i;
}
}

The most readable way I've done this in the past is to use std::vector::erase combined with std::remove_if. In the example below, I use this combination to remove any number less than 10 from a vector.
(For non-c++0x, you can just replace the lambda below with your own predicate:)
// a list of ints
int myInts[] = {1, 7, 8, 4, 5, 10, 15, 22, 50. 29};
std::vector v(myInts, myInts + sizeof(myInts) / sizeof(int));
// get rid of anything < 10
v.erase(std::remove_if(v.begin(), v.end(),
[](int i) { return i < 10; }), v.end());

I agree with wilx's answer. Here is an implementation:
// curFiles is: vector < string > curFiles;
vector< string >::iterator it = curFiles.begin();
while(it != curFiles.end()) {
if(aConditionIsMet) {
it = curFiles.erase(it);
}
else ++it;
}

You can do that but you will have to reshuffle your while() a bit, I think. The erase() function returns an iterator to the element next after the erased one: iterator erase(iterator position);. Quoting from the standard from 23.1.1/7:
The iterator returned from a.erase(q)
points to the element immediately
following q prior to the element being
erased. If no such element exists,
a.end() is returned.
Though maybe you should be using the Erase-remove idiom instead.

erase returns a pointer to the next iterator value (same as Vassilis):
vector <cMyClass>::iterator mit
for(mit = myVec.begin(); mit != myVec.end(); )
{ if(condition)
mit = myVec.erase(mit);
else
mit++;
}

If someone need working on indexes
vector<int> vector;
for(int i=0;i<10;++i)vector.push_back(i);
int size = vector.size();
for (int i = 0; i < size; ++i)
{
assert(i > -1 && i < (int)vector.size());
if(vector[i] % 3 == 0)
{
printf("Removing %d, %d\n",vector[i],i);
vector.erase(vector.begin() + i);
}
if (size != (int)vector.size())
{
--i;
size = vector.size();
printf("Go back %d\n",size);
}
}

As they said, vector's iterators get invalidated on vector::erase() no matter which form of iterator increment you use. Use an integer index instead.

You might want to consider using a std::list instead of a std::vector for your data structure. It is safer (less bug prone) to use when combining erasure with iteration.

Removing items from the middle of a vector will invalidate all iterators to that vector, so you cannot do this (update: without resorting to Wilx's suggestion).
Also, if you're worried about performance, erasing items from the middle of a vector is a bad idea anyway. Perhaps you want to use an std::list?

C++ insert element at the beginning of a vector using insert()

Below I have a function which is supposed to extract from a vector into another vector the odd numbers and in the old vector i want to insert the even numbers at the beginning of the vector so that later to resize is. I know it is not really an effective way but I have a problem from a book to test the speed of the variant with resize instead of erasing, which I think would have the same speed anyway.. My problem is that in else when I want to insert the element which even, it does not work in this way.. What am I doing wrong in the insert function? Am I passing in a wrong way the iterator?
std::vector<int> extract(std::vector<int>& even)
{
std::vector<int> odd;
std::vector<int>::size_type size = even.size();
std::vector<int>::const_iterator a = even.begin();
std::vector<int>::const_iterator b = even.end();
while (a != b)
{
if ((*a) % 2 != 0)
{
odd.push_back((*a));
}
else
{
even.insert(even.begin(),(*a));
}
a++;
}
//even.resize(size);
return odd;
}

Removing zeros from an array and resizing the array to the new amount of elements?

I am having another problem with manipulating data in a C++ array. I now want to decimate the array by removing all the zeros from it.
So for example say before I had array[4] = {1,2,0,0,4} It would become array[3] = {1,2,4}.
I know that I will need to use a for loop to iterate through the array storing the main data and that I will most likely need to initialize a new array to store the decimated data but I am not quite sure how to go about it.

You cannot resize a plain array, since it is statically allocated. Thus, it is probably better to use a vector from the standard library (STL). In such a way you would not need to create a new array. Actually, unless there is a strong reason, it is typically better to use std::vector or std::array (in C++11) than plain C-like arrays.
By using vector, you can do something like:
std::vector<int> v{1,2,0,0,4};
v.erase(
std::remove(v.begin(), v.end(), 0),
v.end());
After erasing the zero elements, the vector still has capacity 5, though (of course v.size() would return 3, as expected). If you can use C++11 then you can go a little bit further:
v.shrink_to_fit();
The call to shrink_to_fit reduces the vector's capacity to accommodate it to the actual number of elements in it (3 in the example). That could lead to memory savings (especially if there are many elements in the vector).

If you have to resize array's why not simply use std::vector. The example does it.
#include <vector>
#include <algorithm>
bool isZero (int i)
{
return i == 0;
}
int main()
{
std::vector<int> myarray;
myarray.push_back( 0 );
myarray.push_back( 1 );
myarray.push_back( 0 );
myarray.push_back( 3 );
myarray.push_back( 9 );
std::vector<int>::iterator newIter = std::remove_if( myarray.begin() , myarray.end() , isZero);
myarray.resize( newIter - myarray.begin() );
return 0;
}

If you don't know the content of the array, you cannot know how many
values will be non-zero, so your memory must be dynamically
allocated. Use std::vector.
std::vector<int> v;
std::copy_if(begin(array), end(array), std::back_inserter(v),
[](int x) { return x != 0; });
If you would start with a vector to begin with, you could manipulate the data in-place with erase-remove.
v.erase(std::remove(begin(v), end(v), 0), end(v));
If you really want to do it the hard way:
// count
auto non_zero_count = std::count_if(begin(array), end(array),
[](int x) { return x != 0;});
// allocate
int* new_array{new int[x]};
std::copy_if(begin(array), end(array), new_array,
[](int x) { return x != 0; });
There is really no solution to arrive at fixed size array here, unless you know all your inputs.

Suppose you have an array and you want to remove the 0 value in the array and resize it.
int toResize[] = {4,3,2,0,8,7,9,0,5,4,7,0}; //12 elements
vector<int>resized;
vector<int>::iterator it;
for(int i=0;i<12;i++){
int check = toResize[i];
if(check!=0){
resized.push_back(check);
}
}
for ( it=resized.begin() ; it < resized.end(); it++ )
cout << " " << *it;
Feel free to mark the question answered if you are satisfied.

Erasing multiple objects from a std::vector?

Here is my issue, lets say I have a std::vector with ints in it.
let's say it has 50,90,40,90,80,60,80.
I know I need to remove the second, fifth and third elements. I don't necessarily always know the order of elements to remove, nor how many. The issue is by erasing an element, this changes the index of the other elements. Therefore, how could I erase these and compensate for the index change. (sorting then linearly erasing with an offset is not an option)
Thanks

I am offering several methods:
1. A fast method that does not retain the original order of the elements:
Assign the current last element of the vector to the element to erase, then erase the last element. This will avoid big moves and all indexes except the last will remain constant. If you start erasing from the back, all precomputed indexes will be correct.
void quickDelete( int idx )
{
vec[idx] = vec.back();
vec.pop_back();
}
I see this essentially is a hand-coded version of the erase-remove idiom pointed out by Klaim ...
2. A slower method that retains the original order of the elements:
Step 1: Mark all vector elements to be deleted, i.e. with a special value. This has O(|indexes to delete|).
Step 2: Erase all marked elements using v.erase( remove (v.begin(), v.end(), special_value), v.end() );. This has O(|vector v|).
The total run time is thus O(|vector v|), assuming the index list is shorter than the vector.
3. Another slower method that retains the original order of the elements:
Use a predicate and remove if as described in https://stackoverflow.com/a/3487742/280314 . To make this efficient and respecting the requirement of
not "sorting then linearly erasing with an offset", my idea is to implement the predicate using a hash table and adjust the indexes stored in the hash table as the deletion proceeds on returning true, as Klaim suggested.

Using a predicate and the algorithm remove_if you can achieve what you want : see http://www.cplusplus.com/reference/algorithm/remove_if/
Don't forget to erase the item (see remove-erase idiom).
Your predicate will simply hold the idx of each value to remove and decrease all indexes it keeps each time it returns true.
That said if you can afford just removing each object using the remove-erase idiom, just make your life simple by doing it.

Erase the items backwards. In other words erase the highest index first, then next highest etc. You won't invalidate any previous iterators or indexes so you can just use the obvious approach of multiple erase calls.

I would move the elements which you don't want to erase to a temporary vector and then replace the original vector with this.

While this answer by Peter G. in variant one (the swap-and-pop technique) is the fastest when you do not need to preserve the order, here is the unmentioned alternative which maintains the order.
With C++17 and C++20 the removal of multiple elements from a vector is possible with standard algorithms. The run time is O(N * Log(N)) due to std::stable_partition. There are no external helper arrays, no excessive copying, everything is done inplace. Code is a "one-liner":
template <class T>
inline void erase_selected(std::vector<T>& v, const std::vector<int>& selection)
{
v.resize(std::distance(
v.begin(),
std::stable_partition(v.begin(), v.end(),
[&selection, &v](const T& item) {
return !std::binary_search(
selection.begin(),
selection.end(),
static_cast<int>(static_cast<const T*>(&item) - &v[0]));
})));
}
The code above assumes that selection vector is sorted (if it is not the case, std::sort over it does the job, obviously).
To break this down, let us declare a number of temporaries:
// We need an explicit item index of an element
// to see if it should be in the output or not
int itemIndex = 0;
// The checker lambda returns `true` if the element is in `selection`
auto filter = [&itemIndex, &sorted_sel](const T& item) {
return !std::binary_search(
selection.begin(),
selection.end(),
itemIndex++);
};
This checker lambda is then fed to std::stable_partition algorithm which is guaranteed to call this lambda only once for each element in the original (unpermuted !) array v.
auto end_of_selected = std::stable_partition(
v.begin(),
v.end(),
filter);
The end_of_selected iterator points right after the last element which should remain in the output array, so we now can resize v down. To calculate the number of elements we use the std::distance to get size_t from two iterators.
v.resize(std::distance(v.begin(), end_of_selected));
This is different from the code at the top (it uses itemIndex to keep track of the array element). To get rid of the itemIndex, we capture the reference to source array v and use pointer arithmetic to calculate itemIndex internally.
Over the years (on this and other similar sites) multiple solutions have been proposed, but usually they employ multiple "raw loops" with conditions and some erase/insert/push_back calls. The idea behind stable_partition is explained beautifully in this talk by Sean Parent.
This link provides a similar solution (and it does not assume that selection is sorted - std::find_if instead of std::binary_search is used), but it also employs a helper (incremented) variable which disables the possibility to parallelize processing on larger arrays.
Starting from C++17, there is a new first argument to std::stable_partition (the ExecutionPolicy) which allows auto-parallelization of the algorithm, further reducing the run-time for big arrays. To make yourself believe this parallelization actually works, there is another talk by Hartmut Kaiser explaining the internals.

Would this work:
void DeleteAll(vector<int>& data, const vector<int>& deleteIndices)
{
vector<bool> markedElements(data.size(), false);
vector<int> tempBuffer;
tempBuffer.reserve(data.size()-deleteIndices.size());
for (vector<int>::const_iterator itDel = deleteIndices.begin(); itDel != deleteIndices.end(); itDel++)
markedElements[*itDel] = true;
for (size_t i=0; i<data.size(); i++)
{
if (!markedElements[i])
tempBuffer.push_back(data[i]);
}
data = tempBuffer;
}
It's an O(n) operation, no matter how many elements you delete. You could gain some efficiency by reordering the vector inline (but I think this way it's more readable).

This is non-trival because as you delete elements from the vector, the indexes change.
[0] hi
[1] you
[2] foo
>> delete [1]
[0] hi
[1] foo
If you keep a counter of times you delete an element and if you have a list of indexes you want to delete in sorted order then:
int counter = 0;
for (int k : IndexesToDelete) {
events.erase(events.begin()+ k + counter);
counter -= 1;
}

You can use this method, if the order of the remaining elements doesn't matter
#include <iostream>
#include <vector>
using namespace std;
int main()
{
vector< int> vec;
vec.push_back(1);
vec.push_back(-6);
vec.push_back(3);
vec.push_back(4);
vec.push_back(7);
vec.push_back(9);
vec.push_back(14);
vec.push_back(25);
cout << "The elements befor " << endl;
for(int i = 0; i < vec.size(); i++) cout << vec[i] <<endl;
vector< bool> toDeleted;
int YesOrNo = 0;
for(int i = 0; i<vec.size(); i++)
{
cout<<"You need to delete this element? "<<vec[i]<<", if yes enter 1 else enter 0"<<endl;
cin>>YesOrNo;
if(YesOrNo)
toDeleted.push_back(true);
else
toDeleted.push_back(false);
}
//Deleting, beginning from the last element to the first one
for(int i = toDeleted.size()-1; i>=0; i--)
{
if(toDeleted[i])
{
vec[i] = vec.back();
vec.pop_back();
}
}
cout << "The elements after" << endl;
for(int i = 0; i < vec.size(); i++) cout << vec[i] <<endl;
return 0;
}

Here's an elegant solution in case you want to preserve the indices, the idea is to replace the values you want to delete with a special value that is guaranteed not be used anywhere, and then at the very end, you perform the erase itself:
std::vector<int> vec = {1, 2, 3, 4, 5, 6, 7, 8, 9};
// marking 3 elements to be deleted
vec[2] = std::numeric_limits<int>::lowest();
vec[5] = std::numeric_limits<int>::lowest();
vec[3] = std::numeric_limits<int>::lowest();
// erase
vec.erase(std::remove(vec.begin(), vec.end(), std::numeric_limits<int>::lowest()), vec.end());
// print values => 1 2 5 7 8 9
for (const auto& value : vec) std::cout << ' ' << value;
std::cout << std::endl;
It's very quick if you delete a lot of elements because the deletion itself is happening only once. Items can also be deleted in any order that way.
If you use a a struct instead of an int, then you can still mark an element of that struct, for ex dead=true and then use remove_if instead of remove =>
struct MyObj
{
int x;
bool dead = false;
};
std::vector<MyObj> objs = {{1}, {2}, {3}, {4}, {5}, {6}, {7}, {8}, {9}};
objs[2].dead = true;
objs[5].dead = true;
objs[3].dead = true;
objs.erase(std::remove_if(objs.begin(), objs.end(), [](const MyObj& obj) { return obj.dead; }), objs.end());
// print values => 1 2 5 7 8 9
for (const auto& obj : objs) std::cout << ' ' << obj.x;
std::cout << std::endl;
This one is a bit slower, around 80% the speed of the remove.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Remove duplicates without using any STL containers - c++

Just use std::sort() and std::unique(): int arr[] = { 1,2,3,4,5,3,3,5,4 }; std::sort( std::begin(arr), std::end(arr) ); auto end = std::unique( std::begin(arr), std::end(arr) ); Live example

Related

check if all item array equal in array [duplicate]

Could you please see what is wrong with my itteration because i guess the problem is in that if from the nest [duplicate]

C++ insert element at the beginning of a vector using insert()

Removing zeros from an array and resizing the array to the new amount of elements?

Erasing multiple objects from a std::vector?

Categories

Resources