Fastest way to remove duplicates from a vector<>

Fastest way to remove duplicates from a vector<> - c++

As the title says, I have in my mind some methods to do it but I don't know which is fastest.
So let's say that we have a: vector<int> vals with some values
1
After my vals are added
sort(vals.begin(), vals.end());
auto last = unique(vals.begin(), vals.end());
vals.erase(last, vals.end());
2
Convert to set after my vals are added:
set<int> s( vals.begin(), vals.end() );
vals.assign( s.begin(), s.end() );
3
When i add my vals, i check if it's already in my vector:
if( find(vals.begin(), vals.end(), myVal)!=vals.end() )
// add my val
4
Use a set from start
Ok, I've got these 4 methods, my questions are:
1 From 1, 2 and 3 which is the fastest?
2 Is 4 faster than the first 3?
3 At 2 after converting the vector to set, it's more convenabile to use the set to do what I need to do or should I do the vals.assign( .. ) and continue with my vector?

Question 1: Both 1 and 2 are O(n log n), 3 is O(n^2). Between 1 and 2, it depends on the data.
Question 2: 4 is also O(n log n) and can be better than 1 and 2 if you have lots of duplicates, because it only stores one copy of each. Imagine a million values that are all equal.
Question 3: Well, that really depends on what you need to do.
The only thing that can be said without knowing more is that your alternative number 3 is asymptotically worse than the others.
If you're using C++11 and don't need ordering, you can use std::unordered_set, which is a hash table and can be significantly faster than std::set.

Option 1 is going to beat all the others. The complexity is just O(N log N) and the contiguous memory of vector keeps the constant factors low.
std::set typically suffers a lot from non-contiguous allocations. It's not just slow to access those, just creating them takes significant time as well.

These methods all have their shortcomings although (1) is worth looking at.
But, take a look at this 5th option: Bear in mind that you can access the vector's data buffer using the data() function. Then, bearing in mind that no reallocation will take place since the vector will only ever get smaller, apply the algorithm that you learn at school:
unduplicate(vals.data(), vals.size());
void unduplicate(int* arr, std::size_t length) /*Reference: Gang of Four, I think*/
{
int *it, *end = arr + length - 1;
for (it = arr + 1; arr < end; arr++, it = arr + 1){
while (it <= end){
if (*it == *arr){
*it = *end--;
} else {
++it;
}
}
}
}
And resize the vector at the end if that is what's required. This is never worse than O(N^2), so is superior to insertion-sort or sort then remove approaches.
Your 4th option might be an idea if you can adopt it. Profile the performance. Otherwise use my algorithm from the 1960s.

I've got a similar problem recently, and experimented with 1, 2, and 4, as well as with unordered_set version of 4. In turned out that the best performance was the latter one, 4 with unordered_set in place of set.
BTW, that empirical finding is not too surprising if one considers that both set and sort were a bit of overkill: they guaranteed relative order of unequal elements. For example inputs 4,3,5,2,4,3 would lead to sorted output of unique values 2,3,4,5. This is unnecessary if you can live with unique values in arbitrary order, i.e. 3,4,2,5. When you use unordered_set it doesn't guarantee the order, only uniqueness, and therefore it doesn't have to perform the additional work of ensuring the order of different elements.

Related

CodeSignal: Execution time limit exceeded with c++

I am trying to solve the programming problem firstDuplicate on codesignal. The problem is "Given an array a that contains only numbers in the range 1 to a.length, find the first duplicate number for which the second occurrence has minimal index".
Example: For a = [2, 1, 3, 5, 3, 2] the output should be firstDuplicate(a) = 3
There are 2 duplicates: numbers 2 and 3. The second occurrence of 3 has a smaller index than the second occurrence of 2 does, so the answer is 3.
With this code I pass 21/23 tests, but then it tells me that the program exceeded the execution time limit on test 22. How would I go about making it faster so that it passes the remaining two tests?
#include <algorithm>
int firstDuplicate(vector<int> a) {
vector<int> seen;
for (size_t i = 0; i < a.size(); ++i){
if (std::find(seen.begin(), seen.end(), a[i]) != seen.end()){
return a[i];
}else{
seen.push_back(a[i]);
}
}
if (seen == a){
return -1;
}
}

Anytime you get asked a question about "find the duplicate", "find the missing element", or "find the thing that should be there", your first instinct should be use a hash table. In C++, there are the unordered_map and unordered_set classes that are for such types of coding exercises. The unordered_set is effectively a map of keys to bools.
Also, pass you vector by reference, not value. Passing by value incurs the overhead of copying the entire vector.
Also, that comparison seems costly and unnecessary at the end.
This is probably closer to what you want:
#include <unordered_set>
int firstDuplicate(const vector<int>& a) {
std::unordered_set<int> seen;
for (int i : a) {
auto result_pair = seen.insert(i);
bool duplicate = (result_pair.second == false);
if (duplicate) {
return (i);
}
}
return -1;
}

std::find is linear time complexity in terms of distance between first and last element (or until the number is found) in the container, thus having a worst-case complexity of O(N), so your algorithm would be O(N^2).
Instead of storing your numbers in a vector and searching for it every time, Yyu should do something like hashing with std::map to store the numbers encountered and return a number if while iterating, it is already present in the map.
std::map<int, int> hash;
for(const auto &i: a) {
if(hash[i])
return i;
else
hash[i] = 1;
}
Edit: std::unordered_map is even more efficient if the order of keys doesn't matter, since insertion time complexity is constant in average case as compared to logarithmic insertion complexity for std::map.

It's probably an unnecessary optimization, but I think I'd try to take slightly better advantage of the specification. A hash table is intended primarily for cases where you have a fairly sparse conversion from possible keys to actual keys--that is, only a small percentage of possible keys are ever used. For example, if your keys are strings of length up to 20 characters, the theoretical maximum number of keys is 25620. With that many possible keys, it's clear no practical program is going to store any more than a minuscule percentage, so a hash table makes sense.
In this case, however, we're told that the input is: "an array a that contains only numbers in the range 1 to a.length". So, even if half the numbers are duplicates, we're using 50% of the possible keys.
Under the circumstances, instead of a hash table, even though it's often maligned, I'd use an std::vector<bool>, and expect to get considerably better performance in the vast majority of cases.
int firstDuplicate(std::vector<int> const &input) {
std::vector<bool> seen(input.size()+1);
for (auto i : input) {
if (seen[i])
return i;
seen[i] = true;
}
return -1;
}
The advantage here is fairly simple: at least in a typical case, std::vector<bool> uses a specialization to store bools in only one bit apiece. This way we're storing only one bit for each number of input, which increases storage density, so we can expect excellent use of the cache. In particular, as long as the number of bytes in the cache is at least a little more than 1/8th the number of elements in the input array, we can expect all of seen to be in the cache most of the time.
Now make no mistake: if you look around, you'll find quite a few articles pointing out that vector<bool> has problems--and for some cases, that's entirely true. There are places and times that vector<bool> should be avoided. But none of its limitations applies to the way we're using it here--and it really does give an advantage in storage density that can be quite useful, especially for cases like this one.
We could also write some custom code to implement a bitmap that would give still faster code than vector<bool>. But using vector<bool> is easy, and writing our own replacement that's more efficient is quite a bit of extra work...

Is std::sort the best choice to do in-place sort for a huge array with limited integer value?

I want to sort an array with huge(millions or even billions) elements, while the values are integers within a small range(1 to 100 or 1 to 1000), in such a case, is std::sort and the parallelized version __gnu_parallel::sort the best choice for me?
actually I want to sort a vecotor of my own class with an integer member representing the processor index.
as there are other member inside the class, so, even if two data have same integer member that is used for comparing, they might not be regarded as same data.

Counting sort would be the right choice if you know that your range is so limited. If the range is [0,m) the most efficient way to do so it have a vector in which the index represent the element and the value the count. For example:
vector<int> to_sort;
vector<int> counts;
for (int i : to_sort) {
if (counts.size() < i) {
counts.resize(i+1, 0);
}
counts[i]++;
}
Note that the count at i is lazily initialized but you can resize once if you know m.
If you are sorting objects by some field and they are all distinct, you can modify the above as:
vector<T> to_sort;
vector<vector<const T*>> count_sorted;
for (const T& t : to_sort) {
const int i = t.sort_field()
if (count_sorted.size() < i) {
count_sorted.resize(i+1, {});
}
count_sorted[i].push_back(&t);
}
Now the main difference is that your space requirements grow substantially because you need to store the vectors of pointers. The space complexity went from O(m) to O(n). Time complexity is the same. Note that the algorithm is stable. The code above assumes that to_sort is in scope during the life cycle of count_sorted. If your Ts implement move semantics you can store the object themselves and move them in. If you need count_sorted to outlive to_sort you will need to do so or make copies.
If you have a range of type [-l, m), the substance does not change much, but your index now represents the value i + l and you need to know l beforehand.
Finally, it should be trivial to simulate an iteration through the sorted array by iterating through the counts array taking into account the value of the count. If you want stl like iterators you might need a custom data structure that encapsulates that behavior.
Note: in the previous version of this answer I mentioned multiset as a way to use a data structure to count sort. This would be efficient in some java implementations (I believe the Guava implementation would be efficient) but not in C++ where the keys in the RB tree are just repeated many times.

You say "in-place", I therefore assume that you don't want to use O(n) extra memory.
First, count the number of objects with each value (as in Gionvanni's and ronaldo's answers). You still need to get the objects into the right locations in-place. I think the following works, but I haven't implemented or tested it:
Create a cumulative sum from your counts, so that you know what index each object needs to go to. For example, if the counts are 1: 3, 2: 5, 3: 7, then the cumulative sums are 1: 0, 2: 3, 3: 8, 4: 15, meaning that the first object with value 1 in the final array will be at index 0, the first object with value 2 will be at index 3, and so on.
The basic idea now is to go through the vector, starting from the beginning. Get the element's processor index, and look up the corresponding cumulative sum. This is where you want it to be. If it's already in that location, move on to the next element of the vector and increment the cumulative sum (so that the next object with that value goes in the next position along). If it's not already in the right location, swap it with the correct location, increment the cumulative sum, and then continue the process for the element you swapped into this position in the vector.
There's a potential problem when you reach the start of a block of elements that have already been moved into place. You can solve that by remembering the original cumulative sums, "noticing" when you reach one, and jump ahead to the current cumulative sum for that value, so that you don't revisit any elements that you've already swapped into place. There might be a cleverer way to deal with this, but I don't know it.
Finally, compare the performance (and correctness!) of your code against std::sort. This has better time complexity than std::sort, but that doesn't mean it's necessarily faster for your actual data.

You definitely want to use counting sort. But not the one you're thinking of. Its main selling point is that its time complexity is O(N+X) where X is the maximum value you allow the sorting of.
Regular old counting sort (as seen on some other answers) can only sort integers, or has to be implemented with a multiset or some other data structure (becoming O(Nlog(N))). But a more general version of counting sort can be used to sort (in place) anything that can provide an integer key, which is perfectly suited to your use case.
The algorithm is somewhat different though, and it's also known as American Flag Sort. Just like regular counting sort, it starts off by calculating the counts.
After that, it builds a prefix sums array of the counts. This is so that we can know how many elements should be placed behind a particular item, thus allowing us to index into the right place in constant time.
since we know the correct final position of the items, we can just swap them into place. And doing just that would work if there weren't any repetitions but, since it's almost certain that there will be repetitions, we have to be more careful.
First: when we put something into its place we have to increment the value in the prefix sum so that the next element with same value doesn't remove the previous element from its place.
Second: either
keep track of how many elements of each value we have already put into place so that we dont keep moving elements of values that have already reached their place, this requires a second copy of the counts array (prior to calculating the prefix sum), as well as a "move count" array.
keep a copy of the prefix sums shifted over by one so that we stop moving elements once the stored position of the latest element
reaches the first position of the next value.
Even though the first approach is somewhat more intuitive, I chose the second method (because it's faster and uses less memory).
template<class It, class KeyOf>
void countsort (It begin, It end, KeyOf key_of) {
constexpr int max_value = 1000;
int final_destination[max_value] = {}; // zero initialized
int destination[max_value] = {}; // zero initialized
// Record counts
for (It it = begin; it != end; ++it)
final_destination[key_of(*it)]++;
// Build prefix sum of counts
for (int i = 1; i < max_value; ++i) {
final_destination[i] += final_destination[i-1];
destination[i] = final_destination[i-1];
}
for (auto it = begin; it != end; ++it) {
auto key = key_of(*it);
// while item is not in the correct position
while ( std::distance(begin, it) != destination[key] &&
// and not all items of this value have reached their final position
final_destination[key] != destination[key] ) {
// swap into the right place
std::iter_swap(it, begin + destination[key]);
// tidy up for next iteration
++destination[key];
key = key_of(*it);
}
}
}
Usage:
vector<Person> records = populateRecords();
countsort(records.begin(), records.end(), [](Person const &){
return Person.id()-1; // map [1, 1000] -> [0, 1000)
});
This can be further generalized to become MSD Radix Sort,
here's a talk by Malte Skarupke about it: https://www.youtube.com/watch?v=zqs87a_7zxw
Here's a neat visualization of the algorithm: https://www.youtube.com/watch?v=k1XkZ5ANO64

The answer given by Giovanni Botta is perfect, and Counting Sort is definitely the way to go. However, I personally prefer not to go resizing the vector progressively, but I'd rather do it this way (assuming your range is [0-1000]):
vector<int> to_sort;
vector<int> counts(1001);
int maxvalue=0;
for (int i : to_sort) {
if(i > maxvalue) maxvalue = i;
counts[i]++;
}
counts.resize(maxvalue+1);
It is essentially the same, but no need to be constantly managing the size of the counts vector. Depending on your memory constraints, you could use one solution or the other.

Getting unique elements from a container [c++]

I was looking to get only the unique elements from a container. Let's say srcContainer is the container from which I want unique elements. I looked at three options:
Using std::unique
std::sort(srcContainer.begin(), srcContainer.end());
srcContainer.erase(std::unique(srcContainer.begin(), srcContainer.end()), srcContainer.end());
Using BOOST::unique
boost::erase(srcContainer, boost::unique<boost::return_found_end>(boost::sort(srcContainer)));
My own method
std::set<T> uniqueElems(srcContainer.begin(), srcContainer.end());
srcContainer.clear();
srcContainer.insert(srcContainer.end(), uniqueElems.begin(), uniqueElems.end());
The issue with 1. and 2. are that they change the order in which members occurred in the original srcContainer. With 3. there is no change in order, and in addition it gives a much better performance compared to 1. and 2 (Is it because there is no explicit sorting in 3. ??) above. The elapsed wall clock time for 3 methods above and the number of elements in srcContainer are given below:
size of srcContainer (contains integers) = 1e+6
- std::unique = 1.04779 secs
- BOOST::unique = 1.04774 secs
- Own method = 0.481638 secs
size of srcContainer (contains integers) = 1e+8
- std::unique = 151.554 secs
- BOOST::unique = 151.474 secs
- Own method = 57.5693 secs
My question is:
Is there better way to find unique using std::unique or BOOST::unique or any other code and maintaining the original order in the container?
Any issue with using method 3. above.
For performance profiling srcContainer was created as follows:
std::vector<int> srcContainer;
int halfWay = numElems/2;
for (size_t k=0; k<numElems; ++k) {
if (k < halfWay)
srcContainer.push_back(k);
else
srcContainer.push_back(k - halfWay);
}
Edits:
Agreed with comments that method 3. also changes the order of elements. Is there a better way to get unique elements without changing order?
Thanks

EDIT based on info about source data:
The reason you're seeing the set insertion complete faster than sorting the vector is that your input data is two already sorted ranges. For quicksort (typically used by std::sort) this is a degenerate case and one of the worst possible inputs you could give it. For an input size of 1e8 changing the sort from std::sort to std::stable_sort cut the runtime from ~25s to <9s.
If you want to keep the original item order, you could try something like the following which keeps a hash of all the items. I have no idea what the performance of this would be, but for example you could utilize an approach with hashing and remove_if as sketched below:
struct Remover
{
explicit Remover(hash& found_items) : found_items_(found_items) { }
bool operator()(const Iter& item) { retval = <does exist in hash>; add to hash; return retval; }
hash& found_items_;
};
hash dup_finder;
Remover remover(dup_finder);
std::erase(std::remove_if(src.begin(), src.end(), remover), src.end());
Original components of my answer:
If the elements in the source container are already mostly sorted, you may see better performance with stable_sort rather than sort prior to calling unique. I'm unable to guess without more information about yoru data set what might cause option 3 to perform better than 1&2.
Option 3 should remove uniques but keep in mind that in spite of what you assert, it will still reorder the items in exactly the same way that the first two options do.

C++ Array Intersection

Does anyone know if it's possible to turn this from O(m * n) to O(m + n)?
vector<int> theFirst;
vector<int> theSecond;
vector<int> theMatch;
theFirst.push_back( -2147483648 );
theFirst.push_back(2);
theFirst.push_back(44);
theFirst.push_back(1);
theFirst.push_back(22);
theFirst.push_back(1);
theSecond.push_back(1);
theSecond.push_back( -2147483648 );
theSecond.push_back(3);
theSecond.push_back(44);
theSecond.push_back(32);
theSecond.push_back(1);
for( int i = 0; i < theFirst.size(); i++ )
{
for( int x = 0; x < theSecond.size(); x++ )
{
if( theFirst[i] == theSecond[x] )
{
theMatch.push_back( theFirst[i] );
}
}
}

Put the contents of the first vector into a hash set, such as std::unordered_set. That is O(m). Scan the second vector, checking if the values are in the unordered_set and keeping a tally of those that are. That is n lookups of a hash structure, so O(n). So, O(m+n). If you have l elements in the overlap, you may count O(l) for adding them to the third vector. std::unordered_set is in the C++0x draft and available in the latest gcc versions, and there is also an implementation in boost.
Edited to use unordered_set
Using C++2011 syntax:
unordered_set<int> firstMap(theFirst.begin(), theFirst.end());
for (const int& i : theSecond) {
if (firstMap.find(i)!=firstMap.end()) {
cout << "Duplicate: " << i << endl;
theMatch.push_back(i);
}
}
Now, the question still remains, what do you want to do with duplicates in the originals? Explicitly, how many times should 1 be in theMatch, 1, 2 or 4 times?
This outputs:
Duplicate: 1
Duplicate: -2147483648
Duplicate: 44
Duplicate: 1

Using this: http://www.cplusplus.com/reference/algorithm/set_intersection/
You should be able to achieve O(mlogm + nlogn) I believe. (set_intersection requires that the input ranges be already sorted).
This might perform a bit differently than your solution for duplicate elements, however.

Please correct me if I am wrong,
you are suggesting following solution for the intersection problem:
sort two vectors, and keep iteration in both sorted vector in such a way that we reach to a common element,
so overall complexity will be
(n*log(n) + m*log(m)) + (n + m)
Assuming k*log(k) as complexity of sorting
Am I right?
Ofcourse the complexity will depend on the complexity of sorting.

I would sort the longer array O(n*log (n)), search for elements from the shorter array O(m*log (n)). Total is then O(n*log(n) + m*log (n) )

Assuming you want to produce theMatch from two data sets, and you don't care about the data sets themselves, put one in an unordered_map (available currently from Boost and listed in the final committee draft for C++11), mapping the key to an integer that increases whenever added to, and therefore keeps track of the number of times the key occurs. Then, when you get a hit on the other data set, you push_back the hit the number of times it occurred in the first time.
You can get to O(n log n + m log m) by sorting the vectors first, or O(n log n + m) by creating a std::map of one of them.
Caveat: these are not order-preserving operations, and theMatch will come out in different orders with different techniques. It looks to me like the order is likely considered arbitrary. If the order given in the code above is necessary, I don't think there's a better algorithm.
Edit:
Take data set A and data set B, of type Type. Create an unordered_map<Type, int>.
Go through data set A, and check each member to see if it's in the map. If not, add the element with the int 1 to the map. If it is, increment the int. Each of these operations is O(1) on the average, so this step is O(len A).
Go through data set B, and check each member to see if it's in the map. If not, go on to the next. If so, push_back the member onto the destination queue. The int is the number of times that value is in data set A, so do the push_back the number of times the member's in A to duplicate the behavior given. Each of these operations is on the average O(1), so this step is O(len B).
This is average behavior. If you always hit the worst case, you're back with O(m*n). I don't think there's a way to guarantee O(m + n).

If the order of the elements in the resulting array/set doesn't matter then the answer is yes.
For the arbitrary types of elements with some order defined the best algorithm is O( max(m,n)*log(min(m,n)) ). For the numbers of limited size the best algorithm is O(m+n).
Construct the set of elements of smaller array - for arbitrary elements just sorting is OK and for the numbers of limited size it must be something similar to intermediate table in numeric sort.
Iterate through larger array and check if the element is within a set constructed earlier - for the arbitrary element binary search is OK (which is O(log(min(n,m))) and for numbers the single check is O(1).

How does one remove duplicate elements in place in an array in O(n) in C or C++?

Is there any method to remove the duplicate elements in an array in place in C/C++ in O(n)?
Suppose elements are a[5]={1,2,2,3,4}
then resulting array should contain {1,2,3,4}
The solution can be achieved using two for loops but that would be O(n^2) I believe.

If, and only if, the source array is sorted, this can be done in linear time:
std::unique(a, a + 5); //Returns a pointer to the new logical end of a.
Otherwise you'll have to sort first, which is (99.999% of the time) n lg n.

Best case is O(n log n). Perform a heap sort on the original array: O(n log n) in time, O(1)/in-place in space. Then run through the array sequentially with 2 indices (source & dest) to collapse out repetitions. This has the side effect of not preserving the original order, but since "remove duplicates" doesn't specify which duplicates to remove (first? second? last?), I'm hoping that you don't care that the order is lost.
If you do want to preserve the original order, there's no way to do things in-place. But it's trivial if you make an array of pointers to elements in the original array, do all your work on the pointers, and use them to collapse the original array at the end.
Anyone claiming it can be done in O(n) time and in-place is simply wrong, modulo some arguments about what O(n) and in-place mean. One obvious pseudo-solution, if your elements are 32-bit integers, is to use a 4-gigabit bit-array (512 megabytes in size) initialized to all zeros, flipping a bit on when you see that number and skipping over it if the bit was already on. Of course then you're taking advantage of the fact that n is bounded by a constant, so technically everything is O(1) but with a horrible constant factor. However, I do mention this approach since, if n is bounded by a small constant - for instance if you have 16-bit integers - it's a very practical solution.

Yes. Because access (insertion or lookup) on a hashtable is O(1), you can remove duplicates in O(N).
Pseudocode:
hashtable h = {}
numdups = 0
for (i = 0; i < input.length; i++) {
if (!h.contains(input[i])) {
input[i-numdups] = input[i]
h.add(input[i])
} else {
numdups = numdups + 1
}
This is O(N).
Some commenters have pointed out that whether a hashtable is O(1) depends on a number of things. But in the real world, with a good hash, you can expect constant-time performance. And it is possible to engineer a hash that is O(1) to satisfy the theoreticians.

I'm going to suggest a variation on Borealids answer, but I'll point out up front that it's cheating. Basically, it only works assuming some severe constraints on the values in the array - e.g. that all keys are 32-bit integers.
Instead of a hash table, the idea is to use a bitvector. This is an O(1) memory requirement which should in theory keep Rahul happy (but won't). With the 32-bit integers, the bitvector will require 512MB (ie 2**32 bits) - assuming 8-bit bytes, as some pedant may point out.
As Borealid should point out, this is a hashtable - just using a trivial hash function. This does guarantee that there won't be any collisions. The only way there could be a collision is by having the same value in the input array twice - but since the whole point is to ignore the second and later occurences, this doesn't matter.
Pseudocode for completeness...
src = dest = input.begin ();
while (src != input.end ())
{
if (!bitvector [*src])
{
bitvector [*src] = true;
*dest = *src; dest++;
}
src++;
}
// at this point, dest gives the new end of the array
Just to be really silly (but theoretically correct), I'll also point out that the space requirement is still O(1) even if the array holds 64-bit integers. The constant term is a bit big, I agree, and you may have issues with 64-bit CPUs that can't actually use the full 64 bits of an address, but...

Take your example. If the array elements are bounded integer, you can create a lookup bitarray.
If you find an integer such as 3, turn the 3rd bit on.
If you find an integer such as 5, turn the 5th bit on.
If the array contains elements rather than integer, or the element is not bounded, using a hashtable would be a good choice, since hashtable lookup cost is a constant.

The canonical implementation of the unique() algorithm looks like something similar to the following:
template<typename Fwd>
Fwd unique(Fwd first, Fwd last)
{
if( first == last ) return first;
Fwd result = first;
while( ++first != last ) {
if( !(*result == *first) )
*(++result) = *first;
}
return ++result;
}
This algorithm takes a range of sorted elements. If the range is not sorted, sort it before invoking the algorithm. The algorithm will run in-place, and return an iterator pointing to one-past-the-last-element of the unique'd sequence.
If you can't sort the elements then you've cornered yourself and you have no other choice but to use for the task an algorithm with runtime performance worse than O(n).
This algorithm runs in O(n) runtime. That's big-oh of n, worst case in all cases, not amortized time. It uses O(1) space.

The example you have given is a sorted array. It is possible only in that case (given your constant space constraint)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js