C++ Array Intersection

C++ Array Intersection - c++

Does anyone know if it's possible to turn this from O(m * n) to O(m + n)?
vector<int> theFirst;
vector<int> theSecond;
vector<int> theMatch;
theFirst.push_back( -2147483648 );
theFirst.push_back(2);
theFirst.push_back(44);
theFirst.push_back(1);
theFirst.push_back(22);
theFirst.push_back(1);
theSecond.push_back(1);
theSecond.push_back( -2147483648 );
theSecond.push_back(3);
theSecond.push_back(44);
theSecond.push_back(32);
theSecond.push_back(1);
for( int i = 0; i < theFirst.size(); i++ )
{
for( int x = 0; x < theSecond.size(); x++ )
{
if( theFirst[i] == theSecond[x] )
{
theMatch.push_back( theFirst[i] );
}
}
}

Put the contents of the first vector into a hash set, such as std::unordered_set. That is O(m). Scan the second vector, checking if the values are in the unordered_set and keeping a tally of those that are. That is n lookups of a hash structure, so O(n). So, O(m+n). If you have l elements in the overlap, you may count O(l) for adding them to the third vector. std::unordered_set is in the C++0x draft and available in the latest gcc versions, and there is also an implementation in boost.
Edited to use unordered_set
Using C++2011 syntax:
unordered_set<int> firstMap(theFirst.begin(), theFirst.end());
for (const int& i : theSecond) {
if (firstMap.find(i)!=firstMap.end()) {
cout << "Duplicate: " << i << endl;
theMatch.push_back(i);
}
}
Now, the question still remains, what do you want to do with duplicates in the originals? Explicitly, how many times should 1 be in theMatch, 1, 2 or 4 times?
This outputs:
Duplicate: 1
Duplicate: -2147483648
Duplicate: 44
Duplicate: 1

Using this: http://www.cplusplus.com/reference/algorithm/set_intersection/
You should be able to achieve O(mlogm + nlogn) I believe. (set_intersection requires that the input ranges be already sorted).
This might perform a bit differently than your solution for duplicate elements, however.

Please correct me if I am wrong,
you are suggesting following solution for the intersection problem:
sort two vectors, and keep iteration in both sorted vector in such a way that we reach to a common element,
so overall complexity will be
(n*log(n) + m*log(m)) + (n + m)
Assuming k*log(k) as complexity of sorting
Am I right?
Ofcourse the complexity will depend on the complexity of sorting.

I would sort the longer array O(n*log (n)), search for elements from the shorter array O(m*log (n)). Total is then O(n*log(n) + m*log (n) )

Assuming you want to produce theMatch from two data sets, and you don't care about the data sets themselves, put one in an unordered_map (available currently from Boost and listed in the final committee draft for C++11), mapping the key to an integer that increases whenever added to, and therefore keeps track of the number of times the key occurs. Then, when you get a hit on the other data set, you push_back the hit the number of times it occurred in the first time.
You can get to O(n log n + m log m) by sorting the vectors first, or O(n log n + m) by creating a std::map of one of them.
Caveat: these are not order-preserving operations, and theMatch will come out in different orders with different techniques. It looks to me like the order is likely considered arbitrary. If the order given in the code above is necessary, I don't think there's a better algorithm.
Edit:
Take data set A and data set B, of type Type. Create an unordered_map<Type, int>.
Go through data set A, and check each member to see if it's in the map. If not, add the element with the int 1 to the map. If it is, increment the int. Each of these operations is O(1) on the average, so this step is O(len A).
Go through data set B, and check each member to see if it's in the map. If not, go on to the next. If so, push_back the member onto the destination queue. The int is the number of times that value is in data set A, so do the push_back the number of times the member's in A to duplicate the behavior given. Each of these operations is on the average O(1), so this step is O(len B).
This is average behavior. If you always hit the worst case, you're back with O(m*n). I don't think there's a way to guarantee O(m + n).

If the order of the elements in the resulting array/set doesn't matter then the answer is yes.
For the arbitrary types of elements with some order defined the best algorithm is O( max(m,n)*log(min(m,n)) ). For the numbers of limited size the best algorithm is O(m+n).
Construct the set of elements of smaller array - for arbitrary elements just sorting is OK and for the numbers of limited size it must be something similar to intermediate table in numeric sort.
Iterate through larger array and check if the element is within a set constructed earlier - for the arbitrary element binary search is OK (which is O(log(min(n,m))) and for numbers the single check is O(1).

Related

Fastest way to remove duplicates from a vector<>

As the title says, I have in my mind some methods to do it but I don't know which is fastest.
So let's say that we have a: vector<int> vals with some values
1
After my vals are added
sort(vals.begin(), vals.end());
auto last = unique(vals.begin(), vals.end());
vals.erase(last, vals.end());
2
Convert to set after my vals are added:
set<int> s( vals.begin(), vals.end() );
vals.assign( s.begin(), s.end() );
3
When i add my vals, i check if it's already in my vector:
if( find(vals.begin(), vals.end(), myVal)!=vals.end() )
// add my val
4
Use a set from start
Ok, I've got these 4 methods, my questions are:
1 From 1, 2 and 3 which is the fastest?
2 Is 4 faster than the first 3?
3 At 2 after converting the vector to set, it's more convenabile to use the set to do what I need to do or should I do the vals.assign( .. ) and continue with my vector?

Question 1: Both 1 and 2 are O(n log n), 3 is O(n^2). Between 1 and 2, it depends on the data.
Question 2: 4 is also O(n log n) and can be better than 1 and 2 if you have lots of duplicates, because it only stores one copy of each. Imagine a million values that are all equal.
Question 3: Well, that really depends on what you need to do.
The only thing that can be said without knowing more is that your alternative number 3 is asymptotically worse than the others.
If you're using C++11 and don't need ordering, you can use std::unordered_set, which is a hash table and can be significantly faster than std::set.

Option 1 is going to beat all the others. The complexity is just O(N log N) and the contiguous memory of vector keeps the constant factors low.
std::set typically suffers a lot from non-contiguous allocations. It's not just slow to access those, just creating them takes significant time as well.

These methods all have their shortcomings although (1) is worth looking at.
But, take a look at this 5th option: Bear in mind that you can access the vector's data buffer using the data() function. Then, bearing in mind that no reallocation will take place since the vector will only ever get smaller, apply the algorithm that you learn at school:
unduplicate(vals.data(), vals.size());
void unduplicate(int* arr, std::size_t length) /*Reference: Gang of Four, I think*/
{
int *it, *end = arr + length - 1;
for (it = arr + 1; arr < end; arr++, it = arr + 1){
while (it <= end){
if (*it == *arr){
*it = *end--;
} else {
++it;
}
}
}
}
And resize the vector at the end if that is what's required. This is never worse than O(N^2), so is superior to insertion-sort or sort then remove approaches.
Your 4th option might be an idea if you can adopt it. Profile the performance. Otherwise use my algorithm from the 1960s.

I've got a similar problem recently, and experimented with 1, 2, and 4, as well as with unordered_set version of 4. In turned out that the best performance was the latter one, 4 with unordered_set in place of set.
BTW, that empirical finding is not too surprising if one considers that both set and sort were a bit of overkill: they guaranteed relative order of unequal elements. For example inputs 4,3,5,2,4,3 would lead to sorted output of unique values 2,3,4,5. This is unnecessary if you can live with unique values in arbitrary order, i.e. 3,4,2,5. When you use unordered_set it doesn't guarantee the order, only uniqueness, and therefore it doesn't have to perform the additional work of ensuring the order of different elements.

Is std::sort the best choice to do in-place sort for a huge array with limited integer value?

I want to sort an array with huge(millions or even billions) elements, while the values are integers within a small range(1 to 100 or 1 to 1000), in such a case, is std::sort and the parallelized version __gnu_parallel::sort the best choice for me?
actually I want to sort a vecotor of my own class with an integer member representing the processor index.
as there are other member inside the class, so, even if two data have same integer member that is used for comparing, they might not be regarded as same data.

Counting sort would be the right choice if you know that your range is so limited. If the range is [0,m) the most efficient way to do so it have a vector in which the index represent the element and the value the count. For example:
vector<int> to_sort;
vector<int> counts;
for (int i : to_sort) {
if (counts.size() < i) {
counts.resize(i+1, 0);
}
counts[i]++;
}
Note that the count at i is lazily initialized but you can resize once if you know m.
If you are sorting objects by some field and they are all distinct, you can modify the above as:
vector<T> to_sort;
vector<vector<const T*>> count_sorted;
for (const T& t : to_sort) {
const int i = t.sort_field()
if (count_sorted.size() < i) {
count_sorted.resize(i+1, {});
}
count_sorted[i].push_back(&t);
}
Now the main difference is that your space requirements grow substantially because you need to store the vectors of pointers. The space complexity went from O(m) to O(n). Time complexity is the same. Note that the algorithm is stable. The code above assumes that to_sort is in scope during the life cycle of count_sorted. If your Ts implement move semantics you can store the object themselves and move them in. If you need count_sorted to outlive to_sort you will need to do so or make copies.
If you have a range of type [-l, m), the substance does not change much, but your index now represents the value i + l and you need to know l beforehand.
Finally, it should be trivial to simulate an iteration through the sorted array by iterating through the counts array taking into account the value of the count. If you want stl like iterators you might need a custom data structure that encapsulates that behavior.
Note: in the previous version of this answer I mentioned multiset as a way to use a data structure to count sort. This would be efficient in some java implementations (I believe the Guava implementation would be efficient) but not in C++ where the keys in the RB tree are just repeated many times.

You say "in-place", I therefore assume that you don't want to use O(n) extra memory.
First, count the number of objects with each value (as in Gionvanni's and ronaldo's answers). You still need to get the objects into the right locations in-place. I think the following works, but I haven't implemented or tested it:
Create a cumulative sum from your counts, so that you know what index each object needs to go to. For example, if the counts are 1: 3, 2: 5, 3: 7, then the cumulative sums are 1: 0, 2: 3, 3: 8, 4: 15, meaning that the first object with value 1 in the final array will be at index 0, the first object with value 2 will be at index 3, and so on.
The basic idea now is to go through the vector, starting from the beginning. Get the element's processor index, and look up the corresponding cumulative sum. This is where you want it to be. If it's already in that location, move on to the next element of the vector and increment the cumulative sum (so that the next object with that value goes in the next position along). If it's not already in the right location, swap it with the correct location, increment the cumulative sum, and then continue the process for the element you swapped into this position in the vector.
There's a potential problem when you reach the start of a block of elements that have already been moved into place. You can solve that by remembering the original cumulative sums, "noticing" when you reach one, and jump ahead to the current cumulative sum for that value, so that you don't revisit any elements that you've already swapped into place. There might be a cleverer way to deal with this, but I don't know it.
Finally, compare the performance (and correctness!) of your code against std::sort. This has better time complexity than std::sort, but that doesn't mean it's necessarily faster for your actual data.

You definitely want to use counting sort. But not the one you're thinking of. Its main selling point is that its time complexity is O(N+X) where X is the maximum value you allow the sorting of.
Regular old counting sort (as seen on some other answers) can only sort integers, or has to be implemented with a multiset or some other data structure (becoming O(Nlog(N))). But a more general version of counting sort can be used to sort (in place) anything that can provide an integer key, which is perfectly suited to your use case.
The algorithm is somewhat different though, and it's also known as American Flag Sort. Just like regular counting sort, it starts off by calculating the counts.
After that, it builds a prefix sums array of the counts. This is so that we can know how many elements should be placed behind a particular item, thus allowing us to index into the right place in constant time.
since we know the correct final position of the items, we can just swap them into place. And doing just that would work if there weren't any repetitions but, since it's almost certain that there will be repetitions, we have to be more careful.
First: when we put something into its place we have to increment the value in the prefix sum so that the next element with same value doesn't remove the previous element from its place.
Second: either
keep track of how many elements of each value we have already put into place so that we dont keep moving elements of values that have already reached their place, this requires a second copy of the counts array (prior to calculating the prefix sum), as well as a "move count" array.
keep a copy of the prefix sums shifted over by one so that we stop moving elements once the stored position of the latest element
reaches the first position of the next value.
Even though the first approach is somewhat more intuitive, I chose the second method (because it's faster and uses less memory).
template<class It, class KeyOf>
void countsort (It begin, It end, KeyOf key_of) {
constexpr int max_value = 1000;
int final_destination[max_value] = {}; // zero initialized
int destination[max_value] = {}; // zero initialized
// Record counts
for (It it = begin; it != end; ++it)
final_destination[key_of(*it)]++;
// Build prefix sum of counts
for (int i = 1; i < max_value; ++i) {
final_destination[i] += final_destination[i-1];
destination[i] = final_destination[i-1];
}
for (auto it = begin; it != end; ++it) {
auto key = key_of(*it);
// while item is not in the correct position
while ( std::distance(begin, it) != destination[key] &&
// and not all items of this value have reached their final position
final_destination[key] != destination[key] ) {
// swap into the right place
std::iter_swap(it, begin + destination[key]);
// tidy up for next iteration
++destination[key];
key = key_of(*it);
}
}
}
Usage:
vector<Person> records = populateRecords();
countsort(records.begin(), records.end(), [](Person const &){
return Person.id()-1; // map [1, 1000] -> [0, 1000)
});
This can be further generalized to become MSD Radix Sort,
here's a talk by Malte Skarupke about it: https://www.youtube.com/watch?v=zqs87a_7zxw
Here's a neat visualization of the algorithm: https://www.youtube.com/watch?v=k1XkZ5ANO64

The answer given by Giovanni Botta is perfect, and Counting Sort is definitely the way to go. However, I personally prefer not to go resizing the vector progressively, but I'd rather do it this way (assuming your range is [0-1000]):
vector<int> to_sort;
vector<int> counts(1001);
int maxvalue=0;
for (int i : to_sort) {
if(i > maxvalue) maxvalue = i;
counts[i]++;
}
counts.resize(maxvalue+1);
It is essentially the same, but no need to be constantly managing the size of the counts vector. Depending on your memory constraints, you could use one solution or the other.

An fast algorithm for sorting and shuffling equal valued entries (preferably by STL's)

I'm currently developing stochastic optimization algorithms and have encountered the following issue (which I imagine appears also in other places): It could be called totally unstable partial sort:
Given a container of size n and a comparator, such that entries may be equally valued.
Return the best k entries, but if values are equal, it should be (nearly) equally probable to receive any of them.
(output order is irrelevant to me, i.e. equal values completely among the best k need not be shuffled. To even have all equal values shuffled is however a related, interesting question and would suffice!)
A very (!) inefficient way would be to use shuffle_randomly and then partial_sort, but one actually only needs to shuffle the block of equally valued entries "at the selection border" (resp. all blocks of equally valued entries, both is much faster). Maybe that Observation is where to start...
I would very much prefer, if someone could provide a solution with STL algorithms (or at least to a large portion), both because they're usually very fast, well encapsulated and OMP-parallelized.
Thanx in advance for any ideas!

You want to partial_sort first. Then, while elements are not equal, return them. If you meet a sequence of equal elements which is larger than the remaining k, shuffle and return first k. Else return all and continue.

Not fully understanding your issue, but if you it were me solving this issue (if I am reading it correctly) ...
Since it appears you will have to traverse the given object anyway, you might as well build a copy of it for your results, sort it upon insert, and randomize your "equal" items as you insert.
In other words, copy the items from the given container into an STL list but overload the comparison operator to create a B-Tree, and if two items are equal on insert randomly choose to insert it before or after the current item.
This way it's optimally traversed (since it's a tree) and you get the random order of the items that are equal each time the list is built.
It's double the memory, but I was reading this as you didn't want to alter the original list. If you don't care about losing the original, delete each item from the original as you insert into your new list. The worst traversal will be the first time you call your function since the passed in list might be unsorted. But since you are replacing the list with your sorted copy, future runs should be much faster and you can pick a better pivot point for your tree by assigning the root node as the element at length() / 2.
Hope this is helpful, sounds like a neat project. :)

If you really mean that output order is irrelevant, then you want std::nth_element, rather than std::partial_sort, since it is generally somewhat faster. Note that std::nth_element puts the nth element in the right position, so you can do the following, which is 100% standard algorithm invocations (warning: not tested very well; fencepost error possibilities abound):
template<typename RandomIterator, typename Compare>
void best_n(RandomIterator first,
RandomIterator nth,
RandomIterator limit,
Compare cmp) {
using ref = typename std::iterator_traits<RandomIterator>::reference;
std::nth_element(first, nth, limit, cmp);
auto p = std::partition(first, nth, [&](ref a){return cmp(a, *nth);});
auto q = std::partition(nth + 1, limit, [&](ref a){return !cmp(*nth, a);});
std::random_shuffle(p, q); // See note
}
The function takes three iterators, like nth_element, where nth is an iterator to the nth element, which means that it is begin() + (n - 1)).
Edit: Note that this is different from most STL algorithms, in that it is effectively an inclusive range. In particular, it is UB if nth == limit, since it is required that *nth be valid. Furthermore, there is no way to request the best 0 elements, just as there is no way to ask for the 0th element with std::nth_element. You might prefer it with a different interface; do feel free to do so.
Or you might call it like this, after requiring that 0 < k <= n:
best_n(container.begin(), container.begin()+(k-1), container.end(), cmp);
It first uses nth_element to put the "best" k elements in positions 0..k-1, guaranteeing that the kth element (or one of them, anyway) is at position k-1. It then repartitions the elements preceding position k-1 so that the equal elements are at the end, and the elements following position k-1 so that the equal elements are at the beginning. Finally, it shuffles the equal elements.
nth_element is O(n); the two partition operations sum up to O(n); and random_shuffle is O(r) where r is the number of equal elements shuffled. I think that all sums up to O(n) so it's optimally scalable, but it may or may not be the fastest solution.
Note: You should use std::shuffle instead of std::random_shuffle, passing a uniform random number generator through to best_n. But I was too lazy to write all the boilerplate to do that and test it. Sorry.

If you don't mind sorting the whole list, there is a simple answer. Randomize the result in your comparator for equivalent elements.
std::sort(validLocations.begin(), validLocations.end(),
[&](const Point& i_point1, const Point& i_point2)
{
if (i_point1.mX == i_point2.mX)
{
return Rand(1.0f) < 0.5;
}
else
{
return i_point1.mX < i_point2.mX;
}
});

How to get a sorted subvector out of a sorted vector, fast

I have a data structure like this:
struct X {
float value;
int id;
};
a vector of those (size N (think 100000), sorted by value (stays constant during the execution of the program):
std::vector<X> values;
Now, I want to write a function
void subvector(std::vector<X> const& values,
std::vector<int> const& ids,
std::vector<X>& out /*,
helper data here */);
that fills the out parameter with a sorted subset of values, given by the passed ids (size M < N (about 0.8 times N)), fast (memory is not an issue, and this will be done repeatedly, so building lookuptables (the helper data from the function parameters) or something else that is done only once is entirely ok).
My solution so far:
Build lookuptable lut containing id -> offset in values (preparation, so constant runtime)
create std::vector<X> tmp, size N, filled with invalid ids (linear in N)
for each id, copy values[lut[id]] to tmp[lut[id]] (linear in M)
loop over tmp, copying items to out (linear in N)
this is linear in N (as it's bigger than M), but the temporary variable and repeated copying bugs me. Is there a way to do it quicker than this? Note that M will be close to N, so things that are O(M log N) are unfavourable.
Edit: http://ideone.com/xR8Vp is a sample implementation of mentioned algorithm, to make the desired output clear and prove that it's doable in linear time - the question is about the possibility of avoiding the temporary variable or speeding it up in some other way, something that is not linear is not faster :).

An alternative approach you could try is to use a hash table instead of a vector to look up ids in:
void subvector(std::vector<X> const& values,
std::unordered_set<int> const& ids,
std::vector<X>& out) {
out.clear();
out.reserve(ids.size());
for(std::vector<X>::const_iterator i = values.begin(); i != values.end(); ++i) {
if(ids.find(i->id) != ids.end()) {
out.push_back(*i);
}
}
}
This runs in linear time since unordered_set::find is constant expected time (assuming that we have no problems hashing ints). However I suspect it might not be as fast in practice as the approach you described initially using vectors.

Since your vector is sorted, and you want a subset of it sorted the same way, I assume we can just slice out the chunk you want without rearranging it.
Why not just use find_if() twice. Once to find the start of the range you want and once to find the end of the range. This will give you the start and end iterators of the sub vector. Construct a new vector using those iterators. One of the vector constructor overloads takes two iterators.
That or the partition algorithm should work.

If I understood your problem correctly, you actually try to create a linear time sorting algorithm (subject to the input size of numbers M).
That is NOT possible.
Your current approach is to have a sorted list of possible values.
This takes linear time to the number of possible values N (theoretically, given that the map search takes O(1) time).
The best you could do, is to sort the values (you found from the map) with a quick sorting method (O(MlogM) f.e. quicksort, mergesort etc) for small values of M and maybe do that linear search for bigger values of M.
For example, if N is 100000 and M is 100 it is much faster to just use a sorting algorithm.
I hope you can understand what I say. If you still have questions I will try to answer them :)
edit: (comment)
I will further explain what I mean.
Say you know that your numbers will range from 1 to 100.
You have them sorted somewhere (actually they are "naturally" sorted) and you want to get a subset of them in sorted form.
If it would be possible to do it faster than O(N) or O(MlogM), sorting algorithms would just use this method to sort.
F.e. by having the set of numbers {5,10,3,8,9,1,7}, knowing that they are a subset of the sorted set of numbers {1,2,3,4,5,6,7,8,9,10} you still can't sort them faster than O(N) (N = 10) or O(MlogM) (M = 7).

How does one remove duplicate elements in place in an array in O(n) in C or C++?

Is there any method to remove the duplicate elements in an array in place in C/C++ in O(n)?
Suppose elements are a[5]={1,2,2,3,4}
then resulting array should contain {1,2,3,4}
The solution can be achieved using two for loops but that would be O(n^2) I believe.

If, and only if, the source array is sorted, this can be done in linear time:
std::unique(a, a + 5); //Returns a pointer to the new logical end of a.
Otherwise you'll have to sort first, which is (99.999% of the time) n lg n.

Best case is O(n log n). Perform a heap sort on the original array: O(n log n) in time, O(1)/in-place in space. Then run through the array sequentially with 2 indices (source & dest) to collapse out repetitions. This has the side effect of not preserving the original order, but since "remove duplicates" doesn't specify which duplicates to remove (first? second? last?), I'm hoping that you don't care that the order is lost.
If you do want to preserve the original order, there's no way to do things in-place. But it's trivial if you make an array of pointers to elements in the original array, do all your work on the pointers, and use them to collapse the original array at the end.
Anyone claiming it can be done in O(n) time and in-place is simply wrong, modulo some arguments about what O(n) and in-place mean. One obvious pseudo-solution, if your elements are 32-bit integers, is to use a 4-gigabit bit-array (512 megabytes in size) initialized to all zeros, flipping a bit on when you see that number and skipping over it if the bit was already on. Of course then you're taking advantage of the fact that n is bounded by a constant, so technically everything is O(1) but with a horrible constant factor. However, I do mention this approach since, if n is bounded by a small constant - for instance if you have 16-bit integers - it's a very practical solution.

Yes. Because access (insertion or lookup) on a hashtable is O(1), you can remove duplicates in O(N).
Pseudocode:
hashtable h = {}
numdups = 0
for (i = 0; i < input.length; i++) {
if (!h.contains(input[i])) {
input[i-numdups] = input[i]
h.add(input[i])
} else {
numdups = numdups + 1
}
This is O(N).
Some commenters have pointed out that whether a hashtable is O(1) depends on a number of things. But in the real world, with a good hash, you can expect constant-time performance. And it is possible to engineer a hash that is O(1) to satisfy the theoreticians.

I'm going to suggest a variation on Borealids answer, but I'll point out up front that it's cheating. Basically, it only works assuming some severe constraints on the values in the array - e.g. that all keys are 32-bit integers.
Instead of a hash table, the idea is to use a bitvector. This is an O(1) memory requirement which should in theory keep Rahul happy (but won't). With the 32-bit integers, the bitvector will require 512MB (ie 2**32 bits) - assuming 8-bit bytes, as some pedant may point out.
As Borealid should point out, this is a hashtable - just using a trivial hash function. This does guarantee that there won't be any collisions. The only way there could be a collision is by having the same value in the input array twice - but since the whole point is to ignore the second and later occurences, this doesn't matter.
Pseudocode for completeness...
src = dest = input.begin ();
while (src != input.end ())
{
if (!bitvector [*src])
{
bitvector [*src] = true;
*dest = *src; dest++;
}
src++;
}
// at this point, dest gives the new end of the array
Just to be really silly (but theoretically correct), I'll also point out that the space requirement is still O(1) even if the array holds 64-bit integers. The constant term is a bit big, I agree, and you may have issues with 64-bit CPUs that can't actually use the full 64 bits of an address, but...

Take your example. If the array elements are bounded integer, you can create a lookup bitarray.
If you find an integer such as 3, turn the 3rd bit on.
If you find an integer such as 5, turn the 5th bit on.
If the array contains elements rather than integer, or the element is not bounded, using a hashtable would be a good choice, since hashtable lookup cost is a constant.

The canonical implementation of the unique() algorithm looks like something similar to the following:
template<typename Fwd>
Fwd unique(Fwd first, Fwd last)
{
if( first == last ) return first;
Fwd result = first;
while( ++first != last ) {
if( !(*result == *first) )
*(++result) = *first;
}
return ++result;
}
This algorithm takes a range of sorted elements. If the range is not sorted, sort it before invoking the algorithm. The algorithm will run in-place, and return an iterator pointing to one-past-the-last-element of the unique'd sequence.
If you can't sort the elements then you've cornered yourself and you have no other choice but to use for the task an algorithm with runtime performance worse than O(n).
This algorithm runs in O(n) runtime. That's big-oh of n, worst case in all cases, not amortized time. It uses O(1) space.

The example you have given is a sorted array. It is possible only in that case (given your constant space constraint)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js