Efficiently filling in first gap in an ordered list of numbers

Efficiently filling in first gap in an ordered list of numbers - c++

I'm writing something that needs to start with a list of numbers, already in order but possibly with gaps, and find the first gap, fill in a number in that gap, and return the number it filled in. The numbers are integers on the range [0, inf). I have this, and it works perfectly:
list<int> TestList = {0, 1, 5, 6, 7};
int NewElement;
if(TestList.size() == 0)
{
NewElement = 0;
TestList.push_back(NewElement);
}
else
{
bool Selected = false;
int Previous = 0;
for(auto Current = TestList.begin(); Current != TestList.end(); Current++)
{
if(*Current > Previous + 1)
{
NewElement = Previous + 1;
TestList.insert(Current, NewElement);
Selected = true;
break;
}
Previous = *Current;
}
if(!Selected)
{
NewElement = Previous + 1;
TestList.insert(TestList.end(), NewElement);
}
}
But I'm worried about efficiency, since I'm using an equivalent piece of code to allocate uniform block binding locations in OpenGL behind a class wrapper I wrote (but that isn't exactly relevant to the question :) ). Any suggestions to improve the efficiency? I'm not even sure if std::list is the best choice for it.

Some suggestions:
Try other containers and compare. A linked list may have good theoretical properties, but the real-life benefits of contiguous storage, such as in a sorted vector, can be dramatic.
Since your range is already ordered, you can perform binary search to find the gap: Start in the middle, and if the value there is equal to half the size, there's no gap, so you can restrict searching to the other half. Rinse and repeat. (This assumes that there are no repeated numbers in the range, which I suppose is a reasonable restriction given that you have a notion of "gap".)
This is more of a theoretical, separate suggestion. A binary search on a pure linked list cannot be implemented very efficiently, so a different sort of data structure would be needed to take advantage of this approach.

Some suggestions:
A cursor gap structure (a.k.a. “gap buffer”) based on std::vector is probably the fastest (regardless of system) for this. For the implementation, set the capacity at the outset (known from the largest number) so as to avoid costly dynamic allocations.
Binary search can be employed on a cursor gap structure, with then fast block move to move the insertion point, but do measure if you go this route: often methods that are bad for a very large number of items, such as linear search, turn out to be best for a small number of items.
Retain the position between calls, so you don't have to start from the beginning each time, reducing the total algorithmic complexity for filling in all, from O(n2) to O(n).
Update: based on the OP’s additional commentary information that this is a case of allocating and deallocating numbers, where the order (apparently) doesn't matter, the fastest is probably to use a free list, as suggested in a comment by j_random_hacker. Simply put, at the outset push all available numbers onto a stack, e.g. a std::stack, the free-list. To allocate a number simply pop it off the stack (choosing the number at top of the stack), to deallocate a number simply push it.

Related

Which container is most efficient for multiple insertions / deletions in C++?

I was set a homework challenge as part of an application process (I was rejected, by the way; I wouldn't be writing this otherwise) in which I was to implement the following functions:
// Store a collection of integers
class IntegerCollection {
public:
// Insert one entry with value x
void Insert(int x);
// Erase one entry with value x, if one exists
void Erase(int x);
// Erase all entries, x, from <= x < to
void Erase(int from, int to);
// Return the count of all entries, x, from <= x < to
size_t Count(int from, int to) const;
The functions were then put through a bunch of tests, most of which were trivial. The final test was the real challenge as it performed 500,000 single insertions, 500,000 calls to count and 500,000 single deletions.
The member variables of IntegerCollection were not specified and so I had to choose how to store the integers. Naturally, an STL container seemed like a good idea and keeping it sorted seemed an easy way to keep things efficient.
Here is my code for the four functions using a vector:
// Previous bit of code shown goes here
private:
std::vector<int> integerCollection;
};
void IntegerCollection::Insert(int x) {
/* using lower_bound to find the right place for x to be inserted
keeps the vector sorted and makes life much easier */
auto it = std::lower_bound(integerCollection.begin(), integerCollection.end(), x);
integerCollection.insert(it, x);
}
void IntegerCollection::Erase(int x) {
// find the location of the first element containing x and delete if it exists
auto it = std::find(integerCollection.begin(), integerCollection.end(), x);
if (it != integerCollection.end()) {
integerCollection.erase(it);
}
}
void IntegerCollection::Erase(int from, int to) {
if (integerCollection.empty()) return;
// lower_bound points to the first element of integerCollection >= from/to
auto fromBound = std::lower_bound(integerCollection.begin(), integerCollection.end(), from);
auto toBound = std::lower_bound(integerCollection.begin(), integerCollection.end(), to);
/* std::vector::erase deletes entries between the two pointers
fromBound (included) and toBound (not indcluded) */
integerCollection.erase(fromBound, toBound);
}
size_t IntegerCollection::Count(int from, int to) const {
if (integerCollection.empty()) return 0;
int count = 0;
// lower_bound points to the first element of integerCollection >= from/to
auto fromBound = std::lower_bound(integerCollection.begin(), integerCollection.end(), from);
auto toBound = std::lower_bound(integerCollection.begin(), integerCollection.end(), to);
// increment pointer until fromBound == toBound (we don't count elements of value = to)
while (fromBound != toBound) {
++count; ++fromBound;
}
return count;
}
The company got back to me saying that they wouldn't be moving forward because my choice of container meant the runtime complexity was too high. I also tried using list and deque and compared the runtime. As I expected, I found that list was dreadful and that vector took the edge over deque. So as far as I was concerned I had made the best of a bad situation, but apparently not!
I would like to know what the correct container to use in this situation is? deque only makes sense if I can guarantee insertion or deletion to the ends of the container and list hogs memory. Is there something else that I'm completely overlooking?

We cannot know what would make the company happy. If they reject std::vector without concise reasoning I wouldn't want to work for them anyway. Moreover, we dont really know the precise requirements. Were you asked to provide one reasonably well performing implementation? Did they expect you to squeeze out the last percent of the provided benchmark by profiling a bunch of different implementations?
The latter is probably too much for a homework challenge as part of an application process. If it is the first you can either
roll your own. It is unlikely that the interface you were given can be implemented more efficiently than one of the std containers does... unless your requirements are so specific that you can write something that performs well under that specific benchmark.
std::vector for data locality. See eg here for Bjarne himself advocating std::vector rather than linked lists.
std::set for ease of implementation. It seems like you want the container sorted and the interface you have to implement fits that of std::set quite well.
Let's compare only isertion and erasure assuming the container needs to stay sorted:
operation std::set std::vector
insert log(N) N
erase log(N) N
Note that the log(N) for the binary_search to find the position to insert/erase in the vector can be neglected compared to the N.
Now you have to consider that the asymptotic complexity listed above completely neglects the non-linearity of memory access. In reality data can be far away in memory (std::set) leading to many cache misses or it can be local as with std::vector. The log(N) only wins for huge N. To get an idea of the difference 500000/log(500000) is roughly 26410 while 1000/log(1000) is only ~100.
I would expect std::vector to outperform std::set for considerably small container sizes, but at some point the log(N) wins over cache. The exact location of this turning point depends on many factors and can only reliably determined by profiling and measuring.

Nobody knows which container is MOST efficient for multiple insertions / deletions. That is like asking what is the most fuel-efficient design for a car engine possible. People are always innovating on the car engines. They make more efficient ones all the time. However, I would recommend a splay tree. The time required for a insertion or deletion is a splay tree is not constant. Some insertions take a long time and some take only a very a short time. However, the average time per insertion/deletion is always guaranteed to be be O(log n), where n is the number of items being stored in the splay tree. logarithmic time is extremely efficient. It should be good enough for your purposes.

The first thing that comes to mind is to hash the integer value so single look ups can be done in constant time.
The integer value can be hashed to compute an index in to an array of bools or bits, used to tell if the integer value is in the container or not.
Counting and and deleting large ranges could be sped up from there, by using multiple hash tables for specific integer ranges.
If you had 0x10000 hash tables, that each stored ints from 0 to 0xFFFF and were using 32 bit integers you could then mask and shift the upper half of the int value and use that as an index to find the correct hash table to insert / delete values from.
IntHashTable containers[0x10000];
u_int32 hashIndex = (u_int32)value / 0x10000;
u_int32int valueInTable = (u_int32)value - (hashIndex * 0x10000);
containers[hashIndex].insert(valueInTable);
Count for example could be implemented as so, if each hash table kept count of the number of elements it contained:
indexStart = startRange / 0x10000;
indexEnd = endRange / 0x10000;
int countTotal = 0;
for (int i = indexStart; i<=indexEnd; ++i) {
countTotal += containers[i].count();
}

Not sure if using sorting really is a requirement for removing the range. It might be based on position. Anyway, here is a link with some hints which STL container to use.
In which scenario do I use a particular STL container?
Just FYI.
Vector maybe a good choice, but it does a lot of re allocation, as you know. I prefer deque instead, as it doesn't require big chunk of memory to allocate all items. For such requirement as you had, list probably fit better.

Basic solution for this problem might be std::map<int, int>
where key is the integer you are storing and value is the number of occurences.
Problem with this is that you can not quickly remove/count ranges. In other words complexity is linear.
For quick count you would need to implement your own complete binary tree where you can know the number of nodes between 2 nodes(upper and lower bound node) because you know the size of tree, and you know how many left and right turns you took to upper and lower bound nodes. Note that we are talking about complete binary tree, in general binary tree you can not make this calculation fast.
For quick range remove I do not know how to make it faster than linear.

Is std::sort the best choice to do in-place sort for a huge array with limited integer value?

I want to sort an array with huge(millions or even billions) elements, while the values are integers within a small range(1 to 100 or 1 to 1000), in such a case, is std::sort and the parallelized version __gnu_parallel::sort the best choice for me?
actually I want to sort a vecotor of my own class with an integer member representing the processor index.
as there are other member inside the class, so, even if two data have same integer member that is used for comparing, they might not be regarded as same data.

Counting sort would be the right choice if you know that your range is so limited. If the range is [0,m) the most efficient way to do so it have a vector in which the index represent the element and the value the count. For example:
vector<int> to_sort;
vector<int> counts;
for (int i : to_sort) {
if (counts.size() < i) {
counts.resize(i+1, 0);
}
counts[i]++;
}
Note that the count at i is lazily initialized but you can resize once if you know m.
If you are sorting objects by some field and they are all distinct, you can modify the above as:
vector<T> to_sort;
vector<vector<const T*>> count_sorted;
for (const T& t : to_sort) {
const int i = t.sort_field()
if (count_sorted.size() < i) {
count_sorted.resize(i+1, {});
}
count_sorted[i].push_back(&t);
}
Now the main difference is that your space requirements grow substantially because you need to store the vectors of pointers. The space complexity went from O(m) to O(n). Time complexity is the same. Note that the algorithm is stable. The code above assumes that to_sort is in scope during the life cycle of count_sorted. If your Ts implement move semantics you can store the object themselves and move them in. If you need count_sorted to outlive to_sort you will need to do so or make copies.
If you have a range of type [-l, m), the substance does not change much, but your index now represents the value i + l and you need to know l beforehand.
Finally, it should be trivial to simulate an iteration through the sorted array by iterating through the counts array taking into account the value of the count. If you want stl like iterators you might need a custom data structure that encapsulates that behavior.
Note: in the previous version of this answer I mentioned multiset as a way to use a data structure to count sort. This would be efficient in some java implementations (I believe the Guava implementation would be efficient) but not in C++ where the keys in the RB tree are just repeated many times.

You say "in-place", I therefore assume that you don't want to use O(n) extra memory.
First, count the number of objects with each value (as in Gionvanni's and ronaldo's answers). You still need to get the objects into the right locations in-place. I think the following works, but I haven't implemented or tested it:
Create a cumulative sum from your counts, so that you know what index each object needs to go to. For example, if the counts are 1: 3, 2: 5, 3: 7, then the cumulative sums are 1: 0, 2: 3, 3: 8, 4: 15, meaning that the first object with value 1 in the final array will be at index 0, the first object with value 2 will be at index 3, and so on.
The basic idea now is to go through the vector, starting from the beginning. Get the element's processor index, and look up the corresponding cumulative sum. This is where you want it to be. If it's already in that location, move on to the next element of the vector and increment the cumulative sum (so that the next object with that value goes in the next position along). If it's not already in the right location, swap it with the correct location, increment the cumulative sum, and then continue the process for the element you swapped into this position in the vector.
There's a potential problem when you reach the start of a block of elements that have already been moved into place. You can solve that by remembering the original cumulative sums, "noticing" when you reach one, and jump ahead to the current cumulative sum for that value, so that you don't revisit any elements that you've already swapped into place. There might be a cleverer way to deal with this, but I don't know it.
Finally, compare the performance (and correctness!) of your code against std::sort. This has better time complexity than std::sort, but that doesn't mean it's necessarily faster for your actual data.

You definitely want to use counting sort. But not the one you're thinking of. Its main selling point is that its time complexity is O(N+X) where X is the maximum value you allow the sorting of.
Regular old counting sort (as seen on some other answers) can only sort integers, or has to be implemented with a multiset or some other data structure (becoming O(Nlog(N))). But a more general version of counting sort can be used to sort (in place) anything that can provide an integer key, which is perfectly suited to your use case.
The algorithm is somewhat different though, and it's also known as American Flag Sort. Just like regular counting sort, it starts off by calculating the counts.
After that, it builds a prefix sums array of the counts. This is so that we can know how many elements should be placed behind a particular item, thus allowing us to index into the right place in constant time.
since we know the correct final position of the items, we can just swap them into place. And doing just that would work if there weren't any repetitions but, since it's almost certain that there will be repetitions, we have to be more careful.
First: when we put something into its place we have to increment the value in the prefix sum so that the next element with same value doesn't remove the previous element from its place.
Second: either
keep track of how many elements of each value we have already put into place so that we dont keep moving elements of values that have already reached their place, this requires a second copy of the counts array (prior to calculating the prefix sum), as well as a "move count" array.
keep a copy of the prefix sums shifted over by one so that we stop moving elements once the stored position of the latest element
reaches the first position of the next value.
Even though the first approach is somewhat more intuitive, I chose the second method (because it's faster and uses less memory).
template<class It, class KeyOf>
void countsort (It begin, It end, KeyOf key_of) {
constexpr int max_value = 1000;
int final_destination[max_value] = {}; // zero initialized
int destination[max_value] = {}; // zero initialized
// Record counts
for (It it = begin; it != end; ++it)
final_destination[key_of(*it)]++;
// Build prefix sum of counts
for (int i = 1; i < max_value; ++i) {
final_destination[i] += final_destination[i-1];
destination[i] = final_destination[i-1];
}
for (auto it = begin; it != end; ++it) {
auto key = key_of(*it);
// while item is not in the correct position
while ( std::distance(begin, it) != destination[key] &&
// and not all items of this value have reached their final position
final_destination[key] != destination[key] ) {
// swap into the right place
std::iter_swap(it, begin + destination[key]);
// tidy up for next iteration
++destination[key];
key = key_of(*it);
}
}
}
Usage:
vector<Person> records = populateRecords();
countsort(records.begin(), records.end(), [](Person const &){
return Person.id()-1; // map [1, 1000] -> [0, 1000)
});
This can be further generalized to become MSD Radix Sort,
here's a talk by Malte Skarupke about it: https://www.youtube.com/watch?v=zqs87a_7zxw
Here's a neat visualization of the algorithm: https://www.youtube.com/watch?v=k1XkZ5ANO64

The answer given by Giovanni Botta is perfect, and Counting Sort is definitely the way to go. However, I personally prefer not to go resizing the vector progressively, but I'd rather do it this way (assuming your range is [0-1000]):
vector<int> to_sort;
vector<int> counts(1001);
int maxvalue=0;
for (int i : to_sort) {
if(i > maxvalue) maxvalue = i;
counts[i]++;
}
counts.resize(maxvalue+1);
It is essentially the same, but no need to be constantly managing the size of the counts vector. Depending on your memory constraints, you could use one solution or the other.

Sorted data structure for in-order iteration, ordered push, and removal (N elements only from top)

What is considered an optimal data structure for pushing something in order (so inserts at any position, able to find correct position), in-order iteration, and popping N elements off the top (so the N smallest elements, N determined by comparisons with threshold value)? The push and pop need to be particularly fast (run every iteration of a loop), while the in-order full iteration of the data happens at a variable rate but likely an order of magnitude less often. The data can't be purged by the full iteration, it needs to be unchanged. Everything that is pushed will eventually be popped, but since a pop can remove multiple elements there can be more pushes than pops. The scale of data in the structure at any one time could go up to hundreds or low thousands of elements.
I'm currently using a std::deque and binary search to insert elements in ascending order. Profiling shows it taking up the majority of the time, so something has got to change. std::priority_queue doesn't allow iteration, and hacks I've seen to do it won't iterate in order. Even on a limited test (no full iteration!), the std::set class performed worse than my std::deque approach.
None of the classes I'm messing with seem to be built with this use case in mind. I'm not averse to making my own class, if there's a data structure not to be found in STL or boost for some reason.
edit:
There's two major functions right now, push and prune. push uses 65% of the time, prune uses 32%. Most of the time used in push is due to insertion into the deque (64% out of 65%). Only 1% comes from the binary search to find the position.
template<typename T, size_t Axes>
void Splitter<T, Axes>::SortedData::push(const Data& data) //65% of processing
{
size_t index = find(data.values[(axis * 2) + 1]);
this->data.insert(this->data.begin() + index, data); //64% of all processing happens here
}
template<typename T, size_t Axes>
void Splitter<T, Axes>::SortedData::prune(T value) //32% of processing
{
auto top = data.begin(), end = data.end(), it = top;
for (; it != end; ++it)
{
Data& data = *it;
if (data.values[(axis * 2) + 1] > value) break;
}
data.erase(top, it);
}
template<typename T, size_t Axes>
size_t Splitter<T, Axes>::SortedData::find(T value)
{
size_t start = 0;
size_t end = this->data.size();
if (!end) return 0;
size_t diff;
while (diff = (end - start) >> 1)
{
size_t mid = diff + start;
if (this->data[mid].values[(axis * 2) + 1] <= value)
{
start = mid;
}
else
{
end = mid;
}
}
return this->data[start].values[(axis * 2) + 1] <= value ? end : start;
}

With your requirements, a hybrid data-structure tailored to your needs will probably perform best. As others have said, continuous memory is very important, but I would not recommend keeping the array sorted at all times. I propose you use 3 buffers (1 std::array and 2 std::vectors):
1 (constant-size) Buffer for the "insertion heap". Needs to fit into the cache.
2 (variable-sized) Buffers (A+B) to maintain and update sorted arrays.
When you push an element, you add it to the insertion heap via std::push_heap. Since the insertion heap is constant size, it can overflow. When that happens, you std::sort it backwards and std::merge it with the already sorted-sequence buffer (A) into the third (B), resizing them as needed. That will be the new sorted buffer and the old one can be discarded, i.e. you swap A and B for the next bulk operation. When you need the sorted sequence for iteration, you do the same. When you remove elements, you compare the top element in the heap with the last element in the sorted sequence and remove that (which is why you sort it backwards, so that you can pop_back instead of pop_front).
For reference, this idea is loosely based on sequence heaps.

Have you tried messing around with std::vector? As weird as it may sound it could be actually pretty fast because it uses continuous memory. If I remember correctly Bjarne Stroustrup was talking about this at Going Native 2012 (http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/Keynote-Bjarne-Stroustrup-Cpp11-Style but I'm not 100% sure that it's in this video).

You save time with the binary search, but the insertion in random positions of the deque is slow. I would suggest an std::map instead.

From your edit, it sounds like the delay is in copying - is it a complex object? Can you heap allocate and store pointers in the structure so each entry is created once only; you'll need to provide a custom comparitor that takes pointers, as the objects operator<() wouldn't be called. (The custom comparitor can simply call operator<())
EDIT:
Your own figures show it's the insertion that takes the time, not the 'sorting'. While some of that insertion time is creating a copy of your object, some (possibly most) is creation of the internal structure that will hold your object - and I don't think that will change between list/map/set/queue etc. IF you can predict the likely eventual/maximum size of your data set, and can write or find your own sorting algorithm, and the time is being lost in allocating objects, then vector might be the way to go.

Moving minheap.top to maxheap.top where maxheap.top <= minheap.top

I have a maxheap and a minheap where the maximum element of the maxheap is less than or equal to the minimum element of the minheap.
I now want to move the minimum element of the minheap to become the maximum element of the maxheap.
One way to do this would be to pop the top element of the minheap and push it onto the maxheap.
Is there a more efficient way to do this?
Here is what I ended up doing:
I actually had to insert an element into minheap and then do the operation described above, I did the following:
// place value to insert at end of minheap
mintoph[mintoph_size] = R;
// use std::pop_heap, minimum element now at end
pop_heap(mintoph.begin(), mintoph.begin() + mintoph_size + 1, greater<int>());
// (*) make room in maxheap at top
for (int pos = maxboth_size++; pos > 0;)
{
int parent = (pos - 1) / 2;
maxboth[pos] = maxboth[parent];
pos = parent;
}
// move element from back of minheap to maxheap head
maxboth[0] = mintoph[mintoph_size];
There is a waste of already paid-for comparisons at step (*) above, as parents are demoted to children, but I think this is unavoidable.

What you really need is an efficient way to insert into a priority queue when you know that the element being inserted is smaller/bigger than the min/max, depending on whether this is a min-heap or a max-heap. For the traditional "heap" data structure, this takes O(log n) time.
But if you are willing to use a different representation for your priority queues than the traditional "heap" data structure, then such an insert can trivially be made to run in O(1) time. Many different kinds of priority queues can do this, such as leftist heaps, skew heaps, or pairing heaps.
Edit: Of course, you'll still need to pay the cost of removing from the original priority queue, which will likely be O(log n) anyway, although there are approaches that may help there as well, such as "lazy deletes".

You'd probably be best served by using a min-max-heap or treap. min-max-heap seems tailor-made for what you're doing, but treaps are so dang well-rounded that they might work well too - especially if you need a little more than just looking up minimums and maximums and adding values.
http://en.wikipedia.org/wiki/Min-max_heap
http://en.wikipedia.org/wiki/Treap

Use min-max heap, it's a double ended priority queue, you can do this thing in o(lgn), you can not do it in less than O(lgn)
but you can use amortized algorithms that they do insertion in o(1) like fibonacci heap,

How does one remove duplicate elements in place in an array in O(n) in C or C++?

Is there any method to remove the duplicate elements in an array in place in C/C++ in O(n)?
Suppose elements are a[5]={1,2,2,3,4}
then resulting array should contain {1,2,3,4}
The solution can be achieved using two for loops but that would be O(n^2) I believe.

If, and only if, the source array is sorted, this can be done in linear time:
std::unique(a, a + 5); //Returns a pointer to the new logical end of a.
Otherwise you'll have to sort first, which is (99.999% of the time) n lg n.

Best case is O(n log n). Perform a heap sort on the original array: O(n log n) in time, O(1)/in-place in space. Then run through the array sequentially with 2 indices (source & dest) to collapse out repetitions. This has the side effect of not preserving the original order, but since "remove duplicates" doesn't specify which duplicates to remove (first? second? last?), I'm hoping that you don't care that the order is lost.
If you do want to preserve the original order, there's no way to do things in-place. But it's trivial if you make an array of pointers to elements in the original array, do all your work on the pointers, and use them to collapse the original array at the end.
Anyone claiming it can be done in O(n) time and in-place is simply wrong, modulo some arguments about what O(n) and in-place mean. One obvious pseudo-solution, if your elements are 32-bit integers, is to use a 4-gigabit bit-array (512 megabytes in size) initialized to all zeros, flipping a bit on when you see that number and skipping over it if the bit was already on. Of course then you're taking advantage of the fact that n is bounded by a constant, so technically everything is O(1) but with a horrible constant factor. However, I do mention this approach since, if n is bounded by a small constant - for instance if you have 16-bit integers - it's a very practical solution.

Yes. Because access (insertion or lookup) on a hashtable is O(1), you can remove duplicates in O(N).
Pseudocode:
hashtable h = {}
numdups = 0
for (i = 0; i < input.length; i++) {
if (!h.contains(input[i])) {
input[i-numdups] = input[i]
h.add(input[i])
} else {
numdups = numdups + 1
}
This is O(N).
Some commenters have pointed out that whether a hashtable is O(1) depends on a number of things. But in the real world, with a good hash, you can expect constant-time performance. And it is possible to engineer a hash that is O(1) to satisfy the theoreticians.

I'm going to suggest a variation on Borealids answer, but I'll point out up front that it's cheating. Basically, it only works assuming some severe constraints on the values in the array - e.g. that all keys are 32-bit integers.
Instead of a hash table, the idea is to use a bitvector. This is an O(1) memory requirement which should in theory keep Rahul happy (but won't). With the 32-bit integers, the bitvector will require 512MB (ie 2**32 bits) - assuming 8-bit bytes, as some pedant may point out.
As Borealid should point out, this is a hashtable - just using a trivial hash function. This does guarantee that there won't be any collisions. The only way there could be a collision is by having the same value in the input array twice - but since the whole point is to ignore the second and later occurences, this doesn't matter.
Pseudocode for completeness...
src = dest = input.begin ();
while (src != input.end ())
{
if (!bitvector [*src])
{
bitvector [*src] = true;
*dest = *src; dest++;
}
src++;
}
// at this point, dest gives the new end of the array
Just to be really silly (but theoretically correct), I'll also point out that the space requirement is still O(1) even if the array holds 64-bit integers. The constant term is a bit big, I agree, and you may have issues with 64-bit CPUs that can't actually use the full 64 bits of an address, but...

Take your example. If the array elements are bounded integer, you can create a lookup bitarray.
If you find an integer such as 3, turn the 3rd bit on.
If you find an integer such as 5, turn the 5th bit on.
If the array contains elements rather than integer, or the element is not bounded, using a hashtable would be a good choice, since hashtable lookup cost is a constant.

The canonical implementation of the unique() algorithm looks like something similar to the following:
template<typename Fwd>
Fwd unique(Fwd first, Fwd last)
{
if( first == last ) return first;
Fwd result = first;
while( ++first != last ) {
if( !(*result == *first) )
*(++result) = *first;
}
return ++result;
}
This algorithm takes a range of sorted elements. If the range is not sorted, sort it before invoking the algorithm. The algorithm will run in-place, and return an iterator pointing to one-past-the-last-element of the unique'd sequence.
If you can't sort the elements then you've cornered yourself and you have no other choice but to use for the task an algorithm with runtime performance worse than O(n).
This algorithm runs in O(n) runtime. That's big-oh of n, worst case in all cases, not amortized time. It uses O(1) space.

The example you have given is a sorted array. It is possible only in that case (given your constant space constraint)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js