Will std::lower_bound be logarithmic for list<>? - c++

Suppose I have a list<int> and maintaining it in ordered state. Can I isert new values into it with logarithmic complexity with code like this
#include <iostream>
#include <random>
#include <list>
#include <algorithm>
using namespace std;
ostream& operator<<(ostream& out, const list<int> data) {
for(auto it=data.begin(); it!=data.end(); ++it) {
if(it!=data.begin()) {
out << ", ";
}
out << (*it);
}
return out;
}
int main() {
const int max = 100;
mt19937 gen;
uniform_int_distribution<int> dist(0, max);
list<int> data;
for(int i=0; i<max; ++i) {
int val = dist(gen);
auto it = lower_bound(data.begin(), data.end(), val);
data.insert(it, val);
}
cout << data << endl;
}
I would say not, because it is impossible to position iterator in list in O(1) but documentation says strange:
The number of comparisons performed is logarithmic in the distance
between first and last (At most log2(last - first) + O(1)
comparisons). However, for non-LegacyRandomAccessIterators, the number
of iterator increments is linear. Notably, std::set and std::multiset
iterators are not random access, and so their member functions
std::set::lower_bound (resp. std::multiset::lower_bound) should be
preferred.
i.e. it doesn't recomment to use this function for set which is alterady search tree internally. Which containers this function is inteded to use then? How to insert and maintain sorted then?

Will std::lower_bound be logarithmic for list<>?
No. Quote from documentation:
for non-LegacyRandomAccessIterators, the number of iterator increments is linear.
Which containers this function is inteded to use then?
std::lower_bound is intended for any container that is - or can be - ordered, and doesn't have faster lower bound algorithm that relies on its internal structure - which excludes std::set and std::multiset as mentioned in the documentation.

Related

How to sort a vector of strings in a specific predetermined order?

The problem: I need to sort a vector of strings in exact specific order. Let say we have a constant vector or a array with the exact order:
vector<string> correctOrder = {"Item3", "Item1", "Item5", "Item4", "Item2"};
Next, we have a dynamic incoming vector which will have same Items, but they maybe mixed and less in number.
vector<string> incommingVector = {"Item1", "Item5", "Item3"};
So I need to sort the incomming vector with the order like the first vector, correctOrder, and the result must be:
vector<string> sortedVector = {"Item3", "Item1", "Item5"};
I think the correct order may be represented in a different way, but can't figure out.
Can someone help me please?
If the default comparison is not enough (lexicographic comparison) then the simplest thing you can do is to provide the sort function with a lambda that tells it which string come first.
You can have a unordered_map<string,int> with the strings in your correctorder vector as keys and their corresponding position in the sorted array as values.
The cmp function will simply compare the values of the keys you provide in your incommingVector.
unordered_map<string, int> my_map;
for(int i = 0 ; i < correctorder.size() ; i++)
my_map[correctorder[i]]=i;
auto cmp =[&my_map](const string& s, const string& s1){
return my_map[s] < my_map[s1];
}
sort(incommingVector.begin(), incommingVector.end() , cmp);
You can create your own functor to sort your vector in template vector order as explained by below code :
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
using namespace std;
struct MyComparator
{
//static const int x = 9;
const std::vector<std::string> correctOrder{"Item1", "Item2", "Item3", "Item4", "Item5"};
bool operator() (const std::string& first,const std::string& second )
{
auto firstitr = std::find(correctOrder.begin(),correctOrder.end(),first);
auto seconditr = std::find(correctOrder.begin(),correctOrder.end(),second);
return firstitr < seconditr;
}
};
void printVector(const std::vector<std::string>& input)
{
for(const auto&elem:input)
{
std::cout<<elem<<" , ";
}
std::cout<<std::endl;
}
int main()
{
std::vector<string> incomingVector = {"Item3", "Item5", "Item1"};
std::cout<<"vector before sort... "<<std::endl;
printVector(incomingVector);
std::sort(incomingVector.begin(),incomingVector.end(),MyComparator());
std::cout<<"vector after sort...."<<std::endl;
printVector(incomingVector);
return 0;
}
You can take advantage of std::unordered_map<std::string, int>, i.e., a hash table for mapping a string into an integer in constant time. You can use it for finding out the position that a given string occupies in your vector correctOrder in O(1), so that you can compare two strings that are in the vector incomming in constant time.
Consider the following function sort_incomming_vector():
#include <unordered_map>
using Vector = std::vector<std::string>;
void sort_incomming_vector(const Vector& correctOrder /*N*/, Vector& incomming /*M*/)
{
std::unordered_map<std::string, int> order;
// populate the order hash table in O(N) time
for (size_t i = 0; i < correctOrder.size(); ++i)
order[correctOrder[i]] = i;
// sort "incomming" in O(M*log M) time
std::sort(incomming.begin(), incomming.end(),
[&order](const auto& a, const auto& b) { // sorting criterion
return order[a] < order[b];
}
);
}
The hash table order maps the strings into integers, and this resulting integer is used by the lambda (i.e., the sorting criterion) passed to the sorting algorithm, std::sort, to compare a pair strings in the vector incomming, so that the sorting algorithm can permute them accordingly.
If correctOder contains N elements, and incomming contains M elements, then the hash table can be initialised in O(N) time, and incomming can be sorted in O(M*log M) time. Therefore, the whole algorithm will run in O(N + M*log M) time.
If N is much larger than M, this solution is optimal, since the dominant term will be N, i.e., O(N + M*log M) ~ O(N).
You need to create a comparison function that returns the correct ordering and pass that to std::sort. To do that, you can write a reusable function that returns a lambda that compares the result of trying to std::find the two elements being compared. std::find returns iterators, and you can compare those with the < operator.
#include <algorithm>
std::vector<std::string> correctOrder = {"Item1", "Item2", "Item3", "Item4", "Item5"};
// Could be just std::string correctOrder[], or std::array<...> etc.
// Returns a sorter that orders elements based on the order given by the iterator pair
// (so it supports not just std::vector<string> but other containers too.
template <typename ReferenceIter>
auto ordered_sorter(ReferenceIter ref_begin, ReferenceIter ref_end) {
// Note: you can build an std::unordered_map<ReferenceIter::value_type, std::size_t> to
// be more efficient and compare map.find(left)->second with
// map.find(right)->second (after you make sure the find does not return a
// one-past-the-end iterator.
return [&](const auto& left, const auto& right) {
return std::find(ref_begin, ref_end, left) < std::find(ref_begin, ref_end, right);
};
}
int main() {
using namespace std;
vector<string> v{"Item3", "Item5", "Item1"};
// Pass the ordered_sorter to std::sort
std::sort(v.begin(), v.end(), ordered_sorter(std::begin(correctOrder), std::end(correctOrder)));
for (const auto& s : v)
std::cout << s << ", "; // "Item1, Item3, Item5, "
}
Note that this answer less efficient with a large number of elements, but more simpler than the solutions using an std::unordered_map<std::string, int> for lookup, but a linear search is probably faster for small number of elements. Do your benchmarking if performance matters.
Edit: If you don't want the default comparison to be used, then you need to pass as a third parameter your custom compare method, as shown in the example that exists in the linked reference.
Use std::sort and you are done:
#include <iostream> // std::cout
#include <algorithm> // std::sort
#include <vector> // std::vector
#include <string> // std::string
using namespace std;
int main () {
vector<string> incommingVector = {"Item3", "Item5", "Item1"};
// using default comparison (operator <):
std::sort (incommingVector.begin(), incommingVector.end());
// print out content:
std::cout << "incommingVector contains:";
for (std::vector<string>::iterator it=incommingVector.begin(); it!=incommingVector.end(); ++it)
std::cout << ' ' << *it;
std::cout << '\n';
return 0;
}
Output:
incommingVector contains: Item1 Item3 Item5

Tracking node traversals in calls to std::map::find?

I'm performing a large number of lookups, inserts and deletes on a std::map. I'm considering adding some code to optimize for speed, but I'd like to collect some statistics about the current workload. Specifically, I'd like to keep track of how many nodes 'find' has to traverse on each call so I can keep a running tally.
I'm thinking that if most changes in my map occur at the front, I might be better off searching the first N entries before using the tree that 'find' uses.
Find will have to compare elements using the map's compare function so you can provide a custom compare function that counts the number of times it is called in order to see how much work it is doing on each call (essentially how many nodes are traversed).
I don't see how searching the first N entries before calling find() could help in this case though. Iterating through the entries in a map just traverses the tree in sorted order so it can't be more efficient than just calling find() unless somehow your comparison function is much more expensive than a check for equality.
Example code:
#include <algorithm>
#include <iostream>
#include <map>
#include <numeric>
#include <vector>
using namespace std;
int main() {
vector<int> v(100);
iota(begin(v), end(v), 0);
vector<pair<int, int>> vp(v.size());
transform(begin(v), end(v), begin(vp), [](int i) { return make_pair(i, i); });
int compareCount = 0;
auto countingCompare = [&](int x, int y) { ++compareCount; return x < y; };
map<int, int, decltype(countingCompare)> m(begin(vp), end(vp), countingCompare);
cout << "Compares during construction: " << compareCount << "\n";
compareCount = 0;
auto pos = m.find(50);
cout << "Compares during find(): " << compareCount << "\n";
}
If it's feasible for your key/value structures it is worth considering unordered_map (in C++11 or TR1) as an alternative. std::map, being a balanced tree, is not likely to perform well under this usage profile, and hybrid approaches where you search the first N seem like a lot of work to me with no guaranteed payoff.

How to randomly shuffle values in a map?

I have a std::map with both key and value as integers. Now I want to randomly shuffle the map, so keys point to a different value at random. I tried random_shuffle but it doesn't compile. Note that I am not trying to shuffle the keys, which makes no sense for a map. I'm trying to randomise the values.
I could push the values into a vector, shuffle that and then copy back. Is there a better way?
You can push all the keys in a vector, shuffle the vector and use it to swap the values in the map.
Here is an example:
#include <iostream>
#include <string>
#include <vector>
#include <map>
#include <algorithm>
#include <random>
#include <ctime>
using namespace std;
int myrandom (int i) { return std::rand()%i;}
int main ()
{
srand(time(0));
map<int,string> m;
vector<int> v;
for(int i=0; i<10; i++)
m.insert(pair<int,string>(i,("v"+to_string(i))));
for(auto i: m)
{
cout << i.first << ":" << i.second << endl;
v.push_back(i.first);
}
random_shuffle(v.begin(), v.end(),myrandom);
vector<int>::iterator it=v.begin();
cout << endl;
for(auto& i:m)
{
string ts=i.second;
i.second=m[*it];
m[*it]=ts;
it++;
}
for(auto i: m)
{
cout << i.first << ":" << i.second << endl;
}
return 0;
}
The complexity of your proposal is O(N), (both the copies and the shuffle have linear complexity) which seems optimal (looking at less elements would introduce non-randomness into your shuffle).
If you want to repeatedly shuffle your data, you could maintain a map of type <Key, size_t> (i.e. the proverbial level of indirection) that indexes into a std::vector<Value> and then just shuffle that vector repeatedly. That saves you all the copying in exchange for O(N) space overhead. If the Value type itself is expensive, you have an extra vector<size_t> of indices into the real data on which you do the shuffling.
For convenience sake, you could encapsulate the map and vector inside one class that exposes a shuffle() member function. Such a wrapper would also need to expose the basic lookup / insertion / erase functionality of the underyling map.
EDIT: As pointed out by #tmyklebu in the comments, maintaining (raw or smart) pointers to secondary data can be subject to iterator invalidation (e.g. when inserting new elements at the end that causes the vector's capacity to be resized). Using indices instead of pointers solves the "insertion at the end" problem. But when writing the wrapper class you need to make sure that insertions of new key-value pairs never cause "insertions in the middle" for your secondary data because that would also invalidate the indices. A more robust library solution would be to use Boost.MultiIndex, which is specifically designed to allow multiple types of view over a data structure.
Well, with only using the map i think of that:
make a flag array for each cell of the map, randomly generate two integers s.t. 0<=i, j < size of map; swap them and mark these cells as swapped. iterate for all.
EDIT: the array is allocate by the size of the map, and is a local array.
I doubt it...
But... Why not write a quick class that has 2 vectors in. A sorted std::vector of keys and a std::random_shuffled std::vector of values? Lookup the key using std::lower_bound and use std::distance and std::advance to get the value. Easy!
Without thinking too deeply, this should have similar complexity to std::map and possibly better locality of reference.
Some untested and unfinished code to get you started.
template <class Key, class T>
class random_map
{
public:
T& at(Key const& key);
void shuffle();
private:
std::vector<Key> d_keys; // Hold the keys of the *map*; MUST be sorted.
std::vector<T> d_values;
}
template <class Key, class T>
T& random_map<Key, T>::at(Key const& key)
{
auto lb = std::lower_bound(d_keys.begin(), d_keys.end(), key);
if(key < *lb) {
throw std::out_of_range();
}
auto delta = std::difference(d_keys.begin(), lb);
auto it = std::advance(d_values.begin(), lb);
return *it;
}
template <class Key, class T>
void random_map<Key, T>::shuffle()
{
random_shuffle(d_keys.begin(), d_keys.end());
}
If you want to shuffle the map in place, you can implement your own version of random_shuffle for your map. The solution still requires placing the keys into a vector, which is done below using transform:
typedef std::map<int, std::string> map_type;
map_type m;
m[10] = "hello";
m[20] = "world";
m[30] = "!";
std::vector<map_type::key_type> v(m.size());
std::transform(m.begin(), m.end(), v.begin(),
[](const map_type::value_type &x){
return x.first;
});
srand48(time(0));
auto n = m.size();
for (auto i = n-1; i > 0; --i) {
map_type::size_type r = drand48() * (i+1);
std::swap(m[v[i]], m[v[r]]);
}
I used drand48()/srand48() for a uniform pseudo random number generator, but you can use whatever is best for you.
Alternatively, you can shuffle v, and then rebuild the map, such as:
std::random_shuffle(v.begin(), v.end());
map_type m2 = m;
int i = 0;
for (auto &x : m) {
x.second = m2[v[i++]];
}
But, I wanted to illustrate that implementing shuffle on the map in place isn't overly burdensome.
Here is my solution using std::reference_wrapper of C++11.
First, let's make a version of std::random_shuffle that shuffles references. It is a small modification of version 1 from here: using the get method to get to the referenced values.
template< class RandomIt >
void shuffleRefs( RandomIt first, RandomIt last ) {
typename std::iterator_traits<RandomIt>::difference_type i, n;
n = last - first;
for (i = n-1; i > 0; --i) {
using std::swap;
swap(first[i].get(), first[std::rand() % (i+1)].get());
}
}
Now it's easy:
template <class MapType>
void shuffleMap(MapType &map) {
std::vector<std::reference_wrapper<typename MapType::mapped_type>> v;
for (auto &el : map) v.push_back(std::ref(el.second));
shuffleRefs(v.begin(), v.end());
}

How to efficiently select a random element from a std::set

How can I efficiently select a random element from a std::set?
A std::set::iterator is not a random access iterator. So I can't directly index a randomly chosen element like I could for a std::deque or std::vector
I could take the iterator returned from std::set::begin() and increment it a random number of times in the range [0,std::set::size()), but that seems to be doing a lot of unnecessary work. For an "index" close to the set's size, I would end up traversing the entire first half of the internal tree structure, even though it's already known the element won't be found there.
Is there a better approach?
In the name of efficiency, I am willing to define "random" as less random than whatever approach I might have used to choose a random index in a vector. Call it "reasonably random".
Edit...
Many insightful answers below.
The short version is that even though you can find a specific element in log(n) time, you can't find an arbitrary element in that time through the std::set interface.
Use boost::container::flat_set instead:
boost::container::flat_set<int> set;
// ...
auto it = set.begin() + rand() % set.size();
Insertions and deletions become O(N) though, I don't know if that's a problem. You still have O(log N) lookups, and the fact that the container is contiguous gives an overall improvement that often outweighs the loss of O(log N) insertions and deletions.
What about a predicate for find (or lower_bound) which causes a random tree traversal? You'd have to tell it the size of the set so it could estimate the height of the tree and sometimes terminate before leaf nodes.
Edit: I realized the problem with this is that std::lower_bound takes a predicate but does not have any tree-like behavior (internally it uses std::advance which is discussed in the comments of another answer). std::set<>::lower_bound uses the predicate of the set, which cannot be random and still have set-like behavior.
Aha, you can't use a different predicate, but you can use a mutable predicate. Since std::set passes the predicate object around by value you must use a predicate & as the predicate so you can reach in and modify it (setting it to "randomize" mode).
Here's a quasi-working example. Unfortunately I can't wrap my brain around the right random predicate so my randomness is not excellent, but I'm sure someone can figure that out:
#include <iostream>
#include <set>
#include <stdlib.h>
#include <time.h>
using namespace std;
template <typename T>
struct RandomPredicate {
RandomPredicate() : size(0), randomize(false) { }
bool operator () (const T& a, const T& b) {
if (!randomize)
return a < b;
int r = rand();
if (size == 0)
return false;
else if (r % size == 0) {
size = 0;
return false;
} else {
size /= 2;
return r & 1;
}
}
size_t size;
bool randomize;
};
int main()
{
srand(time(0));
RandomPredicate<int> pred;
set<int, RandomPredicate<int> & > s(pred);
for (int i = 0; i < 100; ++i)
s.insert(i);
pred.randomize = true;
for (int i = 0; i < 100; ++i) {
pred.size = s.size();
set<int, RandomPredicate<int> >::iterator it = s.lower_bound(0);
cout << *it << endl;
}
}
My half-baked randomness test is ./demo | sort -u | wc -l to see how many unique integers I get out. With a larger sample set try ./demo | sort | uniq -c | sort -n to look for unwanted patterns.
If you could access the underlying red-black tree (assuming that one exists) then you could access a random node in O(log n) choosing L/R as the successive bits of a ceil(log2(n))-bit random integer. However, you can't, as the underlying data structure is not exposed by the standard.
Xeo's solution of placing iterators in a vector is O(n) time and space to set up, but amortized constant overall. This compares favourably to std::next, which is O(n) time.
You can use the std::advance method:
set <int> myset;
//insert some elements into myset
int rnd = rand() % myset.size();
set <int> :: const_iterator it(myset.begin());
advance(it, rnd);
//now 'it' points to your random element
Another way to do this, probably less random:
int mini = *myset().begin(), maxi = *myset().rbegin();
int rnd = rand() % (maxi - mini + 1) + mini;
int rndresult = *myset.lower_bound(rnd);
If either the set doesn't update frequently or you don't need to run this algorithm frequently, keep a mirrored copy of the data in a vector (or just copy the set to a vector on need) and randomly select from that.
Another approach, as seen in a comment, is to keep a vector of iterators into the set (they're only invalidated on element deletion for sets) and randomly select an iterator.
Finally if you don't need a tree-based set, you could use vector or deque as your underlying container and sort/unique-ify when needed.
You can do this by maintaining a normal array of values; when you insert to the set, you append the element to the end of the array (O(1)), then when you want to generate a random number you can grab it from the array in O(1) as well.
The issue comes when you want to remove elements from the array. The most naive method would take O(n), which might be efficient enough for your needs. However, this can be improved to O(log n) using the following method;
Keep, for each index i in the array, prfx[i], which represents the number of non-deleted elements in the range 0...i in the array. Keep a segment tree, where you keep the maximum prfx[i] contained in each range.
Updating the segment tree can be done in O(log n) per deletion. Now, when you want to access the random number, you query the segment tree to find the "real" index of the number (by finding the earliest range in which the maximum prfx is equal to the random index). This makes the random-number generation of complexity O(log n).
Average O(1)/O(log N) (hashable/unhashable) insert/delete/sample with off-the-shelf containers
The idea is simple: use rejection sampling while upper bounding the rejection rate, which is achievable with a amortized O(1) compaction operation.
However, unlike solutions based on augmented trees, this approach cannot be extended to support weighted sampling.
template <typename T>
class UniformSamplingSet {
size_t max_id = 0;
std::unordered_set<size_t> unused_ids;
std::unordered_map<size_t, T> id2value;
std::map<T, size_t> value2id;
void compact() {
size_t id = 0;
std::map<T, size_t> new_value2id;
std::unordered_map<size_t, T> new_id2value;
for (auto [_, value] : id2value) {
new_value2id.emplace(value, id);
new_id2value.emplace(id, value);
++id;
}
max_id = id;
unused_ids.clear();
std::swap(id2value, new_id2value);
std::swap(value2id, new_value2id);
}
public:
size_t size() {
return id2value.size();
}
void insert(const T& value) {
size_t id;
if (!unused_ids.empty()) {
id = *unused_ids.begin();
unused_ids.erase(unused_ids.begin());
} else {
id = max_id++;
}
if (!value2id.emplace(value, id).second) {
unused_ids.insert(id);
} else {
id2value.emplace(id, value);
}
}
void erase(const T& value) {
auto it = value2id.find(value);
if (it == value2id.end()) return;
unused_ids.insert(it->second);
id2value.erase(it->second);
value2id.erase(it);
if (unused_ids.size() * 2 > max_id) {
compact();
};
}
// uniform(n): uniform random in [0, n)
template <typename F>
T sample(F&& uniform) {
size_t i;
do { i = uniform(max_id); } while (unused_ids.find(i) != unused_ids.end());
return id2value.at(i);
}

Removing duplicates from an array using std::map

I'm directly posting my code which I've written on collabedit under 5 minutes (including figuring out the algorithm) thus even though with the risk of completely made of fun in terms of efficiency I wanted to ask my fellow experienced stack overflow algorithm enthusiasts about the problem;
Basically removing duplicate elements from an array. My Approach: Basically using the std::map as my hash table and for each element in duplicated array if the value has not been assigned add it to our new array. If assigned just skip. At the end return the unique array. Here is my code and the only thing I'm asking in terms of an interview question can my solution be more efficient?
#include <iostream>
#include <vector>
#include <map>
using namespace std;
vector<int>uniqueArr(int arr[],int size){
std::map<int,int>storedValues;
vector<int>uniqueArr;
for(int i=0;i<size;i++){
if(storedValues[arr[i]]==0){
uniqueArr.push_back(arr[i]);
storedValues[arr[i]]=1;
}
}
return uniqueArr;
}
int main()
{
const int size=10;
int arr[size]={1,2,2,4,2,5,6,5,7,1};
vector<int>uniArr=uniqueArr(arr,size);
cout<<"Result: ";
for(int i=0;i<uniArr.size();i++) cout<<uniArr[i]<<" ";
cout<<endl;
return 0;
}
First of all, there is no need for a map, a set is conceptually more correct, since you don't want to store any values, but only the keys.
Performance-wise, it might be a better idea to use a std::unordered_set instead of a std::set, as the former is hashed and can give you O(1) insert and lookup in best case, whereas the latter is a binary search tree, giving you only O(log n) access.
vector<int> uniqueArr(int arr[], int size)
{
std::unordered_set<int> storedValues;
vector<int> uniqueArr;
for(int i=0; i<size; ++i){
if(storedValues.insert(arr[i]).second)
uniqueArr.push_back(arr[i]);
return uniqueArr;
}
But if you are allowed to use the C++ standard library more extensively, you may also consider the other answers using std::sort and std::unique, although they are O(n log n) (instead of the above ~O(n) solution) and destroy the order of the elements.
If you want to use a more flexible and std-driven approach but with ~O(n) complexity and without destroying the order of the elements, you can transform the above routine into the following std-like algorithm, even if being a bit too far-fetched for a simple interview question:
template<typename ForwardIterator>
ForwardIterator unordered_unique(ForwardIterator first, ForwardIterator last)
{
typedef typename std::iterator_traits<ForwardIterator>::value_type value_type;
std::unordered_set<value_type> unique;
return std::remove_if(first, last,
[&unique](const value_type &arg) mutable -> bool
{ return !unique.insert(arg).second; });
}
Which you can then apply like std::unique in the usual erase-remove way:
std::vector<int> values(...);
values.erase(unordered_unique(values.begin(), values.end()), values.end());
To remove the unique values without copying the vector and without needing to sort it beforehand.
Since you are asking in terms of an interview question, I will say that you don't get the job.
const int size=10;
int arr[size]={1,2,2,4,2,5,6,5,7,1};
std::sort( &arr[0], &arr[size] );
int* new_end = std::unique( &arr[0], &arr[size] );
std::copy(
&arr[0], new_end,
, std::ostream_iterator< int >( std::cout, " " )
);
No temporary maps, no temporary vectors, no dynamic memory allocations, a lot less code written so its easier both to write and to mantain.
#include <algorithm>
#include <vector>
int main()
{
std::vector<int> vec({1,2,3,2,4,4,5,7,6,6});
std::sort(vec.begin(), vec.end());
vec.erase(std::unique(vec.begin(), vec.end()), vec.end());
// vec = {1,2,3,4,5,6,7}
return 0;
}
//works with C++11
// O(n log n)
In-place removal's nice for speed - something like this (returning the new size):
template <typename T, size_t N>
size_t keep_unique(T (&array)[N])
{
std::unordered_set<T> found;
for (size_t i = 0, j = 0; i < N; ++i)
if (found.insert(array[i]).second))
if (j != i) // (optional) avoid copy to self, as may be slower or unsupported by T
array[j++] = array[i];
else
++j;
return j;
}
(For larger objects, or those that can't be safely copied, may be necessary and/or faster and more space efficient to store T*s in the unordered_set - must also provide a dereferencing comparison operator and hash function.)
To visualise how this works, consider processing the following input:
1 3 6 3 5 6 0 2 1
<--+<----+ |
<-----+
The arrows above represent the minimal in-place compaction necessary to produce the answer:
1 3 6 5 0 2
That's precisely what the algorithm above does, looking at all the elements at [i], and keeping track of where they need to be copied to (and how many non-duplicates there are) in [j].