Accessing elements of a list of lists in C++ - c++

I have a list of lists like this:
std::list<std::list<double> > list;
I filled it with some lists with doubles in them (actually quite a lot, which is why I am not using a vector. All this copying takes up a lot of time.)
Say I want to access the element that could be accesed like list[3][3] if the list were not a list but a vector or two dimensional array. How would I do that?
I know that accessing elements in a list is accomplished by using an iterator. I couldn't figure out how to get out the double though.

double item = *std::next(std::begin(*std::next(std::begin(list), 3)), 3);
Using a vector would usually have much better performance, though; accessing element n of a list is O(n).
If you're concerned about performance of splicing the interior of the container, you could use deque, which has operator[], amortized constant insertion and deletion from either end, and linear time insertion and deletion from the interior.
For C++03 compilers, you can implement begin and next yourself:
template<typename Container>
typename Container::iterator begin(Container &container)
{
return container.begin();
}
template<typename Container>
typename Container::const_iterator begin(const Container &container)
{
return container.begin();
}
template<typename T, int n>
T *begin(T (&array)[n])
{
return &array[0];
}
template<typename Iterator>
Iterator next(Iterator it, typename std::iterator_traits<Iterator>::difference_type n = 1)
{
std::advance(it, n);
return it;
}

To actually answer your question, you should probably look at std::advance.

To strictly answer your question, Joachim Pileborg's answer is the way to go:
std::list<std::list<double> >::iterator it = list.begin();
std::advance(it, 3);
std::list<double>::iterator it2 = (*it).begin();
std::advance(it2, 3);
double d = *it2;
Now, from your question and further comments it is not clear whether you always add elements to the end of the lists or they can be added anywhere. If you always add to the end, vector<double> will work better. A vector<T> does not need to be copied every time its size increases; only whenever its capacity increases, which is a very different thing.
In addition to this, using reserve(), as others said before, will help a lot with the reallocations. You don't need to reserve for the combined size of all vectors, but only for each individual vector. So:
std::vector<std::vector<double> > v;
v.reserve(512); // If you are inserting 400 vectors, with a little extra just in case
And you would also reserve for each vector<double> inside v. That's all.
Take into account that your list of lists will take much more space. For each double in the internal list, it will have to allocate at least two additional pointers, and also two additional pointers for each list inside the global least. This means that the total memory taken by your container will be roughly three times that of the vector. And all this allocation and management also takes extra runtime.

Related

Extract subvector in constant time

I have a std::vector<int> and I want to throw away the x first and y last elements. Just copying the elements is not an option, since this is O(n).
Is there something like vector.begin()+=x to let the vector just start later and end earlier?
I also tried
items = std::vector<int> (&items[x+1],&items[0]+items.size()-y);
where items is my vector, but this gave me bad_alloc
C++ standard algorithms work on ranges, not on actual containers, so you don't need to extract anything: you just need to adjust the iterator range you're working with.
void foo(const std::vector<T>& vec, const size_t start, const size_t end)
{
assert(vec.size() >= end-start);
auto it1 = vec.begin() + start;
auto it2 = vec.begin() + end;
std::whatever(it1, it2);
}
I don't see why it needs to be any more complicated than that.
(trivial live demo)
If you only need a range of values, you can represent that as a pair of iterators from first to last element of the range. These can be acquired in constant time.
Edit: According to the description in the comments, this seems like the most sensible solution. If your functions expect a vector reference, then you'll need to refactor a bit.
Other solutions:
If you don't need the original vector, and therefore can modify it, and the order of elements is not relevant, you can swap the first x elements with the n-x-y...n-y elements and then remove the last x+y elements. This can be done in O(x+y) time.
If appropriate, you could choose to use std::list for which what you're asking can be done in constant time if you have iterators to the first and last node of the sublist. This also requires that you can modify the original list but the order of elements won't change.
If those are not options, then you need to copy and are stuck with O(n).
The other answers are correct: usually iterators will do.
Nevertheless, you can also write a vector view. Here is a sketch:
template<typename T>
struct vector_view
{
vector_view(std::vector<T> const& v, size_t ind_begin, size_t ind_end)
: _v(v)
, _size(/* size of range */)
, _ind_begin(ind_begin) {}
auto size() const { return _size; }
auto const& operator[](size_t i) const
{
//possibly check for input outside range
return _v[ i + _ind_begin ];
}
//conversion of view to std::vector
operator std::vector<T>() const
{
std::vector<T> ret(_size);
//fill it
return ret;
}
private:
std::vector<T> const& _v;
size_t _size;
size_t _ind_begin;
}
Expose further methods as required (some iterator stuff might be appropriate when you want to use that with the standard library algorithms).
Further, take care on the validity of the const reference std::vector<T> const& v; -- if that could be an issue, one should better work with shared-pointers.
One can also think of more general approaches here, for example, use strides or similar things.

Fast way to implement pop_front to a std::vector

I'm using some classes and several utility methods that use std:: vector.
Now I need to use each frame a pop_front - push_back method on one of those classes (but they are all linked, and work together so I can't change only one).
Most of the operations are iterate over all element and push_back operations, so what I should do for the best work is: fork the repository of those classes and utilities, template everything, and use deque or list.
But this means a lot of code rewriting and a lot of testing that will make me miss the deadline.
So I need advice to write an efficient pop_front to a static-size vector (the size will not change).
I've found here a way:
template<typename T>
void pop_front(std::vector<T>& vec)
{
vec.front() = vec.back();
vec.pop_back();
vec.front() = vec.back(); // but should this work?
}
And another idea should be:
template<typename T>
void pop_front(std::vector<T>& vec, already_allocated_vector vec1)
{
vec1.clear();
copy(vec.begin(), vec.end()-1, vec1.begin());
copy(vec1.begin(), vec1.end(), vec.begin());
}
What is the faster of these two solutions? Any other solutions?
I would expect:
template<typename T>
void pop_front(std::vector<T>& vec)
{
assert(!vec.empty());
vec.front() = std::move(vec.back());
vec.pop_back();
}
to be the most efficient way of doing this, but it does not maintain the order of the elements in the vector.
If you need to maintain the order of the remaining elements in vec, you can do:
template<typename T>
void pop_front(std::vector<T>& vec)
{
assert(!vec.empty());
vec.erase(vec.begin());
}
This will have linear time in the number of elements in vec, but it is the best you can do without changing your data structure.
Neither of these functions will maintain the vector at a constant size, because a pop_front operation will by definition remove an element from a container.
Since pop_front() only erases the first element, the direct implementation is this:
template <typename V>
void pop_front(V & v)
{
assert(!v.empty());
v.erase(v.begin());
}
Don't worry about speed for now. If you want to go back and optimize code, ask for dedicated project time.
I also have a way... Not so good tho..
This looks like #0xPwn's solution, but he didn't reverse the vector the second time.
You will probably understand this code, so i will not explain it.
#include <algorithm>
#include <vector>
template <typename T>
void pop_front(std::vector<T>& vec){
std::reverse(vec.begin(),vec.end()); // first becomes last, reverses the vector
vec.pop_back(); // pop last
std::reverse(vec.begin(),vec.end()); // reverses it again, so the elements are in the same order as before
}
if you just try to erase the first element then in the function use:
if (my_vector.size()){ //check if there any elements in the vector array
my_vector.erase(my_vector.begin()); //erase the firs element
}
if you want to emulate pop front for the entire vector array ( you want to keep pop out every element from start to end) , i suggest on:
reverse(my_vector.begin(),my_vector.end()); // reverse the order of the vector array
my_vector.pop_back(); // now just simple pop_back will give you the results

Obtaining `std::priority_queue` elements in reverse order?

I've written some K-nearest-neighbor query methods which build a list of points that are nearest to a given query point. To maintain that list of neighbors, I use the std::priority_queue such that the top element is the farthest neighbor to the query point. This way I know if I should push the new element that is currently being examined (if at a lesser distance than the current farthest neighbor) and can pop() the farthest element when my priority-queue has more than K elements.
So far, all is well. However, when I output the elements, I would like to order them from the closest to the farthest. Currently, I simply pop all the elements from the priority-queue and put them on the output-container (through an iterator), which results in a sequence of points ordered from farthest to closest, so then, I call std::reverse on the output iterator range.
As a simple example, here is a linear-search that uses the priority-queue (obviously, the actual nearest-neighbor query methods I use are far more complicated):
template <typename DistanceValue,
typename ForwardIterator,
typename OutputIterator,
typename GetDistanceFunction,
typename CompareFunction>
inline
OutputIterator min_dist_linear_search(ForwardIterator first,
ForwardIterator last,
OutputIterator output_first,
GetDistanceFunction distance,
CompareFunction compare,
std::size_t max_neighbors = 1,
DistanceValue radius = std::numeric_limits<DistanceValue>::infinity()) {
if(first == last)
return output_first;
typedef std::priority_queue< std::pair<DistanceValue, ForwardIterator>,
std::vector< std::pair<DistanceValue, ForwardIterator> >,
detail::compare_pair_first<DistanceValue, ForwardIterator, CompareFunction> > PriorityQueue;
PriorityQueue output_queue = PriorityQueue(detail::compare_pair_first<DistanceValue, ForwardIterator, CompareFunction>(compare));
for(; first != last; ++first) {
DistanceValue d = distance(*first);
if(!compare(d, radius))
continue;
output_queue.push(std::pair<DistanceValue, ForwardIterator>(d, first));
while(output_queue.size() > max_neighbors)
output_queue.pop();
if(output_queue.size() == max_neighbors)
radius = output_queue.top().first;
};
OutputIterator it = output_first;
while( !output_queue.empty() ) {
*it = *(output_queue.top().second);
output_queue.pop(); ++it;
};
std::reverse(output_first, it);
return it;
};
The above is all dandy except for one thing: it requires the output-iterator type to be bidirectional and essentially be pointing to a pre-allocated container. Now, this practice of storing the output in a range prescribed by some output iterator is great and pretty standard too (e.g. std::copy and other STL algorithms are good examples of that). However, in this case I would like to be able to only require a forward output-iterator type, which would make it possible to use back-inserter iterators like those provided for STL containers and iostreams.
So, this boils down to reversing the priority-queue before dumping its content in the output iterator. So, these are the better options I've been able to come up with:
Create a std::vector, dump the priority-queue content in it, and dump the elements into the output-iterator using a reverse-iterator on the vector.
Replace the std::priority_queue with a sorted container (e.g. std::multimap), and then dump the content into the output-iterator using the appropriate traversal order.
Are there any other reasonable option?
I used to employ a std::multimap in a previous implementation of this algorithm and others, as of my second option above. However, when I switched to std::priority_queue, the performance gain was significant. So, I'd rather not use the second option, as it really seems that using a priority-queue for maintaining the list of neighbors is much better than relying on a sorted array. Btw, I also tried a std::vector that I maintain sorted with std::inplace_merge, which was better than multimap, but didn't match up to the priority-queue.
As for the first option, which is my best option at this point, it just seems wasteful to me to have to do this double transfer of data (queue -> vector -> output). I'm just inclined to think that there must be a simpler way to do this... something that I'm missing..
The first option really isn't that bad in this application (considering the complexity of the algorithm that precedes it), but if there is a trick to avoid this double memory transfer, I'd like to know about it.
Problem solved!
I'm such an idiot... I knew I was missing something obvious. In this case, the std::sort_heap() function. The reference page even has an example that does exactly what I need (and since the std::priority_queue is just implemented in terms of a random-access container and the heap-functions (pop_heap, push_heap, make_heap) it makes no real difference to use these functions directly in-place of the std::priority_queue class). I don't know how I could have missed that.
Anyways, I hope this helps anyone who had the same problem.
One dirty idea, which would nevertheless be guaranteed to work, would be the following:
std::priority_queue<int, std::vector<int>, std::less<int> > queue;
queue.push(3);
queue.push(5);
queue.push(9);
queue.push(2);
// Prints in reverse order.
int* front = const_cast<int*>(&queue.top());
int* back = const_cast<int*>(front + queue.size());
std::sort(front, back);
while (front < back) {
printf("%i ", *front);
++front;
}
It may be noted that the in-place sorting will likely break the queue.
why don't you just specify the opposite comparison function in the declaration:
#include <iostream>
#include <queue>
#include <vector>
#include <functional>
int main() {
std::priority_queue<int, std::vector<int>, std::greater<int> > pq;
pq.push(1);
pq.push(10);
pq.push(15);
std::cout << pq.top() << std::endl;
}

What's wrong with my vector<T>::erase here?

I have two vector<T> in my program, called active and non_active respectively. This refers to the objects it contains, as to whether they are in use or not.
I have some code that loops the active vector and checks for any objects that might have gone non active. I add these to a temp_list inside the loop.
Then after the loop, I take my temp_list and do non_active.insert of all elements in the temp_list.
After that, I do call erase on my active vector and pass it the temp_list to erase.
For some reason, however, the erase crashes.
This is the code:
non_active.insert(non_active.begin(), temp_list.begin(), temp_list.end());
active.erase(temp_list.begin(), temp_list.end());
I get this assertion:
Expression:("_Pvector == NULL || (((_Myvec*)_Pvector)->_Myfirst <= _Ptr && _Ptr <= ((_Myvect*)_Pvector)->_Mylast)",0)
I've looked online and seen that there is a erase-remove idiom, however not sure how I'd apply that to a removing a range of elements from a vector<T>
I'm not using C++11.
erase expects a range of iterators passed to it that lie within the current vector. You cannot pass iterators obtained from a different vector to erase.
Here is a possible, but inefficient, C++11 solution supported by lambdas:
active.erase(std::remove_if(active.begin(), active.end(), [](const T& x)
{
return std::find(temp_list.begin(), temp_list.end(), x) != temp_list.end();
}), active.end());
And here is the equivalent C++03 solution without the lambda:
template<typename Container>
class element_of
{
Container& container;
element_of(Container& container) : container(container) {}
public:
template<typename T>
bool operator()(const T& x) const
{
return std::find(container.begin(), container.end(), x)
!= container.end();
}
};
// ...
active.erase(std::remove_if(active.begin(), active.end(),
element_of<std::vector<T> >(temp_list)),
active.end());
If you replace temp_list with a std::set and the std::find_if with a find member function call on the set, the performance should be acceptable.
The erase method is intended to accept iterators to the same container object. You're trying to pass in iterators to temp_list to use to erase elements from active which is not allowed for good reasons, as a Sequence's range erase method is intended to specify a range in that Sequence to remove. It's important that the iterators are in that sequence because otherwise we're specifying a range of values to erase rather than a range within the same container which is a much more costly operation.
The type of logic you're trying to perform suggests to me that a set or list might be better suited for the purpose. That is, you're trying to erase various elements from the middle of a container that match a certain condition and transfer them to another container, and you could eliminate the need for temp_list this way.
With list, for example, it could be as easy as this:
for (ActiveList::iterator it = active.begin(); it != active.end();)
{
if (it->no_longer_active())
{
inactive.push_back(*it);
it = active.erase(it);
}
else
++it;
}
However, sometimes vector can outperform these solutions, and maybe you have need for vector for other reasons (like ensuring contiguous memory). In that case, std::remove_if is your best bet.
Example:
bool not_active(const YourObjectType& obj);
active_list.erase(
remove_if(active_list.begin(), active_list.end(), not_active),
active_list.end());
More info on this can be found under the topic, 'erase-remove idiom' and you may need predicate function objects depending on what external states are required to determine if an object is no longer active.
You can actually make the erase/remove idiom usable for your case. You just need to move the value over to the other container before std::remove_if possibly shuffles it around: in the predicate.
template<class OutIt, class Pred>
struct copy_if_predicate{
copy_if_predicate(OutIt dest, Pred p)
: dest(dest), pred(p) {}
template<class T>
bool operator()(T const& v){
if(pred(v)){
*dest++ = v;
return true;
}
return false;
}
OutIt dest;
Pred pred;
};
template<class OutIt, class Pred>
copy_if_predicate<OutIt,Pred> copy_if_pred(OutIt dest, Pred pred){
return copy_if_predicate<OutIt,Pred>(dest,pred);
}
Live example on Ideone. (I directly used bools to make the code shorter, not bothering with output and the likes.)
The function std::vector::erase requires the iterators to be iterators into this vector, but you are passing iterators from temp_list. You cannot erase elements from a container that are in a completely different container.
active.erase(temp_list.begin(), temp_list.end());
You try to erase elements from one list, but you use iterators for second list. First list iterators aren't the same, like in second list.
I would like to suggest that this is an example of where std::list should be used. You can splice members from one list to another. Look at std::list::splice()for this.
Do you need random access? If not then you don't need a std::vector.
Note that with list, when you splice, your iterators, and references to the objects in the list remain valid.
If you don't mind making the implementation "intrusive", your objects can contain their own iterator value, so they know where they are. Then when they change state, they can automate their own "moving" from one list to the other, and you don't need to transverse the whole list for them. (If you want this sweep to happen later, you can get them to "register" themselves for later moving).
I will write an algorithm here now to run through one collection and if a condition exists, it will effect a std::remove_if but at the same time will copy the element into your "inserter".
//fwd iterator must be writable
template< typename FwdIterator, typename InputIterator, typename Pred >
FwdIterator copy_and_remove_if( FwdIterator inp, FwdIterator end, InputIterator outp, Pred pred )
{
for( FwdIterator test = inp; test != end; ++test )
{
if( pred(*test) ) // insert
{
*outp = *test;
++outp;
}
else // keep
{
if( test != inp )
{
*inp = *test;
}
++inp;
}
}
return inp;
}
This is a bit like std::remove_if but will copy the ones being removed into an alternative collection. You would invoke it like this (for a vector) where isInactive is a valid predicate that indicates it should be moved.
active.erase( copy_and_remove_if( active.begin(), active.end(), std::back_inserter(inactive), isInactive ), active.end() );
The iterators you pass to erase() should point into the vector itself; the assertion is telling you that they don't. This version of erase() is for erasing a range out of the vector.
You need to iterate over temp_list yourself and call active.erase() on the result of dereferencing the iterator at each step.

Determining if an unordered vector<T> has all unique elements

Profiling my cpu-bound code has suggested I that spend a long time checking to see if a container contains completely unique elements. Assuming that I have some large container of unsorted elements (with < and = defined), I have two ideas on how this might be done:
The first using a set:
template <class T>
bool is_unique(vector<T> X) {
set<T> Y(X.begin(), X.end());
return X.size() == Y.size();
}
The second looping over the elements:
template <class T>
bool is_unique2(vector<T> X) {
typename vector<T>::iterator i,j;
for(i=X.begin();i!=X.end();++i) {
for(j=i+1;j!=X.end();++j) {
if(*i == *j) return 0;
}
}
return 1;
}
I've tested them the best I can, and from what I can gather from reading the documentation about STL, the answer is (as usual), it depends. I think that in the first case, if all the elements are unique it is very quick, but if there is a large degeneracy the operation seems to take O(N^2) time. For the nested iterator approach the opposite seems to be true, it is lighting fast if X[0]==X[1] but takes (understandably) O(N^2) time if all the elements are unique.
Is there a better way to do this, perhaps a STL algorithm built for this very purpose? If not, are there any suggestions eek out a bit more efficiency?
Your first example should be O(N log N) as set takes log N time for each insertion. I don't think a faster O is possible.
The second example is obviously O(N^2). The coefficient and memory usage are low, so it might be faster (or even the fastest) in some cases.
It depends what T is, but for generic performance, I'd recommend sorting a vector of pointers to the objects.
template< class T >
bool dereference_less( T const *l, T const *r )
{ return *l < *r; }
template <class T>
bool is_unique(vector<T> const &x) {
vector< T const * > vp;
vp.reserve( x.size() );
for ( size_t i = 0; i < x.size(); ++ i ) vp.push_back( &x[i] );
sort( vp.begin(), vp.end(), ptr_fun( &dereference_less<T> ) ); // O(N log N)
return adjacent_find( vp.begin(), vp.end(),
not2( ptr_fun( &dereference_less<T> ) ) ) // "opposite functor"
== vp.end(); // if no adjacent pair (vp_n,vp_n+1) has *vp_n < *vp_n+1
}
or in STL style,
template <class I>
bool is_unique(I first, I last) {
typedef typename iterator_traits<I>::value_type T;
…
And if you can reorder the original vector, of course,
template <class T>
bool is_unique(vector<T> &x) {
sort( x.begin(), x.end() ); // O(N log N)
return adjacent_find( x.begin(), x.end() ) == x.end();
}
You must sort the vector if you want to quickly determine if it has only unique elements. Otherwise the best you can do is O(n^2) runtime or O(n log n) runtime with O(n) space. I think it's best to write a function that assumes the input is sorted.
template<class Fwd>
bool is_unique(In first, In last)
{
return adjacent_find(first, last) == last;
}
then have the client sort the vector, or a make a sorted copy of the vector. This will open a door for dynamic programming. That is, if the client sorted the vector in the past then they have the option to keep and refer to that sorted vector so they can repeat this operation for O(n) runtime.
The standard library has std::unique, but that would require you to make a copy of the entire container (note that in both of your examples you make a copy of the entire vector as well, since you unnecessarily pass the vector by value).
template <typename T>
bool is_unique(std::vector<T> vec)
{
std::sort(vec.begin(), vec.end());
return std::unique(vec.begin(), vec.end()) == vec.end();
}
Whether this would be faster than using a std::set would, as you know, depend :-).
Is it infeasible to just use a container that provides this "guarantee" from the get-go? Would it be useful to flag a duplicate at the time of insertion rather than at some point in the future? When I've wanted to do something like this, that's the direction I've gone; just using the set as the "primary" container, and maybe building a parallel vector if I needed to maintain the original order, but of course that makes some assumptions about memory and CPU availability...
For one thing you could combine the advantages of both: stop building the set, if you have already discovered a duplicate:
template <class T>
bool is_unique(const std::vector<T>& vec)
{
std::set<T> test;
for (typename std::vector<T>::const_iterator it = vec.begin(); it != vec.end(); ++it) {
if (!test.insert(*it).second) {
return false;
}
}
return true;
}
BTW, Potatoswatter makes a good point that in the generic case you might want to avoid copying T, in which case you might use a std::set<const T*, dereference_less> instead.
You could of course potentially do much better if it wasn't generic. E.g if you had a vector of integers of known range, you could just mark in an array (or even bitset) if an element exists.
You can use std::unique, but it requires the range to be sorted first:
template <class T>
bool is_unique(vector<T> X) {
std::sort(X.begin(), X.end());
return std::unique(X.begin(), X.end()) == X.end();
}
std::unique modifies the sequence and returns an iterator to the end of the unique set, so if that's still the end of the vector then it must be unique.
This runs in nlog(n); the same as your set example. I don't think you can theoretically guarantee to do it faster, although using a C++0x std::unordered_set instead of std::set would do it in expected linear time - but that requires that your elements be hashable as well as having operator == defined, which might not be so easy.
Also, if you're not modifying the vector in your examples, you'd improve performance by passing it by const reference, so you don't make an unnecessary copy of it.
If I may add my own 2 cents.
First of all, as #Potatoswatter remarked, unless your elements are cheap to copy (built-in/small PODs) you'll want to use pointers to the original elements rather than copying them.
Second, there are 2 strategies available.
Simply ensure there is no duplicate inserted in the first place. This means, of course, controlling the insertion, which is generally achieved by creating a dedicated class (with the vector as attribute).
Whenever the property is needed, check for duplicates
I must admit I would lean toward the first. Encapsulation, clear separation of responsibilities and all that.
Anyway, there are a number of ways depending on the requirements. The first question is:
do we have to let the elements in the vector in a particular order or can we "mess" with them ?
If we can mess with them, I would suggest keeping the vector sorted: Loki::AssocVector should get you started.
If not, then we need to keep an index on the structure to ensure this property... wait a minute: Boost.MultiIndex to the rescue ?
Thirdly: as you remarked yourself a simple linear search doubled yield a O(N2) complexity in average which is no good.
If < is already defined, then sorting is obvious, with its O(N log N) complexity.
It might also be worth it to make T Hashable, because a std::tr1::hash_set could yield a better time (I know, you need a RandomAccessIterator, but if T is Hashable then it's easy to have T* Hashable to ;) )
But in the end the real issue here is that our advises are necessary generic because we lack data.
What is T, do you intend the algorithm to be generic ?
What is the number of elements ? 10, 100, 10.000, 1.000.000 ? Because asymptotic complexity is kind of moot when dealing with a few hundreds....
And of course: can you ensure unicity at insertion time ? Can you modify the vector itself ?
Well, your first one should only take N log(N), so it's clearly the better worse case scenario for this application.
However, you should be able to get a better best case if you check as you add things to the set:
template <class T>
bool is_unique3(vector<T> X) {
set<T> Y;
typename vector<T>::const_iterator i;
for(i=X.begin(); i!=X.end(); ++i) {
if (Y.find(*i) != Y.end()) {
return false;
}
Y.insert(*i);
}
return true;
}
This should have O(1) best case, O(N log(N)) worst case, and average case depends on the distribution of the inputs.
If the type T You store in Your vector is large and copying it is costly, consider creating a vector of pointers or iterators to Your vector elements. Sort it based on the element pointed to and then check for uniqueness.
You can also use the std::set for that. The template looks like this
template <class Key,class Traits=less<Key>,class Allocator=allocator<Key> > class set
I think You can provide appropriate Traits parameter and insert raw pointers for speed or implement a simple wrapper class for pointers with < operator.
Don't use the constructor for inserting into the set. Use insert method. The method (one of overloads) has a signature
pair <iterator, bool> insert(const value_type& _Val);
By checking the result (second member) You can often detect the duplicate much quicker, than if You inserted all elements.
In the (very) special case of sorting discrete values with a known, not too big, maximum value N.
You should be able to start a bucket sort and simply check that the number of values in each bucket is below 2.
bool is_unique(const vector<int>& X, int N)
{
vector<int> buckets(N,0);
typename vector<int>::const_iterator i;
for(i = X.begin(); i != X.end(); ++i)
if(++buckets[*i] > 1)
return false;
return true;
}
The complexity of this would be O(n).
Using the current C++ standard containers, you have a good solution in your first example. But if you can use a hash container, you might be able to do better, as the hash set will be nO(1) instead of nO(log n) for a standard set. Of course everything will depend on the size of n and your particular library implementation.