Implementation of a contiguous (flat) unordered container

Implementation of a contiguous (flat) unordered container - c++

I am trying to implement or conceptually design a container that has contiguous memory but where the element order is unimportant (and that is exploited for insertion/removal of objects).
This is something that is similar to std::vector, but lifting the constraint that when an element is removed the relative order of the other elements is preserved, as in this case the last element can be put in place of the removed one.
I more or less know how to implement it (based on std::vector and some special back referenced iterator) but I am looking for a reference implementation to avoid reinventing the wheel.
I am familiar with Boost.Container, but I didn't find such container.
boost::container::flat_set is close, but it maintains the order, which is unnecessary. In some sense, I am looking for some sort of "boost::container::unordered_flat_set" or "unordered_vector".
This is the behavior that I expect:
unordered_flat_set<T> ufs(100); // allocates 100 elements
ufs.reserve(120);
unordered_flat_set<T>::iterator it = ...; // find something
ufs.erase(it); // overwrite last element to that position, destroy last element
ufs.insert(T{}); //add element at "end", only if necessary reallocate, keep buffer memory in multiples of 2 (or 1.6). Element order is not fundamental, can be altered completely by a call to "erase".
ufs.size(); // report size
Both erase and insert are O(1), (unless reallocation is necessary).
Is this a concept that is not already in standard or non-standard containers.
(Perhaps it is the concept of being unordered that doesn't play well with the current containers.
After all the only "unordered" currently is std::unordered_set and it is fairly new.)
This is a reference (very minimal) implementation, it is mainly to give a concrete realization of the concept I am looking for. In fact I am looking to see if the concept already exists to apply it to an existing base-code.
I am not trying to reinvent the wheel.
#include<iostream>
#include<vector>
template<class T>
class unordered_vector{
std::vector<T> impl_;
public:
unordered_vector(){}
void reserve(int i){impl_.reserve(i);}
struct iterator{
std::vector<T>* back_ptr;
int i;
T& operator*(){return back_ptr->operator[](i);}
iterator operator++(){++i; return *this;}
iterator operator--(){--i; return *this;}
bool operator==(iterator const& other) const{return back_ptr == other.back_ptr and i == other.i;}
bool operator!=(iterator const& other) const{return not(*this == other);}
};
int size(){return impl_.size();}
iterator erase(iterator it){
*it = it.back_ptr->last(); // should I use placement new here to not rely in customized (or not assignable object type)?
return it.back_ptr->erase(it.rbegin()); // I return this for compatibility, although there is no use for this
}
iterator insert(T t){
impl_.push_back(t); return {&impl_, size()-1};
}
iterator begin(){return {&impl_, 0};} // does an unordered container have a begin ?? ok, for compatibility, like std::unordered_set
iterator end(){return {&impl_, (int)impl_.size()};} // same question,
T& operator[](int i){return impl_[i];} // same question, if it is unordered v[i] has not a "salient" meaning.
};
int main(){
unordered_vector<double> uv;
uv.reserve(10);
uv.insert(1.1);
uv.insert(2.3);
uv.insert(5.4);
uv.insert(3.1);
std::cout << uv.size() << std::endl;
auto it = uv.begin();
assert( uv.begin() != uv.end());
assert( it != uv.end() );
for(auto it = uv.begin(); it != uv.end(); ++it){
std::cout << *it << std::endl;
}
}

Please see sfl library that I have recently updated to GitHub:
https://github.com/slavenf/sfl-library
It is C++11 header only library that offers flat ordered and unordered containers that store elements contiguously in memory. All containers meet requirements of Container, AllocatorAwareContainer and ContiguousContainer.

Related

How does std::sort() get both iterators and pointers? [duplicate]

I have a custom vector container that internally stores item a linear array. Last night, I was trying to implement custom iterators for my class to be able to use them with STL algorithms. I have had some success that you can see in here:
Live example with custom iterators
While doing so, I discovered I can merely pass raw pointers to STL algorithm and they just seem to work fine. Here's the example without any iterators:
#include <cstddef>
#include <iostream>
#include <iterator>
#include <algorithm>
template<typename T>
class my_array{
T* data_;
std::size_t size_;
public:
my_array()
: data_(NULL), size_(0)
{}
my_array(std::size_t size)
: data_(new T[size]), size_(size)
{}
my_array(const my_array<T>& other){
size_ = other.size_;
data_ = new T[size_];
for (std::size_t i = 0; i<size_; i++)
data_[i] = other.data_[i];
}
my_array(const T* first, const T* last){
size_ = last - first;
data_ = new T[size_];
for (std::size_t i = 0; i<size_; i++)
data_[i] = first[i];
}
~my_array(){
delete [] data_;
}
const my_array<T>& operator=(const my_array<T>& other){
size_ = other.size_;
data_ = new T[size_];
for (std::size_t i = 0; i<size_; i++)
data_[i] = other.data_[i];
return other;
}
const T& operator[](std::size_t idx) const {return data_[idx];}
T& operator[](std::size_t& idx) {return data_[idx];}
std::size_t size(){return size_;}
T* begin(){return data_;}
T* end(){return data_+size_;}
};
template<typename T>
void print(T t) {
std::cout << t << std::endl;
}
int main(){
typedef float scalar_t;
scalar_t list [] = {1, 3, 5, 2, 4, 3, 5, 10, 10};
my_array<scalar_t> a(list, list+sizeof(list)/sizeof(scalar_t));
// works!
for (scalar_t* it = a.begin(), *end = a.end();
it != end; ++it)
std::cout << ' ' << *it;
std::cout << std::endl;
// works!
std::for_each(a.begin(), a.end(), print<scalar_t>);
std::cout << std::endl;
// works!
my_array<int> b(a.size());
std::copy(a.begin(), a.end(), b.begin());
// works!
scalar_t* end = std::remove(a.begin(), a.end(), 5);
std::for_each(a.begin(), end, print<scalar_t>);
std::cout << std::endl;
// works!
std::random_shuffle(a.begin(), end);
std::for_each(a.begin(), end, print<scalar_t>);
std::cout << std::endl;
// works!
std::cout << "Counts of 3 in array = " << std::count(a.begin(), end, 3) << std::endl << std::endl;
// works!
std::sort(a.begin(), end);
std::for_each(a.begin(), end, print<scalar_t>);
std::cout << std::endl;
// works!
if (!std::binary_search(a.begin(), a.end(), 5))
std::cout << "Removed!" << std::endl;
return 0;
}
Live example without iterators
My questions here are the following:
Does this always work for containers that have linear storage? I know that this would not work for linked-lists for example.
If they do work in this situation, why should I ever go through the hassle of implementing iterators anyway? I know how iterators generalize my code and whatnot, but if this simple array is all I ever need then I don't see the point.
What are the negative issues of what I'm doing if this approach would always work? For one thing, I can see I'm breaking data encapsulation.

One of the features of iterators being based on operator-overloading, is that pointers are already random-access iterators. This was a big design win in the early days of STL, as it made it easier to use algorithms with existing code (as well as making the interface more familiar to programmers). It's perfectly reasonable to wrap an array, add typedef T* iterator; typedef const T* const_iterator, return &array[0] from your begin() and &array[size] from your end(), and then use your container with any iterator-based algorithm. As you already realise, this will work for any container where elements are contiguous in memory (such as an array).
You might implement 'real' iterators if:
You have a different-shaped container (such as a tree or list);
You want to be able to resize the array without invalidating the iterators;
You want to add debugging checks to your iterator use, for example to check if the iterator is used after being invalidated or after the container has been deleted, or to bounds-check;
You want to introduce type-safety, and make sure people can't accidentally assign an arbitrary T* to a my_array::iterator.
I'd say this last advantage alone is well worth writing a trivial wrapper class for. If you don't take advantage of C++'s type system by making different kinds of thing have different types, you might as well switch to Javascript :-)

Yes. See Effective STL, Item 16, which demonstrates with linear storage containers you can simply take the address of an item and work with that pointer as if it pointed to a simple array.
I think you answered your own question – you probably shouldn't, if you know the simple array is all you'll ever need.
Probably the biggest issue is just that – breaking data encapsulation. Consider whether or not an abstraction such as an explicit iterator type would buy you anything versus the cost.

Does this always work for containers that have linear storage?
Yes, the iterator concepts were designed so that pointers could act as iterators over arrays.
If they do work in this situation, why should I ever go through the hassle of implementing iterators anyway?
There's no good reason to define your own iterator type in this situation, unless you want to do something like bounds-checking which can't be done with a simple pointer.
One slight benefit would be that you could include nested typedefs for the iterator's traits, as some of the standard iterator types do; but using pointers these are available from std::iterator_traits<T*> anyway.
What are the negative issues of what I'm doing if this approach would always work? For one thing, I can see I'm breaking data encapsulation.
To make the interface more consistent with STL-style containers, you should define iterator and const_iterator types (typedef aliases for the pointers), and provide const overloads of begin and end; and perhaps cbegin and cend for C++11 compatiblity.
There are various other requirements that you might want to conform to; see section 23.2 of the C++ standard for the gory details. But in general, it's more important to make iterators conform to their requirements, since STL-style algorithms work with iterators rather than containers, and by using pointers you already conform to those requirements.

It happens that pointers provide the interface required of random access iterators (dereference, increment, addition, difference, etc) and can be treated just like iterators.
It should always work for containers with contiguous storage.
You might wish to create your own iterators for the same reason you use methods instead of all public data in your classes: To encapsulate what's happening with an interface you can modify if you need to. As long as you typedef your T* to an iterator type this is probably not a significant issue. Additionally some algorithms may benefit from an iterator that's tagged with the iterator type, which you can't do for simple pointer types.

Custom range-based iterator for a templated vector of vectors

I have an assignment that's been driving me insane. I've been researching basic concepts to increase my knowledge and try to apply it to this problem but I'm a bit stuck.
We have a main.cpp file that tests a VectorOfVectors class that has this syntax:
for( int num : intVov )
{
printf( "%d ", num );
}
We're creating our own VectorOfVectors class with templated vectors as items.
We have to make the main function properly work by creating our own custom iterator that iterates through all the values, as shown by the main function. I've been researching range-based iterators but I'm a little confused as to how to construct my own, and because it is a vector of vectors the syntax does not match well with some online examples.
I would like guidance as to how I can go about creating this iterator. I know I need a begin() and end() function, as well as override the operator++ function to get it to work. Would my iterator use int values as the pointer(s) that I increment in operator++? Would I need two pointers? What would begin() and end() return, iterators, or integers, or T values, or vectors? How should I construct the iterator and what data do I need for it? Would the iterator constructor take two pointers as values, or one, or how would that work? Would the iterator need its own copy of a VectorOfVectors to iterate (and be set in a constructor)?
How would I go about increasing the pointers? Any help, general knowledge or even tips would be greatly appreciated!
Here's what I've been fiddling around, just as a reference.
#include <vector>
using std::vector;
template< typename T > class VectorOfVectors
{
public:
class iterator
{
public:
//Constructor
iterator(const VectorOfVectors<T> * vov, int pos_vov, int pos_v)
{
_pos_vov = pos_vov;
_pos_v = pos_v;
_vov = vov;
}
bool operator!= (const iterator & other) const
{
return pos != other._pos;
}
int operator* () const;
const iterator operator++ ()
{
_pos_v++;
if (_pos_v == _pos_vov->end())
{
_pos_vov++;
if (_pos_vov == _vov.end())
{
--_pos_vov;
_pos_v = _pos_vov->end();
--_pos_v;
return (*this);
}
else
{
_pos_v = _pos_vov->begin();
return (*this);
}
}
else
{
return (*this);
}
}
private:
int _pos_v;
int _pos_vov;
const VectorOfVectors<T> * _vov;
};
void AddEmptyVector()
{
vectorOfVectors.push_back(new vector<T>());
}
int GetVectorCount() const
{
return vectorOfVectors.size();
}
vector<T> GetVectorAtIndex(int index)
{
return vectorOfVectors.at(index);
}
void AddCopyOfVector(vector<T> & toBeAdded)
{
vectorOfVectors.push_back(toBeAdded);
}
iterator begin() const
{
return iter(this, 0, 0);
}
iterator end() const
{
return iterator(this, 4, 3);
}
private:
vector< vector<T> > vectorOfVectors = new vector< vector<T> >();
};

From the for loop that you posted, it seems that the instructor wants the iterator to iterate over individual elements of the vector-of-vectors (rather than, for instance, over the sub-vectors).
begin() and end() must always return iterators, not the elements pointed to by iterators (so e.g. not raw ints in this case). In certain special cases it might be possible to make these raw pointers (for example I think certain STL implementations of std::vector::iterator do this), but generally they will need to be small structs containing enough information to navigate the parent data structure.
See this page for a good description of what the for (var : collection) syntax actually translates to. This tells you that you need to design an iterator type Iter that has the following properties:
*Iter returns a T, or something convertible to T (like T&). You currently have this wrong -- your int operator* () const; (which doesn't appear to be defined?) is returning an int, for some reason.
++Iter moves the iterator to the next element.
Iter1 != Iter2 returns false when the two iterators are pointing at the same element.
How you actually create the Iter type is completely up to you. Using 2 integer indices and checking for "wraparound" on ++ as you're currently doing seems sound to me.
Other notes:
Your begin() currently calls return iter(this, 0, 0);, which won't even compile.
Please don't dynamically allocate pointers to std::vector, as you do in the declaration of vectorOfVectors and in AddEmptyVector(). For a start, neither will compile, because for them to compile you would need to declare vectorOfVectors as a pointer to a vector of pointers to vectors, and you don't want to do that, because std::vector manages its own dynamic memory allocation internally -- avoid having to ever call new and delete is the main reason you use a std::vector in the first place.

Can raw pointers be used instead of iterators with STL algorithms for containers with linear storage?

I have a custom vector container that internally stores item a linear array. Last night, I was trying to implement custom iterators for my class to be able to use them with STL algorithms. I have had some success that you can see in here:
Live example with custom iterators
While doing so, I discovered I can merely pass raw pointers to STL algorithm and they just seem to work fine. Here's the example without any iterators:
#include <cstddef>
#include <iostream>
#include <iterator>
#include <algorithm>
template<typename T>
class my_array{
T* data_;
std::size_t size_;
public:
my_array()
: data_(NULL), size_(0)
{}
my_array(std::size_t size)
: data_(new T[size]), size_(size)
{}
my_array(const my_array<T>& other){
size_ = other.size_;
data_ = new T[size_];
for (std::size_t i = 0; i<size_; i++)
data_[i] = other.data_[i];
}
my_array(const T* first, const T* last){
size_ = last - first;
data_ = new T[size_];
for (std::size_t i = 0; i<size_; i++)
data_[i] = first[i];
}
~my_array(){
delete [] data_;
}
const my_array<T>& operator=(const my_array<T>& other){
size_ = other.size_;
data_ = new T[size_];
for (std::size_t i = 0; i<size_; i++)
data_[i] = other.data_[i];
return other;
}
const T& operator[](std::size_t idx) const {return data_[idx];}
T& operator[](std::size_t& idx) {return data_[idx];}
std::size_t size(){return size_;}
T* begin(){return data_;}
T* end(){return data_+size_;}
};
template<typename T>
void print(T t) {
std::cout << t << std::endl;
}
int main(){
typedef float scalar_t;
scalar_t list [] = {1, 3, 5, 2, 4, 3, 5, 10, 10};
my_array<scalar_t> a(list, list+sizeof(list)/sizeof(scalar_t));
// works!
for (scalar_t* it = a.begin(), *end = a.end();
it != end; ++it)
std::cout << ' ' << *it;
std::cout << std::endl;
// works!
std::for_each(a.begin(), a.end(), print<scalar_t>);
std::cout << std::endl;
// works!
my_array<int> b(a.size());
std::copy(a.begin(), a.end(), b.begin());
// works!
scalar_t* end = std::remove(a.begin(), a.end(), 5);
std::for_each(a.begin(), end, print<scalar_t>);
std::cout << std::endl;
// works!
std::random_shuffle(a.begin(), end);
std::for_each(a.begin(), end, print<scalar_t>);
std::cout << std::endl;
// works!
std::cout << "Counts of 3 in array = " << std::count(a.begin(), end, 3) << std::endl << std::endl;
// works!
std::sort(a.begin(), end);
std::for_each(a.begin(), end, print<scalar_t>);
std::cout << std::endl;
// works!
if (!std::binary_search(a.begin(), a.end(), 5))
std::cout << "Removed!" << std::endl;
return 0;
}
Live example without iterators
My questions here are the following:
Does this always work for containers that have linear storage? I know that this would not work for linked-lists for example.
If they do work in this situation, why should I ever go through the hassle of implementing iterators anyway? I know how iterators generalize my code and whatnot, but if this simple array is all I ever need then I don't see the point.
What are the negative issues of what I'm doing if this approach would always work? For one thing, I can see I'm breaking data encapsulation.

One of the features of iterators being based on operator-overloading, is that pointers are already random-access iterators. This was a big design win in the early days of STL, as it made it easier to use algorithms with existing code (as well as making the interface more familiar to programmers). It's perfectly reasonable to wrap an array, add typedef T* iterator; typedef const T* const_iterator, return &array[0] from your begin() and &array[size] from your end(), and then use your container with any iterator-based algorithm. As you already realise, this will work for any container where elements are contiguous in memory (such as an array).
You might implement 'real' iterators if:
You have a different-shaped container (such as a tree or list);
You want to be able to resize the array without invalidating the iterators;
You want to add debugging checks to your iterator use, for example to check if the iterator is used after being invalidated or after the container has been deleted, or to bounds-check;
You want to introduce type-safety, and make sure people can't accidentally assign an arbitrary T* to a my_array::iterator.
I'd say this last advantage alone is well worth writing a trivial wrapper class for. If you don't take advantage of C++'s type system by making different kinds of thing have different types, you might as well switch to Javascript :-)

Yes. See Effective STL, Item 16, which demonstrates with linear storage containers you can simply take the address of an item and work with that pointer as if it pointed to a simple array.
I think you answered your own question – you probably shouldn't, if you know the simple array is all you'll ever need.
Probably the biggest issue is just that – breaking data encapsulation. Consider whether or not an abstraction such as an explicit iterator type would buy you anything versus the cost.

Does this always work for containers that have linear storage?
Yes, the iterator concepts were designed so that pointers could act as iterators over arrays.
If they do work in this situation, why should I ever go through the hassle of implementing iterators anyway?
There's no good reason to define your own iterator type in this situation, unless you want to do something like bounds-checking which can't be done with a simple pointer.
One slight benefit would be that you could include nested typedefs for the iterator's traits, as some of the standard iterator types do; but using pointers these are available from std::iterator_traits<T*> anyway.
What are the negative issues of what I'm doing if this approach would always work? For one thing, I can see I'm breaking data encapsulation.
To make the interface more consistent with STL-style containers, you should define iterator and const_iterator types (typedef aliases for the pointers), and provide const overloads of begin and end; and perhaps cbegin and cend for C++11 compatiblity.
There are various other requirements that you might want to conform to; see section 23.2 of the C++ standard for the gory details. But in general, it's more important to make iterators conform to their requirements, since STL-style algorithms work with iterators rather than containers, and by using pointers you already conform to those requirements.

It happens that pointers provide the interface required of random access iterators (dereference, increment, addition, difference, etc) and can be treated just like iterators.
It should always work for containers with contiguous storage.
You might wish to create your own iterators for the same reason you use methods instead of all public data in your classes: To encapsulate what's happening with an interface you can modify if you need to. As long as you typedef your T* to an iterator type this is probably not a significant issue. Additionally some algorithms may benefit from an iterator that's tagged with the iterator type, which you can't do for simple pointer types.

Searching for suitable data structure in c++

Suggest a suitable data structure (in C++), such that the below mentioned purpose is solved:
insert an element to the end.
read and delete an element from the end.
read and delete an element from beginning.
find out if a particular element exists.
Right now i am using vectors..but finding if a particular element exists has a great time complexity in vectors as my elements are not sorted.
Is there some better data structure than vectors to accomplish this..if yes..then which one and please give an example.

One possibility is to use std::set or std::unordered_set which is basically a hash table and maintain the order between the elements yourself. This will give you O(log(n)) or amortized O(1) lookup complexity and constant insertion/deletion at the beginning/end. In Java this is called LinkedHashSet. Unfortunately STL doesn't provide this kind of data structure out of the box, but it should be easy to implement on top of a set/unordered_set or map/unordered_map.
Here's a piece of code that illustrates the idea:
template <typename T>
class linked_set {
private:
// Comparator of values with dereferencing.
struct value_deref_less {
bool operator()(const T *lhs, const T *rhs) const {
return *lhs < *rhs;
}
};
typedef std::set<const T*, value_deref_less> Set;
Set set_; // Used for quick lookup
std::deque<T> store_; // Used for ordered storage. deque is used instead of
// vector because the former doesn't invalidate
// pointers/iterators when elements are pushed.
public:
void push_back(const T& value) {
store_.push_back(value);
set_.insert(&store_.back());
// TODO: handle the case of duplicate elements.
}
// TODO: better provide your own iterator.
typedef typename Set::iterator iterator;
iterator find(const T& value) { return set_.find(&value); }
// ...
};

You won't be able to have both fast insertions at the two sides AND fast searches with the same container, at least if you restrict the possibilities to the STL. More exotic non-standard containers may help.
But the approach I generally choose in these cases is to use two containers. For storing the elements, the obvious option is std::deque. For searches, make a std::map<K,V> in which V is an iterator for the deque. Since insert/delete in deques does not invalidate iterators that are not involved, it should be OK IF you always remember to synchronize the map and the deque (i.e. when you do an insert or delete on the deque, do that also on the map).
Another simpler/safer option, instead of using iterators - if after a search in the map you just need the element found (you don't need to visit nearby elements, etc.) - is to have in both the deque and the map smart pointers to the actual objects (more specifically, shared_ptr). Again, you have to be careful to keep both in sync; although it won't be as catastrophic if they loose sync, probably the consistency of your program will be compromised, of course.
struct MyItem
{
std::string name;
int something;
int another;
MyItem(const std::string &name_, int something_, int another_)
:name(name_), something(something_), another(another_) {}
};
class MyContainer
{
public:
typedef std::shared_ptr<MyItem> MyItemPtr;
void push_front(MyItemPtr item)
{
deque.push_front(item);
assert(map.find(item->name) == map.end());
map[item->name] = item;
}
void push_back(MyItemPtr item)
{
deque.push_back(item);
assert(map.find(item->name) == map.end());
map[item->name] = item;
}
MyItemPtr pop_front()
{
item = deque.front();
deque.pop_front();
map.erase(item->name);
return item;
}
MyItemPtr pop_back()
{
item = deque.back();
deque.pop_back();
map.erase(item->name);
return item;
}
MyItemPtr find(const std::string &name)
{
std::map<std::string, MyItemPtr>::iterator iter = map.find(name);
if (iter == map.end())
return MyItemPtr();
else
return iter->second;
}
private:
std::deque<MyItemPtr> deque;
std::map<std::string, MyItemPtr> map;
};
To use it:
MyContainer container;
MyContainer::MyItemPtr a(new MyItem("blah", 1, 2));
container.push_back(a);
MyContainer::MyItemPtr b(new MyItem("foo", 5, 6));
container.push_front(b);
MyContainer::MyItemPtr f = container.find("blah");
if (f)
cout << f->name << ", " << f->something << ", " << f->another;

You can keep the vector, but also use a std::set for fast queries.
The set is not enough for deleting an element from the beginning/end, as you don't really know which is the first/last element you've inserted. You could keep references to those elements, but then in order to support deletion, you would need the next ones and so on, which degrades back to using one more container.

You should start with a std::map to see if logarithmic complexity is suitable.
A B+Tree would be a bit more complex and would require your own implementation or research to find an open source implmentation. But it is a reasonable choice given the requirements and the pain point you cited (searching), if the std::map still proves inadequate.
You would map an element's value to its iterator in a std::list, for example. All operations would be O(lg n) with std::map.

Use std::deque. This is a double-ended queue and it is also used as a container for standard interfaces such as std::stack.
It usually uses a quasi-linked list implementation and has amortized O(1) time complexity for insertions and deletions at edges.

If there is a lot of insert/delete a linked list would be more appropriate.
Beware that a linked list (single or double) will have quite an overhead (usually the size of a pointer, but implementation vary).
The standard template library offers you std::list.

std::inserter with set - insert to begin() or end()? [duplicate]

This question already has answers here:
Is there a difference between using .begin() vs .end() for std::inserter for std::set?
(2 answers)
Closed 6 years ago.
I have some code that looks like this:
std::set<int> s1, s2, out;
// ... s1 and s2 are populated ...
std::set_intersection(s1.begin(), s1.end(),
s2.begin(), s2.end(),
std::inserter(out, out.end()));
I've read inserts can be done in amortized constant time if the value being inserted to the set immediately follows the iterator given as a "hint". This would obviously be beneficial when running the set intersection, especially since everything being written to out is already in sorted order.
How do I guarantee this optimal performance? When creating the std::inserter, out is empty so out.begin() == out.end() so I can't see it makes any difference whether I specify out.begin() or out.end() as the hint. However, if this is interpreted at inserting every element at begin(), it doesn't seem that I would get the optimum algorithmic performance. Can this be done better?

I've chosen Alexander Gessler's answer as the 'correct' answer, because it led me to this solution, which I thought I would post anyway. I've written a last_inserter(), which guarantees that the insert position is always an iterator to the last element (or begin() if empty), because set wants an iterator to the element preceding the actual insert position for best performance (so not end() - that would be one after the actual insert position).
The usage as per the original example is like this:
std::set<int> s1, s2, out;
// ... s1 and s2 are populated ...
std::set_intersection(s1.begin(), s1.end(),
s2.begin(), s2.end(),
last_inserter(out)); // note no iterator provided
This guarantees that the insert hint is always an iterator to the last element, hopefully providing best-case performance when using an output iterator to a set with a sorted range, as above.
Below is my implementation. I think it's platform specific to Visual C++ 2010's STL implementation, because it's based heavily on the existing insert_iterator, and I can only get it working by deriving from std::_Outit. If anyone knows how to make this portable, let me know:
// VC10 STL wants this to be a checked output iterator. I haven't written one, but
// this needs to be defined to silence warnings about this.
#define _SCL_SECURE_NO_WARNINGS
template<class Container>
class last_inserter_iterator : public std::_Outit {
public:
typedef last_inserter_iterator<Container> _Myt;
typedef Container container_type;
typedef typename Container::const_reference const_reference;
typedef typename Container::value_type _Valty;
last_inserter_iterator(Container& cont)
: container(cont)
{
}
_Myt& operator=(const _Valty& _Val)
{
container.insert(get_insert_hint(), _Val);
return (*this);
}
_Myt& operator=(_Valty&& _Val)
{
container.insert(get_insert_hint(), std::forward<_Valty>(_Val));
return (*this);
}
_Myt& operator*()
{
return (*this);
}
_Myt& operator++()
{
return (*this);
}
_Myt& operator++(int)
{
return (*this);
}
protected:
Container& container;
typename Container::iterator get_insert_hint() const
{
// Container is empty: no last element to insert ahead of; just insert at begin.
if (container.empty())
return container.begin();
else
{
// Otherwise return iterator to last element in the container. std::set wants the
// element *preceding* the insert position as a hint, so this should be an iterator
// to the last actual element, not end().
return (--container.end());
}
}
};
template<typename Container>
inline last_inserter_iterator<Container> last_inserter(Container& cont)
{
return last_inserter_iterator<Container>(cont);
}

You could use a custom functor instead of std::inserter and re-call out.end() every time a new element is inserted.
Alternatively, if your values are sorted descendingly, out.begin() will be fine.

According to http://gcc.gnu.org/onlinedocs/gcc-4.8.0/libstdc++/api/a01553_source.html
insert_iterator&
operator=(typename _Container::value_type&& __value)
{
iter = container->insert(iter, std::move(__value));
++iter;
return *this;
}
Where iter originally pointed to the iterator you passed to std::inserter. So iter will always point to one past the value you just inserted and if you're inserting in order, should be optimally efficient.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Implementation of a contiguous (flat) unordered container - c++

Related

How does std::sort() get both iterators and pointers? [duplicate]

Custom range-based iterator for a templated vector of vectors

Can raw pointers be used instead of iterators with STL algorithms for containers with linear storage?

Searching for suitable data structure in c++

std::inserter with set - insert to begin() or end()? [duplicate]

Categories

Resources