Moving elements out of an associative container - c++

Just for fun, I have implemented the simplest sorting algorithm imaginable:
template<typename Iterator>
void treesort(Iterator begin, Iterator end)
{
typedef typename std::iterator_traits<Iterator>::value_type element_type;
// copy data into the tree
std::multiset<element_type> tree(begin, end);
// copy data out of the tree
std::copy(tree.begin(), tree.end(), begin);
}
It's only about 20 times slower than std::sort for my test data :)
Next, I wanted to improve the performance with move semantics:
template<typename Iterator>
void treesort(Iterator begin, Iterator end)
{
typedef typename std::iterator_traits<Iterator>::value_type element_type;
// move data into the tree
std::multiset<element_type> tree(std::make_move_iterator(begin),
std::make_move_iterator(end));
// move data out of the tree
std::move(tree.begin(), tree.end(), begin);
}
But this did not affect the performance in a significant way, even though I am sorting std::strings.
Then I remembered that associative containers are constant from the outside, that is, std::move and std::copy will do the same thing here :( Is there any other way to move the data out of the tree?

std::set and std::multiset only provide const access to their elements. This means you cannot move something out of the set. If you could move items out (or modify them at all), you could break the set by changing the sort order of the items. So C++11 forbids it.
So your attempt to use the std::move algorithm will just invoke the copy constructor.

I believe you could make a custom allocator for the multiset to use (3rd template argument) which actually moves the elements in it's destroy method back to the user's container. Then erase each element in the set and during its destruction it should move your string back to the original container. I think the custom allocator would need to have 2 phase construction (pass it the begin iterator passed to yourtreesort function to hold as a member, but not during construction because it has to be default constructible).
Obviously this would be bizarre and is a silly workaround for not having a pop method in set/multiset. But it should be possible.

I like Dave's idea of a freaky allocator that remembers the source of each move constructed object and automatically moves back on destruction, I'd never thought of doing that!
But here's an answer closer to your original attempt:
template<typename Iterator>
void treesort_mv(Iterator begin, Iterator end)
{
typedef typename std::iterator_traits<Iterator>::value_type element_type;
// move the elements to tmp storage
std::vector<element_type> tmp(std::make_move_iterator(begin),
std::make_move_iterator(end));
// fill the tree with sorted references
typedef std::reference_wrapper<element_type> element_ref;
std::multiset<element_ref, std::less<element_type>> tree(tmp.begin(), tmp.end());
// move data out of the vector, in sorted order
std::move(tree.begin(), tree.end(), begin);
}
This sorts a multiset of references, so they don't need to be moved out of the tree.
However, when moving back into the original range the move assignments are not necessarily safe for self-assignment, so I moved them into a vector first, so that when re-assigning them back to the original range there will not be self-assignments.
This is marginally faster than your original version in my tests. It probably loses efficiency because it has to allocate the vector as well as all the tree nodes. That and the fact that my compiler uses COW strings so moving isn't much faster than copying anyway.

Related

Can one capture the reallocations of std::vector?

I know that push_back() on an std::vector can cause reallocation and therefore invalidate iterators in the pointer. Is there a way of installing a hook on reallocations (which presumably happen very seldom) so that I can adjust iterators appropriately?
Ideally something like this:
class hook; // forward
std::vectorwithhook<T,hook> v;
auto pointer = v.end();
template<> class hook<T> {
void operator()(T *old, T *new) { pointer += new-old; }
}
and then I can push_back() on v and play with pointer with no fear.
IMHO the easiest way to do this would be to have your vectorwithhook::push_back return the new end() and use it like:
pointer = v.push_back(new_item);
NOTE: you would have to do this for all members that change content of the vector (e.g. emplace_back, pop_back, insert etc...)
Alternatively, it should also be possible by creating your own allocator type, which will take a reference to iterator and the container in constructor and update it every time allocator::allocate(...) or allocator::dellocate(...) is called. NOTE that this goes against the principals of STL that was designed to have iterators, containers, allocators separate from one another...
P.S. none of this sounds like a good idea tbh, I would think about reworking the code to avoid keeping the end() iterator instead of doing any of this.

How to get a guaranteed invalid iterator (for vector)?

I have a container class which is essentially (showing relevant parts only)
class Foo {
typedef std::map<int, std::vector<int>> data_t;
data_t data;
struct iterator {
data_t::iterator major_it;
data_t::mapped_type::iterator minor_it;
// ...
};
iterator begin();
iterator end() {
return { data.end(), /* XXX */ };
}
};
As you see, I want to have iterator implemented for this container. The iterator shall go through each node in map and iterator through each element in the vector the node refers to. I have no problem implementing the iterator, but I had some trouble implementing end() for the container.
The iterator consists of two levels of iterators, and for the past-end iterator, major_it will have to be data_t.end(), but I won't have anything to initialize minor_it.
In the same vein, begin() will also be broken when the map is empty.
Any idea?
std::vector::iterator is always invalid when value-initialized:
std::vector<…>::iterator invalid_iter_value = {};
Incidentally, it's unusable when default-initialized (i.e. uninitialized), which might be good enough for you. If major_it is already at the end, then simply don't access minor_it.
std::vector<…>::iterator unusable_iter_value;
However, note that it's also illegal to copy the default-initialized object, so value-initialization is a good idea unless you're customizing the copy constructor and operator.

Trying to wrap std containers to store rvalue references (like unique_ptr, but on the stack)

I'm trying to do strange things again.
Okay, here's the general idea. I want a std::list (and vector and so on) that actually own the objects they contain. I want to move the values into it, and access them by reference.
Example using a list of unique_ptr:
using namespace std;
list<unique_ptr<T>> items; // T is whatever type
items.push_back(make_unique(...));
items.push_back(make_unique(...));
for ( unique_ptr<T> item : items )
item.dosomething();
With me so far? Good. Now, let's do it with stack semantics and rvalue references. We can't just use a list<T&&> for obvious reasons, so we'd have to make a new class:
using namespace std;
owninglist<T> items;
items.push_back(T());
items.push_back(T());
for ( T& item : items )
item.dosomething();
Of course, I might want an owningstack or owningvector as well, so ideally we want it to be templated:
owning<std::list<T>> items;
The owning<U<T>> class should inherit whatever push_back() and pop_front() etc functions the underlying collection has. Presumably to achieve that, I'd need to code a generic base class, and derive explicit specialisations for the collections that have unusual functions:
template<typename T> owning<std::queue<T>> : owningbase<T> {
void push_front() { ... }
}
I'm getting stuck on the iterators. The begin() and end() functions should return an iterator that works the same as the underlying collection's iterator would, except with an operator*() that returns the item by lvalue reference instead of by value.
We'd need some way to transfer ownership of an item out of the list again. Perhaps the iterator could have an operator~ that returns the item as an rvalue, deletes the item from the list, and invalidates the iterator?
Of course, all this is assuming the underlying std::list (or whatever) can be convinced to take an rvalue. If push_back() copies the value in as an lvalue, then none of this is going to work. Would I be better off coding a container from scratch? If I did, is there some way to put the majority of the code for list, queue, stack and vector into a single base class, to save rewriting pretty much the same class four times over?
Perhaps I could introduce an intermediate class, some kind of wrapper? So owned<list<T>> could inherit from list<refwrapper<T>> or something? I know boost has a reference_wrapper, but I'm not sure it fits this scenario.
If you want to avoid copy elements around you can use std::move.
So if you have a std::list you can populate it with values by moving them in:
SomeBigObject sbo;
std::list<SomeBigObject> list;
list.push_back(SomeBigObject()); // SomeBigObject() is a rvalue and so it is moved
list.push_back(std::move(sbo)); // sbo may not be a rvalue so you have to move it
// For construction you can also use std::list::emplace
list.emplace(list.end()); // construct the value directly at the end of the list
For accessing them you can simply use the ranged based loop:
for(auto& i :list)
...
If you want to move them out of the container you can also use std::move.
The object is moved out of the container but the remains will still be in the container,
so you have to erase them:
for(auto it = list.begin; it != lsit.end();)
{
// the value of *it is moved into obj;
// an empty value of "SomeBigObject" will remain so erase it from the list
SomeBigObject obj = std::move(*it);
it = list.erase(it);
// do something with "obj"
...
}

How to get std::set pointer to the raw data?

I want to pass the whole set as an argument to a function, like the way we do for arrays (i.e &array[0]). I am not able to figure out how to get the pointer to the raw data for a set.
It is not possible to do it in the same way as an array because std::set is not required to have it's data arranged in a contiguous block of memory. It is a binary tree so it most likely consists of linked nodes. But you can pass it by reference, or use the begin() and end() iterators.
template <typename T>
void foo(const std::set<T>& s);
template <typename Iterator>
void bar(Iterator first, Iterator last);
std::set<int> mySet = ....;
foo(mySet);
bar(mySet.begin(), mySet.end());
You can't get a pointer to the raw data in the same sense as you'd do for an array, because a set doesn't reside in continuous memory.
I want to pass the whole set as an argument to a function
Pass it by reference. There's no memory overhead (if that's what you were worrying about):
void foo(std::set<int>& x);
You will have to iterate through the std::set to extract all the elements of the std::set.
Unlike std::vector and arrays there is no requirment imposed by the standard that std::set elements should be located in contiguos memory.
Either pass an reference/pointer to std::set in the function and extract the data inside the function by iterating over it.
It depends what you mean by:
"I want to pass the whole set as an argument to a function"
std::set<int> data;
// fill data;
You can pass the set by reference:
plop(data); // void plop(std::set<int>& data); // passing be reference would be the C++ way
Alternatively you can pass iterators.
This abstracts away the type of container you are using and thus allows the writers of plop() to concentrate on the algorithm. In this case the iterators behave in the same way as pointers (in C++ code).
plop(data.begin(), data.end(); // template<typename I> void plop(I begin, I end);
Alternatively do you mean you want to pass the data in a set to a C like function.
In this case you need to pass a pointer (as that is the only thing C can understand). Unfortunately you can not pass a pointer into a set directly as that has no real meaning. But you can copy the data into a vector and from there into a C program:
std::vector<int> datavec(data.begin(), data.end());
plop(&data[0], datavec.size()); // void plop(int* data, std::size_t size);
This works because vector stores the data in contiguous memory.

Hybrid vector/list container?

I'm in need of a container that has the properties of both a vector and a list. I need fast random access to elements within the container, but I also need to be able to remove elements in the middle of the container without moving the other elements. I also need to be able to iterate over all elements in the container, and see at a glance (without iteration) how many elements are in the container.
After some thought, I've figured out how I could create such a container, using a vector as the base container, and wrapping the actual stored data within a struct that also contained fields to record whether the element was valid, and pointers to the next/previous valid element in the vector. Combined with some overloading and such, it sounds like it should be fairly transparent and fulfill my requirements.
But before I actually work on creating yet another container, I'm curious if anyone knows of an existing library that implements this very thing? I'd rather use something that works than spend time debugging a custom implementation. I've looked through the Boost library (which I'm already using), but haven't found this in there.
If the order does not matter, I would just use a hash table mapping integers to pointers. std::tr1::unordered_map<int, T *> (or std::unordered_map<int, unique_ptr<T>> if C++0x is OK).
The hash table's elements can move around which is why you need to use a pointer, but it will support very fast insertion / lookup / deletion. Iteration is fast too, but the elements will come out in an indeterminate order.
Alternatively, I think you can implement your own idea as a very simple combination of a std::vector and a std::list. Just maintain both a list<T> my_list and a vector<list<T>::iterator> my_vector. To add an object, push it onto the back of my_list and then push its iterator onto my_vector. (Set an iterator to my_list.end() and decrement it to get the iterator for the last element.) To lookup, look up in the vector and just dereference the iterator. To delete, remove from the list (which you can do by iterator) and set the location in the vector to my_list.end().
std::list guarantees the elements within will not move when you delete them.
[update]
I am feeling motivated. First pass at an implementation:
#include <vector>
#include <list>
template <typename T>
class NairouList {
public:
typedef std::list<T> list_t;
typedef typename list_t::iterator iterator;
typedef std::vector<iterator> vector_t;
NairouList() : my_size(0)
{ }
void push_back(const T &elt) {
my_list.push_back(elt);
iterator i = my_list.end();
--i;
my_vector.push_back(i);
++my_size;
}
T &operator[](typename vector_t::size_type n) {
if (my_vector[n] == my_list.end())
throw "Dave's not here, man";
return *(my_vector[n]);
}
void remove(typename vector_t::size_type n) {
my_list.erase(my_vector[n]);
my_vector[n] = my_list.end();
--my_size;
}
size_t size() const {
return my_size;
}
iterator begin() {
return my_list.begin();
}
iterator end() {
return my_list.end();
}
private:
list_t my_list;
vector_t my_vector;
size_t my_size;
};
It is missing some Quality of Implementation touches... Like, you probably want more error checking (what if I delete the same element twice?) and maybe some const versions of operator[], begin(), end(). But it's a start.
That said, for "a few thousand" elements a map will likely serve at least as well. A good rule of thumb is "Never optimize anything until your profiler tells you to".
Looks like you might be wanting a std::deque. Removing an element is not as efficient as a std::list, but because deque's are typically created by using non-contiguous memory "blocks" that are managed via an additional pointer array/vector internal to the container (each "block" would be an array of N elements), removal of an element inside of a deque does not cause the same re-shuffling operation that you would see with a vector.
Edit: On second though, and after reviewing some of the comments, while I think a std::deque could work, I think a std::map or std::unordered_map will actually be better for you since it will allow the array-syntax indexing you want, yet give you fast removal of elements as well.