Accumulating many vectors into a single container w/o copying

Accumulating many vectors into a single container w/o copying - c++

Is it possible in C++11 to accumulate many std::vectors, each returned by given a function (whose API I cannot change), into a std container without copying any element?
std::vector<int> make_vect();
container acc; // what is container?
do {
acc.append(std::move(make_vect())); // how to implement this?
} while(acc.size() < n);
Note 1 that elements must not be copied even if they have no move constructor of move assignment operator, such as int in the example. So you can move a chunk of elements (by copying a pointer), but not individual elements.
Note 2 that container must allow for iteration through all accumulated elements using a single iterator. So std::vector<std::vector<>> or similar is not allowed.
Clearly, it is straightforward to write some container allowing this or to use std::list<std::vector<>> and provide your own iterator, but does the std library provide the desired functionality without such user-written additions?
It seems that the requested functionality is nothing particularly outlandish and I'm surprised how hard (if not impossible) it is even with C++11.

TL;DR I don't think a stock Standard Container can do what you ask. Here's why.
Remember that move semantics for containers are efficient because they are implemented as scope-bound handles to dynamically allocated memory. Moving a container is implemented as copying the handle, while not touching the contained elements.
Your first (explicit) constraint is not to copy any element of any of the containers. That necessitates copying the handles into your putative acc_container. In other words, you want a acc_container<std::vector<T>>. Any of the Standard Containers will allow you to do that efficiently no matter how big individual T elements.
Your second (implicit, inferred from the comments) constraint is that you want to have a uniform interface over all the elements of all individual vectors. In other words, you would like to use this as acc_container<T>. This requires an extra level of indirection in the iterators of your acc_container, where the iterator detects it has reached the end of one of the current vector<T>, and jump to the beginning of the next vector<T>.
Such a container does not exist in the Standard Library.
The easiest work-around is to use a std::vector<std::vector<T>> (to avoid copying T elements), and write your own iterator adaptors (e.g. using boost::indirect_iterator, to provide iteration over T elements).
Unfortunately, even if you provide non-member begin() / end() functions that initialize these indirect iterators from the member .begin()/.end(), range-for will not use ADL to look up these functions because it will prefere the old member functions .begin() / .end(). Furthermore, you will not be able to e.g. insert() a T element directly into your compound container, unless you also provide non-member insert() (and similarly for other functionality).
So if you want a bona fide compound container, with range-for support and a native interface of member functions, you need to write one yourself (with std::vector<std::vector<T> as back-end, and the entire std::vector<T> interface written on top of it). Maybe you can suggest it on the Boost mailinglist as a nice project.
UPDATE: here's a link to an old paper by Matt Austern on Segmented Iterators and Hierarchial Algorithms that shows some performance benefits from this approach. The downside is that you also need to make standard algorithms aware of these iterators.

Related

No find member in std::vector

Is there any particular reason why std::vector does not have a member function find? Instead you have to call std::find (and #include <algorithm>).
The reason why I ask is that I think it would be a good thing to be able to change container class in some piece of implementation without having to change the code wherever the container is accessed. Say I replace an std::vector where the implementation uses std::find with an std::map. Then I also have to replace the call of std::find with a call to the member find, unless I want to keep the linear complexity of std::find. Why not just have a member find for all container classes, which finds an element with whatever algorithm is best suited for each container?

Conceptually, std::find only requires two InputIterators to work and not an std::vector. As such, one implementation works for all containers including STL containers, and standard arrays, and anything that can supply an InputIterator, including for example an istream_iterator() - nice!
So, instead of providing one find() method for every container (and take into account that for some it might not possible, like standard arrays), one single, generic find() function is provided for all. This likely makes your code more resilient to change than adding a find() method for each container since it provides a consistent interface to search in any collection: an input stream from console, network etc., or just a basic array. This is an important aspect of the STL generic design philosophy: you can search for elements in any collection/range defined by two InputIterators.
The downside, as you note, is that in some cases, better performance may be achieved using the container's own method, which can make special assumptions to improve performance (similarly for list::remove, unorderd_map::remove/find() etc.). For this reason, containers can provide (and this is a well known design feature of STL) a method specifically for performance reasons: for example a std::unordered_map does not require one to iterate through the entire map to find an element.
In summary, since the generic std::find works efficiently for a vector, there is no need to provide a member function, since it might enforce even less portable design.
For all things STL related see The C++ Standard Library - A Tutorial and Reference, 2nd Edition

std::find is a common solution. some classes need some kind of specialization of this function so that's why they have them locally (as i remember Meyers said that if a class has its own function-member then you should use it instead of globally defined)

C++ std::vector<>::iterator is not a pointer, why?

Just a little introduction, with simple words.
In C++, iterators are "things" on which you can write at least the dereference operator *it, the increment operator ++it, and for more advanced bidirectional iterators, the decrement --it, and last but not least, for random access iterators we need operator index it[] and possibly addition and subtraction.
Such "things" in C++ are objects of types with the according operator overloads, or plain and simple pointers.
std::vector<> is a container class that wraps a continuous array, so pointer as iterator makes sense. On the nets, and in some literature you can find vector.begin() used as a pointer.
The rationale for using a pointer is less overhead, higher performance, especially if an optimizing compiler detects iteration and does its thing (vector instructions and stuff). Using iterators might be harder for the compiler to optimize.
Knowing this, my question is why modern STL implementations, let's say MSVC++ 2013 or libstdc++ in Mingw 4.7, use a special class for vector iterators?

You're completely correct that vector::iterator could be implemented by a simple pointer (see here) -- in fact the concept of an iterator is based on that of a pointer to an array element. For other containers, such as map, list, or deque, however, a pointer won't work at all. So why is this not done? Here are three reasons why a class implementation is preferrable over a raw pointer.
Implementing an iterator as separate type allows additional functionality (beyond what is required by the standard), for example (added in edit following Quentins comment) the possibility to add assertions when dereferencing an iterator, for example, in debug mode.
overload resolution If the iterator were a pointer T*, it could be passed as valid argument to a function taking T*, while this would not be possible with an iterator type. Thus making std::vector<>::iterator a pointer in fact changes the behaviour of existing code. Consider, for example,
template<typename It>
void foo(It begin, It end);
void foo(const double*a, const double*b, size_t n=0);
std::vector<double> vec;
foo(vec.begin(), vec.end()); // which foo is called?
argument-dependent lookup (ADL; pointed out by juanchopanza) If you make an unqualified call, ADL ensures that functions in namespace std will be searched only if the arguments are types defined in namespace std. So,
std::vector<double> vec;
sort(vec.begin(), vec.end()); // calls std::sort
sort(vec.data(), vec.data()+vec.size()); // fails to compile
std::sort is not found if vector<>::iterator were a mere pointer.

The implementation of the iterator is implementation defined, so long as fulfills the requirements of the standard. It could be a pointer for vector, that would work. There are several reasons for not using a pointer;
consistency with other containers.
debug and error checking support
overload resolution, class based iterators allow for overloads to work differentiating them from plain pointers
If all the iterators were pointers, then ++it on a map would not increment it to the next element since the memory is not required to be not-contiguous. Past the contiguous memory of std:::vector most standard containers require "smarter" pointers - hence iterators.
The physical requirement's of the iterator dove-tail very well with the logical requirement that movement between elements it a well defined "idiom" of iterating over them, not just moving to the next memory location.
This was one of the original design requirements and goals of the STL; the orthogonal relationship between the containers, the algorithms and connecting the two through the iterators.
Now that they are classes, you can add a whole host of error checking and sanity checks to debug code (and then remove it for more optimised release code).
Given the positive aspects class based iterators bring, why should or should you not just use pointers for std::vector iterators - consistency. Early implementations of std::vector did indeed use plain pointers, you can use them for vector. Once you have to use classes for the other iterators, given the positives they bring, applying that to vector becomes a good idea.

The rationale for using a pointer is less overhead, higher
performance, especially if an optimizing compiler detects iteration
and does its thing (vector instructions and stuff). Using iterators
might be harder for the compiler to optimize.
It might be, but it isn't. If your implementation is not utter shite, a struct wrapping a pointer will achieve the same speed.
With that in mind, it's simple to see that simple benefits like better diagnostic messages (naming the iterator instead of T*), better overload resolution, ADL, and debug checking make the struct a clear winner over the pointer. The raw pointer has no advantages.

The rationale for using a pointer is less overhead, higher
performance, especially if an optimizing compiler detects iteration
and does its thing (vector instructions and stuff). Using iterators
might be harder for the compiler to optimize.
This is the misunderstanding at the heart of the question. A well formed class implementation will have no overhead, and identical performance all because the compiler can optimize away the abstraction and treat the iterator class as just a pointer in the case of std::vector.
That said,
MSVC++ 2013 or libstdc++ in Mingw 4.7, use a special class for vector
iterators
because they view that adding a layer of abstraction class iterator to define the concept of iteration over a std::vector is more beneficial than using an ordinary pointer for this purpose.
Abstractions have a different set of costs vs benefits, typically added design complexity (not necessarily related to performance or overhead) in exchange for flexibility, future proofing, hiding implementation details. The above compilers decided this added complexity is an appropriate cost to pay for the benefits of having an abstraction.

Because STL was designed with the idea that you can write something that iterates over an iterator, no matter whether that iterator's just equivalent to a pointer to an element of memory-contiguous arrays (like std::array or std::vector) or something like a linked list, a set of keys, something that gets generated on the fly on access etc.
Also, don't be fooled: In the vector case, dereferencing might (without debug options) just break down to a inlinable pointer dereference, so there wouldn't even be overhead after compilation!

I think the reason is plain and simple: originally std::vector was not required to be implemented over contiguous blocks of memory.
So the interface could not just present a pointer.
source: https://stackoverflow.com/a/849190/225186
This was fixed later and std::vector was required to be in contiguous memory, but it was probably too late to make std::vector<T>::iterator a pointer.
Maybe some code already depended on iterator to be a class/struct.
Interestingly, I found implementations of std::vector<T>::iterator where this is valid and generated a "null" iterators (just like a null pointer) it = {};.
std::vector<double>::iterator it = {};
assert( &*it == nullptr );
Also, std::array<T>::iterator and std::initializer_list<T>::iterator are pointers T* in the implementations I saw.
A plain pointer as std::vector<T>::iterator would be perfectly fine in my opinion, in theory.
In practice, being a built-in has observable effects for metaprogramming, (e.g. std::vector<T>::iterator::difference_type wouldn't be valid, yes, one should have used iterator_traits).
Not-being a raw pointer has the (very) marginal advantage of disallowing nullability (it == nullptr) or default conductibility if you are into that. (an argument that doesn't matter for a generic programming point of view.)
At the same time the dedicated class iterators had a steep cost in other metaprogramming aspects, because if ::iterator were a pointer one wouldn't need to have ad hoc methods to detect contiguous memory (see contiguous_iterator_tag in https://en.cppreference.com/w/cpp/iterator/iterator_tags) and generic code over vectors could be directly forwarded to legacy C-functions.
For this reason alone I would argue that iterator-not-being-a-pointer was a costly mistake. It just made it hard to interact with C-code (as you need another layer of functions and type detection to safely forward stuff to C).
Having said this, I think we could still make things better by allowing automatic conversions from iterators to pointers and perhaps explicit (?) conversions from pointer to vector::iterators.

I got around this pesky obstacle by dereferencing and immediately referencing the iterator again. It looks ridiculous, but it satisfies MSVC...
class Thing {
. . .
};
void handleThing(Thing* thing) {
// do stuff
}
vector<Thing> vec;
// put some elements into vec now
for (auto it = vec.begin(); it != vec.end(); ++it)
// handleThing(it); // this doesn't work, would have been elegant ..
handleThing(&*it); // this DOES work

per-container optimizations for functions that operate on iterators

So I know the point of iterators is to abstract the underlying container so you don't have to worry about what it is.
But say you wanted to write an optimized version of merge sort and wanted to do an in place sort if the underlying container was a linked list, since you can run merge sort in-place on a linked list without the need for extra container allocations.
Is there any way to get this information to know whether you're operating on a linked structure and/or access the pointers for Standard Library and other containers?
I'm assuming there is a way since that is what std::sort does? How?

I have also always wished for a way to do this, but the answer is kind-of, but not really. This isn't possible in general, because of the template deduction rules.
Each iterator may be a member of a class, but since multiple classes may have the same iterators, if a function receives these iterators, it is literally and theoretically impossible to deduce which container it came from. That hinders things somewhat.
If you don't need the container type and can work with the iterator type alone, then yes, std::sort could optimize based on the underlying container. But no, std::sort doesn't have a special algorithm for node based containers in any C++ standard library that I know of.
The containers themselves sometimes have specialized versions, see std::list<T>::sort, but the generic std::sort can't make use of that since it works on any arbitrary range, and std::list<T>::sort only works on the entire container.
I really wish they would. Also a specialization of std::lower_bound and similar when called on the tree-based containers would be awesome.

A library implementor can do a certain set of things that are not portable and depend on the details of the implementation. While in the standard the iterator types are referred as nested types in the container, nested types are not deducible, the implementation can decide to implement those as types with namespace scope, which are deducible, and provide a nested typedef. The implementation could then, within the limits of the requirements placed in the standard, optimize the operations.
I don't know of any implementation that implements std::sort as merge sort for lists, but there are other tricks that are used in implementations, for example std::copy when the iterators are into vectors holding POD types is implemented in some libraries by calling std::memmove rather than copying one element at a time.
As I mentioned before, while this can be done by the implementor, writing your own code attempting to do the same would be non-portable and brittle, as it would depend on details of one implementation that might change in the next version of the library or even by changing compiler flags.

Can I create an empty range (iterator pair) without an underlying container object?

I have a class akin to the following:
struct Config
{
using BindingContainer = std::map<ID, std::vector<Binding>>;
using BindingIterator = BindingContainer::mapped_type::const_iterator;
boost::iterator_range<BindingIterator> bindings(ID id) const;
private:
BindingContainer m_bindings;
};
Since the ID passed to bindings() might not exist, I need to be able to represent a 'no bindings' value in the return type domain.
I don't need to differentiate an unknown ID from an ID mapped to an empty vector, so I was hoping to be able to achieve this with the interface as above and return an empty range with default-constructed iterators. Unfortunately, although a ForwardIterator is DefaultConstructible [C++11 24.2.5/1] the result of comparing a singular iterator is undefined [24.2.1/5], so without a container it seems this is not possible.
I could change the interface to e.g wrap the iterator_range in a boost::optional, or return a vector value instead; the former is a little more clunky for the caller though, and the latter has undesirable copy overheads.
Another option is to keep a statically-allocated empty vector and return its iterators. The overhead wouldn't be problematic in this instance, but I'd like to avoid it if I can.
Adapting the map iterator to yield comparable default-constructed iterators is a possibility, though seems over-complex...
Are there any other options here that would support returning an empty range when there is no underlying container?
(Incidentally I'm sure a while back I read a working paper or article about producing empty ranges for standard container type when there is no container object, but can't find anything now.)
(Note I am limited to C++11 features, though I'd be interested if there is any different approach requiring later features.)

Nope, there aren't. Your options are as you suggest. Personally, I would probably go with the idea of hijacking an iterator pair from a static empty vector; I can't imagine what notional "overhead" would be involved here, beyond a couple of extra bytes in your process image.
Is this a singular iterator and, if so, can I compare it to another one?
Comparing default-constructed iterators with operator==
And this hasn't changed in either C++14 or C++17 (so far).

You may use a default constructed boost::iterator_range
from (https://www.boost.org/doc/libs/1_55_0/libs/range/doc/html/range/reference/utilities/iterator_range.html):
However, if one creates a default constructed iterator_range, then one
can still call all its member functions. This design decision avoids
the iterator_range imposing limitations upon ranges of iterators that
are not singular.
Example here:
https://wandbox.org/permlink/zslaPwmk3lBI4Q9N

Is it a good idea to create an STL iterator which is noncopyable?

Most of the time, STL iterators are CopyConstructable, because several STL algorithms require this to improve performance, such as std::sort.
However, I've been working on a pet project to wrap the FindXFile API (previously asked about), but the problem is it's impossible to implement a copyable iterator around this API. A find handle cannot be duplicated by any means -- DuplicateHandle specifically forbids passing these types of handles to it. And if you just maintain a reference count to the find handle, then a single increment by any copy results in an increment of all copies -- clearly that is not what a copy constructed iterator is supposed to do.
Since I can't satisfy the traditional copy constructible requirement for iterators here, is it even worth trying to create an "STL style" iterator? On one hand, creating some other enumeration method is going to not fall into normal STL conventions, but on the other, following STL conventions are going to confuse users of this iterator if they try to CopyConstruct it later.
Which is the lesser of two evils?

An input iterator which is not a forward iterator is copyable, but you can only "use" one of the copies: incrementing any of them invalidates the others (dereferencing one of them does not invalidate the others). This allows it to be passed to algorithms, but the algorithm must complete with a single pass. You can tell which algorithms are OK by checking their requirements - for example copy requires only an InputIterator, whereas adjacent_find requires a ForwardIterator (first one I found).
It sounds to me as though this describes your situation. Just copy the handle (or something which refcounts the handle), without duplicating it.
The user has to understand that it's only an InputIterator, but in practice this isn't a big deal. istream_iterator is the same, and for the same reason.
With the benefit of C++11 hindsight, it would almost have made sense to require InputIterators to be movable but not to require them to be copyable, since duplicates have limited use anyway. But that's "limited use", not "no use", and anyway it's too late now to remove functionality from InputIterator, considering how much code relies on the existing definition.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Accumulating many vectors into a single container w/o copying - c++

Related

No find member in std::vector

C++ std::vector<>::iterator is not a pointer, why?

per-container optimizations for functions that operate on iterators

Can I create an empty range (iterator pair) without an underlying container object?

Is it a good idea to create an STL iterator which is noncopyable?

Categories

Resources