Writing a modern function interface to "produce a populated container"

Writing a modern function interface to "produce a populated container" - c++

When I cut my teeth on C++03, I learned several approaches to writing a "give me the collection of things" function. But each has some setbacks.
template< typename Container >
void make_collection( std::insert_iterator<Container> );
This must be implemented in a header file
The interface doesn't communicate that an empty container is expected.
or:
void make_collection( std::vector<Thing> & );
This is not container agnostic
The interface doesn't communicate that an empty container is expected.
or:
std::vector<Thing> make_collection();
This is not container agnostic
There are several avenues for unnecessary copying. (Wrong container type, wrong contained type, no RVO, no move semantics)
Using modern C++ standards, is there a more idiomatic function interface to "produce a populated container"?

The first approach is type erasure based.
template<class T>
using sink = std::function<void(T&&)>;
A sink is a callable that consumes instances of T. Data flows in, nothing flows out (visible to the caller).
template<class Container>
auto make_inserting_sink( Container& c ) {
using std::end; using std::inserter;
return [c = std::ref(c)](auto&& e) {
*inserter(c.get(), end(c.get()))++ = decltype(e)(e);
};
}
make_inserting_sink takes a container, and generates a sink that consumes stuff to be inserted. In a perfect world, it would be make_emplacing_sink and the lambda returned would take auto&&..., but we write code for the standard libraries we have, not the standard libraries we wish to have.
Both of the above are generic library code.
In the header for your collection generation, you'd have two functions. A template glue function, and a non-template function that does the actual work:
namespace impl {
void populate_collection( sink<int> );
}
template<class Container>
Container make_collection() {
Container c;
impl::populate_collection( make_inserting_sink(c) );
return c;
}
You implement impl::populate_collection outside the header file, which simply hands over an element at a time to the sink<int>. The connection between the container requested, and the produced data, is type erased by sink.
The above assumes your collection is a collection of int. Simply change the type passed to sink and a different type is used. The collection produced need not be a collection of int, just anything that can take int as input to its insert iterator.
This is less than perfectly efficient, as the type erasure creates nearly unavoidable runtime overhead. If you replaced void populate_collection( sink<int> ) with template<class F> void populate_collection(F&&) and implemented it in the header file the type erasure overhead goes away.
std::function is new to C++11, but can be implemented in C++03 or before. The auto lambda with assignment capture is a C++14 construct, but can be implemented as a non-anonymous helper function object in C++03.
We could also optimize make_collection for something like std::vector<int> with a bit of tag dispatching (so make_collection<std::vector<int>> would avoid type erasure overhead).
Now there is a completely different approach. Instead of writing a collection generator, write generator iterators.
The first is an input iterator that call some functions to generate items and advance, the last is a sentinal iterator that compares equal to the first when the collection is exhasted.
The range can have an operator Container with SFINAE test for "is it really a container", or a .to_container<Container> that constructs the container with a pair of iterators, or the end user can do it manually.
These things are annoying to write, but Microsoft is proposing Resumable functions for C++ -- await and yield that make this kind of thing really easy to write. The generator<int> returned probably still uses type erasure, but odds are there will be ways of avoiding it.
To understand what this approach would look like, examine how python generators work (or C# generators).
// exposed in header, implemented in cpp
generator<int> get_collection() resumable {
yield 7; // well, actually do work in here
yield 3; // not just return a set of stuff
yield 2; // by return I mean yield
}
// I have not looked deeply into it, but maybe the above
// can be done *without* type erasure somehow. Maybe not,
// as yield is magic akin to lambda.
// This takes an iterable `G&& g` and uses it to fill
// a container. In an optimal library-class version
// I'd have a SFINAE `try_reserve(c, size_at_least(g))`
// call in there, where `size_at_least` means "if there is
// a cheap way to get the size of g, do it, otherwise return
// 0" and `try_reserve` means "here is a guess asto how big
// you should be, if useful please use it".
template<class Container, class G>
Container fill_container( G&& g ) {
Container c;
using std::end;
for(auto&& x:std::forward<G>(g) ) {
*std::inserter( c, end(c) ) = decltype(x)(x);
}
return c;
}
auto v = fill_container<std::vector<int>>(get_collection());
auto s = fill_container<std::set<int>>(get_collection());
note how fill_container sort of looks like make_inserting_sink turned upside down.
As noted above, the pattern of a generating iterator or range can be written manually without resumable functions, and without type erasure -- I've done it before. It is reasonably annoying to get right (write them as input iterators, even if you think you should get fancy), but doable.
boost also has some helpers to write generating iterators that do not type erase and ranges.

If we take our inspiration from the standard, pretty much anything of the form make_<thing> is going to return <thing> by value (unless profiling indicates otherwise I don't believe returning by value should preclude a logical approach). That suggests option three. You can make it a template-template if you wish to provide a bit of container flexibility (you just have to have an understanding as to whether the allowed container is associative or not).
However depending on your needs, have you considered taking inspiration from std::generate_n and instead of making a container, provide a fill_container functionality instead? Then it would look very similar to std::generate_n, something like
template <class OutputIterator, class Generator>
void fill_container (OutputIterator first, Generator gen);
Then you can either replace elements in an existing container, or use an insert_iterator to populate from scratch, etc. The only thing you have to do is provide the appropriate generator. The name even indicates that it expects the container to be empty if you're using insertion-style iterators.

You can do this in c++11 without container copying. Move constructor will be used instead of a copy constructor.
std::vector<Thing> make_collection()

I don't think there is one idiomatic interface to produce a populated container, but it sounds like in this case you simply need a function to construct and return a container. In that case you should prefer your last case:
std::vector<Thing> make_collection();
This approach will not produce any "unnecessary copying", as long as you are using a modern C++11-compatible compiler. The container is constructed in the function, then moved via move semantics to avoid making a copy.

Related

Is it a good idea to extend std::vector?

Working slightly with javascript, I realized it is ways faster to develop compared with C++ which slows down writing for reasons which often do not apply. It is not comfortable to always pass .begin() and .end() which happens through all my application.
I am thinking about extending std::vector (more by encapsulation than inheritance) which can mostly follow the conventions of javascript methods such as
.filter([](int i){return i>=0;})
.indexOf(txt2)
.join(delim)
.reverse()
instead of
auto it = std::copy_if (foo.begin(), foo.end(), std::back_inserter(bar), [](int i){return i>=0;} );
ptrdiff_t pos = find(Names.begin(), Names.end(), old_name_) - Names.begin();
copy(elems.begin(), elems.end(), ostream_iterator<string>(s, delim));
std::reverse(a.begin(), a.end());
But, I was wondering if it is a good idea, why already there is no C++ library for such common daily functionality? Is there anything wrong with such idea?

There's nothing inheritly wrong with this idea, unless you try to delete a vector polymorphically.
For example:
auto myvec = new MyVector<int>;
std::vector<int>* myvecbase = myvec;
delete myvecbase; // bad! UB
delete myvec; // ok, not UB
This is unusual but could still be a source of error.
However, I would still not recommend it.
To gain your added functionalities, you'd have to have an instance of your own vector, which means you either have to copy or move any other existing vectors to your type. It disallows you to use your functions with a reference to a vector.
For example consider this code:
// Code not in your control:
std::vector<int>& get_vec();
// error! std::vector doesn't have reverse!
auto reversed = get_vec().reverse();
// Works if you copy the vector to your class
auto copy_vec = MyVector<int>{get_vec()};
auto reversed_copy = copy_vec.reverse();
Also, it will work with only vector, whereas I can see the utility to have these functionalities with other container types.
My advice would be to make your proposed function free - not make them member of your child class of vector. This will make them work with any instance or references, and also overloadable with other container types. This will make your code more standard ( not using your own set of containers ) and much easier to maintain.
If you feel the need to implement many of those functional style utilities for container types, I suggest you to seek a library that implements them for you, namely ranges-v3, which is on the way to standardisation.
On the other side of the argument, there are valid use case for inheriting STL's class. For example, if you deal with generic code and want to store function object that might be empty, you can inherit from std::tuple (privately) to leverage empty base class optimization.
Also, it happened to me sometime to store a specific amount of elements of the same type, which could vary at compile time. I did extended std::array (privately) to ease the implementation.
However note something about those two cases: I used them to ease the implementation of generic code, and I inherited them privately, which don't expose the inheritance to other classes.

A wrapper can be used to create a more fluent API.
template<typename container >
class wrapper{
public:
wrapper(container const& c) : c_( c ){}
wrapper& reverse() {
std::reverse(c_.begin(), c_.end());
return *this;
}
template<typename it>
wrapper& copy( it& dest ) {
std::copy(c_.begin(), c_.end(), dest );
return *this;
}
/// ...
private:
container c_;
};
The wrapper can then be used to "beautify" the code
std::vector<int> ints{ 1, 2, 3, 4 };
auto w = wrapper(ints);
auto out = std::ostream_iterator<int>(std::cout,", ");
w.reverse().copy( out );
See working version here.

Why are there so many specializations of std::swap?

While looking at the documentation for std::swap, I see a lot of specializations.
It looks like every STL container, as well as many other std facilities have a specialized swap.
I thought with the aid of templates, we wouldn't need all of these specializations?
For example,
If I write my own pair it works correctly with the templated version:
template<class T1,class T2>
struct my_pair{
T1 t1;
T2 t2;
};
int main() {
my_pair<int,char> x{1,'a'};
my_pair<int,char> y{2,'b'};
std::swap(x,y);
}
So what it is gained from specializing std::pair?
template< class T1, class T2 >
void swap( pair<T1,T2>& lhs, pair<T1,T2>& rhs );
I'm also left wondering if I should be writing my own specializations for custom classes,
or simply relying on the template version.

So what it is gained from specializing std::pair?
Performance. The generic swap is usually good enough (since C++11), but rarely optimal (for std::pair, and for most other data structures).
I'm also left wondering if I should be writing my own specializations for custom classes, or simply relying on the template version.
I suggest relying on the template by default, but if profiling shows it to be a bottleneck, know that there is probably room for improvement. Premature optimization and all that...

std::swap is implemented along the lines of the code below:
template<typename T> void swap(T& t1, T& t2) {
T temp = std::move(t1);
t1 = std::move(t2);
t2 = std::move(temp);
}
(See "How does the standard library implement std::swap?" for more information.)
So what it is gained from specializing std::pair?
std::swap can be specialized in the following way (simplified from libc++):
void swap(pair& p) noexcept(is_nothrow_swappable<first_type>{} &&
is_nothrow_swappable<second_type>{})
{
using std::swap;
swap(first, p.first);
swap(second, p.second);
}
As you can see, swap is directly invoked on the elements of the pair using ADL: this allows customized and potentially faster implementations of swap to be used on first and second (those implementations can exploit the knowledge of the internal structure of the elements for more performance).
(See "How does using std::swap enable ADL?" for more information.)

Presumably this is for performance reasons in the case that the pair's contained types are cheap to swap but expensive to copy, like vector. Since it can call swap on first and second instead of doing a copy with temporary objects it may provide a significant improvement to program performance.

The reason is performance, especially pre c++11.
Consider something like a "Vector" type. The Vector has three fields: size, capacity and a pointer to the actual data. It's copy constructor and copy assignment copy the actual data. The C++11 version also has a move constructor and move assignment that steal the pointer, setting the pointer in the source object to null.
A dedicated Vector swap implementation can simply swap the fields.
A generic swap implementation based on the copy constructor, copy assignment and destructor will result in data copying and dynamic memory allocation/deallocation.
A generic swap implementation based on the move constructor, move assignment and destructor will avoid any data copying or memory allocation but it will leave some redundant nulling and null-checks which the optimiser may or may not be able to optimise away.
So why have a specialised swap implementation for "Pair"? For a pair of int and char there is no need. They are plain old data types so a generic swap is just fine.
But what if I have a pair of say Vector and String ? I want to use the specialist swap operations for those types and so I need a swap operation on the pair type that handles it by swapping it's component elements.

The most efficient way to swap two pairs is not the same as the most efficient way to swap two vectors. The two types have a different implementation, different member variables and different member functions.
There is no just generic way to "swap" two objects in this manner.
I mean, sure, for a copyable type you could do this:
T tmp = a;
a = b;
b = tmp;
But that's horrendous.
For a moveable type you can add some std::move and prevent copies, but then you still need "swap" semantics at the next layer down in order to actually have useful move semantics. At some point, you need to specialise.

There is a rule (I think it comes from either Herb Sutter's Exceptional C++ or Scott Meyer's Effective C++ series) that if your type can provide a swap implementation that does not throw, or is faster than the generic std::swap function, it should do so as member function void swap(T &other).
Theoretically, the generic std::swap() function could use template magic to detect the presence of a member-swap and call that instead of doing
T tmp = std::move(lhs);
lhs = std::move(rhs);
rhs = std::move(tmp);
but no-one seems to have thought about that one, yet, so people tend to add overloads of free swap in order to call the (potentially faster) member-swap.

How to initialize std stack with std vector?

I need to put an std::vector into an std::stack.
Here is my method so far(I am building a card game) :
void CardStack::initializeCardStack(std::vector<Card> & p_cardVector) {
m_cardStack = std::stack<Card>();
//code that should initialize m_cardStack with p_cardVector
}
Note : I cannot change my method signature because it is a imposed by a teacher...
Do I have to iterate over the whole vector ? What is the most efficient way to do this ? The documentation.
I have tried Jens answer but it didn't work.

std::stack doesn't have a constructor which accepts iterators, so you could construct a temporary deque and initialize the stack with this:
void ClassName::initializeStack(std::vector<AnotherClass> const& v) {
m_stackAttribute = std::stack<AnotherClass>( std::stack<AnotherClass>::container_type(v.begin(), v.end()) );
}
However, this copies each element into the container. For maximum efficiency, you should also use move-semantics to eliminate copies
void ClassName::initializeStack(std::vector<AnotherClass>&& v) {
std::stack<AnotherClass>::container_type tmp( std::make_move_iterator(v.begin()), std::make_move_iterator( v.end() ));
m_stackAttribute = std::stack<AnotherClass>( std::move(tmp) );
}

The most efficient way is not using an std::stack at all and just use a std::vector or even better for this use a std::deque.
I've seen and written a lot of C++ code (a lot) but I've yet to find any use for the stack stuff (any meaningful use, that is). It would be different if the underlying container could be changed or having its container type determined at runtime, but this is not the case.
To copy the elements from an std::vector into a std::deque you can just use
std::deque<T> stack(vec.begin(), vec.end());
This will allow the implementation to use the most efficient way to copy the elements.
To explicitly answer your question: yes, the only way to put elements in a stack is to push them in with a loop. This is not efficient but the stack interface doesn't define anything else. However who wrote code accepting an std::stack parameter should be fired (unless s/he promises that it will never happen again) and its code reverted to something more sensible: you would get the same (absence of) "flexibility" but a better interface.
The design problem of stack is that it's parametrized on the underlying container type while instead (to have any meaning) should have been parametrized on the contained element type and receving in the constructor a container for that type (thus hiding the container type). In its present form is basically useless.

Is it a bad practice to return a std container from a interface class?

I have meet such a question.
I need to design a interface class, which looks like to be
struct IIDs
{
....
const std::set<int>& getAllIDs() = 0; //!< I want the collection of int to be sorted.
}
void foo()
{
const std::set<int>& ids = pIIDs->getAllIDs();
for(std::set<int>::const_iterator it = ids.begin();....;..) {
// do something
}
}
I think that return a std's container is a bit of inappropriate, for that it will force the implement to use a std::set to store the value of IDs, But If I write it as follow :
struct IIDs
{
....
int count() const = 0;
int at(int index) = 0; //!< the itmes should be sorted
}
void foo()
{
for (int i = 0; i < pIIDs->count(); ++i) {
int val = pIIDs->at(u);
...
}
}
I found that none of the std's containers could provide those requests:
the complexity of index lookup needed to less or equal than O(log n).
the complexity of insertion need to less or equal than O(log n).
the items must be sorted.
So I just have to use the example.1, Is those can be acceptable?

STL containers and template code in general should never be used across a DLL boundary.
The thing you have to keep in mind when returning complex types like STL containers is that if your call ever crosses the boundary between two different DLLs (or a DLL and an application) running different memory managers your application will most likely crash spectacularly.
The templates that make up the STL code will be executed within the implementation DLL, creating all the memory used by the container there. Later when it leaves scope in your calling code, your own memory manager will attempt to deallocate memory it doesn't own, resulting in a crash.
If you know your code won't cross DLL boundaries, and will only ever be called in the context of a single memory manager, then you're fine as far as memory management is concerned.
However, even in cases where you're only returning references, such as your example above, where the lifetime of the container would be entirely managed by the interface implementation code, unless you know that the exact same version of the STL and the exact same compiler and linker settings were used for compiling the implementation as the caller, you're asking for trouble.

The problem i see is you are returning the collection by const references, that mean that you have a member of that collection type and are returning a reference to it, if you are returning a local variable to the function (invalid memory access problems).
If it's a member variable is better provide access to begin and end iterator. If is local variable you could returned by value (C++11 should optimize and no copy anything). If it's DLL boundary try for all mean not use any C++ types, only C types.

In terms of design, and for good generic code, prefer the STL way: return iterators, leaving the container type an implementation detail of IIDs, and hide your types with typdefs
struct IIDs
{
typedef std::set<int> Container;
typedef Container::iterator IDIterator;
// We only expose iterators to the data
IDIterator begin(); //!< I want the collection of int to be sorted.
IDIterator end();
// ...
};

There are various approaches:
if you want to minimise the coupling of client code on the IIDs implementation and ensure iteration is completed while the IIDs object still exists, then use a visitor pattern: the calling code just has to supply some function to be called for each of the member elements in turn and is not responsible for the iteration itself
Visitor example:
struct IIDs
{
template <typename T>
void visit(T& t)
{
for (int i : ids_) t(i);
}
...
private:
std::set<int> ids_;
};
if you want to give the caller more freedom to mix other code in with the container traversal, and have multiple concurrent independent traversals, then provide iterators, but be aware that the client code could keep an iterator hanging around longer than the IIDs object itself - you may or may not want to handle that scenario gracefully

Best way to return list of objects in C++?

It's been a while since I programmed in C++, and after coming from python, I feel soooo in a straight jacket, ok I'm not gonna rant.
I have a couple of functions that act as "pipes", accepting a list as input, returning another list as output (based on the input),
this is in concept, but in practice, I'm using std::vector to represent the list, is that acceptable?
further more, I'm not using any pointers, so I'm using std::vector<SomeType> the_list(some_size); as the variable, and returning it directly, i.e. return the_list;
P.S. So far it's all ok, the project size is small and this doesn't seem to affect performance, but I still want to get some input/advice on this, because I feel like I'm writing python in C++.

The only thing I can see is that your forcing a copy of the list you return. It would be more efficient to do something like:
void DoSomething(const std::vector<SomeType>& in, std::vector<SomeType>& out)
{
...
// no need to return anything, just modify out
}
Because you pass in the list you want to return, you avoid the extra copy.
Edit: This is an old reply. If you can use a modern C++ compiler with move semantics, you don't need to worry about this. Of course, this answer still applies if the object you are returning DOES NOT have move semantics.

If you really need a new list, I would simply return it. Return value optimization will take care of no needless copies in most cases, and your code stays very clear.
That being said, taking lists and returning other lists is indeed python programming in C++.
A, for C++, more suitable paradigm would be to create functions that take a range of iterators and alter the underlying collection.
e.g.
void DoSomething(iterator const & from, iterator const & to);
(with iterator possibly being a template, depending on your needs)
Chaining operations is then a matter of calling consecutive methods on begin(), end().
If you don't want to alter the input, you'd make a copy yourself first.
std::vector theOutput(inputVector);
This all comes from the C++ "don't pay for what you don't need" philosophy, you'd only create copies where you actually want to keep the originals.

I'd use the generic approach:
template <typename InIt, typename OutIt>
void DoMagic(InIt first, InIt last, OutIt out)
{
for(; first != last; ++first) {
if(IsCorrectIngredient(*first)) {
*out = DoMoreMagic(*first);
++out;
}
}
}
Now you can call it
std::vector<MagicIngredients> ingredients;
std::vector<MagicResults> result;
DoMagic(ingredients.begin(), ingredients.end(), std::back_inserter(results));
You can easily change containers used without changing the algorithm used, also it is efficient there's no overhead in returning containers.

If you want to be really hardcore, you could use boost::tuple.
tuple<int, int, double> add_multiply_divide(int a, int b) {
return make_tuple(a+b, a*b, double(a)/double(b));
}
But since it seems all your objects are of a single, non-polymorphic type, then the std::vector is all well and fine.
If your types were polymorphic (inherited classes of a base class) then you'd need a vector of pointers, and you'd need to remember to delete all the allocated objects before throwing away your vector.

Using a std::vector is the preferably way in many situations. Its guaranteed to use consecutive memory and is therefor pleasant for the L1 cache.
You should be aware of what happends when your return type is std::vector. What happens under the hood is that the std::vector is recursive copied, so if SomeType's copy constructor is expensive the "return statement" may be a lengthy and time consuming operation.
If you are searching and inserting a lot in your list you could look at std::set to get logarithmic time complexity instead of linear. (std::vectors insert is constant until its capacity is exceeded).
You are saying that you have many "pipe functions"... sounds like an excellent scenario for std::transform.

Another problem with returning a list of objects (opposed to working on one or two lists in place, as BigSandwich pointed out), is if your objects have complex copy constructors, those will called for each element in the container.
If you have 1000 objects each referencing a hunk of memory, and they copy that memory on Object a, b; a=b; that's 1000 memcopys for you, just for returning them contained in a container. If you still want to return a container directly, think about pointers in this case.

It works very simple.
list<int> foo(void)
{
list<int> l;
// do something
return l;
}
Now receiving data:
list<int> lst=foo();
Is fully optimal because compiler know to optimize constructor of lst well. and
would not cause copies.
Other method, more portable:
list<int> lst;
// do anything you want with list
lst.swap(foo());
What happens: foo already optimized so there is no problem to return the value. When
you call swap you set value of lst to new, and thus do not copy it. Now old value
of lst is "swapped" and destructed.
This is the efficient way to do the job.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Writing a modern function interface to "produce a populated container" - c++

You can do this in c++11 without container copying. Move constructor will be used instead of a copy constructor. std::vector<Thing> make_collection()

Related

Is it a good idea to extend std::vector?

Why are there so many specializations of std::swap?

How to initialize std stack with std vector?

Is it a bad practice to return a std container from a interface class?

Best way to return list of objects in C++?

Categories

Resources