C++ 2a - polymorphic range - c++

I am writing a C++ library and have had this amazing idea of using as much C++2a/C++20 as possible. Thus, I am using the standard library concepts and creating my own. However, the idea of a function returning a std::vector<X> seemed non-C++20 enough to me, so I declared in my concept a return type matching std::ranges::view<X>. I've then implemented some classes that fulfill this concept.
However, the problem appeared when I wanted to devise a polymorphic wrapper class. So, let's say the concept is C and I have three implementing classes C1, C2 and C3 (but allow for more). Now I want to create a class C_virtual and a template C_virtual_impl<C c> deriving from it, which will allow me to refer to all classes fulfilling C polymorphically. However, for that to work I need a polymorphic std::ranges::view wrapper, similar in spirit to C_virtual.
I have not seen any such class in the headers and in C++ reference. Moreover, when I started implementing it myself, I quickly found myself unable to due to some requirements on iterators, in particular default constructibility, swappability and similar.
Is there a nonobvious solution in the standard library or an idiom? If not, how do I deal with the problem? Possibly a change of design will work. I certainly do not want to return a std::vector<X> or to return a V<X> where V would be a type parameter of C. How do I do this?

Range views, and many other template techniques, are not meant to be used with inheritance-based polymorphism. This is much like how vector<BaseClass> is not especially useful.
If you need runtime polymorphism, then the tool you want is not inheritance (directly); it's type erasure. That is, you have some view wrapper which uses type erasure to forward the various view operations to the erased type. This would also need to be paired with type-erased iterators that wrap the iterators of the given view.
Now of course, this means that the characteristics of the view have to be defined by the type erased wrapper. The wrapper could implement the input_range concept, but it could never fulfill more than input_range itself. Even if you put a contiguous_range type in the wrapper, the wrapper will limit the interface to that of an input_range.
As such, it's best to just avoid this case and rely on static polymorphism via templates whenever possible.

Related

Hierarchy of container types and it's place in memory [duplicate]

I've been wondering, is there any reason for the design decision in C++ to not have a pure abstract class for any of the std library containers?
I appreciate that hash_map came later from the stdext namespace but shares a very similar interface. In the event that I later decide I would like to implement my own map for a particular piece of software, I would have preferred to have some kind of interface to work with.
Example
std::base_map *foo = new std::map<std::string, std::string>;
delete foo;
foo = new stdext::hash_map<std::string, std::string>;
Obviously the above example is not possible, as far as I am aware, however this is similar for list and other std lib containers.
I appreciate that this is not C# or Java, but there is obviously no constraints in C++ to stop this design, so why was it designed like this so that there is no coupling between similar containers.
Because virtual functions add overhead.
Because the containers don't all have the same interface, there are common functions but also important differences regarding iterator invalidation and memory allocation (and so exception behaviour) which you need to understand, if you were using an abstract base you wouldn't know the specifics of how the concrete container would behave.
If you want to write code that is agnostic about the type of container it is passed then in C++ you write a template instead of relying on abstract interfaces, i.e. use static polymorphism not dynamic polymorphism. That avoids the overhead of dynamic dispatch and also allows specialization based on concrete type, because the concrete type is known at compile time.
Finally, it wouldn't have any advantage IMHO. It's better the way it is. As you say, this isn't C# or Java, thankfully.
(P.S. the stdext namespace is not part of C++, it appears to be Microsoft's namespace for non-standard types, a better example would use std::tr1::unordered_map or std::unordered_map instead of stdext::hash_map)

Easy way to implement small buffer optimization for arbitrary type erasure (like in std::function.)

I tend to use type erasure technique quite a bit.
It typically looks like this:
class YetAnotherTypeErasure
{
public:
// interface redirected to pImpl
private:
// Adapting function
template ...
friend YetAnotherTypeErasure make_YetAnotherTypeErasure (...);
class Interface {...};
template <typename Adaptee>
class Concrete final : public Interface {
// redirecting Interface to Adaptee
};
std::unique_ptr<Interface> pImpl_; // always on the heap
};
std::function does something similar, but it has a small buffer optimization, so if Concrete<Adaptee> is smaller than smth and has nothrow move operations, it will be stored in it. Is there some generic library solution to do it fairly easy? For enforcing small buffer only storing at compile time? Maybe something has been proposed for standardisation?
I know nothing about the small buffer optimization required by the standard or any proposal, though it is often allowed or encouraged.
Note that some (conditionally) non-throwing requirements on such types effectively require the optimization in practice because alternatives (like non-throwing allocation from emergency buffers) seem insane here.
On the other hand, you can just make your own solution from scratch, based on the standard library (e.g. std::aligned_storage). This may still verbose from the view of users, but not too hard.
Actually I implemented (not proposed then) any with such optimization and some related utilities several years ago. Lately, libstdc++'s implementation of std::experimental::any used the technique almost exactly as this (however, __ prefixed internal names are certainly not good for ordinary library users).
My implementation now uses some common helpers to deal with the storage. These helpers do ease to implement the type erasure storage strategy (at least fit for something similar to any enough). But I am still interested in more general high-level solution to simplify the interface redirecting.
There is also a function implementation based directly on the any implementation above. They support move-only types and sane allocator interface, while std ones not. The function implementation has better performance than std::function in libstdc++ in some cases, thanks to the (partially no-op) default initialization of the underlying any object.
I found a reasonably nice solution for everyday code - use std::function
With tiny library support to help with const correctness,
the code get's down to 20 lines:
https://gcc.godbolt.org/z/GtewFI
I think C++20 polymorphic_value comes closest to what we can do in modern c++: wg21.link/p0201
Basically it's like std::any but all of your types have to inherit the same interface.
It is Semiregular, they decided to drop equality.
This has some overhead: one vptr in the class itself and a separate dispatch mechanism in the polymorphic value. It also has a pointer like interface instead of a value like.
However, considering how easy it is to use it comparing to writing your own type_erased adapter, I'd say for most use-cases would be more than good enough.

Why is it bad to impose type constraints on templates in C++?

In this question the OP asked about limiting what classes a template will accept. A summary of the sentiment that followed is that the equivalent facility in Java is bad; and don't do this.
I don't understand why this is bad. Duck typing is certainly a powerful tool; but in my mind it lends itself confusing runtime issues when a class looks close (same function names) but has slightly different behavior. And you can't necessarily rely on compile time checking because of examples like this:
struct One { int a; int b };
struct Two { int a; };
template <class T>
class Worker{
T data;
void print() { cout << data.a << endl; }
template <class X>
void usually_important () { int a = data.a; int b = data.b; }
}
int main() {
Worker<Two> w;
w.print();
}
Type Two will allow Worker to compile only if usually_important is not called. This could lead to some instantiations of Worker compiling and others not even in the same program.
In a case like this, though. The responsibility is put on to the designer of ENGINE to ensure that it is a valid type (after which they should inherit ENGINE_BASE). If they don't, there will be a compiler error. To me this seems much safer while not imposing any restrictions or adding much additional work.
class ENGINE_BASE {}; // Empty class, all engines should extend this
template <class ENGINE>
class NeedsAnEngine {
BOOST_STATIC_ASSERT((is_base_of<ENGINE_BASE, ENGINE>));
// Do stuff with ENGINE...
};
This is too long, but it might be informative.
Generics in Java are a type erasure mechanism, and automatic code generation of type casts and type checks.
templates in C++ are code generation and pattern matching mechanisms.
You can use C++ templates to do what Java generics do with a bit of effort. std::function< A(B) > behaves in a covariant/contravariant fashion with regards to A and B types and conversion to other std::function< X(Y) >.
But the primary design of the two is not the same.
A Java List<X> will be a List<Object> with some thin wrapping on it so users don't have to do type casts on extraction. If you pass it as a List<? extends Bar>, it again is getting a List<Object> in essence, it just has some extra type information that changes how the casts work and which methods can be invoked. This means you can extract elements from the List into a Bar and know it works (and check it). Only one method is generated for all List<? extends Bar>.
A C++ std::vector<X> is not in essence a std::vector<Object> or std::vector<void*> or anything else. Each instance of a C++ template is an unrelated type (except template pattern matching). In fact, std::vector<bool> uses a completely different implementation than any other std::vector (this is now considered a mistake because the implementation differences "leak" in annoying ways in this case). Each method and function is generated independently for the particular type you pass it.
In Java, it is assumed that all objects will fit into some hierarchy. In C++, that is sometimes useful, but it has been discovered it is often ill fitting to a problem.
A C++ container need not inherit from a common interface. A std::list<int> and std::vector<int> are unrelated types, but you can act on them uniformly -- they both are sequential containers.
The question "is the argument a sequential container" is a good question. This allows anyone to implement a sequential container, and such sequential containers can as high performance as hand-crafted C code with utterly different implementations.
If you created a common root std::container<T> which all containers inherited from, it would either be full of virtual table cruft or it would be useless other than as a tag type. As a tag type, it would intrusively inject itself into all non-std containers, requiring that they inherit from std::container<T> to be a real container.
The traits approach instead means that there are specifications as to what a container (sequential, associative, etc) is. You can test these specifications at compile time, and/or allow types to note that they qualify for certain axioms via traits of some kind.
The C++03/11 standard library does this with iterators. std::iterator_traits<T> is a traits class that exposes iterator information about an arbitrary type T. Someone completely unconnected to the standard library can write their own iterator, and use std::iterator<...> to auto-work with std::iterator_traits, add their own type aliases manually, or specialize std::iterator_traits to pass on the information required.
C++11 goes a step further. for( auto&& x : y ) can work with things that where written long before the range-based iteration was designed, without touching the class itself. You simply write a free begin and end function in the namespace that the class belongs to that returns a valid forward iterator (note: even invalid forward iterators that are close enough work), and suddenly for ( auto&& x : y ) starts working.
std::function< A(B) > is an example of using these techniques together with type erasure. It has a constructor that accepts anything that can be copied, destroyed, invoked with (B) and whose return type can be converted to A. The types it can take can be completely unrelated -- only that which is required is tested for.
Because of std::functions design, we can have lambda invokables that are unrelated types that can be type-erased into a common std::function if needed, but when not type erased their invokation action is known from there type. So a template function that takes a lambda knows at the point of invokation what will happen, which makes inlining an easy local operation.
This technique is not new -- it was in C++ since std::sort, a high level algorithm that is faster than C's qsort due to the ease of inlining invokable objects passed as comparators.
In short, if you need a common runtime type, type erase. If you need certain properties, test for those properties, don't force a common base. If you need certain axioms to hold (untestable properties), either document or require callers to claim those properties via tags or traits classes (see how the standard library handles iterator categories -- again, not inheritance). When in doubt, use free functions with ADL enabled to access properties of your arguments, and have your default free functions use SFINAE to look for a method and invoke if it exists, and fail otherwise.
Such a mechanism removes the central responsibility of a common base class, allows existing classes to be adapted without modification to pass your requirements (if reasonable), places type erasure only where it is needed, avoids virtual overhead, and ideally generates clear errors when properties are found to not hold.
If your ENGINE has certain properites it needs to pass, write a traits class that tests for those.
If there are properties that cannot be tested for, create tags that describe such properties. Use specialization of a traits class, or canonical typedefs, to let the class describe which axioms hold for the type. (See iterator tags).
If you have a type like ENGINE_BASE, don't demand it, but instead use it as a helper for said tags and traits and axiom typedefs, like std::iterator<...> (you never have to inherit from it, it simply acts as a helper).
Avoid over specifying requirements. If usually_important is never invoked on your Worker<X>, probably your X doesn't need a b in that context. But do test for properties in a way clearer than "method does not compile".
And sometimes, just punt. Following such practices might make things harder for you -- so do an easier way. Most code is written and discarded. Know when your code will persist, and write it better and more extendably and more maintainably. Know that you need to practice those techniques on disposable code so you can write it correctly when you have to.
Let me turn the question around on you: Why is it bad that the code compiles for Two if usually_important isn't called? The type you gave it meets all the needs for that particular instantiation and the compiler will immediately tell you if a particular instantiation no longer meets the interface needed for the needed functionality in the template.
That said if you insist that you need an Engine object, don't do it with templates at all, instead treat it as a sort of strategy pattern with a non-template (using this approach enforces at compile time that the user-defined type adheres to a specific interface, not just that it looks like a duck):
class Worker
{
public:
explicit Worker(EngineBase* data) : data_(data) {}
void print() { cout << data_->a() << endl; }
template <class X>
void usually_important () { int a = data_->a(); int b = data_->b(); }
private:
EngineBase* data_;
}
int main()
{
Worker w(new ConcreteEngine);
w.print();
}
I don't understand why this is bad. Duck typing is certainly a
powerful tool; but in my mind it lends itself confusing runtime issues
when a class looks close (same function names) but has slightly
different behavior.
The probability that you can define a non-trivial interface and then by accident have another interface that has different semantics but can be substituted is minimal. This never, ever happens.
Type Two will allow Worker to compile only if usually_important is not
called.
That is a good thing. We depend on it all the time. It makes class templates more flexible.
Matching a compile-time interface is strictly superior to a run-time one. This is because run-time interfaces can't differ in key ways that compile-time ones can (e.g. different types in the interface), and require a bunch of run-time abstraction like dynamic allocation that may be unnecessary.
In a case like this, though. The responsibility is put on to the
designer of ENGINE to ensure that it is a valid type (after which they
should inherit ENGINE_BASE). If they don't, there will be a compiler
error. To me this seems much safer while not imposing any restrictions
or adding much additional work.
It is not safer. It is utterly pointless. It is stupendously unlikely that the user will accidentally instantiate the class with the wrong type but it will compile successfully due to circumstantial interface match.
What it really boils down to is this: you should only require what you really need. Absolutely definitely must have in order to function. Everything else, don't require it. This is a core tenet of making software maintainable. You cannot possibly imagine what shenanigans I might conceive of long after you have written this class to use it in ways that you never thought it could be used for.

practice and discovery of Boost Type Erasure

I am reading about boost type erasure and I am trying to figure out the potential usage. I would like to practice it a bit while I am reading tons of documentations about the topic (it looks a big one). The most quoted area of application that is networking / exchanging data between client and server.
Can you suggest some other example or exercise where I can play I bit with this library?
Type Erasure is useful in an extraordinary amount of situations, to the point where it may actually be thought of as a fundamentally missing language feature that bridges generic and object oriented programming styles.
When we define a class in C++, what we are really defining is both a very specific type and a very specific interface, and that these two things do not necessarily need to be related. A type deals with the data, where as the interface deals with transformations on that data. Generic code, such as in the STL, doesn't care about type, it cares about interface: you can sort anything container or container-like sequence using std::sort, as long as it provides comparison and iterator interface.
Unfortunately, generic code in C++ requires compile time polymorphism: templates. This doesn't help with things which cannot be known until runtime, or things which require a uniform interface.
A simple example is this: how do you store a number of different types in a single container? The simplest mechanism would be to store all of the types in a void*, perhaps with some type information to distinguish them. Another way is to recognize all of these types have the same interface: retrieval. If we could make a single interface for retrieval, then specialize it for each type, then it would be as if part of the type had been erased.
any_iterator is another very useful reason to do this: if you need to iterate over a number of different containers with the same interface, you will need to erase the type of the container out of the type of the iterator. boost::any_range is a subtle enhancement of this, extending it from iterators to ranges, but the basic idea is the same.
In short, any time you need to go from multiple types with a similar interface to a single type with a single interface, you will need some form of type erasure. It is the runtime technique that equates compile time templates.

Tagged container - is mimicking the container's interface a good practice?

Assume I have a container type to which I would like to attach additional information. My approach would be to define a class holding the container and the info. Is it good practice to define methods for the new class which mimic the container's methods? E.g., instead of writing myContainerObject.internalVector[i] I would like to write myContainerObject[i]. One would have to redefine every method one wishes to use (size(), push_back() etc.). What are the drawbacks of such an approach? What alternatives exist (e.g., is inheriting from a container the better solution?).
You are using composition with forwarding functions, and it's the right thing to do with concrete classes like STL containers. One drawback is that you have to redefine every overload of every function you want to allow, only to forward arguments, as well as parrot-typedef nested types (like iterator).
An alternative is to inherit but never publicly, only with private inheritance (because STL containers' destructor is public and non-virtual), then use using-declarations inside the custom class to bring names of base-class functions and types into scope (needing only one using Base::name; for all overloads of a function).
Possible ways of grasping a type which is a container, that I can think of:
Provide the a decorator (this is what you are asking about) whose component will be the internal container. This is can work when you have strict interfaces which you must comply with. It's neither good or bad practice. Read about decorator design pattern.
Use an iterator concept in your algorithms instead of a container. This a generic approach, and how stl algorithms are implemented
Use a container concept - similar to (2). Detect if the type is a container (SFINAE trick) and manipulate it.
Reimplement the container interface. You need a really strong reason to do it since it requires a significant amount of work/know-how. Some info: http://stdcxx.apache.org/doc/stdlibug/16-3.html
Generally , unless your uses case is very, very trivial or you have some specific requirements you should not expose class internal state (myContainerObject.internalVector[i]) to out side world.
Best practice is to keep the internals private and implement the [] operator and other functions (size() and such).