How standards-compliant is my polymorph? - c++

Often in code I write there are types that are layout-compatible but are distinct types, but still I'd like to pass them around as if they were the same type. This comes with quite some syntactical overhead, including the necessary casts etc.
I (very) recently thought up a small helper mix-in class, which I dubbed polymorph:
struct polymorph
{
template<typename T>
const T& as() const { return reinterpret_cast<const T&>(*this); }
template<typename T>
T& as() { return reinterpret_cast<T&>(*this); }
}
A small example "demonstrating" it for the classical "my complex number type is better than yours" situation here*.
My question is: how robust is this class, and how could I make this more resilient against misuse and/or undefined behaviour. I haven't used it a lot and am kind of hesitant because there are a lot of things that could go terribly wrong.
This class is primarily intended for correspondences such as the one between _Complex/std::complex<double>/double[2]. I'm still thinking of a nice way to extend it to perform useful conversions, and exactly how useful that could be.
*Note I'm not saying this is totally undefined-behaviour free. Hence this question.

You are in violation of [basic.lval]/8, commonly known as the "strict aliasing rule". It says that you may not access an object through a pointer of a type different from the object being pointed to, with a number of exceptions (pointers to base classes, unsigned char*, const/volatile differences, etc). And layout compatibility is not part of that consideration.
So yes, this is UB.

Related

Is it safe to reinterpret_cast from std::function<void()> * to std::function<std::monostate()> *?

Example:
std::function<std::monostate()> convert(std::function<void()> func){
return *reinterpret_cast<std::function<std::monostate()> * >(&func);
}
Are std::function<void()> and std::function<std::monostate()> considered "similar" enough for reinterpret_cast to be safe?
Edit: someone asked me to clarify what I am asking. I am not asking if the general case of foo<X> and foo<Y> are similar but whether foo<void> and foo<std::monostate> are.
No this is unsafe and leads to undefined behavior. In particular, there's no guarantee that the two layouts will be compatible. Of course, you might get away with it with some compiler and runtime combinations, but then it might break if some future release of your compiler decides to implement certain forms of control flow integrity.
The safe way to do what you want, albeit at a small cost in performance, is just to return a new lambda, as in:
std::function<std::monostate()> convert(std::function<void()> func){
return [func=std::move(func)]() -> std::monostate { func(); return {}; };
}
Are std::function<void()> and std::function<std::monostate()> considered "similar" enough for reinterpret_cast to be safe?
No. Given a template foo and distinct types X and Y, the instantiations foo<X> and foo<Y> are not similar, regardless of any perceived relationship between X and Y (as long as they are not the same type, which is why they were qualified as "distinct"). Different template instantiations are unrelated unless documented otherwise. There is no such documentation for std::function.
The rules for "similar" make allowances for digging into pointer types, but there is nothing special for templates. (Nor could there be, since a template specialization could look radically different than its base template.) Different types as template arguments yield dissimilar templated classes. No need to dig deeper into those arguments.
I am not asking if the general case of foo<X> and foo<Y> are similar but whether foo<void> and foo<std::monostate> are.
There is nothing special about void and std::monostate that would make them two names for the same type. (In fact, they cannot be the same type, as the former has zero values, while the latter has exactly one value.) So, asking about foo<void> and foo<std::monostate> is the same as asking about the general case, just with a greater possibility of seeing connections that do not exist.
Also, the question is not about foo<void> and foo<std::monostate> but about foo<void()> and foo<std::monostate()>. The types used as template arguments are function types, not object types. Function types are very particular in that two function types are the same only when all of their parameter and return types are exact matches; none of the conversions allowed when invoking a function are considered. (Not that there is a conversion from void to std::monostate.) The function types are different, so again the templates instantiated from those types are not similar.
Perhaps a more focused version of this question would have asked about function pointers instead of std::function objects.
(from a comment:) I was looking at the assembly code of std::monostate() functions and void() functions and they generate the same assembly verbatim.
Generated assembly means nothing as far as the language is concerned. At best, you have evidence that with your compiler, it seems likely that you could get away with invoking a function pointer after casting it from void (*)() to std::monostate (*)(). Not "safe" so much as "works for now". And that assumes that you use the function pointer directly instead of burying it inside a std::function (a complex adapter of types).
C++ is a strongly typed language. Different types are different even if they are treated the same at the level of assembly code. This might be more readily apparent if we switch to more familiar types. On many common systems, char is signed, making it equivalent to signed char at the assembly code level. However, this does not affect the similarity of functions. The following code is illegal, even if changing char to signed char has no effect on the assembly code generated for foo().
char foo() { return 'c'; }
int main()
{
signed char (*fun)() = foo; // <-- Error: invalid conversion
// ^^^^^^ -- because the return type is signed char, not char
}
One can downgrade this error to a warning with a reinterpret_cast. After all, it is legal to cast a function pointer to any function pointer type. However, it is not safe to invoke the function through the cast pointer (unless cast back to the original type), hence the warning. Invoking it might work very reliably on your system, but that is due to your system, not the language. When you ask about "safe", you are asking for guidance from the language specs, not merely what will probably work on your system.

Modifying scoped enum by reference

I am increasingly finding scoped enums unwieldy to use. I am trying to write a set of function overloads including a template for scoped enums that sets/initializes a value by reference--something like this:
void set_value(int& val);
void set_value(double& val);
template <typename ENUM> set_value(ENUM& val);
However, I don't quite see how to write the templated version of set_value without introducing multiple temporary values:
template <typename ENUM>
set_value(ENUM& val)
{
std::underlying_type_t<ENUM> raw_val;
set_value(raw_val); // Calls the appropriate "primitive" overload
val = static_cast<ENUM>(raw_val);
}
I believe the static_cast introduces a second temporary value in addition to raw_val. I suppose it's possible that one or both of these could be optimized away by the compiler, and in any case it shouldn't really make much difference in terms of performance since the set_value call will also generate temporary values (assuming it's not inlined), but this still seems inelegant. What I would like to do would be something like this:
template <typename ENUM>
set_value(ENUM& val)
{
set_value(static_cast<std::underlying_type_t<ENUM>&>(val));
}
... but this isn't valid (nor is the corresponding code using pointers directly instead of references) because scoped enums aren't related to their underlying primitives via inheritance.
I could use reinterpret_cast, which, from some preliminary testing, appears to work (and I can't think of any reason why it wouldn't work), but that seems to be frowned upon in C++.
Is there a "standard" way to do this?
I could use reinterpret_cast, which, from some preliminary testing, appears to work (and I can't think of any reason why it wouldn't work), but that seems to be frowned upon in C++.
Indeed, that reinterpret_cast is undefined behavior by violation of the strict aliasing rule.
Eliminating a single mov instruction (or otherwise, more or less, copying a register's worth of data) is premature micro-optimization. The compiler is likely to be able to take care of it.
If performance is really important, then follow the optimization process: profile, disassemble, understand the compiler's interpretation, and work together with it within the defined rules.
At a glance, you (and the compiler) might have an easier time with functions like T get_value() instead of void set_value(T). The flow of data and initialization make more sense, although type deduction is lost. You can regain the deduction through tag types, if that's really important.

Why is RTTI necessary?

Why is RTTI (Runtime Type Information) necessary?
RTTI, Run-Time Type Information, introduces a [mild] form of reflection for C++.
It allows to know for example the type of a super class, hence allowing to handle an heterogeneous collection of objects which are all derived from the same base type. in ways that are specific to the individual super-classes. (Say you have an array of "Vehicle" objects and need to deal differently with the "Truck" objects found amid the array).
The question whether RTTI is necessary is however an open one. Story has it that Bjarne Stroustrup purposefully excluded this feature from the original C++ specification, by fear that it would be misused.
There are indeed opportunities to overuse/misuse reflection features, and this may have been even more of a factor when C++ was initially introduced because there wasn't such a OOP culture in the mainstream programmer community.
This said, with a more OOP savvy community, with the effective demonstration of all the good things reflection can do (eg. with languages such as Java or C#) and with the fancy design patterns in use nowadays, I strongly believe that RTTI and reflection features at large are very important even if sometimes misused.
I can think of exactly one case when it would be appropriate to use RTTI, and it doesn't even work.
It is fairly common for C-compatible APIs which perform callbacks to provide a user-defined void* to communicate a state structure back to the caller. When calling such an API from C++, it is quite common to pass the this pointer through said void* argument. From the callback, one might want to invoke virtual functions on the passed pointer.
In some cases when the callback parameters are insecure (such as LPARAM of a Windows message), it is obviously desirable to validate the pointer before using it for a virtual call, by checking the hidden vfptr. dynamic_cast is the natural way to do this, but results in undefined behavior exactly when the object is invalid (IIRC, it is undefined behavior if the pointer is to anything except an object with a virtual table). So RTTI is utterly useless for preventing a shatter attack in this way.
Feel free to present any other valid use cases for RTTI, cause I'm totally unconvinced.
EDIT: boost::any got mentioned. As far as boost::any is concerned, you can disable RTTI and use the following typeid implementation:
typedef const void* typeinfo_nonrtti;
template <typename T> typeinfo_nonrtti typeid_nonrtti();
template <typename T> class typeinfo_nonrtti_helper
{
friend typeinfo_nonrtti typeid_nonrtti<T>();
static char unique;
};
template <typename T> char typeinfo_nonrtti_helper<T>::unique;
template <typename T>
typeinfo_nonrtti typeid_nonrtti() { return &typeinfo_nonrtti_helper<T>::unique; }

Hypothetical, formerly-C++0x concepts questions

(Preamble: I am a late follower to the C++0x game and the recent controversy regarding the removal of concepts from the C++0x standard has motivated me to learn more about them. While I understand that all of my questions are completely hypothetical -- insofar as concepts won't be valid C++ code for some time to come, if at all -- I am still interested in learning more about concepts, especially given how it would help me understand more fully the merits behind the recent decision and the controversy that has followed)
After having read some introductory material on concepts as C++0x (until recently) proposed them, I am having trouble wrapping my mind around some syntactical issues. Without further ado, here are my questions:
1) Would a type that supports a particular derived concept (either implicitly, via the auto keyword, or explicitly via concept_maps) also need to support the base concept indepdendently? In other words, does the act of deriving a concept from another (e.g. concept B<typename T> : A<T>) implicitly include an 'invisible' requires statement (within B, requires A<T>;)? The confusion arises from the Wikipedia page on concepts which states:
Like in class inheritance, types that
meet the requirements of the derived
concept also meet the requirements of
the base concept.
That seems to say that a type only needs to satisfy the derived concept's requirements and not necessarily the base concept's requirements, which makes no sense to me. I understand that Wikipedia is far from a definitive source; is the above description just a poor choice of words?
2) Can a concept which lists typenames be 'auto'? If so, how would the compiler map these typenames automatically? If not, are there any other occasions where it would be invalid to use 'auto' on a concept?
To clarify, consider the following hypothetical code:
template<typename Type>
class Dummy {};
class Dummy2 { public: typedef int Type; };
auto concept SomeType<typename T>
{
typename Type;
}
template<typename T> requires SomeType<T>
void function(T t)
{}
int main()
{
function(Dummy<int>()); //would this match SomeType?
function(Dummy2()); //how about this?
return 0;
}
Would either of those classes match SomeType? Or is a concept_map necessary for concepts involving typenames?
3) Finally, I'm having a hard time understanding what axioms would be allowed to define. For example, could I have a concept define an axiom which is logically inconsistent, such as
concept SomeConcept<typename T>
{
T operator*(T&, int);
axiom Inconsistency(T a)
{
a * 1 == a * 2;
}
}
What would that do? Is that even valid?
I appreciate that this is a very long set of questions and so I thank you in advance.
I've used the most recent C++0x draft, N2914 (which still has concepts wording in it) as a reference for the following answer.
1) Concepts are like interfaces in that. If your type supports a concept, it should also support all "base" concepts. Wikipedia statement you quote makes sense from the point of view of a type's client - if he knows that T satisfies concept Derived<T>, then he also knows that it satisfies concept Base<T>. From type author perspective, this naturally means that both have to be implemented. See 14.10.3/2.
2) Yes, a concept with typename members can be auto. Such members can be automatically deduced if they are used in definitions of function members in the same concept. For example, value_type for iterator can be deduced as a return type of its operator*. However, if a type member is not used anywhere, it will not be deduced, and thus will not be implicitly defined. In your example, there's no way to deduce SomeType<T>::Type for either Dummy or Dummy1, as Type isn't used by other members of the concept, so neither class will map to the concept (and, in fact, no class could possibly auto-map to it). See 14.10.1.2/11 and 14.10.2.2/4.
3) Axioms were a weak point of the spec, and they were being constantly updated to make some (more) sense. Just before concepts were pulled from the draft, there was a paper that changed quite a bit - read it and see if it makes more sense to you, or you still have questions regarding it.
For your specific example (accounting for syntactic difference), it would mean that compiler would be permitted to consider expression (a*1) to be the same as (a*2), for the purpose of the "as-if" rule of the language (i.e. the compiler permitted to do any optimizations it wants, so long as the result behaves as if there were none). However, the compiler is not in any way required to validate the correctness of axioms (hence why they're called axioms!) - it just takes them for what they are.

Why is is it not possible to pass a const set<Derived*> as const set<Base*> to a function?

Before this is marked as duplicate, I'm aware of this question, but in my case we are talking about const containers.
I have 2 classes:
class Base { };
class Derived : public Base { };
And a function:
void register_objects(const std::set<Base*> &objects) {}
I would like to invoke this function as:
std::set<Derived*> objs;
register_objects(objs);
The compiler does not accept this. Why not? The set is not modifiable so there is no risk of non-Derived objects being inserted into it. How can I do this in the best way?
Edit:
I understand that now the compiler works in a way that set<Base*> and set<Derived*> are totally unrelated and therefor the function signature is not found. My question now however is: why does the compiler work like this? Would there be any objections to not see const set<Derived*> as derivative of const set<Base*>
The reason the compiler doesn't accept this is that the standard tells it not to.
The reason the standard tells it not to, is that the committee did not what to introduce a rule that const MyTemplate<Derived*> is a related type to const MyTemplate<Base*> even though the non-const types are not related. And they certainly didn't want a special rule for std::set, since in general the language does not make special cases for library classes.
The reason the standards committee didn't want to make those types related, is that MyTemplate might not have the semantics of a container. Consider:
template <typename T>
struct MyTemplate {
T *ptr;
};
template<>
struct MyTemplate<Derived*> {
int a;
void foo();
};
template<>
struct MyTemplate<Base*> {
std::set<double> b;
void bar();
};
Then what does it even mean to pass a const MyTemplate<Derived*> as a const MyTemplate<Base*>? The two classes have no member functions in common, and aren't layout-compatible. You'd need a conversion operator between the two, or the compiler would have no idea what to do whether they're const or not. But the way templates are defined in the standard, the compiler has no idea what to do even without the template specializations.
std::set itself could provide a conversion operator, but that would just have to make a copy(*), which you can do yourself easily enough. If there were such a thing as a std::immutable_set, then I think it would be possible to implement that such that a std::immutable_set<Base*> could be constructed from a std::immutable_set<Derived*> just by pointing to the same pImpl. Even so, strange things would happen if you had non-virtual operators overloaded in the derived class - the base container would call the base version, so the conversion might de-order the set if it had a non-default comparator that did anything with the objects themselves instead of their addresses. So the conversion would come with heavy caveats. But anyway, there isn't an immutable_set, and const is not the same thing as immutable.
Also, suppose that Derived is related to Base by virtual or multiple inheritance. Then you can't just reinterpret the address of a Derived as the address of a Base: in most implementations the implicit conversion changes the address. It follows that you can't just batch-convert a structure containing Derived* as a structure containing Base* without copying the structure. But the C++ standard actually allows this to happen for any non-POD class, not just with multiple inheritance. And Derived is non-POD, since it has a base class. So in order to support this change to std::set, the fundamentals of inheritance and struct layout would have to be altered. It's a basic limitation of the C++ language that standard containers cannot be re-interpreted in the way you want, and I'm not aware of any tricks that could make them so without reducing efficiency or portability or both. It's frustrating, but this stuff is difficult.
Since your code is passing a set by value anyway, you could just make that copy:
std::set<Derived*> objs;
register_objects(std::set<Base*>(objs.begin(), objs.end());
[Edit: you've changed your code sample not to pass by value. My code still works, and afaik is the best you can do other than refactoring the calling code to use a std::set<Base*> in the first place.]
Writing a wrapper for std::set<Base*> that ensures all elements are Derived*, the way Java generics work, is easier than arranging for the conversion you want to be efficient. So you could do something like:
template<typename T, typename U>
struct MySetWrapper {
// Requirement: std::less is consistent. The default probably is,
// but for all we know there are specializations which aren't.
// User beware.
std::set<T> content;
void insert(U value) { content.insert(value); }
// might need a lot more methods, and for the above to return the right
// type, depending how else objs is used.
};
MySetWrapper<Base*,Derived*> objs;
// insert lots of values
register_objects(objs.content);
(*) Actually, I guess it could copy-on-write, which in the case of a const parameter used in the typical way would mean it never needs to do the copy. But copy-on-write is a bit discredited within STL implementations, and even if it wasn't I doubt the committee would want to mandate such a heavyweight implementation detail.
If your register_objects function receives an argument, it can put/expect any Base subclass in there. That's what it's signature sais.
It's a violation of the Liskov substitution principle.
This particular problem is also referred to as Covariance. In this case, where your function argument is a constant container, it could be made to work. In case the argument container is mutable, it can't work.
Take a look here first: Is array of derived same as array of base. In your case set of derived is a totally different container from set of base and since there is no implicit conversion operator is available to convert between them , compiler is giving an error.
std::set<Base*> and std::set<Derived*> are basically two different objects. Though the Base and Derived classes are linked via inheritance, at compiler template instantiation level they are two different instantiation(of set).
Firstly, It seems a bit odd that you aren't passing by reference ...
Secondly, as mentioned in the other post, you would be better off creating the passed-in set as a std::set< Base* > and then newing a Derived class in for each set member.
Your problem surely arises from the fact that the 2 types are completely different. std::set< Derived* > is in no way inherited from std::set< Base* > as far as the compiler is concerned. They are simply 2 different types of set ...
Well, as stated in the question you mention, set<Base*> and set<Derived*> are different objects. Your register_objects() function takes a set<Base*> object. So the compiler do not know about any register_objects() that takes set<Derived*>. The constness of the parameter does not change anything. Solutions stated in the quoted question seem the best things you can do. Depends on what you need to do ...
As you are aware, the two classes are quite similar once you remove the non-const operations. However, in C++ inheritance is a property of types, whereas const is a mere qualifier on top of types. That means that you can't properly state that const X derives from const Y, even when X derives from Y.
Furthermore, if X does not inherit from Y, that applies to all cv-qualified variants of X and Y as well. This extends to std::set instantiations. Since std::set<Foo> does not inherit from std::set<bar>, std::set<Foo> const does not inherit from std::set<bar> const either.
You are quite right that this is logically allowable, but it would require further language features. They are available in C# 4.0, if you're interested in seeing another language's way of doing it. See here: http://community.bartdesmet.net/blogs/bart/archive/2009/04/13/c-4-0-feature-focus-part-4-generic-co-and-contra-variance-for-delegate-and-interface-types.aspx
Didn't see it linked yet, so here's a bullet point in the C++ FAQ Lite related to this:
http://www.parashift.com/c++-faq-lite/proper-inheritance.html#faq-21.3
I think their Bag-of-Apples != Bag-of-Fruit analogy suits the question.