Can I ensure RVO for reintrepret-cast'ed values? - c++

Suppose I've written:
Foo get_a_foo() {
return reinterpret_cast<Foo>(get_a_bar());
}
and suppose that sizeof(Foo) == sizeof(Bar).
Does return value optimization necessarily take place here, or are compilers allowed to do whatever they like when I "break the rules" by using a reinterpret_cast? If I don't get RVO, or am not guaranteed it - can I change this code to ensure that it occur?
My question is about C++11 and, separately, C++17 (since there was some change in it w.r.t. RVO, if I'm not mistaken).

Suppose I've written:
Foo get_a_foo() {
return reinterpret_cast<Foo>(get_a_bar());
}
and suppose that sizeof(Foo) == sizeof(Bar).
That reinterpret_cast is not legal for all possible Foo and Bar types. It only works for cases where:
Bar is a pointer and Foo is either a pointer or an integer/enum big enough to hold pointers.
Bar is an integer/enum big enough to hold a pointer, and Foo is a pointer.
Bar is an object type and Foo is a reference type.
There are a couple of other cases I didn't cover, but they're either irrelevant (nullptr_t casting) or fall under similar issues for #1 or #2.
See, elision doesn't actually matter when dealing in fundamental types. You can't tell the difference between eliding a copy/move of fundamental types and not eliding it. So is there a conversion there? Is the compiler just using the return value register? That's up to the compiler, via the "as if" rule.
And elision doesn't apply when returning reference types, so #3 is out.
But if Foo and Bar are user-defined object types (or object types other than pointers, integers, or member pointers), the cast is is ill-formed. reinterpret_cast is not some kind of trivial memcpy conversion function.
So let's replace this with some code that could, you know, actually work:
Foo get_a_foo()
{
return std::bit_cast<Foo>(get_a_bar());
}
Where C++20's std::bit_cast effectively converts one trivial copyable type to another trivial copyable type.
That conversion still would not be elided. Or at least, not in the way "elision" is typically used.
Because the two types are trivially copyable, and bit_cast will only call trivial constructors, the compiler could certainly erase the constructors, and even use the return value object of get_a_foo as the return value object of get_a_bar. And thus, it could be considered "elision".
But "elision" typically refers to the part of the standard that allows the implementation to disregard even non-trivial constructor/destructors. The compiler can only perform the above because all of the constructors and destructors are trivial. If they were non-trivial, they could not be disregarded (then again, if they were non-trivial, std::bit_cast wouldn't work).
My point is that the optimization of the conversion above is not due to "elision" or RVO rules; it's due entirely to the "as if" rule. Even in C++17, whether the bit_cast call is effectively made a noop is entirely up to the compiler. Yes, after having created the Foo prvalue, the "elision" of it's copy into the function's return value object is required by C++17.
But the conversion itself is not a matter of elision.

Related

Does static_cast<T>(funcReturningT()) inhibit RVO?

C++17 guarantees copy elision for:
T funcReturningT() {
return T(...);
}
T t=funcReturningT();
Now if I wrap the return into a static_cast to the same type, like so:
T t=static_cast<T>(funcReturningT());
does the standard still guarantee copy elision or not?
RVO is dead. Long live RVO.
In modern C++, there are two cases corresponding to what used to be called RVO:
There's NRVO, in which the compiler is allowed to elide a local variable if it can see that that variable will just be returned eventually. If the compiler declines to elide the local variable, then it is obligated to treat it as an rvalue when it is returned, assuming certain conditions are met (thus converting a copy into a move). This is not what your question is asking about.
There's also guaranteed copy elision --- but to even use that name for it is to think about C++17 using a pre-C++17 mindset. What really changed in C++17 is that prvalues are no longer "objects without identity" but rather "recipes for creating objects". This is the reason why copy-elision-like behaviour is guaranteed.
So let's look at your statement:
T t=static_cast<T>(funcReturningT());
The expression funcReturningT() is a prvalue. In C++14, this would mean that immediately upon evaluating it, the implementation would have to instantiate a temporary T object (lacking identity), but non-guaranteed copy elision would allow the compiler (at its discretion) to elide such object. In C++17, it is ready to create a T object but doesn't do so immediately.
Then the static_cast is evaluated, and the result of it is also a prvalue of the same type. Because of this, no move constructor is required and no temporary object needs to be created. The result of the cast is just the original "recipe".
And finally, when t is initialized from the result of the static_cast, the move constructor is once again not required. The "recipe" that static_cast used is simply executed with t as the object that it creates.
The beauty of it is that there is nothing to elide, which is why "guaranteed copy elision" is a misnomer.
(Sometimes, temporary objects do need to be created: in particular, if any constructor or conversion function needs to be called at any point, then the prvalue has to be "materialized" in order for that call to actually have an object to work with. In your example, this is not necessary.)

Can std::move() or its explicit equivalent on a local variable allow elision?

For example:
Big create()
{
Big x;
return std::move(x);
// return static_cast<typename std::remove_reference<T>::type&&>(t) // why not elide here?
}
Assuming that applying std::move() to return a local variable inhibits move-semantics because compilers can't make any assumptions about the inner-workings of functions in general, what about cases when those assumptions are not necessary, for example when:
std::move(x) is inlined (probably always)
std::move(x) is written as: static_cast<typename std::remove_reference<T>::type&&>(t)
According to the current Standard, an implementation is allowed to apply NRVO...
— in a return statement in a function with a class return type, when the
expression is the name of a non-volatile automatic object (other than
a function parameter or a variable introduced by the
exception-declaration of a handler (18.3)) with the same type
(ignoring cv-qualification) as the function return type, the copy/move
operation can be omitted by constructing the automatic object directly
into the function call’s return object
Obviously, neither 1) nor 2) qualify. Apart from the fact that using std::move() to return a local variable is redundant, why is this restriction necessary?
You should be clear on exactly what "allow elision" means. First of all, the compiler can do anything it wants, under the "as-if" rule. That is, the compiler can spit out any assembly it wants, as long as that assembly behaves correctly. That means that the compiler can elide any constructor it wants, but it does have to prove that the program will behave the same whether or not the constructor is called.
So why the special rules for elision? Well, these are cases where the compiler can elide constructor calls (and therefore, destructor calls too) without proving that the behavior is the same. This is very useful, because there are lots of types where the constructor is very non-trivial (like say, string), and the compilers in practice are generally not capable of proving that they are safe to elide (in a reasonable time frame) (in the past, there was even lack of clarity on whether optimizing out a heap allocation was legal to begin with, since it is basically mutation of a global variable).
So, we want to have elision for performance reasons. However, it is basically designating a special case in the standard, in terms of behavior. The bigger the special case, the more complexity we are introducing to the standard. So the goal should be to make the permitted situation for elision to be broad enough to cover the useful cases we care about, but no broader.
You are approaching this as: why not make the special case as big as practical? In reality, it is the opposite. To extend the allowable situations for elision, it needs to be shown to be very worthwhile.
After re-reading the question, I understand it differently. I read the question as 'Why std::move() inhibits (N)RVO'
Quote from standard provided in the question has wrong highlight. It should be
in a return statement in a function with a class return type, when the
expression is the name of a non-volatile automatic object (other than
a function parameter or a variable introduced by the
exception-declaration of a handler (18.3)) with the same type
(ignoring cv-qualification) as the function return type
What inhibits NRVO here is not that std::move() is called, but the fact that return value of std::move is not X, but X&&. It doesn't match the function signature!

Managing trivial types

I have found the intricacies of trivial types in C++ non-trivial to understand and hope someone can enlighten me on the following.
Given type T, storage for T allocated using ::operator new(std::size_t) or ::operator new[](std::size_t) or std::aligned_storage, and a void * p pointing to a location in that storage suitably aligned for T so that it may be constructed at p:
If std::is_trivially_default_constructible<T>::value holds, is the code invoking undefined behavior when code skips initialization of T at p (i.e. by using T * tPtr = new (p) T();) before otherwise accessing *p as T? Can one just use T * tPtr = static_cast<T *>(p); instead without fear of undefined behavior in this case?
If std::is_trivially_destructible<T>::value holds, does skipping destruction of T at *p (i.e by calling tPtr->~T();) cause undefined behavior?
For any type U for which std::is_trivially_assignable<T, U>::value holds, is std::memcpy(&t, &u, sizeof(U)); equivalent to t = std::forward<U>(u); (for any t of type T and u of type U) or will it cause undefined behavior?
No, you can't. There is no object of type T in that storage, and accessing the storage as if there was is undefined. See also T.C.'s answer here.
Just to clarify on the wording in [basic.life]/1, which says that objects with vacuous initialization are alive from the storage allocation onward: that wording obviously refers to an object's initialization. There is no object whose initialization is vacuous when allocating raw storage with operator new or malloc, hence we cannot consider "it" alive, because "it" does not exist. In fact, only objects created by a definition with vacuous initialization can be accessed after storage has been allocated but before the vacuous initialization occurs (i.e. their definition is encountered).
Omitting destructor calls never per se leads to undefined behavior. However, it's pointless to attempt any optimizations in this area in e.g. templates, since a trivial destructor is just optimized away.
Right now, the requirement is being trivially copyable, and the types have to match. However, this may be too strict. Dos Reis's N3751 at least proposes distinct types to work as well, and I could imagine this rule being extended to trivial copy assignment across one type in the future.
However, what you've specifically shown does not make a lot of sense (not least because you're asking for assignment to a scalar xvalue, which is ill-formed), since trivial assignment can hold between types whose assignment is not actually "trivial", that is, has the same semantics as memcpy. E.g. is_trivially_assignable<int&, double> does not imply that one can be "assigned" to the other by copying the object representation.
Technically reinterpreting storage is not enough to introduce a new object as. Look at the note for Trivial default constructor states:
A trivial default constructor is a constructor that performs no action. All data types compatible with the C language (POD types) are trivially default-constructible. Unlike in C, however, objects with trivial default constructors cannot be created by simply reinterpreting suitably aligned storage, such as memory allocated with std::malloc: placement-new is required to formally introduce a new object and avoid potential undefined behavior.
But the note says it's a formal limitation, so probably it is safe in many cases. Not guaranteed though.
No. is_assignable does not even guarantee the assignment will be legal under certain conditions:
This trait does not check anything outside the immediate context of the assignment expression: if the use of T or U would trigger template specializations, generation of implicitly-defined special member functions etc, and those have errors, the actual assignment may not compile even if std::is_assignable::value compiles and evaluates to true.
What you describe looks more like is_trivially_copyable, which says:
Objects of trivially-copyable types are the only C++ objects that may be safely copied with std::memcpy or serialized to/from binary files with std::ofstream::write()/std::ifstream::read().
I don't really know. I would trust KerrekSB's comments.

Are the following inlined functions guaranteed to have the same implementation?

Are the following functions guaranteed to have the same implementation (i.e. object code)?
Does this change if Foo below is a primitive type instead (e.g. int)?
Does this change with the size of Foo?
Returning by value:
inline Foo getMyFooValue() { return myFoo; }
Foo foo = getMyFooValue();
Returning by reference:
inline const Foo &getMyFooReference() { return myFoo; }
Foo foo = getMyFooReference();
Modifying in place:
inline void getMyFooInPlace(Foo &theirFoo) { theirFoo = myFoo; }
Foo foo;
getMyFooInPlace(foo);
Are the following functions guaranteed to have the same implementation (i.e. object code)?
No, the language only specifies behaviour, not code generation, so it's up to the compiler whether two pieces of code with equivalent behaviour produce the same object code.
Does this change if Foo below is a primitive type instead (e.g. int)?
If it is (or, more generally, if it's trivially copyable), then all three have the same behaviour, so can be expected to produce similar code.
If it's a non-trivial class type, then it depends on what the class's special functions do. Each calls these functions in slightly different ways:
The first might copy-initialise a temporary object (calling the copy constructor), copy-initialise foo with that, then destroy the temporary (calling the destructor); but more likely it will elide the temporary, becoming equivalent to the second.
The second will copy-initialise foo (calling the copy constructor)
The third will default initialise foo (calling the default constructor), then copy-assign to it (calling the assignment operator).
So whether or not they are equivalent depends on whether default-initialisation and copy-assignment has equivalent behaviour to copy-initialisation, and (perhaps) whether creating and destroying a temporary has side effects. If they are equivalent, then you'll probably get similar code.
Does this change with the size of Foo?
No the size is irrelevant. What matters is whether it's trivial (so that both copy initialisation and copy assignment simply copy bytes) or non-trivial (so that they call user-defined functions, which might or might not be equivalent to each other).
The standard draft N3337 contains the following rules in 1.9.5: "A conforming [C++] implementation [...] shall produce the same observable behaviour as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input." And in 1.9.9 it defines the observable behaviour basically as I/O and volatile's values. Which means that as long as the I/O and volatiles of your program stay the same the implementation can do what it wants. If you have no I/O or volatiles the program doesn't need to do anything (which makes benchmarks hard to get right with high optimizations).
Note that the standard specifically is totally silent about what code a compiler should emit. Hell, it could probably interpret the sources.
This answers your question: No.

What's the motivation behind having copy and direct initialization behave differently?

Somewhat related to Why is copy constructor called instead of conversion constructor?
There are two syntaxes for initialization, direct- and copy-initialization:
A a(b);
A a = b;
I want to know the motivation for them having different defined behavior. For copy initialization, an extra copy is involved, and I can't think of any purpose for that copy. Since it's a copy from a temp, it can and probably will be optimized out, so the user can't rely on it happening - ergo the extra copy itself isn't reason enough for the different behavior. So... why?
Only a speculation, but I am afraid it will be hard to be more certain without Bjarne Stroustrup confirming how it really was:
It was designed this way because it was assumed such behaviour will be expected by the programmer, that he will expect the copy to be done when = sign is used, and not done with the direct initializer syntax.
I think the possible copy elision was only added in later versions of the standard, but I am not sure - this is something somebody may be able to tell certainly by checking the standard history.
Since it's a copy from a temp, it can and probably will be optimized out
The keyword here is probably. The standard allows, but does not require, a compiler to optimize the copy away. If some compilers allowed this code (optimized), but others rejected it (non-optimized), this would be very inconsistent.
So the standard prescribes a consistent way of handling this - everyone must check that the copy constructor is accessible, whether they then use it or not.
The idea is that all compilers should either accept the code or reject it. Otherwise it will be non-portable.
Another example, consider
A a;
B b;
A a1 = a;
A a2 = b;
It would be equally inconsistent to allow a2 but forbid a1 when As copy constructor is private.
We can also see from the Standard text that the two methods of initializing a class object were intended to be different (8.5/16):
If the initialization is direct-initialization, or if it is copy-initialization where the cv-unqualified version of the source type is the same class as, or a derived class of, the class of the destination, constructors are considered. The applicable constructors are enumerated (13.3.1.3), and the best one is chosen through overload resolution (13.3). The constructor so selected is called to initialize the object, with the initializer expression or expression-list as its argument(s). If no constructor applies, or the overload resolution is ambiguous, the initialization is ill-formed.
Otherwise (i.e., for the remaining copy-initialization cases), user-defined conversion sequences that can convert from the source type to the destination type or (when a conversion function is used) to a derived class thereof are enumerated as described in 13.3.1.4, and the best one is chosen through overload resolution (13.3). If the conversion cannot be done or is ambiguous, the initialization is ill-formed. The function selected is called with the initializer expression as its argument; if the function is a constructor, the call initializes a temporary of the cv-unqualified version of the destination type. The temporary is a prvalue. The result of the call (which is the temporary for the constructor case) is then used to direct-initialize, according to the rules above, the object that is the destination of the copy-initialization. In certain cases, an implementation is permitted to eliminate the copying inherent in this direct-initialization by constructing the intermediate result directly into the object being initialized; see 12.2, 12.8.
A difference is that the direct-initialization uses the constructors of the constructed class directly. With copy-initialization, other conversion functions are considered and these may produce a temporary that has to be copied.
Take the following example:
struct X
{
X(int);
X(const X&);
};
int foo(X x){/*Do stuff*/ return 1; }
X x(1);
foo(x);
In the compilers I tested, the argument to foo was always copied even with full optimization turned on. From this, we can gather that copies will not/must not be eliminated in all situations.
Now lets think from a language design perspective, imagine all the scenarios you would have to think about if you wanted to make rules for when a copy is needed and when it isn't. This would be very difficult. Also, even if you were able to come up with rules, they would be very complex and almost impossible for people to comprehend. However, at the same time, if you forced copies everywhere, that would be very inefficient. This is why the rules are the way they are, you make the rules comprehensible for people to understand while still not forcing copies to be made if they can be avoided.
I have to admit now, this answer is very similar to Suma's answer. The idea is that you can expect the behavior with the current rules, and anything else would be too hard for people to follow.
Initialization of built-in types like:
int i = 2;
is very natural syntax, in part due to historical reasons (remember your high school math). It is more natural than:
int i(2);
even if some mathematicians may argue this point. After all, there is nothing unnatural in calling a function (a constructor in this case) and passing it an argument.
For built-in types these two types of initialization are identical. There is no extra copy in the former case.
That is the reason for having both types of initialization and originally there was no specific intention to make them behave differently.
However, there are user-defined types and one of the stated goals of the language is to allow them to behave as built-in types as closely as possible.
Thus, copy construction (taking input from some conversion function, for example) is the natural implementation of the first syntax.
The fact that you may have extra copies and that they may be elided is an optimization for user-defined types. Both copy elision and explicit constructors came much later into the language. It is not surprising that standard allows optimizations after a certain period of use. Also, now you can eliminate explicit constructors from the overload resolution candidates.