Are the following inlined functions guaranteed to have the same implementation?

Are the following inlined functions guaranteed to have the same implementation? - c++

Are the following functions guaranteed to have the same implementation (i.e. object code)?
Does this change if Foo below is a primitive type instead (e.g. int)?
Does this change with the size of Foo?
Returning by value:
inline Foo getMyFooValue() { return myFoo; }
Foo foo = getMyFooValue();
Returning by reference:
inline const Foo &getMyFooReference() { return myFoo; }
Foo foo = getMyFooReference();
Modifying in place:
inline void getMyFooInPlace(Foo &theirFoo) { theirFoo = myFoo; }
Foo foo;
getMyFooInPlace(foo);

Are the following functions guaranteed to have the same implementation (i.e. object code)?
No, the language only specifies behaviour, not code generation, so it's up to the compiler whether two pieces of code with equivalent behaviour produce the same object code.
Does this change if Foo below is a primitive type instead (e.g. int)?
If it is (or, more generally, if it's trivially copyable), then all three have the same behaviour, so can be expected to produce similar code.
If it's a non-trivial class type, then it depends on what the class's special functions do. Each calls these functions in slightly different ways:
The first might copy-initialise a temporary object (calling the copy constructor), copy-initialise foo with that, then destroy the temporary (calling the destructor); but more likely it will elide the temporary, becoming equivalent to the second.
The second will copy-initialise foo (calling the copy constructor)
The third will default initialise foo (calling the default constructor), then copy-assign to it (calling the assignment operator).
So whether or not they are equivalent depends on whether default-initialisation and copy-assignment has equivalent behaviour to copy-initialisation, and (perhaps) whether creating and destroying a temporary has side effects. If they are equivalent, then you'll probably get similar code.
Does this change with the size of Foo?
No the size is irrelevant. What matters is whether it's trivial (so that both copy initialisation and copy assignment simply copy bytes) or non-trivial (so that they call user-defined functions, which might or might not be equivalent to each other).

The standard draft N3337 contains the following rules in 1.9.5: "A conforming [C++] implementation [...] shall produce the same observable behaviour as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input." And in 1.9.9 it defines the observable behaviour basically as I/O and volatile's values. Which means that as long as the I/O and volatiles of your program stay the same the implementation can do what it wants. If you have no I/O or volatiles the program doesn't need to do anything (which makes benchmarks hard to get right with high optimizations).
Note that the standard specifically is totally silent about what code a compiler should emit. Hell, it could probably interpret the sources.
This answers your question: No.

Related

Why can't the C++ compiler elide the move when moving a POD into an optional with RVO?

Consider the following code (godbolt):
#include <optional>
#include <array>
struct LargeType {
std::array<int, 256> largeContents;
};
LargeType doSomething();
std::optional<LargeType> wrapIntoOptional(){
return std::optional<LargeType> {doSomething()};
}
As you see, there is a function returning a large POD and then a function wrapping it into a std::optional. As visible in godbolt, the compiler creates a memcpy here, so it cannot fully elide moving the object. Why is this?
If I understand it correctly, the C++ language would allow eliding the move due to the as-if rule, as there are no visible side effects to it. So it seems that the compiler really cannot avoid it. But why?
My (probably incorrect) understanding how the compiler could optimize the memcpy out is to hand a reference to the storage inside the optional to doSomething() (as I guess such large objects get passed by hidden reference anyway). The optional itself would already lie on the stack of the caller of wrapIntoOptional due to RVO. As the definition of the constructor of std::optional is in the header, it is available to the compiler, so it should be able to inline it, so it can hand that storage location to doSomething in the first place. So what's wrong about my intuition here?
To clarify: I don't argue that the C++ language requires the compiler to inline this. I just thought it would be a reasonable optimization and given that wrapping things into optionals is a common operation, it would be an optimization that is implemented in modern compilers.

It is impossible to elide any copy/move through a constructor call into storage managed by the constructed object.
The constructor takes the object as a reference. In order to bind the reference to something there must be an object, so the prvalue from doSomething() must be materialized into a temporary object to bind to the reference and the constructor then must copy/move from that temporary into its own storage.
It is impossible to elide through function parameters. That would require knowing the implementation of the function and the way C++ is specified it is possible to compile each function only knowing the declarations of other functions (aside from constant expression evaluation). This would break that or require a new type of annotation in the declaration.
None of this prevents the compiler from optimizing in a way that doesn't affect the observable behavior though. If your compiler is not figuring out that the extra copy can be avoided and has no observable side effects when seeing all relevant function/constructor definitions, then that's something you could complain to your compiler vendor about. The concept of copy elision is about allowing the compiler to optimize away a copy/move even though it would have had observable side effects.

You can add noexcept to elide copy:
https://godbolt.org/z/rrGEfrdzc

With guaranteed copy elision, why does the class need to be fully defined?

A followup to this post. Consider the following:
class C;
C foo();
That is a pair of valid declarations. C doesn't need to be fully defined when merely declaring a function. But if we were to add the following function:
class C;
C foo();
inline C bar() { return foo(); }
Then suddenly C needs to be a fully defined type. But with guaranteed copy elision, none of its members are required. There's no copying or even a move, the value is initialized elsewhere, and destroyed only in the context of the caller (to bar).
So why? What in the standard prohibits it?

Guaranteed copy elision has exceptions, for compatibility reasons and/or efficiency. Trivially copyable types may be copied even where copy elision would otherwise be guaranteed. You're right that if this doesn't apply, then the compiler would be able to generate correct code without knowing any details of C, not even its size. But the compiler does need to know if this applies, and for that, it still needs the type to be complete.
According to https://timsong-cpp.github.io/cppwp/class.temporary :
15.2 Temporary objects [class.temporary]
1 Temporary objects are created
[...]
(1.2) -- when needed by the implementation to pass or return an object of trivially-copyable type (see below), and
[...]
3 When an object of class type X is passed to or returned from a function, if each copy constructor, move constructor, and destructor of X is either trivial or deleted, and X has at least one non-deleted copy or move constructor, implementations are permitted to create a temporary object to hold the function parameter or result object. The temporary object is constructed from the function argument or return value, respectively, and the function's parameter or return object is initialized as if by using the non-deleted trivial constructor to copy the temporary (even if that constructor is inaccessible or would not be selected by overload resolution to perform a copy or move of the object). [ Note: This latitude is granted to allow objects of class type to be passed to or returned from functions in registers. -- end note ]

This has nothing to do with copy ellision. the foo is supposed to return a C value. As long as you just pass a reference or pointer to foo, it's OK. Once you try to call foo - as is the case in bar- the size of its arguments and return value must be at hand; the only valid way to know that is presenting a full declaration of the requiered type.
Had the signature been using a reference or a pointer, all the required info was present and you could do without the full type declaration. This approach has a name: pimpl==Pointer to IMPLementaion, and it is widely used as a means of hiding details in closed-source library distros.

The rule lies in [basic.lval]/9:
Unless otherwise indicated ([dcl.type.simple]), a prvalue shall always have complete type or the void type; ...

Despite the number of answers and amount of commentary posted to this thread (which has answered all of my personal questions), I have decided to post an answer 'for the rest of us'. I didn't initially understand what the OP was getting at but now I do, so I thought I'd share. If you know all this and it bores you, dear reader, then please just move on.
#xskxzr and #hvd effectively answered the question, but #hvd's post especially is in standardese and assumes that readers know how return-by-value (and by extension RVO) works, which I imagine not everybody does. I thought I did, but I was missing an important detail (which, when you think it through, is actually pretty obvious, but still, I missed it).
So this post mainly focuses on that, so that we can all see why (a) the OP wondered why there was an issue compiling bar() in the first place, and then (b) subsequently realised the reason.
So, let's look at that code again. Given this (which is legal, even with an incompletely defined type):
class C;
C foo();
Why can't the compiler compile this (I have removed the inline because it is irrelevant):
C bar() { return foo(); }
The error message from gcc being:
error: return type 'class C' is incomplete
Well, first up, the accepted answer quotes the relevant paragraph from the standard that explicitly forbids it, so no mystery there. But the OP (and indeed commenter Walter, who picked up on this straightaway) wanted to know why.
Well at first that seemed to me to be obvious: space needs to be allocated for the function result by the caller and it doesn't know how big the object is so the compiler is in a quandry. But I was missing a trick, and that lies in the way return-by-value works.
Now for those that don't know, returning class objects by value works by the caller allocating space for the returned object on the stack and passing a pointer to it as a hidden parameter to the function being called, which then constructs the object, manipulates it, whatever.
However, this daisy-chains, so if we have the following code (where we fully define C before calling bar()):
class C
{
public:
int x;
};
C c = bar ();
c.x = 4;
then space for c is allocated before bar() is called and the address of c is then passed as a hidden parameter to bar(), and then passed directly on to foo(), which finally fills constructs the object in the desired location. So, because bar() didn't actually do anything with this pointer (apart from pass it around) then all it cares about is the pointer itself, and not what it points to.
Or does it? Well, actually, yes, it does.
When returning a class object by value, small objects are usually returned in a register (or a pair of registers) as an optimisation. The compiler can get away with doing this in the majority of cases where the object is small enough (more on that in a moment).
But now, bar() needs to know whether this is what foo() is going to do, and to do that it needs, for various reasons, to see the full declaration of the class.
So, in summary, that's why the compiler needs a fully-defined type in order to call foo(), else it won't know what foo() will be expecting and so it doesn't know what code to generate. Not on most platforms anyway, end of story.
Notes:
I looked at gcc and there are seem to be two (entirely logical) rules for determining whether a class object is returned in a register or pair of registers:
a) The object is 16 bytes or smaller (in a 64 bit build).
b) std::is_trivially_copyable<T>::value evaluates to true (maybe someone can find something in the standard about that).
In case any readers don't know, RVO relies on constructing the object in its final resting place (i.e. in the location allocated by the caller). This is because there are objects (such as some implementations of std::basic_string, I believe) that are sensitive to being moved around in memory so you can't just construct them somewhere convenient to you and then memcpy them somewhere else.
If constructing the returned object in that final location is not possible (because of the way you coded the function returning the object), then RVO doesn't happen (how can it?), see live demo below (make_no_RVO()).
As a specific example of point 1b, if a small object contains data members that (might) point either to itself or to any of its data members, then returning it by value will get you into trouble if you don't declare it properly. Just adding an empty copy constructor will do, since then it is no longer trivially copyable. But then I guess that's true in general, don't hide important information from the compiler.
Live demo here. All comments on this post welcome, I will answer them to the best of my ability.

Can I ensure RVO for reintrepret-cast'ed values?

Suppose I've written:
Foo get_a_foo() {
return reinterpret_cast<Foo>(get_a_bar());
}
and suppose that sizeof(Foo) == sizeof(Bar).
Does return value optimization necessarily take place here, or are compilers allowed to do whatever they like when I "break the rules" by using a reinterpret_cast? If I don't get RVO, or am not guaranteed it - can I change this code to ensure that it occur?
My question is about C++11 and, separately, C++17 (since there was some change in it w.r.t. RVO, if I'm not mistaken).

Suppose I've written:
Foo get_a_foo() {
return reinterpret_cast<Foo>(get_a_bar());
}
and suppose that sizeof(Foo) == sizeof(Bar).
That reinterpret_cast is not legal for all possible Foo and Bar types. It only works for cases where:
Bar is a pointer and Foo is either a pointer or an integer/enum big enough to hold pointers.
Bar is an integer/enum big enough to hold a pointer, and Foo is a pointer.
Bar is an object type and Foo is a reference type.
There are a couple of other cases I didn't cover, but they're either irrelevant (nullptr_t casting) or fall under similar issues for #1 or #2.
See, elision doesn't actually matter when dealing in fundamental types. You can't tell the difference between eliding a copy/move of fundamental types and not eliding it. So is there a conversion there? Is the compiler just using the return value register? That's up to the compiler, via the "as if" rule.
And elision doesn't apply when returning reference types, so #3 is out.
But if Foo and Bar are user-defined object types (or object types other than pointers, integers, or member pointers), the cast is is ill-formed. reinterpret_cast is not some kind of trivial memcpy conversion function.
So let's replace this with some code that could, you know, actually work:
Foo get_a_foo()
{
return std::bit_cast<Foo>(get_a_bar());
}
Where C++20's std::bit_cast effectively converts one trivial copyable type to another trivial copyable type.
That conversion still would not be elided. Or at least, not in the way "elision" is typically used.
Because the two types are trivially copyable, and bit_cast will only call trivial constructors, the compiler could certainly erase the constructors, and even use the return value object of get_a_foo as the return value object of get_a_bar. And thus, it could be considered "elision".
But "elision" typically refers to the part of the standard that allows the implementation to disregard even non-trivial constructor/destructors. The compiler can only perform the above because all of the constructors and destructors are trivial. If they were non-trivial, they could not be disregarded (then again, if they were non-trivial, std::bit_cast wouldn't work).
My point is that the optimization of the conversion above is not due to "elision" or RVO rules; it's due entirely to the "as if" rule. Even in C++17, whether the bit_cast call is effectively made a noop is entirely up to the compiler. Yes, after having created the Foo prvalue, the "elision" of it's copy into the function's return value object is required by C++17.
But the conversion itself is not a matter of elision.

What should the default constructor do in a RAII class with move semantics?

Move semantics are great for RAII classes. They allow one to program as if one had value semantics without the cost of heavy copies. A great example of this is returning std::vector from a function. Programming with value semantics however means, that one would expect types to behave like primitive data types. Those two aspects sometimes seem to be at odds.
On the one hand, in RAII one would expect the default constructor to return a fully initialized object or throw an exception if the resource acquisition failed. This guarantees that any constructed object will be in a valid and consistent state (i.e. safe to use).
On the other hand, with move semantics there exists a point when objects are in a valid but unspecified state. Similarly, primitive data types can be in an uninitialized state. Therefore, with value semantics, I would expect the default constructor to create an object in this valid but unspecified state, so that the following code would have the expected behavior:
// Primitive Data Type, Value Semantics
int i;
i = 5;
// RAII Class, Move Semantics
Resource r;
r = Resource{/*...*/}
In both cases, I would expect the "heavy" initialization to occur only once. I am wondering, what is the best practice regarding this? Obviously, there is a slight practical issue with the second approach: If the default constructor creates objects in the unspecified state, how would one write a constructor that does acquire a resource, but takes no additional parameters? (Tag dispatching comes to mind...)
Edit: Some of the answers have questioned the rationale of trying to make your classes work like primitive data types. Some of my motivation comes from Alexander Stepanov's Efficient Programming with Components, where he talks about regular types. In particular, let me quote:
Whatever is a natural idiomatic expression in c [for built-in types], should be a natural idiomatic expression for regular types.
He goes on to provide almost the same example as above. Is his point not valid in this context? Am I understanding it wrong?
Edit: As there hasn't been much discussion, I am about to accept the highest voted answer. Initializing objects in a "moved-from like" state in the default constructor is probably not a good idea, since everyone who agreed with the existing answers would not expect that behavior.

Programming with value semantics however means, that one would expect
types to behave like primitive data types.
Keyword "like". Not "identically to".
Therefore, with value semantics, I would expect the default
constructor to create an object in this valid but unspecified state
I really don't see why you should expect that. It doesn't seem like a very desirable feature to me.
what is the best practice regarding this?
Forget this idea that a non POD class should share this feature in common with primitive data types. It's wrong headed. If there is no sensible way to initialize a class without parameters, then that class should not have a default constructor.
If you want to declare an object, but hold off on initializing it (perhaps in a deeper scope), then use std::unique_ptr.

If you accept that objects should generally be valid by construction, and all possible operations on an object should move it only between valid states, then it seems to me that by having a default constructor, you are only saying one of two things:
This value is a container, or another object with a reasonable “empty” state, which I intend to mutate—e.g., std::vector.
This value does not have any member variables, and is used primarily for its type—e.g., std::less.
It doesn’t follow that a moved-from object need necessarily have the same state as a default-constructed one. For example, an std::string containing the empty string "" might have a different state than a moved-from string instance. When you default-construct an object, you expect to work with it; when you move from an object, the vast majority of the time you simply destroy it.
How would one write a constructor that does acquire a resource, but takes no additional parameters?
If your default constructor is expensive and takes no parameters, I would question why. Should it really be doing something so expensive? Where are its default parameters coming from—some global configuration? Maybe passing them explicitly would be easier to maintain. Take the example of std::ifstream: with a parameter, its constructor opens a file; without, you use the open() member function.

What you can do is lazy initialization: have a flag (or a nulled pointer) in your object that indicates whether the object is fully initialized. Then have a member function that uses this flag to ensure initialization after it is run. All your default constructor needs to do is to set the initialization flag to false. If all members that need an initialized state call ensure_initialization() before starting their work, you have perfect semantics and no double heavy initialization.
Example:
class Foo {
public:
Foo() : isInitialized(false) { };
void ensureInitialization() {
if(isInitialized) return;
//the usual default constructor code
isInitialized = true;
};
void bar() {
ensureInitialization();
//the rest of the bar() implementation
};
private:
bool isInitialized;
//some heavy variables
}
Edit:
To reduce the overhead produced by the function call, you can do something like this:
//In the .h file:
class Foo {
public:
Foo() : isInitialized(false) { };
void bar();
private:
void initialize();
bool isInitialized;
//some heavy variables
}
//In the .cpp file:
#define ENSURE_INITIALIZATION() do { \
if(!isInitialized) initialize(); \
} while(0)
void Foo::bar() {
ENSURE_INITIALIZATION();
//the rest of the bar() implementation
}
void Foo::initialize() {
//the usual default constructor code
isInitialized = true;
}
This makes sure that the decision to initialize or not is inlined without inlining the initialization itself. The later would just bloat the executable and reduce instruction cache efficiency, but the first can't be done automatically, so you need to employ the preprocessor for that. The overhead of this approach should be less than a function call on average.

C++: Calling a constructor to a temporary object

Suppose I have the following:
int main() {
SomeClass();
return 0;
}
Without optimization, the SomeClass() constructor will be called, and then its destructor will be called, and the object will be no more.
However, according to an IRC channel that constructor/destructor call may be optimized away if the compiler thinks there's no side effect to the SomeClass constructors/destructors.
I suppose the obvious way to go about this is not to use some constructor/destructor function (e.g use a function, or a static method or so), but is there a way to ensure the calling of the constructors/destructors?

However, according to an IRC channel that constructor/destructor call may be optimized away if the compiler thinks there's no side effect to the SomeClass constructors/destructors.
The bolded part is wrong. That should be: knows there is no observable behaviour
E.g. from § 1.9 of the latest standard (there are more relevant quotes):
A conforming implementation executing a well-formed program shall produce the same observable behavior
as one of the possible executions of the corresponding instance of the abstract machine with the same program
and the same input. However, if any such execution contains an undefined operation, this International
Standard places no requirement on the implementation executing that program with that input (not even
with regard to operations preceding the first undefined operation).
As a matter of fact, this whole mechanism underpins the sinlge most ubiquitous C++ language idiom: Resource Acquisition Is Initialization
Backgrounder
Having the compiler optimize away the trivial case-constructors is extremely helpful. It is what allows iterators to compile down to exactly the same performance code as using raw pointer/indexers.
It is also what allows a function object to compile down to the exact same code as inlining the function body.
It is what makes C++11 lambdas perfectly optimal for simple use cases:
factorial = std::accumulate(begin, end, [] (int a,int b) { return a*b; });
The lambda compiles down to a functor object similar to
struct lambda_1
{
int operator()(int a, int b) const
{ return a*b; }
};
The compiler sees that the constructor/destructor can be elided and the function body get's inlined. The end result is optimal 1
More (un)observable behaviour
The standard contains a very entertaining example to the contrary, to spark your imagination.
§ 20.7.2.2.3
[ Note: The use count updates caused by the temporary object construction and destruction are not
observable side effects, so the implementation may meet the effects (and the implied guarantees) via
different means, without creating a temporary. In particular, in the example:
shared_ptr<int> p(new int);
shared_ptr<void> q(p);
p = p;
q = p;
both assignments may be no-ops. —end note ]
IOW: Don't underestimate the power of optimizing compilers. This in no way means that language guarantees are to be thrown out of the window!
1 Though there could be faster algorithms to get a factorial, depending on the problem domain :)

I'm sure is 'SomeClass::SomeClass()' is not implemented as 'inline', the compiler has no way of knowing that the constructor/destructor has no side effects, and it will call the constructor/destructor always.

If the compiler is optimizing away a visible effect of the constructor/destructor call, it is buggy. If it has no visible effect, then you shouldn't notice it anyway.
However let's assume that somehow your constructor or destructor does have a visible effect (so construction and subsequent destruction of that object isn't effectively a no-op) in such a way that the compiler could legitimately think it wouldn't (not that I can think of such a situation, but then, it might be just a lack of imagination on my side). Then any of the following strategies should work:
Make sure that the compiler cannot see the definition of the constructor and/or destructor. If the compiler doesn't know what the constructor/destructor does, it cannot assume it does not have an effect. Note, however, that this also disables inlining. If your compiler does not do cross-module optimization, just putting the constructor/destructor into a different file should suffice.
Make sure that your constructor/destructor actually does have observable behaviour, e.g. through use of volatile variables (every read or write of a volatile variable is considered observable behaviour in C++).
However let me stress again that it's very unlikely that you have to do anything, unless your compiler is horribly buggy (in which case I'd strongly advice you to change the compiler :-)).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js