Can an optimizing compiler add std::move?

Can an optimizing compiler add std::move? - c++

Can a compiler do automatic lvalue-to-rvalue conversion if it can prove that the lvalue won't be used again? Here's an example to clarify what I mean:
void Foo(vector<int> values) { ...}
void Bar() {
vector<int> my_values {1, 2, 3};
Foo(my_values); // may the compiler pretend I used std::move here?
}
If a std::move is added to the commented line, then the vector can be moved into Foo's parameter, rather than copied. However, as written, I didn't use std::move.
It's pretty easy to statically prove that my_values won't be used after the commented line. So s the compiler allowed to move the vector, or is it required to copy it?

The compiler is required to behave as-if the copy occurred from the vector to the call of Foo.
If the compiler can prove that there are is a valid abstract machine behavior with no observable side effects (within the abstract machine behavior, not in a real computer!) that involves moving the std::vector into Foo, it can do this.
In your above case, this (moving has no abstract machine visible side effects) is true; the compiler may not be able to prove it, however.
The possibly observable behavior when copying a std::vector<T> is:
Invoking copy constructors on the elements. Doing so with int cannot be observed
Invoking the default std::allocator<> at different times. This invokes ::new and ::delete (maybe1) In any case, ::new and ::delete has not been replaced in the above program, so you cannot observe this under the standard.
Calling the destructor of T more times on different objects. Not observable with int.
The vector being non-empty after the call to Foo. Nobody examines it, so it being empty is as-if it was not.
References or pointers or iterators to the elements of the exterior vector being different than those inside. No references, vectors or pointers are taken to the elements of the vector outside Foo.
While you may say "but what if the system is out of memory, and the vector is large, isn't that observable?":
The abstract machine does not have an "out of memory" condition, it simply has allocation sometimes failing (throwing std::bad_alloc) for non-constrained reasons. It not failing is a valid behavior of the abstract machine, and not failing by not allocating (actual) memory (on the actual computer) is also valid, so long as the non-existence of the memory has no observable side effects.
A slightly more toy case:
int main() {
int* x = new int[std::size_t(-1)];
delete[] x;
}
while this program clearly allocates way too much memory, the compiler is free to not allocate anything.
We can go further. Even:
int main() {
int* x = new int[std::size_t(-1)];
x[std::size_t(-2)] = 2;
std::cout << x[std::size_t(-2)] << '\n';
delete[] x;
}
can be turned into std::cout << 2 << '\n';. That large buffer must exist abstractly, but as long as your "real" program behaves as-if the abstract machine would, it doesn't actually have to allocate it.
Unfortunately, doing so at any reasonable scale is difficult. There are lots and lots of ways information can leak from a C++ program. So relying on such optimizations (even if they happen) is not going to end well.
1 There was some stuff about coalescing calls to new that might confuse the issue, I am uncertain if it would be legal to skip calls even if there was a replaced ::new.
An important fact is that there are situations that the compiler is not required to behave as-if there was a copy, even if std::move was not called.
When you return a local variable from a function in a line that looks like return X; and X is the identifier, and that local variable is of automatic storage duration (on the stack), the operation is implicitly a move, and the compiler (if it can) can elide the existence of the return value and the local variable into one object (and even omit the move).
The same is true when you construct an object from a temporary -- the operation is implicitly a move (as it is binding to an rvalue) and it can elide away the move completely.
In both these cases, the compiler is required to treat it as a move (not a copy), and it can elide the move.
std::vector<int> foo() {
std::vector<int> x = {1,2,3,4};
return x;
}
that x has no std::move, yet it is moved into the return value, and that operation can be elided (x and the return value can be turned into one object).
This:
std::vector<int> foo() {
std::vector<int> x = {1,2,3,4};
return std::move(x);
}
blocks elision, as does this:
std::vector<int> foo(std::vector<int> x) {
return x;
}
and we can even block the move:
std::vector<int> foo() {
std::vector<int> x = {1,2,3,4};
return (std::vector<int> const&)x;
}
or even:
std::vector<int> foo() {
std::vector<int> x = {1,2,3,4};
return 0,x;
}
as the rules for implicit move are intentionally fragile. (0,x is a use of the much maligned , operator).
Now, relying on implicit-move not occurring in cases like this last , based one is not advised: the standard committee has already changed an implicit-copy case to an implicit-move since implicit-move was added to the language because they deemed it harmless (where the function returns a type A with a A(B&&) ctor, and the return statement is return b; where b is of type B; at C++11 release that did a copy, now it does a move.) Further expansion of implicit-move cannot be ruled out: casting explicitly to a const& is probably the most reliable way to prevent it now and in the future.

In this case, the compiler could move out of my_values. This is because that causes no difference in observable behaviour.
Quoting the C++ standard's definition of observable behaviour:
The least requirements on a conforming implementation are:
Access to volatile objects are evaluated strictly according to the rules of the abstract machine.
At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.
The input and output dynamics of interactive devices shall take place in such a fashion that prompting output is actually delivered before a program waits for input. What constitutes an interactive device is implementation-defined.
Interpreting this slightly: "files" here includes the standard output stream, and for calls of functions that are not defined by the C++ Standard (e.g. operating system calls, or calls to third party libraries), it must be assumed that those functions might write to a file, so a corollary of this is that non-standard function calls must also be considered observable behaviour.
However your code (as you have shown it) has no volatile variables and no calls to non-standard functions. So the two versions (move or not-move) must have identical observable behaviour and therefore the compiler could do either (or even optimize the function out entirely, etc.)
In practice, of course, it's generally not so easy for a compiler to prove that no non-standard function calls occur, so many optimization opportunities like this are missed. For example, in this case the compiler may not yet know whether or not the default ::operator new has been replaced with a function that generates output.

Related

Calling non-static member function outside of object's lifetime in C++17

Does the following program have undefined behavior in C++17 and later?
struct A {
void f(int) { /* Assume there is no access to *this here */ }
};
int main() {
auto a = new A;
a->f((a->~A(), 0));
}
C++17 guarantees that a->f is evaluated to the member function of the A object before the call's argument is evaluated. Therefore the indirection from -> is well-defined. But before the function call is entered, the argument is evaluated and ends the lifetime of the A object (see however the edits below). Does the call still have undefined behavior? Is it possible to call a member function of an object outside its lifetime in this way?
The value category of a->f is prvalue by [expr.ref]/6.3.2 and [basic.life]/7 does only disallow non-static member function calls on glvalues referring to the after-lifetime object. Does this imply the call is valid? (Edit: As discussed in the comments I am likely misunderstanding [basic.life]/7 and it probably does apply here.)
Does the answer change if I replace the destructor call a->~A() with delete a or new(a) A (with #include<new>)?
Some elaborating edits and clarifications on my question:
If I were to separate the member function call and the destructor/delete/placement-new into two statements, I think the answers are clear:
a->A(); a->f(0): UB, because of non-static member call on a outside its lifetime. (see edit below, though)
delete a; a->f(0): same as above
new(a) A; a->f(0): well-defined, call on the new object
However in all these cases a->f is sequenced after the first respective statement, while this order is reversed in my initial example. My question is whether this reversal does allow for the answers to change?
For standards before C++17, I initially thought that all three cases cause undefined behavior, already because the evaluation of a->f depends on the value of a, but is unsequenced relative to the evaluation of the argument which causes a side-effect on a. However, this is undefined behavior only if there is an actual side-effect on a scalar value, e.g. writing to a scalar object. However, no scalar object is written to because A is trivial and therefore I would also be interested in what constraint exactly is violated in the case of standards before C++17, as well. In particular, the case with placement-new seems unclear to me now.
I just realized that the wording about the lifetime of objects changed between C++17 and the current draft. In n4659 (C++17 draft) [basic.life]/1 says:
The lifetime of an object o of type T ends when:
if T is a class
type with a non-trivial destructor (15.4), the destructor call starts
[...]
while the current draft says:
The lifetime of an object o of type T ends when:
[...]
if T is a class type, the destructor call starts, or
[...]
Therefore, I suppose my example does have well-defined behavior in C++17, but not he current (C++20) draft, because the destructor call is trivial and the lifetime of the A object isn't ended. I would appreciate clarification on that as well. My original question does still stands even for C++17 for the case of replacing the destructor call with delete or placement-new expression.
If f accesses *this in its body, then there may be undefined behavior for the cases of destructor call and delete expression, however in this question I want to focus on whether the call in itself is valid or not.
Note however that the variation of my question with placement-new would potentially not have an issue with member access in f, depending on whether the call itself is undefined behavior or not. But in that case there might be a follow-up question especially for the case of placement-new because it is unclear to me, whether this in the function would then always automatically refer to the new object or whether it might need to potentially be std::laundered (depending on what members A has).
While A does have a trivial destructor, the more interesting case is probably where it has some side effect about which the compiler may want to make assumptions for optimization purposes. (I don't know whether any compiler uses something like this.) Therefore, I welcome answers for the case where A has a non-trivial destructor as well, especially if the answer differs between the two cases.
Also, from a practical perspective, a trivial destructor call probably does not affect the generated code and (unlikely?) optimizations based on undefined behavior assumptions aside, all code examples will most likely generate code that runs as expected on most compilers. I am more interested in the theoretical, rather than this practical perspective.
This question intends to get a better understanding of the details of the language. I do not encourage anyone to write code like that.

It’s true that trivial destructors do nothing at all, not even end the lifetime of the object, prior to (the plans for) C++20. So the question is, er, trivial unless we suppose a non-trivial destructor or something stronger like delete.
In that case, C++17’s ordering doesn’t help: the call (not the class member access) uses a pointer to the object (to initialize this), in violation of the rules for out-of-lifetime pointers.
Side note: if just one order were undefined, so would be the “unspecified order” prior to C++17: if any of the possibilities for unspecified behavior are undefined behavior, the behavior is undefined. (How would you tell the well-defined option was chosen? The undefined one could emulate it and then release the nasal demons.)

The postfix expression a->f is sequenced before the evaluation of any arguments (which are indeterminately sequenced relative to one another). (See [expr.call])
The evaluation of the arguments is sequenced before the body of the function (even inline functions, see [intro.execution])
The implication, then is that calling the function itself is not undefined behavior. However, accessing any member variables or calling other member functions within would be UB per [basic.life].
So the conclusion is that this specific instance is safe per the wording, but a dangerous technique in general.

You seem to assume that a->f(0) has these steps (in that order for most recent C++ standard, in some logical order for previous versions):
evaluating *a
evaluating a->f (a so called bound member function)
evaluating 0
calling the bound member function a->f on the argument list (0)
But a->f doesn't have either a value or type. It's essentially a non-thing, a meaningless syntax element needed only because the grammar decomposes member access and function call, even on a member function call which by define combines member access and function call.
So asking when a->f is "evaluated" is a meaningless question: there is no such thing as a distinct evaluation step for the a->f value-less, type-less expression.
So any reasoning based on such discussions of order of evaluation of non entity is also void and null.
EDIT:
Actually this is worse than what I wrote, the expression a->f has a phony "type":
E1.E2 is “function of parameter-type-list cv returning T”.
"function of parameter-type-list cv" isn't even something that would be a valid declarator outside a class: one cannot have f() const as a declarator as in a global declaration:
int ::f() const; // meaningless
And inside a class f() const doesn't mean "function of parameter-type-list=() with cv=const”, it means member-function (of parameter-type-list=() with cv=const). There is no proper declarator for proper "function of parameter-type-list cv". It can only exist inside a class; there is no type "function of parameter-type-list cv returning T" that can be declared or that real computable expressions can have.

In addition to what others said:
a->~A(); delete a;
This program has a memory leak which itself is technically not undefined behavior.
However, if you called delete a; to prevent it - that should have been undefined behavior because delete would call a->~A() second time [Section 12.4/14].
a->~A()
Otherwise in reality this is as others suggested - compiler generates machine code along the lines of A* a = malloc(sizeof(A)); a->A(); a->~A(); a->f(0);.
Since no member variables or virtuals all three member functions are empty ({return;}) and do nothing. Pointer a even still points to valid memory.
It will run but debugger may complain of memory leak.
However, using any nonstatic member variables inside f() could have been undefined behavior because you are accessing them after they are (implicitly) destroyed by compiler-generated ~A(). That would likely result in a runtime error if it was something like std::string or std::vector.
delete a
If you replaced a->~A() with expression that invoked delete a; instead then I believe this would have been undefined behavior because pointer a is no longer valid at that point.
Despite that, the code should still run without errors because function f() is empty. If it accessed any member variables it may have crashed or led to random results because the memory for a is deallocated.
new(a) A
auto a = new A; new(a) A; is itself undefined behavior because you are calling A() a second time for the same memory.
In that case calling f() by itself would be valid because a exists but constructing a twice is UB.
It will run fine if A does not contain any objects with constructors allocating memory and such. Otherwise it could lead to memory leaks, etc, but f() would access the "second" copy of them just fine.

I'm not a language lawyer but I took your code snippet and modified it slightly. I wouldn't use this in production code but this seems to produce valid defined results...
#include <iostream>
#include <exception>
struct A {
int x{5};
void f(int){}
int g() { std::cout << x << '\n'; return x; }
};
int main() {
try {
auto a = new A;
a->f((a->~A(), a->g()));
catch(const std::exception& e) {
std::cerr << e.what();
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
I'm running Visual Studio 2017 CE with compiler language flag set to /std:c++latest and my IDE's version is 15.9.16 and I get the follow console output and exit program status:
console output
5
IDE exit status output
The program '[4128] Test.exe' has exited with code 0 (0x0).
So this does seem to be defined in the case of Visual Studio, I'm not sure how other compilers will treat this. The destructor is being invoked, however the variable a is still in dynamic heap memory.
Let's try another slight modification:
#include <iostream>
#include <exception>
struct A {
int x{5};
void f(int){}
int g(int y) { x+=y; std::cout << x << '\n'; return x; }
};
int main() {
try {
auto a = new A;
a->f((a->~A(), a->g(3)));
catch(const std::exception& e) {
std::cerr << e.what();
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
console output
8
IDE exit status output
The program '[4128] Test.exe' has exited with code 0 (0x0).
This time let's not change the class anymore, but let's make call on a's member afterwards...
int main() {
try {
auto a = new A;
a->f((a->~A(), a->g(3)));
a->g(2);
} catch( const std::exception& e ) {
std::cerr << e.what();
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
console output
8
10
IDE exit status output
The program '[4128] Test.exe' has exited with code 0 (0x0).
Here it appears that a.x is maintaining its value after a->~A() is called since new was called on A and delete has not yet been called.
Even more if I remove the new and use a stack pointer instead of allocated dynamic heap memory:
int main() {
try {
A b;
A* a = &b;
a->f((a->~A(), a->g(3)));
a->g(2);
} catch( const std::exception& e ) {
std::cerr << e.what();
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
I'm still getting:
console output
8
10
IDE exit status output
When I change my compiler's language flag setting from /c:std:c++latest to /std:c++17 I'm getting the same exact results.
What I'm seeing from Visual Studio it appears to be well defined without producing any UB within the contexts of what I've shown. However as from a language perspective when it concerns the standard I wouldn't rely on this type of code either. The above also doesn't consider when the class has internal pointers both stack-automatic storage as well as dynamic-heap allocation and if the constructor calls new on those internal objects and the destructor calls delete on them.
There are also a bunch of other factors than just the language setting for the compiler such as optimizations, convention calling, and other various compiler flags. It is hard to say and I don't have an available copy of the full latest drafted standard to investigate this any deeper. Maybe this can help you, others who are able to answer your question more thoroughly, and other readers to visualize this type of behavior in action.

Will the compiler optimize functions which return structures with fixed size arrays?

Assuming I have a struct in C/C++ with fixed size array members, for example:
#define SIZE 10000
struct foo{
int vector_i[SIZE];
float vector_f[SIZE];
};
and I would like to create a function that will return an instance of foo, like:
foo func(int value_i, float value_f){
int i;
foo f;
for(i=0;i<SIZE;i++) f.vector_i[i] = value_i;
for(i=0;i<SIZE;i++) f.vector_f[i] = value_f;
return f;
}
If I call the function using:
foo ff = func(1,1.1);
will the compiler perform some kind of optimization (ie TCO)?
Will the executable fill directly ff variable, or it will fill first f of func and then copy all values from f to ff?
How can I check if the optimization is performed?

My answer applies to c++.
Will the compiler perform some kind of optimization (ie TCO)?
By TCO do you mean "tail call optimization"? The function doesn't make a function call at the end (a tail call, if you will), so that optimization doesn't apply.
The compiler can elide the copy from the return value to the temporary due to named return value optimization. The copy-initialization from the temporary can also be elided.
How can I check if the optimization is performed?
By reading the generated assembly code.
If you can't read assembly, another approach would be to add copy and move constructors that have side effects and observe whether those side effects occur. However, modifying the program can have effect on whether the compiler decides to optimize (but side effects are not required to prevent copy elision).
If you don't want to rely on optimization, you should explicitly pass an exiting object to the function by reference (pointer in c), and modify it in place.
Standard reference for copy elision [class.copy] §31 (current standard draft)
When certain criteria are met, an implementation is allowed to omit the copy/move construction of a class object, even if the constructor selected for the copy/move operation and/or the destructor for the object have side effects. [...]
The section describes the criteria, which are met in this case. The quote was generated from the standard document draft at 2016-04-07. Numbering may vary across different versions of the standard document and rules have slightly changed. The quoted part has been unchanged since c++03, where the section is [class.copy] §15.

This is pretty well documented in Agner Fog's Calling Conventions document, § 7.1 Passing and returning objects, Table 7. Methods for returning structure, class and union objects.
A struct, class or union object can be returned from a function in registers only if it is sufficiently small and not too complex. If the object is too complex or doesn't fit into the appropriate registers then the caller must supply storage space for the object and pass a pointer to this space as a parameter to the function. The pointer can be passed in a register or on the stack. The same pointer is returned by the function. The detailed rules are given in table 7.
In other words, large return objects get constructed directly in the caller supplied buffer (on the caller's stack).
An extra copy is still required if the identity of the object to return is not known at compile time, e.g.:
foo func(bool a) {
foo x, y;
// fill x and y
return a ? x : y; // copying is required here
}

Is there a difference between an intermediate variable and return'ing a function call directly?

Is there any difference between calling a function in return and calling the function and then returning the value on runtime, like this:
my functions prototypes:
int aFunc(int...);
int bFunc(int...);
my first bFunc return line:
int bFunc(int...)
{
...
return (aFunc(x,...));
}
my second bFunc return line:
int bFunc(int...)
{
...
int retVal = aFunc(x,...);
return retVal;
}

To answer your specific question: there should not be an observable difference between
return expression;
and
x = expression;
return x;
provided of course that x is of the correct type.
However, in C++ there can be a difference between
return complicated_expression;
and
x = some_subexpression;
y = some_other_subexpression;
return complicated_expression_rewritten_in_terms_of_x_and_y;
The reason being: C++ guarantees that destructors of temporary values created during the evaluation of a subexpression are run at the end of the statement. This refactoring moves the side effect of any temporary value destructor associated with some_expression from after the computation of some_other_subexpression -- at the end of the return statement - to before it - at the end of the assignment to x.
I have seen real-world production code where this refactoring introduced a bug into the program: the computation of some_other_subexpression depended for its correctness on the side effect of the destructor of a temporary value generated during the evaluation of some_subexpression running afterwards. The rewritten code was easier to read, but unfortunately also wrong.

There may be a difference if the return type is something more complex like std::vector, depending on the optimizations implemented in the compiler.
Returning an unnamed vector requires (anonymous) return value optimization to avoid a copy, a common optimization. Whereas returning a named value requires named return value optimization, something not all compilers did in the past:
The Visual C++ 8.0 compiler ... adds a new feature: Named Return Value Optimization (NRVO). NRVO eliminates the copy constructor and destructor of a stack-based return value. This optimizes out the redundant copy constructor and destructor calls and thus improves overall performance.

A good compiler should make both identical (at least when optimizations are enabled).
Theoretically, there are two copy operations in bFunc:
Into a local variable on the stack.
From the local variable, into the "bottom" of the stack (bottom in the perspective of bFunc).
If retVal is an object (class or structure) returned by-value and not by-reference (as in the case above), then the additional copy operation might yield an overhead proportional to the size of retVal.
In addition to that, the fact that the copy constructor should be called twice (when dealing with an object), might prevent the compiler from applying the optimization in the first place.

Usually I'd not expect any real difference since the compiler may optimize the temporary variable away.

Could a smart compiler do all the things std::move does without it being part of the language?

This is a bit theoretical question, but although I have some basic understanding of the std::move Im still not certain if it provides some additional functionality to the language that theoretically couldnt be achieved with supersmart compilers. I know that code like :
{
std::string s1="STL";
std::string s2(std::move(s1));
std::cout << s1 <<std::endl;
}
is a new semantic behavior not just performance sugar. :D But tbh I guess nobody will use var x after doing std::move(x).
Also for movable only data (std::unique_ptr<>, std::thread) couldnt compiler automatically do the move construction and clearing of the old variable if type is declared movable?
Again this would mean that more code would be generated behind programmers back(for example now you can count cpyctor and movector calls, with automagic std::moving you couldnt do that ).

No.
But tbh I guess nobody will use var x after doing std::move(x)
Absolutely not guaranteed. In fact, a decent part of the reason why std::move(x) is not automatically usable by the compiler is because, well, it can't be decided automatically whether or not you intend this. It's explicitly well-defined behaviour.
Also, removing rvalue references would imply that the compiler can automagically write all the move constructors for you. This is definitely not true. D has a similar scheme, but it's a complete failure, because there are numerous useful situations in which the compiler-generated "move constructor" won't work correctly, but you can't change it.
It would also prevent perfect forwarding, which has other uses.
The Committee make many stupid mistakes, but rvalue references is not one of them.
Edit:
Consider something like this:
int main() {
std::unique_ptr<int> x = make_unique<int>();
some_func_that_takes_ownership(x);
int input = 0;
std::cin >> input;
if (input == 0)
some_other_func(x);
}
Owch. Now what? You can't magic the value of "input" to be known at compile-time. This is doubly a problem if the bodies of some_other_func and some_func_that_takes_ownership are unknown. This is Halting Problem- you can't prove that x is or is not used after some_func_that_takes_ownership.
D fails. I promised an example. Basically, in D, "move" is "binary copy and don't destruct the old". Unfortunately, consider a class with, say, a pointer to itself- something you will find in most string classes, most node-based containers, in designs for std::function, boost::variant, and lots of other similar handy value types. The pointer to the internal buffer will be copied but oh noes! points to the old buffer, not the new one. Old buffer is deallocated - GG your program.

It depends on what you mean by "what move does". To satisfy your curiosity, I think what you're looking to be told about the existence of Uniqueness Type Systems and Linear Type Systems.
These are types systems that enforce, at compile-time (in the type system), that a value only be referenced by one location, or that no new references be made. std::unique_ptr is the best approximation C++ can provide, given its rather weak type system.
Let's say we had a new storage-class specifier called uniqueref. This is like const, and specifies that the value has a single unique reference; nobody else has the value. It would enable this:
int main()
{
int* uniqueref x(new int); // only x has this reference
// unique type feature: error, would no longer be unique
auto y = x;
// linear type feature: okay, x not longer usable, z is now the unique owner
auto z = uniquemove(x);
// linear type feature: error: x is no longer usable
*x = 5;
}
(Also interesting to note the immense optimizations that can be taking, knowing a pointer value is really truly only referenced through that pointer. It's a bit like C99's restrict in that aspect.)
In terms of what you're asking, since we can now say that a type is uniquely referenced, we can guarantee that it's safe to move. That said, move operates are ultimately user-defined, and can do all sorts of weird stuff if desired, so implicitly doing this is a bad idea in current C++ anyway.
Everything above is obviously not formally thought-out and specified, but should give you an idea of what such a type system might look like. More generally, you probably want an Effect Type System.
But yes, these ideas do exist and are formally researched. C++ is just too established to add them.

Doing this the way you suggest is a lot more complicated than necessary:
std::string s1="STL";
std::string s2(s1);
std::cout << s1 <<std::endl;
In this case, it is fairly sure that a copy is meant. But if you drop the last line, s1 essentially ends its lifetime after the construction of s2.
In a reference counted implementation, the copy constructor for std::string will only increment the reference counter, while the destructor will decrement and delete if it becomes zero.
So the sequence is
(inlined std::string::string(char const *))
determine string length
allocate memory
copy string
initialize reference counter to 1
initialize pointer in string object
(inlined std::string::string(std::string const &))
increment reference counter
copy pointer to string representation
Now the compiler can flatten that, simply initialize the reference counter to 2 and store the pointer twice. Common Subexpression Elimination then finds out that s1 and s2 keep the same pointer value, and merges them into one.
In short, the only difference in generated code should be that the reference counter is initialized to 2.

Is RVO (Return Value Optimization) on unnamed objects a universally guaranteed behavior?

This question is in different aspect (also limited to gcc). My question is meant only for unnamed objects. Return Value Optimization is allowed to change the observable behavior of the resulting program. This seems to be mentioned in standard also.
However, this "allowed to" term is confusing. Does it mean that RVO is guaranteed to happen on every compiler. Due to RVO below code changes it's observable behavior:
#include<iostream>
int global = 0;
struct A {
A(int *p) {}
A(const A &obj) { ++ global; }
};
A foo () { return A(0); } // <--- RVO happens
int main () {
A obj = foo();
std::cout<<"global = "<<global<<"\n"; // prints 0 instead of 2
}
Is this program suppose to print global = 0 for all implementations irrespective of compiler optimizations and method size of foo ?

According to the standard, the program can print 0, 1 or 2. The specific paragraph in C++11 is 12.8p31 that starts with:
When certain criteria are met, an implementation is allowed to omit the copy/move construction of a class object, even if the copy/move constructor and/or destructor for the object have side effects.
Note that both copy elisions are not an optimization that falls in the as-if rule (which requires the behavior of the program to be consistent with the behavior of the same program as-if no optimization had taken place). The standard explicitly allows the implementation to generate different observable behaviors, and it is up to you the programmer to have your program not depend on that (or accept all three possible outcomes).
Note 2: 1 is not mentioned in any of the answers, but it is a possible outcome. There are two potential copies taking place, from the local variable in the function to the returned object to the object in main, the compiler can elide none, one or the two copies generating all three possible outputs.

It cannot be guaranteed. If you tried to write such a guarantee coherently, you would find it impossible to do so.
For example, consider this code:
std::string f() {
std::string first("first");
std::string second("second");
return FunctionThatIsAlwaysFalse() ? first : second;
}
The function FunctionThatIsAlwaysFalse always returns false, but you can only tell that if you do inter-module optimizations. Should the standard require every single compiler to do inter-module optimization so that it can use RVO in this case? How would that work? Or should it prohibit any compiler from using RVO when inter-module optimizations are needed? How would that work? How can it stop compilers that are smart enough to see that RVO is possible from doing it and those that are not from not doing it?
Should the standard list every optimization compilers are required to support with RVO? And should it prohibit RVO in other cases? Wouldn't that kind of defeat the point of optimizing compilers?
And what about the cases where the compiler believes RVO will reduce performance? Should the compiler be required to do an optimization it believes is bad? For example:
if(FunctionCompilerKnowsHasNoSideEffectsAndThinksMostlyReturnsFalse())
return F(3); // Here RVO is a pessimization
else
{
Foo j=F(3);
return Foo(j);
}
Here, if the compiler is not required to do RTO, if can avoid the if and the function call, since without RTO, the code is the same in both halves. Should you force the compiler to do an optimization it thinks makes things worse? Why?
There's really no way to make such a guarantee work.

Pedantically speaking its implementation defined. Modern compilers are intelligent enough to do such kind of optimization.
But there is no guarantee that the behavior would be exactly same across implementations. That's what implementation defined behavior is all about.
"allowed to" in this context means that 0 or 1 or 2 are standard conforming outputs.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js