Related
Consider the following simple code that makes use of new (I am aware there is no delete[], but it does not pertain to this question):
int main()
{
int* mem = new int[100];
return 0;
}
Is the compiler allowed to optimize out the new call?
In my research, g++ (5.2.0) and Visual Studio 2015 do not optimize out the new call, while clang (3.0+) does. All tests have been made with full optimizations enabled (-O3 for g++ and clang, Release mode for Visual Studio).
Isn't new making a system call under the hood, making it impossible (and illegal) for a compiler to optimize that out?
EDIT: I have now excluded undefined behaviour from the program:
#include <new>
int main()
{
int* mem = new (std::nothrow) int[100];
return 0;
}
clang 3.0 does not optimize that out anymore, but later versions do.
EDIT2:
#include <new>
int main()
{
int* mem = new (std::nothrow) int[1000];
if (mem != 0)
return 1;
return 0;
}
clang always returns 1.
The history seems to be that clang is following the rules laid out in N3664: Clarifying Memory Allocation which allows the compiler to optimize around memory allocations but as Nick Lewycky points out :
Shafik pointed out that seems to violate causality but N3664 started life as N3433, and I'm pretty sure we wrote the optimization first and wrote the paper afterwards anyway.
So clang implemented the optimization which later on became a proposal that was implemented as part of C++14.
The base question is whether this is a valid optimization prior to N3664, that is a tough question. We would have to go to the as-if rule covered in the draft C++ standard section 1.9 Program execution which says(emphasis mine):
The semantic descriptions in this International Standard define a
parameterized nondeterministic abstract machine. This International
Standard places no requirement on the structure of conforming
implementations. In particular, they need not copy or emulate the
structure of the abstract machine. Rather, conforming implementations
are required to emulate (only) the observable behavior of the abstract
machine as explained below.5
where note 5 says:
This provision is sometimes called the “as-if” rule, because an
implementation is free to disregard any requirement of this
International Standard as long as the result is as if the requirement
had been obeyed, as far as can be determined from the observable
behavior of the program. For instance, an actual implementation need
not evaluate part of an expression if it can deduce that its value is
not used and that no side effects affecting the observable behavior of
the program are produced.
Since new could throw an exception which would have observable behavior since it would alter the return value of the program, that would seem to argue against it being allowed by the as-if rule.
Although, it could be argued it is implementation detail when to throw an exception and therefore clang could decide even in this scenario it would not cause an exception and therefore eliding the new call would not violate the as-if rule.
It also seems valid under the as-if rule to optimize away the call to the non-throwing version as well.
But we could have a replacement global operator new in a different translation unit which could cause this to affect observable behavior, so the compiler would have to have some way a proving this was not the case, otherwise it would not be able to perform this optimization without violating the as-if rule. Previous versions of clang did indeed optimize in this case as this godbolt example shows which was provided via Casey here, taking this code:
#include <cstddef>
extern void* operator new(std::size_t n);
template<typename T>
T* create() { return new T(); }
int main() {
auto result = 0;
for (auto i = 0; i < 1000000; ++i) {
result += (create<int>() != nullptr);
}
return result;
}
and optimizing it to this:
main: # #main
movl $1000000, %eax # imm = 0xF4240
ret
This indeed seems way too aggressive but later versions do not seem to do this.
This is allowed by N3664.
An implementation is allowed to omit a call to a replaceable global allocation function (18.6.1.1, 18.6.1.2). When it does so, the storage is instead provided by the implementation or provided by extending the allocation of another new-expression.
This proposal is part of the C++14 standard, so in C++14 the compiler is allowed to optimize out a new expression (even if it might throw).
If you take a look at the Clang implementation status it clearly states that they do implement N3664.
If you observe this behavior while compiling in C++11 or C++03 you should fill a bug.
Notice that before C++14 dynamic memory allocations are part of the observable status of the program (although I can not find a reference for that at the moment), so a conformant implementation was not allowed to apply the as-if rule in this case.
Bear in mind the C++ standard tells what a correct program should do, not how it should do it. It can't tell the later at all since new architectures can and do arise after the standard is written and the standard has to be of use to them.
new does not have to be a system call under the hood. There are computers usable without operating systems and without a concept of system call.
Hence, as long as the end behaviour does not change, the compiler can optimize any and everything away. Including that new
There is one caveat.
A replacement global operator new could have been defined in a different translation unit
In that case the side effects of new could be such that can't be optimized away. But if the compiler can guarantee that the new operator has no side effects, as would be the case if the posted code is the whole code, then the optimization is valid.
That new can throw std::bad_alloc is not a requirement. In this case, when new is optimized, the compiler can guarantee that no exception will be thrown and no side effect will happen.
It is perfectly allowable (but not required) for a compiler to optimize out the allocations in your original example, and even more so in the EDIT1 example per §1.9 of the standard, which is usually referred to as the as-if rule:
Conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below:
[3 pages of conditions]
A more human-readable representation is available at cppreference.com.
The relevant points are:
You have no volatiles, so 1) and 2) do not apply.
You do not output/write any data or prompt the user, so 3) and 4) do not apply. But even if you did, they would clearly be satisfied in EDIT1 (arguably also in the original example, although from a purely theoretical point of view, it is illegal since the program flow and output -- theoretically -- differs, but see two paragraphs below).
An exception, even an uncaught one, is well-defined (not undefined!) behavior. However, strictly speaking, in case that new throws (not going to happen, see also next paragraph), the observable behavior would be different, both by the program's exit code and by any output that might follow later in the program.
Now, in the particular case of a singular small allocation, you can give the compiler the "benefit of doubt" that it can guarantee that the allocation will not fail.
Even on a system under very heavy memory pressure, it is not possible to even start a process when you have less than the minimum allocation granularity available, and the heap will have been set up prior to calling main, too. So, if this allocation was to fail, the program would never start or would already have met an ungraceful end before main is even called.
Insofar, assuming that the compiler knows this, even though the allocation could in theory throw, it is legal to even optimize the original example, since the compiler can practically guarantee that it will not happen.
<slightly undecided>
On the other hand, it is not allowable (and as you can observe, a compiler bug) to optimize out the allocation in your EDIT2 example. The value is consumed to produce an externally observable effect (the return code).
Note that if you replace new (std::nothrow) int[1000] with new (std::nothrow) int[1024*1024*1024*1024ll] (that's a 4TiB allocation!), which is -- on present day computers -- guaranteed to fail, it still optimizes out the call. In other words, it returns 1 although you wrote code that must output 0.
#Yakk brought up a good argument against this: As long as the memory is never touched, a pointer can be returned, and not actual RAM is needed. Insofar it would even be legitimate to optimize out the allocation in EDIT2. I am unsure who is right and who is wrong here.
Doing a 4TiB allocation is pretty much guaranteed to fail on a machine that doesn't have at least something like a two-digit gigabyte amount of RAM simply because the OS needs to create page tables. Now of course, the C++ standard does not care about page tables or about what the OS is doing to provide memory, that is true.
But on the other hand, the assumption "this will work if memory is not touched" does rely on exactly such a detail and on something that the OS provides. The assumption that if RAM that is not touched it is actually not needed is only true because the OS provides virtual memory. And that implies that the OS needs to create page tables (I can pretend that I don't know about it, but that doesn't change the fact that I rely on it anyway).
Therefore, I think it is not 100% correct to first assume one and then say "but we don't care about the other".
So, yes, the compiler can assume that a 4TiB allocation is in general perfectly possible as long as memory is not touched, and it can assume that it is generally possible to succeed. It might even assume that it's likely to succeed (even when it's not). But I think that in any case, you are never allowed to assume that something must work when there is a possibility of a failure. And not only is there a possibility of failure, in that example, failure is even the more likely possibility.
</slightly undecided>
The worst that can happen in your snippet is that new throws std::bad_alloc, which is unhandled. What happens then is implementation-defined.
With the best case being a no-op and the worst case not being defined, the compiler is allowed to factor them into non-existence. Now, if you actually try and catch the possible exception :
int main() try {
int* mem = new int[100];
return 0;
} catch(...) {
return 1;
}
... then the call to operator new is kept.
Consider the following simple code that makes use of new (I am aware there is no delete[], but it does not pertain to this question):
int main()
{
int* mem = new int[100];
return 0;
}
Is the compiler allowed to optimize out the new call?
In my research, g++ (5.2.0) and Visual Studio 2015 do not optimize out the new call, while clang (3.0+) does. All tests have been made with full optimizations enabled (-O3 for g++ and clang, Release mode for Visual Studio).
Isn't new making a system call under the hood, making it impossible (and illegal) for a compiler to optimize that out?
EDIT: I have now excluded undefined behaviour from the program:
#include <new>
int main()
{
int* mem = new (std::nothrow) int[100];
return 0;
}
clang 3.0 does not optimize that out anymore, but later versions do.
EDIT2:
#include <new>
int main()
{
int* mem = new (std::nothrow) int[1000];
if (mem != 0)
return 1;
return 0;
}
clang always returns 1.
The history seems to be that clang is following the rules laid out in N3664: Clarifying Memory Allocation which allows the compiler to optimize around memory allocations but as Nick Lewycky points out :
Shafik pointed out that seems to violate causality but N3664 started life as N3433, and I'm pretty sure we wrote the optimization first and wrote the paper afterwards anyway.
So clang implemented the optimization which later on became a proposal that was implemented as part of C++14.
The base question is whether this is a valid optimization prior to N3664, that is a tough question. We would have to go to the as-if rule covered in the draft C++ standard section 1.9 Program execution which says(emphasis mine):
The semantic descriptions in this International Standard define a
parameterized nondeterministic abstract machine. This International
Standard places no requirement on the structure of conforming
implementations. In particular, they need not copy or emulate the
structure of the abstract machine. Rather, conforming implementations
are required to emulate (only) the observable behavior of the abstract
machine as explained below.5
where note 5 says:
This provision is sometimes called the “as-if” rule, because an
implementation is free to disregard any requirement of this
International Standard as long as the result is as if the requirement
had been obeyed, as far as can be determined from the observable
behavior of the program. For instance, an actual implementation need
not evaluate part of an expression if it can deduce that its value is
not used and that no side effects affecting the observable behavior of
the program are produced.
Since new could throw an exception which would have observable behavior since it would alter the return value of the program, that would seem to argue against it being allowed by the as-if rule.
Although, it could be argued it is implementation detail when to throw an exception and therefore clang could decide even in this scenario it would not cause an exception and therefore eliding the new call would not violate the as-if rule.
It also seems valid under the as-if rule to optimize away the call to the non-throwing version as well.
But we could have a replacement global operator new in a different translation unit which could cause this to affect observable behavior, so the compiler would have to have some way a proving this was not the case, otherwise it would not be able to perform this optimization without violating the as-if rule. Previous versions of clang did indeed optimize in this case as this godbolt example shows which was provided via Casey here, taking this code:
#include <cstddef>
extern void* operator new(std::size_t n);
template<typename T>
T* create() { return new T(); }
int main() {
auto result = 0;
for (auto i = 0; i < 1000000; ++i) {
result += (create<int>() != nullptr);
}
return result;
}
and optimizing it to this:
main: # #main
movl $1000000, %eax # imm = 0xF4240
ret
This indeed seems way too aggressive but later versions do not seem to do this.
This is allowed by N3664.
An implementation is allowed to omit a call to a replaceable global allocation function (18.6.1.1, 18.6.1.2). When it does so, the storage is instead provided by the implementation or provided by extending the allocation of another new-expression.
This proposal is part of the C++14 standard, so in C++14 the compiler is allowed to optimize out a new expression (even if it might throw).
If you take a look at the Clang implementation status it clearly states that they do implement N3664.
If you observe this behavior while compiling in C++11 or C++03 you should fill a bug.
Notice that before C++14 dynamic memory allocations are part of the observable status of the program (although I can not find a reference for that at the moment), so a conformant implementation was not allowed to apply the as-if rule in this case.
Bear in mind the C++ standard tells what a correct program should do, not how it should do it. It can't tell the later at all since new architectures can and do arise after the standard is written and the standard has to be of use to them.
new does not have to be a system call under the hood. There are computers usable without operating systems and without a concept of system call.
Hence, as long as the end behaviour does not change, the compiler can optimize any and everything away. Including that new
There is one caveat.
A replacement global operator new could have been defined in a different translation unit
In that case the side effects of new could be such that can't be optimized away. But if the compiler can guarantee that the new operator has no side effects, as would be the case if the posted code is the whole code, then the optimization is valid.
That new can throw std::bad_alloc is not a requirement. In this case, when new is optimized, the compiler can guarantee that no exception will be thrown and no side effect will happen.
It is perfectly allowable (but not required) for a compiler to optimize out the allocations in your original example, and even more so in the EDIT1 example per §1.9 of the standard, which is usually referred to as the as-if rule:
Conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below:
[3 pages of conditions]
A more human-readable representation is available at cppreference.com.
The relevant points are:
You have no volatiles, so 1) and 2) do not apply.
You do not output/write any data or prompt the user, so 3) and 4) do not apply. But even if you did, they would clearly be satisfied in EDIT1 (arguably also in the original example, although from a purely theoretical point of view, it is illegal since the program flow and output -- theoretically -- differs, but see two paragraphs below).
An exception, even an uncaught one, is well-defined (not undefined!) behavior. However, strictly speaking, in case that new throws (not going to happen, see also next paragraph), the observable behavior would be different, both by the program's exit code and by any output that might follow later in the program.
Now, in the particular case of a singular small allocation, you can give the compiler the "benefit of doubt" that it can guarantee that the allocation will not fail.
Even on a system under very heavy memory pressure, it is not possible to even start a process when you have less than the minimum allocation granularity available, and the heap will have been set up prior to calling main, too. So, if this allocation was to fail, the program would never start or would already have met an ungraceful end before main is even called.
Insofar, assuming that the compiler knows this, even though the allocation could in theory throw, it is legal to even optimize the original example, since the compiler can practically guarantee that it will not happen.
<slightly undecided>
On the other hand, it is not allowable (and as you can observe, a compiler bug) to optimize out the allocation in your EDIT2 example. The value is consumed to produce an externally observable effect (the return code).
Note that if you replace new (std::nothrow) int[1000] with new (std::nothrow) int[1024*1024*1024*1024ll] (that's a 4TiB allocation!), which is -- on present day computers -- guaranteed to fail, it still optimizes out the call. In other words, it returns 1 although you wrote code that must output 0.
#Yakk brought up a good argument against this: As long as the memory is never touched, a pointer can be returned, and not actual RAM is needed. Insofar it would even be legitimate to optimize out the allocation in EDIT2. I am unsure who is right and who is wrong here.
Doing a 4TiB allocation is pretty much guaranteed to fail on a machine that doesn't have at least something like a two-digit gigabyte amount of RAM simply because the OS needs to create page tables. Now of course, the C++ standard does not care about page tables or about what the OS is doing to provide memory, that is true.
But on the other hand, the assumption "this will work if memory is not touched" does rely on exactly such a detail and on something that the OS provides. The assumption that if RAM that is not touched it is actually not needed is only true because the OS provides virtual memory. And that implies that the OS needs to create page tables (I can pretend that I don't know about it, but that doesn't change the fact that I rely on it anyway).
Therefore, I think it is not 100% correct to first assume one and then say "but we don't care about the other".
So, yes, the compiler can assume that a 4TiB allocation is in general perfectly possible as long as memory is not touched, and it can assume that it is generally possible to succeed. It might even assume that it's likely to succeed (even when it's not). But I think that in any case, you are never allowed to assume that something must work when there is a possibility of a failure. And not only is there a possibility of failure, in that example, failure is even the more likely possibility.
</slightly undecided>
The worst that can happen in your snippet is that new throws std::bad_alloc, which is unhandled. What happens then is implementation-defined.
With the best case being a no-op and the worst case not being defined, the compiler is allowed to factor them into non-existence. Now, if you actually try and catch the possible exception :
int main() try {
int* mem = new int[100];
return 0;
} catch(...) {
return 1;
}
... then the call to operator new is kept.
Consider the following simple code that makes use of new (I am aware there is no delete[], but it does not pertain to this question):
int main()
{
int* mem = new int[100];
return 0;
}
Is the compiler allowed to optimize out the new call?
In my research, g++ (5.2.0) and Visual Studio 2015 do not optimize out the new call, while clang (3.0+) does. All tests have been made with full optimizations enabled (-O3 for g++ and clang, Release mode for Visual Studio).
Isn't new making a system call under the hood, making it impossible (and illegal) for a compiler to optimize that out?
EDIT: I have now excluded undefined behaviour from the program:
#include <new>
int main()
{
int* mem = new (std::nothrow) int[100];
return 0;
}
clang 3.0 does not optimize that out anymore, but later versions do.
EDIT2:
#include <new>
int main()
{
int* mem = new (std::nothrow) int[1000];
if (mem != 0)
return 1;
return 0;
}
clang always returns 1.
The history seems to be that clang is following the rules laid out in N3664: Clarifying Memory Allocation which allows the compiler to optimize around memory allocations but as Nick Lewycky points out :
Shafik pointed out that seems to violate causality but N3664 started life as N3433, and I'm pretty sure we wrote the optimization first and wrote the paper afterwards anyway.
So clang implemented the optimization which later on became a proposal that was implemented as part of C++14.
The base question is whether this is a valid optimization prior to N3664, that is a tough question. We would have to go to the as-if rule covered in the draft C++ standard section 1.9 Program execution which says(emphasis mine):
The semantic descriptions in this International Standard define a
parameterized nondeterministic abstract machine. This International
Standard places no requirement on the structure of conforming
implementations. In particular, they need not copy or emulate the
structure of the abstract machine. Rather, conforming implementations
are required to emulate (only) the observable behavior of the abstract
machine as explained below.5
where note 5 says:
This provision is sometimes called the “as-if” rule, because an
implementation is free to disregard any requirement of this
International Standard as long as the result is as if the requirement
had been obeyed, as far as can be determined from the observable
behavior of the program. For instance, an actual implementation need
not evaluate part of an expression if it can deduce that its value is
not used and that no side effects affecting the observable behavior of
the program are produced.
Since new could throw an exception which would have observable behavior since it would alter the return value of the program, that would seem to argue against it being allowed by the as-if rule.
Although, it could be argued it is implementation detail when to throw an exception and therefore clang could decide even in this scenario it would not cause an exception and therefore eliding the new call would not violate the as-if rule.
It also seems valid under the as-if rule to optimize away the call to the non-throwing version as well.
But we could have a replacement global operator new in a different translation unit which could cause this to affect observable behavior, so the compiler would have to have some way a proving this was not the case, otherwise it would not be able to perform this optimization without violating the as-if rule. Previous versions of clang did indeed optimize in this case as this godbolt example shows which was provided via Casey here, taking this code:
#include <cstddef>
extern void* operator new(std::size_t n);
template<typename T>
T* create() { return new T(); }
int main() {
auto result = 0;
for (auto i = 0; i < 1000000; ++i) {
result += (create<int>() != nullptr);
}
return result;
}
and optimizing it to this:
main: # #main
movl $1000000, %eax # imm = 0xF4240
ret
This indeed seems way too aggressive but later versions do not seem to do this.
This is allowed by N3664.
An implementation is allowed to omit a call to a replaceable global allocation function (18.6.1.1, 18.6.1.2). When it does so, the storage is instead provided by the implementation or provided by extending the allocation of another new-expression.
This proposal is part of the C++14 standard, so in C++14 the compiler is allowed to optimize out a new expression (even if it might throw).
If you take a look at the Clang implementation status it clearly states that they do implement N3664.
If you observe this behavior while compiling in C++11 or C++03 you should fill a bug.
Notice that before C++14 dynamic memory allocations are part of the observable status of the program (although I can not find a reference for that at the moment), so a conformant implementation was not allowed to apply the as-if rule in this case.
Bear in mind the C++ standard tells what a correct program should do, not how it should do it. It can't tell the later at all since new architectures can and do arise after the standard is written and the standard has to be of use to them.
new does not have to be a system call under the hood. There are computers usable without operating systems and without a concept of system call.
Hence, as long as the end behaviour does not change, the compiler can optimize any and everything away. Including that new
There is one caveat.
A replacement global operator new could have been defined in a different translation unit
In that case the side effects of new could be such that can't be optimized away. But if the compiler can guarantee that the new operator has no side effects, as would be the case if the posted code is the whole code, then the optimization is valid.
That new can throw std::bad_alloc is not a requirement. In this case, when new is optimized, the compiler can guarantee that no exception will be thrown and no side effect will happen.
It is perfectly allowable (but not required) for a compiler to optimize out the allocations in your original example, and even more so in the EDIT1 example per §1.9 of the standard, which is usually referred to as the as-if rule:
Conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below:
[3 pages of conditions]
A more human-readable representation is available at cppreference.com.
The relevant points are:
You have no volatiles, so 1) and 2) do not apply.
You do not output/write any data or prompt the user, so 3) and 4) do not apply. But even if you did, they would clearly be satisfied in EDIT1 (arguably also in the original example, although from a purely theoretical point of view, it is illegal since the program flow and output -- theoretically -- differs, but see two paragraphs below).
An exception, even an uncaught one, is well-defined (not undefined!) behavior. However, strictly speaking, in case that new throws (not going to happen, see also next paragraph), the observable behavior would be different, both by the program's exit code and by any output that might follow later in the program.
Now, in the particular case of a singular small allocation, you can give the compiler the "benefit of doubt" that it can guarantee that the allocation will not fail.
Even on a system under very heavy memory pressure, it is not possible to even start a process when you have less than the minimum allocation granularity available, and the heap will have been set up prior to calling main, too. So, if this allocation was to fail, the program would never start or would already have met an ungraceful end before main is even called.
Insofar, assuming that the compiler knows this, even though the allocation could in theory throw, it is legal to even optimize the original example, since the compiler can practically guarantee that it will not happen.
<slightly undecided>
On the other hand, it is not allowable (and as you can observe, a compiler bug) to optimize out the allocation in your EDIT2 example. The value is consumed to produce an externally observable effect (the return code).
Note that if you replace new (std::nothrow) int[1000] with new (std::nothrow) int[1024*1024*1024*1024ll] (that's a 4TiB allocation!), which is -- on present day computers -- guaranteed to fail, it still optimizes out the call. In other words, it returns 1 although you wrote code that must output 0.
#Yakk brought up a good argument against this: As long as the memory is never touched, a pointer can be returned, and not actual RAM is needed. Insofar it would even be legitimate to optimize out the allocation in EDIT2. I am unsure who is right and who is wrong here.
Doing a 4TiB allocation is pretty much guaranteed to fail on a machine that doesn't have at least something like a two-digit gigabyte amount of RAM simply because the OS needs to create page tables. Now of course, the C++ standard does not care about page tables or about what the OS is doing to provide memory, that is true.
But on the other hand, the assumption "this will work if memory is not touched" does rely on exactly such a detail and on something that the OS provides. The assumption that if RAM that is not touched it is actually not needed is only true because the OS provides virtual memory. And that implies that the OS needs to create page tables (I can pretend that I don't know about it, but that doesn't change the fact that I rely on it anyway).
Therefore, I think it is not 100% correct to first assume one and then say "but we don't care about the other".
So, yes, the compiler can assume that a 4TiB allocation is in general perfectly possible as long as memory is not touched, and it can assume that it is generally possible to succeed. It might even assume that it's likely to succeed (even when it's not). But I think that in any case, you are never allowed to assume that something must work when there is a possibility of a failure. And not only is there a possibility of failure, in that example, failure is even the more likely possibility.
</slightly undecided>
The worst that can happen in your snippet is that new throws std::bad_alloc, which is unhandled. What happens then is implementation-defined.
With the best case being a no-op and the worst case not being defined, the compiler is allowed to factor them into non-existence. Now, if you actually try and catch the possible exception :
int main() try {
int* mem = new int[100];
return 0;
} catch(...) {
return 1;
}
... then the call to operator new is kept.
Consider the following simple code that makes use of new (I am aware there is no delete[], but it does not pertain to this question):
int main()
{
int* mem = new int[100];
return 0;
}
Is the compiler allowed to optimize out the new call?
In my research, g++ (5.2.0) and Visual Studio 2015 do not optimize out the new call, while clang (3.0+) does. All tests have been made with full optimizations enabled (-O3 for g++ and clang, Release mode for Visual Studio).
Isn't new making a system call under the hood, making it impossible (and illegal) for a compiler to optimize that out?
EDIT: I have now excluded undefined behaviour from the program:
#include <new>
int main()
{
int* mem = new (std::nothrow) int[100];
return 0;
}
clang 3.0 does not optimize that out anymore, but later versions do.
EDIT2:
#include <new>
int main()
{
int* mem = new (std::nothrow) int[1000];
if (mem != 0)
return 1;
return 0;
}
clang always returns 1.
The history seems to be that clang is following the rules laid out in N3664: Clarifying Memory Allocation which allows the compiler to optimize around memory allocations but as Nick Lewycky points out :
Shafik pointed out that seems to violate causality but N3664 started life as N3433, and I'm pretty sure we wrote the optimization first and wrote the paper afterwards anyway.
So clang implemented the optimization which later on became a proposal that was implemented as part of C++14.
The base question is whether this is a valid optimization prior to N3664, that is a tough question. We would have to go to the as-if rule covered in the draft C++ standard section 1.9 Program execution which says(emphasis mine):
The semantic descriptions in this International Standard define a
parameterized nondeterministic abstract machine. This International
Standard places no requirement on the structure of conforming
implementations. In particular, they need not copy or emulate the
structure of the abstract machine. Rather, conforming implementations
are required to emulate (only) the observable behavior of the abstract
machine as explained below.5
where note 5 says:
This provision is sometimes called the “as-if” rule, because an
implementation is free to disregard any requirement of this
International Standard as long as the result is as if the requirement
had been obeyed, as far as can be determined from the observable
behavior of the program. For instance, an actual implementation need
not evaluate part of an expression if it can deduce that its value is
not used and that no side effects affecting the observable behavior of
the program are produced.
Since new could throw an exception which would have observable behavior since it would alter the return value of the program, that would seem to argue against it being allowed by the as-if rule.
Although, it could be argued it is implementation detail when to throw an exception and therefore clang could decide even in this scenario it would not cause an exception and therefore eliding the new call would not violate the as-if rule.
It also seems valid under the as-if rule to optimize away the call to the non-throwing version as well.
But we could have a replacement global operator new in a different translation unit which could cause this to affect observable behavior, so the compiler would have to have some way a proving this was not the case, otherwise it would not be able to perform this optimization without violating the as-if rule. Previous versions of clang did indeed optimize in this case as this godbolt example shows which was provided via Casey here, taking this code:
#include <cstddef>
extern void* operator new(std::size_t n);
template<typename T>
T* create() { return new T(); }
int main() {
auto result = 0;
for (auto i = 0; i < 1000000; ++i) {
result += (create<int>() != nullptr);
}
return result;
}
and optimizing it to this:
main: # #main
movl $1000000, %eax # imm = 0xF4240
ret
This indeed seems way too aggressive but later versions do not seem to do this.
This is allowed by N3664.
An implementation is allowed to omit a call to a replaceable global allocation function (18.6.1.1, 18.6.1.2). When it does so, the storage is instead provided by the implementation or provided by extending the allocation of another new-expression.
This proposal is part of the C++14 standard, so in C++14 the compiler is allowed to optimize out a new expression (even if it might throw).
If you take a look at the Clang implementation status it clearly states that they do implement N3664.
If you observe this behavior while compiling in C++11 or C++03 you should fill a bug.
Notice that before C++14 dynamic memory allocations are part of the observable status of the program (although I can not find a reference for that at the moment), so a conformant implementation was not allowed to apply the as-if rule in this case.
Bear in mind the C++ standard tells what a correct program should do, not how it should do it. It can't tell the later at all since new architectures can and do arise after the standard is written and the standard has to be of use to them.
new does not have to be a system call under the hood. There are computers usable without operating systems and without a concept of system call.
Hence, as long as the end behaviour does not change, the compiler can optimize any and everything away. Including that new
There is one caveat.
A replacement global operator new could have been defined in a different translation unit
In that case the side effects of new could be such that can't be optimized away. But if the compiler can guarantee that the new operator has no side effects, as would be the case if the posted code is the whole code, then the optimization is valid.
That new can throw std::bad_alloc is not a requirement. In this case, when new is optimized, the compiler can guarantee that no exception will be thrown and no side effect will happen.
It is perfectly allowable (but not required) for a compiler to optimize out the allocations in your original example, and even more so in the EDIT1 example per §1.9 of the standard, which is usually referred to as the as-if rule:
Conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below:
[3 pages of conditions]
A more human-readable representation is available at cppreference.com.
The relevant points are:
You have no volatiles, so 1) and 2) do not apply.
You do not output/write any data or prompt the user, so 3) and 4) do not apply. But even if you did, they would clearly be satisfied in EDIT1 (arguably also in the original example, although from a purely theoretical point of view, it is illegal since the program flow and output -- theoretically -- differs, but see two paragraphs below).
An exception, even an uncaught one, is well-defined (not undefined!) behavior. However, strictly speaking, in case that new throws (not going to happen, see also next paragraph), the observable behavior would be different, both by the program's exit code and by any output that might follow later in the program.
Now, in the particular case of a singular small allocation, you can give the compiler the "benefit of doubt" that it can guarantee that the allocation will not fail.
Even on a system under very heavy memory pressure, it is not possible to even start a process when you have less than the minimum allocation granularity available, and the heap will have been set up prior to calling main, too. So, if this allocation was to fail, the program would never start or would already have met an ungraceful end before main is even called.
Insofar, assuming that the compiler knows this, even though the allocation could in theory throw, it is legal to even optimize the original example, since the compiler can practically guarantee that it will not happen.
<slightly undecided>
On the other hand, it is not allowable (and as you can observe, a compiler bug) to optimize out the allocation in your EDIT2 example. The value is consumed to produce an externally observable effect (the return code).
Note that if you replace new (std::nothrow) int[1000] with new (std::nothrow) int[1024*1024*1024*1024ll] (that's a 4TiB allocation!), which is -- on present day computers -- guaranteed to fail, it still optimizes out the call. In other words, it returns 1 although you wrote code that must output 0.
#Yakk brought up a good argument against this: As long as the memory is never touched, a pointer can be returned, and not actual RAM is needed. Insofar it would even be legitimate to optimize out the allocation in EDIT2. I am unsure who is right and who is wrong here.
Doing a 4TiB allocation is pretty much guaranteed to fail on a machine that doesn't have at least something like a two-digit gigabyte amount of RAM simply because the OS needs to create page tables. Now of course, the C++ standard does not care about page tables or about what the OS is doing to provide memory, that is true.
But on the other hand, the assumption "this will work if memory is not touched" does rely on exactly such a detail and on something that the OS provides. The assumption that if RAM that is not touched it is actually not needed is only true because the OS provides virtual memory. And that implies that the OS needs to create page tables (I can pretend that I don't know about it, but that doesn't change the fact that I rely on it anyway).
Therefore, I think it is not 100% correct to first assume one and then say "but we don't care about the other".
So, yes, the compiler can assume that a 4TiB allocation is in general perfectly possible as long as memory is not touched, and it can assume that it is generally possible to succeed. It might even assume that it's likely to succeed (even when it's not). But I think that in any case, you are never allowed to assume that something must work when there is a possibility of a failure. And not only is there a possibility of failure, in that example, failure is even the more likely possibility.
</slightly undecided>
The worst that can happen in your snippet is that new throws std::bad_alloc, which is unhandled. What happens then is implementation-defined.
With the best case being a no-op and the worst case not being defined, the compiler is allowed to factor them into non-existence. Now, if you actually try and catch the possible exception :
int main() try {
int* mem = new int[100];
return 0;
} catch(...) {
return 1;
}
... then the call to operator new is kept.
I've looked at the standard but couldn't find any indication that simply writing to memory would be considered observable behaviour. If not, that would mean the compiled code need not actually write to that memory. If a compiler choose to optimize away such access anything involving mapper memory, or shared memory, may not work.
1.9-8 seems to defined a very limited observable behaviour but indicates an implementation may define more. Can one assume than any quality compiler would treat modifying memory as an observable behaviour? That is, it may not guarantee atomicity or ordering, but does guarantee that data will eventually be written.
So, have I overlooked something in the standard, or is the writing to memory merely something the compiler decides to do?
Statements from the current or C++0x standard are good. Please note I'm not talking about accessing memory through a function, I mean direct access, such as writing data to a pointer (perhaps retrieved via mmap or another library function).
This kind of thing is what volatile exists for. Else, writing to memory and never apparently reading from it is not observable behaviour. However, in the general case, it would be quite impossible for the optimizer to prove that you never read it back except in relatively trivial examples, so it's not usually an issue.
Can one assume than any quality compiler would treat modifying memory as an observable behaviour?
No. Volatile is meant for marking that. However, you cannot fully trust the compiler even after adding the volatile qualifier, at least as told by a 2008 paper: http://www.cs.utah.edu/~regehr/papers/emsoft08-preprint.pdf
EDIT:
From C standard (not C++) http://c0x.coding-guidelines.com/5.1.2.3.html
An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
My reading of C99 is that unless you specify volatile, how and when the variable is actually accessed is implementation defined. If you specify volatile qualifier then code must work according to the rules of an abstract machine.
Relevant parts in the standard are: 6.7.3 Type qualifiers (volatile description) and 5.1.2.3 Program execution (the abstract machine definition).
For some time now I know that many compilers actually have heuristics to detect cases when a variable should be reread again and when it is okay to use a cached copy. Volatile makes it clear to the compiler that every access to the variable should be actually an access to the memory. Without volatile it seems compiler is free to never reread the variable.
And BTW wrapping the access in a function doesn't change that since a function even without inline might be still inlined by the compiler within the current compilation unit.
From your question below:
Assume I use an array on the heap (unspecified where it is allocated),
and I use that array to perform a calculation (temp space). The
optimizer sees that it doesn't actually need any of that space as it
can use strictly registers. Does the compiler nonetheless write the
temp values to the memory?
Per MSalters below:
It's not guaranteed, and unlikely. Consider a a Static Single
Assignment optimizer. This figures out each possible write/read
dependency, and then assigns registers to optimize these dependencies.
As a side effect, any write that's not followed by a (possible) read
creates no dependencies at all, and is eliminated. In your example
("use strictly registers") the optimizer has satisfied all write/read
dependencies with registers, so it won't write to memory at all. All
reads produce the correct values, so it's a correct optimization.