Consider the following function
int demofunc() {
float* ptr = new float[1000];
if (ptr==nullptr)
{
return -1;
}
return 0;
}
When compiled with clang14, produces a working code. Once -O1 or more aggressive is enabled, the if condition gets removed and the output assembly becomes :
demofunc(): # #demofunc()
xor eax, eax
ret
I can declare my pointer as float* volatile and the branch doesn't get removed, but I don't want to handle this case by case.
Now, I've read in the c++ standard that an allocator that does not have the nothrow property signal failure with a bad_alloc exception while one with nothrow property retuns nullptr.
I get that assumption of success even if I specify fno-exceptions which, I think, should not happen.
I need to specify that I am dealing with a custom port of clang for a specific hardware that has no exception support and I do get new that returns nullptr if the heap gets depleted. I am not exactly sure if this is a problem with clang port or clang itself.
The question goes as follow:
Why does the failure check branch gets optimized?
Can I specify a compiler option to ensure that it doesn't get removed, even with -O1 (or more aggressive).
Related
How can the following program be calling format_disk if it's never
called in code?
#include <cstdio>
static void format_disk()
{
std::puts("formatting hard disk drive!");
}
static void (*foo)() = nullptr;
void never_called()
{
foo = format_disk;
}
int main()
{
foo();
}
This differs from compiler to compiler. Compiling with Clang with
optimizations on, the function never_called executes at runtime.
$ clang++ -std=c++17 -O3 a.cpp && ./a.out
formatting hard disk drive!
Compiling with GCC, however, this code just crashes:
$ g++ -std=c++17 -O3 a.cpp && ./a.out
Segmentation fault (core dumped)
Compilers version:
$ clang --version
clang version 5.0.0 (tags/RELEASE_500/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
$ gcc --version
gcc (GCC) 7.2.1 20171128
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
The program contains undefined behavior, as dereferencing a null pointer
(i.e. calling foo() in main without assigning a valid address to it
beforehand) is UB, therefore no requirements are imposed by the standard.
Executing format_disk at runtime is a perfect valid situation when
undefined behavior has been hit, it's as valid as just crashing (like
when compiled with GCC). Okay, but why is Clang doing that? If you
compile it with optimizations off, the program will no longer output
"formatting hard disk drive", and will just crash:
$ clang++ -std=c++17 -O0 a.cpp && ./a.out
Segmentation fault (core dumped)
The generated code for this version is as follows:
main: # #main
push rbp
mov rbp, rsp
call qword ptr [foo]
xor eax, eax
pop rbp
ret
It tries to make a call to a function to which foo points, and as foo
is initialized with nullptr (or if it didn't have any initialization,
this would still be the case), its value is zero. Here, undefined
behavior has been hit, so anything can happen at all and the program
is rendered useless. Normally, making a call to such an invalid address
results in segmentation fault errors, hence the message we get when
executing the program.
Now let's examine the same program but compiling it with optimizations on:
$ clang++ -std=c++17 -O3 a.cpp && ./a.out
formatting hard disk drive!
The generated code for this version is as follows:
never_called(): # #never_called()
ret
main: # #main
push rax
mov edi, .L.str
call puts
xor eax, eax
pop rcx
ret
.L.str:
.asciz "formatting hard disk drive!"
Interestingly, somehow optimizations modified the program so that
main calls std::puts directly. But why did Clang do that? And why is
never_called compiled to a single ret instruction?
Let's get back to the standard (N4660, specifically) for a moment. What
does it say about undefined behavior?
3.27 undefined behavior [defns.undefined]
behavior for which this document imposes no requirements
[Note: Undefined behavior may be expected when this document omits
any explicit definition of behavior or when a program uses an erroneous
construct or erroneous data. Permissible undefined behavior ranges
from ignoring the situation completely with unpredictable results, to
behaving during translation or program execution in a documented manner
characteristic of the environment (with or without the issuance of a
diagnostic message), to terminating a translation or execution (with the
issuance of a diagnostic message). Many erroneous program constructs
do not engender undefined behavior; they are required to be diagnosed.
Evaluation of a constant expression never exhibits behavior explicitly
specified as undefined ([expr.const]). — end note]
Emphasis mine.
A program that exhibits undefined behavior becomes useless, as everything
it has done so far and will do further has no meaning if it contains
erroneous data or constructs. With that in mind, do remember that
compilers may completely ignore for the case when undefined behavior
is hit, and this actually is used as discovered facts when optimizing a
program. For instance, a construct like x + 1 > x (where x is a signed integer) will be optimized away to a constant,
true, even if the value of x is unknown at compile-time. The reasoning
is that the compiler wants to optimize for valid cases, and the only
way for that construct to be valid is when it doesn't trigger arithmetic
overflow (i.e. if x != std::numeric_limits<decltype(x)>::max()). This
is a new learned fact in the optimizer. Based on that, the construct is
proven to always evaluate to true.
Note: this same optimization can't occur for unsigned integers, because overflowing one is not UB. That is, the compiler needs to keep the expression as it is, as it might have a different evaluation when overflow occurs (unsigned is module 2N, where N is number of bits). Optimizing it away for unsigned integers would be incompliant with the standard (thanks aschepler).
This is useful as it allows for tons of optimizations to kick
in. So
far, so good, but what happens if x holds its maximum value at runtime?
Well, that is undefined behavior, so it's nonsense to try to reason about
it, as anything may happen and the standard imposes no requirements.
Now we have enough information in order to better examine your faulty
program. We already know that accessing a null pointer is undefined
behavior, and that's what's causing the funny behavior at runtime.
So let's try and understand why Clang (or technically LLVM) optimized
the program the way it did.
static void (*foo)() = nullptr;
static void format_disk()
{
std::puts("formatting hard disk drive!");
}
void never_called()
{
foo = format_disk;
}
int main()
{
foo();
}
Remember that it's possible to call never_called before the main entry
starts executing. For example, when declaring a top-level variable,
you can call it while initializing the value of that variable:
void never_called();
int x = (never_called(), 42);
If you write this snippet in your program, the program no
longer exhibits undefined behavior, and the message "formatting hard
disk drive!" is displayed, with optimizations either on or off.
So what's the only way this program is valid? There's this never_caled
function that assigns the address of format_disk to foo, so we might
find something here. Note that foo is marked as static, which means it
has internal linkage and can't be accessed from outside this translation
unit. In contrast, the function never_called has external linkage, and may
be accessed from outside. If another translation unit contains a snippet
like the one above, then this program becomes valid.
Cool, but there's no one calling never_called from outside. Even though this
is the fact, the optimizer sees that the only way for this program to
be valid is if never_called is called before main executes, otherwise it's
just undefined behavior. That's a new learned fact, so the compiler assumes never_called
is in fact called. Based on that new knowledge, other optimizations that
kick in may take advantage of it.
For instance, when constant
folding is
applied, it sees that the construct foo() is only valid if foo can be properly initialized. The only way for that to happen is if never_called is called outside of this translation unit, so foo = format_disk.
Dead code elimination and interprocedural optimization might find out that if foo == format_disk, then the code inside never_called is unneeded,
so the function's body is transformed into a single ret instruction.
Inline expansion optimization
sees that foo == format_disk, so the call to foo can be replaced
with its body. In the end, we end up with something like this:
never_called():
ret
main:
mov edi, .L.str
call puts
xor eax, eax
ret
.L.str:
.asciz "formatting hard disk drive!"
Which is somewhat equivalent to the output of Clang with optimizations on. Of course, what Clang really did may (and might) be different, but optimizations are nonetheless capable of reaching the same conclusion.
Examining GCC's output with optimizations on, it seems it didn't bother investigating:
.LC0:
.string "formatting hard disk drive!"
format_disk():
mov edi, OFFSET FLAT:.LC0
jmp puts
never_called():
mov QWORD PTR foo[rip], OFFSET FLAT:format_disk()
ret
main:
sub rsp, 8
call [QWORD PTR foo[rip]]
xor eax, eax
add rsp, 8
ret
Executing that program results in a crash (segmentation fault), but if you call never_called in another translation unit before main gets executed, then this program doesn't exhibit undefined behavior anymore.
All of this can change crazily as more and more optimizations are engineered, so do not rely on the assumption that your compiler will take care of code containing undefined behavior, it might just screw you up as well (and format your hard drive for real!)
I recommend you read What every C programmer should know about Undefined Behavior and A Guide to Undefined Behavior in C and C++, both article series are very informative and might help you out with understanding the state of art.
Unless an implementation specifies the effect of trying to invoke a null function pointer, it could behave as a call to arbitrary code. Such arbitrary code could perfectly well behave like a call to function "foo()". While Annex L of the C Standard would invite implementations to distinguish between "Critical UB" and "non-critical UB", and some C++ implementations might apply a similar distinction, a invoking an invalid function pointer would be critical UB in any case.
Note that the situation in this question is very different from e.g.
unsigned short q;
unsigned hey(void)
{
if (q < 50000)
do_something();
return q*q;
}
In the latter situation, a compiler which does not claim to be "analyzable" might recognize that code will invoke if q is greater than 46,340 when execution reaches the return statement, and thus it might as well invoke do_something() unconditionally. While Annex L is badly written, it would seem the intention would be to forbid such "optimizations". In the case of calling an invalid function pointer, however, even straightforwardly-generated code on most platforms might have arbitrary behavior.
Is there any way to convert a NULL pointer access into a C++ exception under Linux ? Something similar to the NullPointerException in Java. I hope the following program would return successfully, instead of crash (assume the compiler cannot figure out this NULL pointer access during compile time):
class NullPointerException {};
void accessNullPointer(char* ptr) {
*ptr = 0;
}
int main() {
try {
accessNullPointer(0);
} catch (NullPointerException&) {
return 1;
}
return 0;
}
I'm not expecting any standard way of doing it, since NULL pointer access under C++ is undefined-behavior, just want to know how to get it done under x86_64 Linux/GCC.
I did some very primitive research in this, it might be possible:
When a NULL pointer is access under Linux, a SIGSEGV will be generated.
Inside the SIGSEGV handler, the program's memory and register information will be available (if sigaction() is used to register the signal handler). The instruction which caused the SIGSEGV is also available if the program is disassembled.
Modify the program's memory and/or register, and create/fake an exception instance (maybe by invoking the low level unwind library functions, like _Unwind_RaiseException, etc.)
Finally return from the signal handler, hope the program would start a C++ stack unwinding process like a normal exception was thrown.
Here's a quote from GCC's man page (-fnon-call-exceptions):
Generate code that allows trapping instructions to throw exceptions. Note that this requires platform-specific runtime support that does not exist everywhere. Moreover, it only allows trapping instructions to throw exceptions, i.e. memory references or floating point instructions. It does not allow exceptions to be
thrown from arbitrary signal handlers such as "SIGALRM".
It seems this "platform-specific runtime" is exactly what I want. Anyone knows such a runtime for Linux/x86_64 ? Or give me some information on how to implement such a runtime if no such runtime already exists ?
I want the solution to work in multi-threaded program as well.
No, there's no good way to do that, and there shouldn't be. Exceptions are thrown by a throw statement in source code. That's important for reasoning about exception safety: you can look at the code and see the places where exceptions can be thrown and, perhaps more important, you can look a the code and see the places where exceptions will not be thrown. If pretty much anything you do can throw an exception it becomes very difficult to write exception-safe code without cluttering it with catch clauses. Microsoft tried this in their early C++ compilers: they piggybacked C++ exception handling on top of their OS's structured exceptions, and the result was a disaster.
Register an alternative signal stack with signalaltstack().
The 3rd argument to a signal handler handler registered with the SA_SIGINFO is a pointer to a ucontext_t which contains the saved register. The signal handler should adjust this to simulate a call to a function. That function can then throw the exception.
Potential complications include the need to preserve the value of callee saved registers, the red-zone on x86-64 (which can be disabled) and the return address register on some ISAs, such as ARM.
A question turned up when debugging some code at work for race conditions: here is a reduced example:
//! Schedules a callable to be executed asynchronously
template<class F> void schedule(F &&f);
int main(void)
{
bool flag(false);
// Ignore the fact this is thread unsafe :)
schedule([&] { flag=true; });
// Can the compiler assume under strict aliasing that this check
// for flag being false can be eliminated?
if(!flag)
{
// do something
}
return 0;
}
Obviously the code fragment is thread unsafe - that bool flag needs to be a std::atomic and then the seq_cst memory ordering would force the compiler to always check the value being tested by if. This question isn't about that - it's about whether initialising a capture-all reference lambda tells the compiler that flag may have been aliased, and therefore to not constexpr elide the check for flag's value later on under optimisation?
My own personal guess is that constructing a [&flag] {...} lambda would suggest potential aliasing of flag, while a [&] {...} clobbering all auto initialised variables with being potentially aliased sounds too extreme an anti-optimisation so I'm guessing no to that. However, I would not be surprised if reference capturing lambdas don't alias clobber anything at all.
Over to you C++ language experts! And my thanks in advance.
Edit: I knew that the lack of thread safety would be seen as an answer, however that is not what I am asking. Let me reduce my example still further:
int main(void)
{
bool flag(false);
// Note that this is not invoked, just constructed.
auto foo=[&] { flag=true; };
// Can the compiler assume under strict aliasing that this check
// for flag being false can be eliminated?
if(!flag)
{
// do something
}
return 0;
}
Now can that check for flag being false be elided?
Edit: For those of you coming here in the future, my best understanding of the answers below is "yes, the check can be elided" i.e. constructing a lambda which takes a local variable by reference is not considered by the compiler's optimiser as potentially modifying that variable, and therefore the compiler's optimiser could legally elide subsequent reloads of that variable's storage. Thanks to everyone for your answers.
You can't ignore the lack of thread safety. Data races yield undefined behaviour and this code has a data race, so the answer to "Can the compiler assume under strict aliasing that this check for flag being false can be eliminated?" is "The compiler can do whatever it wants."
If you fix that and make the code thread safe with a std::atomic<bool>, the question disappears: the compiler cannot discard the check because it has to conform to the memory model requirements of atomic variables.
If instead the schedule call didn't do anything related to multithreading, the compiler has to preserve the semantics of the abstract machine. The abstract machine does check the value of the flag, but a compiler might be able to perform static analysis that proves the flag will always have a certain value (either always false or always true) and skip the check. That's allowed under the as-if rule: it is not possible to write a C++ program that can reliably tell the difference between the two possibilities (optimise or not).
So, for the second example, the answer is "The compiler can do whatever it wants, as long as the observable behaviour is the same as if it performed the check."
at the moment, i'm inserting variables into the beginning of block scope using CreateEntryBlockAlloca:
template <typename VariableType>
static inline llvm::AllocaInst *CreateEntryBlockAlloca(BuilderParameter& buildParameters,
const std::string &VarName) {
HAssertMsg( 1 != 0 , "Not Implemented");
};
template <>
inline llvm::AllocaInst *CreateEntryBlockAlloca<double>(BuilderParameter& buildParameters,
const std::string &VarName) {
llvm::Function* TheFunction = buildParameters.dag.llvmFunction;
llvm::IRBuilder<> TmpB(&TheFunction->getEntryBlock(),
TheFunction->getEntryBlock().begin());
return TmpB.CreateAlloca(llvm::Type::getDoubleTy(buildParameters.getLLVMContext()), 0,
VarName.c_str());
}
Now, i want to add Allocas for non-POD types (that might require a destructor/cleanup function at exit). However, it is not enough to add destructor calls at the end of the exit scope block, since it is not clear how to have them be invoked when a regular DWARF exception is thrown (for the purpose of this argument, lets say that exceptions are thrown from Call points that invoke C++ functions which only throw a POD type, so no, in my case, ignorance is bliss, and i would like to stay away from intrinsic llvm exceptions unless i understand them better).
I was thinking that may be i could have a table with offsets in the stack with the Alloca registers, and have the exception handler (at the bottom of the stack, at the invocation point of the JIT function) walk over those offsets on the table and call destructors appropiately.
The thing i don't know is how to query the offset of the Alloca'ed registers created with CreateAlloca. How can i do that reliably?
Also, if you think there is a better way to achieve this, please enlighten me on the path of the llvm
Technical Comment: the JIT code is being called inside a boost::context which only invokes the JIT code inside a try catch, and does nothing on the catch, it just exits from the context and returns to the main execution stack. the idea is that if i handle the unwinding in the main execution stack, any function i call (for say, cleaning up stack variables) will not overwrite those same stack contents from the terminated JIT context, so it will not be corrupted. Hope i'm making enough sense
The thing i don't know is how to query the offset of the Alloca'ed registers created with CreateAlloca. How can i do that reliably?
You can use the address of an alloca directly... there isn't any simple way to get its offset into the stack frame, though.
Why exactly do you not want to use the intrinsic LLVM exceptions? They really are not that hard to use, especially in the simple case where your code never actually catches anything. You can basically just take the code clang generates in the simple case, and copy-paste it.
Edit:
To see how to use exceptions in IR in the simple case, try pasting the following C++ code into the demo page at http://llvm.org/demo/:
class X { public: ~X() __attribute((nothrow)); };
void a(X* p);
void b() { X x; a(&x); }
It's really not that complicated.
The program I'm working on crashes sometimes trying to read data at the address 0xCCCCCCCC. Google (and StackOverflow) being my friends I saw that it's the MSVC debug code for uninitialized stack variable. To understand where the problem can come from, I tried to reproduce this behavior: problem is I haven't been able to do it.
Question is: have you a code snippet showing how a pointer can end pointing to 0xCCCCCCCC?
Thanks.
int main()
{
int* p;
}
If you build with the Visual C++ debug runtime, put a breakpoint in main(), and run, you will see that p has a value of 0xcccccccc.
Compile your code with the /GZ compiler switch or /RTCs switch. Make sure that /Od switch is also used to disable any optimizations.
s
Enables stack frame run-time error checking, as follows:
Initialization of local variables to a nonzero value. This helps identify bugs that do not appear when running in debug mode. There is a greater chance that stack variables will still be zero in a debug build compared to a release build because of compiler optimizations of stack variables in a release build. Once a program has used an area of its stack, it is never reset to 0 by the compiler. Therefore, subsequent, uninitialized stack variables that happen to use the same stack area can return values left over from the prior use of this stack memory.
Detection of overruns and underruns of local variables such as arrays. /RTCs will not detect overruns when accessing memory that results from compiler padding within a structure. Padding could occur by using align (C++), /Zp (Struct Member Alignment), or pack, or if you order structure elements in such a way as to require the compiler to add padding.
Stack pointer verification, which detects stack pointer corruption. Stack pointer corruption can be caused by a calling convention mismatch. For example, using a function pointer, you call a function in a DLL that is exported as __stdcall but you declare the pointer to the function as __cdecl.
I do not have MSVC, but this code should produce the problem and compile with no warnings.
In file f1.c:
void ignore(int **p) { }
In file f2.c:
void ignore(int **p);
int main(int c, char **v)
{
int *a;
ignore(&a);
return *a;
}
The call to ignore makes it look like a might be initialized. I doubt the compiler will warn in this case, because of the risk that the warning might be a false positive.
How about this? Ignore the warning that VC throws while running.
struct A{
int *p;
};
int main(){
A a;
cout << (void *)a.p;
}