When can optimizations done by the compiler cause my C++ code to exhibit wrong behaviour which would not be present had those optimizations not been performed? For example, not using volatile in certain circumstances can cause the program to behave incorrectly (e.g. not re-reading the value of a variable from memory and instead only reads it once and stores it in register). But are there other pitfalls which one should know about before turning on the most aggressive optimization flag and afterwards wondering why the program doesn't work anymore?
Compiler optimizations should not affect the observable behavior of your program, so in theory, you don't need to worry. In practice, if your program strays in to undefined behavior, anything could already happen, so if your program breaks when you enable optimizations, you've merely exposed existing bugs - it wasn't optimization that broke it.
One common optimization point is the return value optimisation (RVO) and named return value optimization (NRVO) which basically means objects returned by value from functions get constructed directly in the object which is receiving them, rather than making a copy. This adjusts the order and number of constructor, copy constructor and destructor calls - but usually with those functions correctly written, there's still no observable difference in the behavior.
Besides the case you mentioned, timing can change in multi-threaded code such that what appears to be working no longer does. Placement of local variables can vary such that harmful behaviour like a memory buffer overrun occurs in debug but not release, optimized or non-optimized, or vice versa. But all of these are bugs that were there already, just exposed by compiler option changes.
This is assuming the compiler has no bugs in its optimizer.
I've only run into it with floating point math. Sometimes the optimizations for speed can change the answer a little. Of course with floating point math, the definition of "right" is not always easy to come up with so you have to run some tests and see if the optimizations are doing what you're expecting. The optimizations don't necessarily make the result wrong, just different.
Other than that, I've never seen any optimizations break correct code. Compiler writers are pretty smart and know what they're doing.
Bugs caused by compiler optimizations that are not rooted in bugs in your code are not predictable and hard to determine (I managed to find one once when examining the assembly code a compiler had created when optimizing a certain area in my code once). The common case is that if an optimization makes your program unstable, it just reveals a flaw in your program.
Just don't work from the assumption that the optimizer ever destroys your code. It's just not what it was made to do. If you do observe problems then automatically consider unintentional UB.
Yes, threading can play havoc with the kind of assumptions you are used to. You get no help from either the language or the compiler, although that's changing. What you do about that is not piss around with volatile, you use a good threading library. And you use one of its synchronization primitives wherever two or more threads can both touch variables. Trying to take short-cuts or optimizing this yourself is a one-way ticket into threading hell.
Failing to include the volatile keyword when declaring access to a volatile memory location or IO device is a bug in your code; even if the bug is only evident when your code gets optimized.
Your compiler will document any "unsafe" optimizations where it documents the command-line switches and pragmas that turn them on and off. Unsafe optimizations usually related to assumptions about floating point math (rounding, edge cases like NAN) or aliasing as others have already mentioned.
Constant folding can create aliasing making bugs in your code appear. So, for example, if you have code like:
static char *caBuffer = " ";
...
strcpy(caBuffer,...)
Your code is basically an error where you scribble over a constant (literal). Without constant folding, the error won't really effect anything. But much like the volatile bug you mentioned, when your compiler folds constants to save space, you might scribble over another literal like the spaces in:
printf("%s%s%s",cpName," ",cpDescription);
because the compiler might point the literal argument to the printf call at the last 4 characters of the literal used to initialize caBuffer.
As long as your code does not rely on specific manifestations of undefined/unspecified behavior, and as long as the functionality of your code is defined in terms of observable behavior of C++ program, a C++ compiler optimizations cannot possibly destroy the functionality of your code with only one exception:
When a temporary object is created with the only purpose of being immediately copied and destroyed, the compiler is allowed to eliminate the creation of such temporary object even if the constructor/destructor of the object has side-effects affecting the observable behavior of the program.
In the newer versions of C++ standard that permission is extended to cover named object in so called Named Return Value Optimization (NRVO).
That's the only way the optimizations can destroy the functionality of conforming C++ code. If your code suffers from optimizations in any other way, it is either a bug in your code or a bug in the compiler.
One can argue though, that relying on this behavior is actually nothing else than relying on a specific manifestation of unspecified behavior. This is a valid argument, which can be used to support the assertion that under the above conditions optimizations can never break the functionality of the program.
Your original example with volatile is not a valid example. You are basically accusing the compiler of breaking guarantees that never existed in the first place. If your question should be interpreted in that specific way (i.e. what random fake non-existent imaginary guarantees can optimizer possibly break), then the number of possible answers is virtually infinite. The question simply wouldn't make much sense.
Strict aliasing is an issue you might run into with gcc. From what I understand, with certain versions of gcc (gcc 4.4) it gets automatically enabled with optimizations. This site http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html does a very good job at explaining strict aliasing rules.
I just recently saw that (in C++0x) the compiler is allowed to assume that certain classes of loops will always terminate (to allow optimizations). I can't find the reference right now but I'll try to link it if I can track it down. This can cause observable program changes.
At a meta level, if your code uses relies on behavior that is based on undefined aspects of the C++ standard, a standards conforming compiler is free to destroy your C++ code (as you put it). If you don't have a standards conforming compiler, then it can also do non-standard things, like destroy your code anyway.
Most compilers publish what subset of the C++ standard they conform to, so you can always write your code to that particular standard and mostly assume you are safe. However, you can't really guard against bugs in the compiler without having encountered them in the first place, so you still aren't really guaranteed anything.
I do not have the exact details (maybe someone else can chime in), but I have heard tell of a bug caused by loop unrolling/optimization if the loop counter variable is of char/uint8_t type(in a gcc context i.e.).
Related
Say I have C++ project which has been working for years well.
Say also this project might (need to verify) contain undefined behaviour.
So maybe compiler was kind to us and doesn't make program misbehave even though there is UB.
Now imagine I want to add some features to the project. e.g. add Crypto ++ library to it.
But the actual code I add to it say from Crypto++ is legitimate.
Here I read:
Your code, if part of a larger project, could conditionally call some
3rd party code (say, a shell extension that previews an image type in
a file open dialog) that changes the state of some flags (floating
point precision, locale, integer overflow flags, division by zero
behavior, etc). Your code, which worked fine before, now exhibits
completely different behavior.
But I can't gauge exactly what author means. Does he say even by adding say Crypto ++ library to my project, despite the code from Crypto++ I add is legitimate, my project can suddenly start working incorrectly?
Is this realistic?
Any links which can confirm this?
It is hard for me to explain to people involved that just adding library might increase risks. Maybe someone can help me formulate how to explain this?
When source code invokes undefined behaviour, it means that the standard gives no guarantee on what could happen. It can work perfectly in one compilation run, but simply compiling it again with a newer version of the compiler or of a library could make it break. Or changing the optimisation level on the compiler can have same effect.
A common example for that is reading one element past end of an array. Suppose you expect it to be null and by chance next memory location contains a 0 on normal conditions (say it is an error flag). It will work without problem. But suppose now that on another compilation run after changing something totally unrelated, the memory organization is slightly changed and next memory location after the array is no longer that flag (that kept a constant value) but a variable taking other values. You program will break and will be hard to debug, because if that variable is used as a pointer, you could overwrite memory on random places.
TL/DR: If one version works but you suspect UB in it, the only correct way is to consistently remove all possible UB from the code before any change. Alternatively, you can keep the working version untouched, but beware, you could have to change it later...
Over the years, C has mutated into a weird hybrid of a low-level language and a high-level language, where code provides a low-level description of a way of performing a task, and modern compilers then try to convert that into a high-level description of what the task is and then implement efficient code to perform that task (possibly in a way very different from what was specified). In order to facilitate the translation from the low-level sequence of steps into the higher-level description of the operations being performed, the compiler needs to make certain assumptions about the conditions under which those low-level steps will be performed. If those assumptions do not hold, the compiler may generate code which malfunctions in very weird and bizarre ways.
Complicating the situation is the fact that there are many common programming constructs which might be legal if certain parts of the rules were a little better thought-out, but which as the rules are written would authorize compilers to do anything they want. Identifying all the places where code does things which arguably should be legal, and which have historically worked correctly 99.999% of the time, but might break for arbitrary reasons can be very difficult.
Thus, one may wish for the addition of a new library not to break anything, and most of the time one's wish might come true, but unfortunately it's very difficult to know whether any code may have lurking time bombs within it.
So we've all heard the don't-use-register line, the reasoning being that trying to out-optimize a compiler is a fool's errand.
register, from what I know, doesn't actually state anything about CPU registers, just that a given variable can't be referenced indirectly. I'll hazard a guess that it's often referred to as obsolete because compilers can detect a lack of addressing automatically thus making such optimizations transparent.
But if we're firm on that argument, can't it be levelled at every optimization-driven keyword in C? Why do we use inline and C99's restrict for example?
I suppose that some things like aliasing make deducing some optimizations hard or even impossible, so where is the line drawn before we start venturing into Sufficiently Smart Compiler territory?
Where should the line should be drawn in C and C++ between spoon-feeding a compiler optimization information and assuming it knows what it's doing?
EDIT: Jens Gustedt pointed out that my conflating of C and C++ isn't right since two of the keywords have semantic differences and one doesn't exist in standard C++. I had a good link about register in C++ which I'll add if I find it...
I would agree that register and inline are somewhat similar in this respect. If the compiler can see the body of the callee while compiling a call site, it should be able to make a good decision on inlining. The use of the inline keyword in both C and C++ has more to do with the mechanics of making the body of the function visible than with anything else.
restrict, however, is different. When compiling a function, the compiler has no idea of what the call sites are going to be. Being able to assume no aliasing can enable optimizations that would otherwise be impossible.
inline is used in the scenario where you implement a non-templated function within the header then include it from multiple compilation units.
This ensures that the compiler should create just one instance of the function as though it were inlined, so you do not get a link error for multiply defined symbol. It does not however require the compiler to actually inline it.
There are GNU flags I think force-inline or similar but that is a language extension.
register doesn't even say that you can't reference the
variable indirectly (at least in C++). It said that in the
original C, but that has been dropped.
Whether trying to out-optimize the compiler is a fool's errand
depends on the optimization. Not many compilers, for example,
will convert sin(x) * sin(x) + cos(x) * cos(x) into 1.
Today, most compilers ignore register, and no one uses it,
because compilers have become good enough at register allocation
to do a better job than you can with register. In fact,
respecting register would typically make the generated code
slower. This is not the case for inline or restrict: in
both cases, there exist techniques, at least theoretically,
which could result in the compiler doing a better job than you
can. Such techniques are not widespread, however, and (as far
as I know, at least), have a very high compile time overhead,
with in some cases compile times which grow exponentially with
the size of the program (which makes them more or less unusable
on most real programs—compile times which are measured in
years really aren't acceptable).
As to where to draw the line... it changes in time. When
I first started programming in C, register made a significant
difference, and was widely used. Today, no. I imagine that in
time, the same may happen with inline or restrict—some
experimental compilers are very close with inline already.
This is a flame-bait question but I will dive in anyway.
Compilers are a lot better at optimising that your average programmer. There was a time I programmed on a 25MHz 68030 and I got some advantage from the use of register because the compiler's optimizer was so poor. But that was back in 1990.
I see inline as just as bad as register.
In general, measure first before you modify. If you find that you code performs so poorly you want to use register or inline, take a deep breath, stand back and look for a better algorithm first.
In recent times (i.e. the last 5 years) I have gone through code bases and removed inline functions galore with no perceptible change in performance being visible. Code size, however, always benefits from the removal of inline methods. That isn't a big issue for your standard x86-style monster multicore marvel of the modern age but it does matter if you work in the embedded space.
It is a moving target, because compiler technology is improving. (Well, sometimes it is more changing than improving, but that has some of the same effect of rendering your optimization attempts moot, or worse.)
Generally, you should not guess at whether an optimization keyword or other optimization technique is good or not. One has to learn quite a bit about how computers work, including the particular platform you are targeting, and how compilers work.
So a rule about using various optimization techniques is to ask do I know the compiler will not do the best job here? Am I willing to commit to that for a while—will the compiler remain stable while this code is in use, am I willing to rewrite the code when the compiler changes this situation? Typically, you have to be an experienced and knowledgeable software engineer to know when you can do better than the compiler. It also helps if you can talk to the compiler developers.
This means people cannot give you an answer here that has a definite guideline. It depends on what compiler you are using, what your project is, what your resources are, and what your goals are, and so on.
Although some people say not to try to out-optimize the compiler, there are various areas of software engineering where people do better than a compiler and in which it is worth the expense of paying people for this.
The difference is as follows:
register is very local optimization (i.e. inside one function). The register allocation is a relatively solved problem both by smarter compilers and by larger number of register (mostly the former but say x86-64 have more registers then x86 and both have larger number then say 8-bit processor)
inline is harder as it is inter-procedure optimization. However as it involves relatively small depth of recursion and small number of procedures (if inlined procedure is too big there is no sense of inlining it) it may be safely left to the compiler.
restrict is much harder. To fully know the that two pointers don't alias you would need to analyse whole program (including libraries, system, plug-ins etc.) - and even then run into problems. However the information is clearer for programmer AND it is part of specification.
Consider very simple code:
void my_memcpy(void *dst, const void *src, size_t size) {
for (size_t i = 0; i < size; i++) {
((char *)dst)[i] = ((const char *)str)[i];
}
}
Is there a benefit to making this code efficient? Yes - memcpy tend to be very useful (say for copying GC). Can this code be vectorized (here - moved by words - say 128b instead of 8b)? Compiler would have to deduce that dst and src does not alias in any way and regions pointed by them are independent. size may depend on user input or runtime behaviour or other elements which makes the analysis practically impossible - similar problems to Halting Problem - in general we cannot analyse everything without running it. Or it might be part of C library (I assume shared libraries) and is called by program hence all call sites are not even known at compile time. Without such analysis the program would exhibit different behaviour with optimization on. On the other hand programmer might ensure that they are different objects simply by knowing the (even higher-level) design instead of need for bottom-up analysis.
restrict can also be part of documentation as it might be programmer who wrote the procedure in a way that it cannot handle 2 aliasing pointers. For example if we want to copy memory from aliasing locations the above code is incorrect.
So to sum up - Sufficiently Smart Compiler would not be able to deduce the restrict (unless we move to compilers understending the meaning of code) without knowing the whole program. Even then the it would be close to undecidability. However for local optimization the compilers are already sufficiently smart. My guess it that Sufficiently Smart Compiler with whole program analysis would be able to deduce in many interesting cases however.
PS. By local I mean single function. So local optimization cannot assume anything about arguments, global variables etc.
One thing that hasn't been mentioned is that many non-x86 compilers aren't nearly as good at optimizing as gcc and other "modern" C-compilers are.
For instance, the compilers for PIC are absolutely terrible at optimizing. Also, the optimizer for cicc (the CUDA compiler), though much better, still seems to miss a lot of fairly simple optimizations.
For these cases, I've found optimization hints like register, inline, and #pragma unroll to be extremely useful.
From what I have seen back in the days I was more involved with C/C++, these are merely orders directly given to the compiler. Compiler may try to inline a function even if it is not given the direct order to do so. That really depends on the compiler and may even raise some cross-compiler issues. As an example, visual studio provides different levels of optimization which correspond to the different intelligence levels of the compiler. I have read that all class functions are implicitly inline to give compiler a hint to minimize function call overhead. In any case, these directives are extremely helpful when you are using a less intelligent compiler while in intelligent cases, they may be very obvious for the compiler to do some optimization.
Also, be sure that these keywords are guaranteed to be safe. Some compiler optimizations may not work with some libraries such as OpenGL (as I have seen it myself). So in cases where you feel that compiler optimization may be harmful, you can use these keywords to make sure it is done the way you want it to.
The compilers such as g++ these days optimize the code very well. You might as well search for optimization elsewhere, maybe in the methods and algorithm you use or by using TBB or CUDA to make your code parallel.
Is there a good reason why this code compiles without warning (and crashes when run) with Visual C++ 2010:
int a = *((int*)nullptr);
Static analysis should conclude that it will crash, right?
Should this use of nullptr produce a compiler error?
No.
Dereferencing a null pointer results in undefined behavior, but no diagnostic is required.
Static analysis should conclude that it will crash, right?
It might. It doesn't have to. It would certainly be nice if a warning was issued. A dedicated static analysis tool (Klocwork, for example) would probably issue a warning.
Yes, static analysis would show this to always crash. However, this would require the compiler to actually perform this static analysis. Most compilers do not do this (at least none I know of).
So the question is: Why don't C/C++ compilers do more static type checking.
The reason the compiler does not do this is mostly: tradition, and a philosophy of making the compiler as simple as possible.
C (and to a lesser degree C++) were created in an environment where computing power was fairly expensive, and where ease of writing a compiler was important (because there were many different HW architectures).
Since static typechecking analysis will both make a compiler harder to write, and make it compile more slowly, it was not felt at the time to be a priority. Thus most compilers don't have it.
Other languages (e.g.) Java make different tradeoffs, and thus in Java many things are illegal that are allowed in C (e.g. unreachable code is a compile-time error in Java; in C most compilers don't even warn). This really boils down to philosophy.
BTW, note that you can get static typechecking in C if you want it - there are several tools available, e.g. lint (ancient), or see What open source C++ static analysis tools are available? .
.
I would like to know all the possible ways (or atleast the popular ones) in which compilers can/do optimize away our code written in C++? I also would like to know how exactly the optimization is done (in each case)!
So far, I'm aware of two optimizations, viz. Empty Base Optimization (EBO) and Return Value Optimization (RVO). What else are there? I've heard of "const" optimization,"unused variable" optimization. What are they?
.
All possible ways? Surely you're joking. For that, look through years of compiler research and practice.
For concrete examples, look up each of the options here: http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
From Standard docs., Section 1.9,
4) This provision is sometimes called the “as-if” rule, because an implementation is free to disregard any requirement of this International Standard
as long as the result is as if the requirement had been obeyed, as far as can be determined from the observable behavior of the program. For instance,
an actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no side effects affecting the
observable behavior of the program are produced.
So actually the standard compliant compiler can perform any kind of optimizations as long as it produces the desired result.
Incredibly broad, because there are many optimizations and compiler writers are always thinking up more. There are tons of them, some optimizing for run time, others optimizing for binary size. Many aren't specifically C++ either, the general compiler optimization techniques are implemented for many compilers/interpreters for many different languages.
Just a handful:
Dead code removal
Loop unrolling
Constants folding
More info:
Optimizing C++ Applications with Intel C++ Compiler
New Optimizations in VC++ 2010
Generic Compiler Optimizations Wiki
Is there any way to know if you program has undefined behavior in C++ (or even C), short of memorizing the entire spec?
The reason I ask is that I've noticed a lot of cases of programs working in debug but not release being due to undefined behavior. It would be nice if there were a tool to at least help spot UB, so we know there's the potential for problems.
Good coding standards. Protect you from yourself. Here are some ideas:
The code must compile at the highest warning level... without warnings. (In other words, your code must not set off any warnings at all when set to the highest level.) Turn on the error on warning flag for all projects.
This does mean some extra work when you use other peoples' libraries since they may not have done this. You will also find there are some warnings which are pointless... turn those off individually as your team decides.
Always use RAII.
Never use C style casts! Never! - I think there's like a couple rare cases when you have to break this but you will probably never find them.
If you must reinterpret_cast or cast to void then use a wrapper to make sure you're always casting to/from the same type. In other words, wrap your pointer/object in a boost::any and cast a pointer to it into whatever you need and on the other side do the same. Why? Because you will always know what type to reinterpret_cast from and the boost::any will enforce that you've cast to the correct type after that. It's the safest you can get.
Always initialize your variables at the point of declaration (or in constructor initializers when in a class).
There are more but those are some very important ones to start with.
Nobody can memorize the standard. What we intermediate to advanced C++ programmers do is use constructs we know are safe and protect ourselves from our human nature... and we don't use constructs that are not safe unless we have to and then we take extra care to make sure the danger is all wrapped up in a nice safe interface that is tested to hell and back.
One important thing to remember which is universal across all languages is to:
make your constructs easy to use correctly and difficult to use incorrectly
It's not possible to detect undefined behavior in all cases. For example, consider x = x++ + 1;. If you're familiar with the language, you know it's UB. Now, *p = (*p)++ + 1; is obviously also UB, but what about *q = (*p)++ + 1;? That's UB if q == p, but other than that it's defined (if awkward-looking). In a given program, it might well be possible to prove that p and q will never be equal when reaching that line, but that can't be done in general.
To help spot UB, use all of the tools you've got. Good compilers will warn for at least the more obvious cases, although you may have to use some compiler options for best coverage. If you have further static analysis tools, use them.
Code reviews are also very good for spotting such problems. Use them, if you've got more than one developer available.
Static code analysis tools such as PC-Lint can help a lot here
Well, this article covers most aspects..
I think you can use one tool from coverity to spot bugs which are going to lead to undefined behavior.
I guess you could use theorem provers (i only know Coq) to be sure your program does what you want.
clang tries hard to produce warnings when undefined behavior is encountered.
I'm not aware of any software tool to detect all forms of UB. Obviously using your compiler's warnings and possibly lint or another static code checker can help a lot.
The other thing that helps a lot is simply experience: The more you program the language, the more you'll see constructs that appear suspect and be able to catch them earlier in the process.
Unfortunately, there is no way way to detect all UB. You'd have to solve the Halting Problem to do that.
The best you can do is to know as many of the rules as possible, look it up when you're in doubt, and check with other programmers (through pair programming, code reviews or just SO questions)
Compiling with as many warnings as possible, and under multiple compilers can help. And running the code through static analysis tools such as Valgrind can detect many issues.
But ultimately, no tool can detect it all.
An additional problem is that many programs actually have to rely on UB. Some API's require it, and just assume that "it works on all sane compilers". OpenGL does that in one or two cases. The Win32 API won't even compile under a standards compliant compiler.
So even if you had a magic UB-detecting tool, it would still be tripped up by the cases that aren't under your control.
Simple: Don't do things that you don't know that you can do.
When you are unsure or have a fishy feeling, check the reference
A good compiler, such as the Intel C++ compiler, should be able to spot 99% of cases of undefined behaviour. You'll need to investigate the flags and switches to use. As ever, read the manual.