Do C++ compilers (optimizers) guarantee any code/executable invariants? - c++

I am wondering if any C++ compilers (optimizers) guarantee that a code that is highly "decorated" with some meaningless operations (e.g. if (true) or do-while(false) pattern) will be generated (when compiling with all the optimizations for speed) into exactly the same executable that would be, if the source code didn't contain these "decorations".
I hope my question is clear - if not I will try to rephrase it is some way - anyway here is, why I am asking it, to make it more clear: I have a 'legacy' piece of c++ code that uses macros quite heavily. Some of them are expressions, so it might be good to wrap them into do-while(false), although these macros themselves contains instructions like "break" or "continue", so I came even further to "if(true)[MacroBody]else-do{}while(false)". So far, so good, maybe not very clean but all should be fine, however I noticed that my executable (MSVC, all optimizations for speed) grows by 2kB after such "decorating".
(My feeling is that the compiler, after making some initial optimizations, was glad enough by what it did so far and stopped making further optimizations. The bad thing is that I was unable to minimize the case - the difference in executable replicated only on real-life case. That's why I would like to ask, if any of you, more specialized in compiler construction, have some knowledge about it. If my perception was right, currently it might work a bit heuristically - like compiler coming to a conclusion: "Looks like I did quite many optimizations, it may be just enough")

Related

Compiling via C/C++ and compiler bugs/limits

I'm looking at the possibility of implementing a high-level (Lisp-like) language by compiling via C (or possibly C++ if exceptions turn out to be useful enough); this is a strategy a number of projects have previously used. Of course, this would generate C code unlike, and perhaps in some dimensions exceeding the complexity of, anything you would write by hand.
Modern C compilers are highly reliable in normal usage, but it's hard to know what bugs might be lurking in edge cases under unusual stresses, particularly if you go over some "well no programmer would ever write an X bigger than Y" hidden limit.
It occurs to me the coincidence of these facts might lead to unhappiness.
Are there any known cases, or is there a good way to find cases, of generated code tripping over edge case bugs/limits in reasonably recent versions of major compilers (e.g. GCC, Microsoft C++, Clang)?
This may not quite be the answer you were looking for, but quite a few years ago, I worked on a project, where parts of the system was written in some higher level language, where it was easy to build state machines in processes. This language produced C-code that was then compiled. The compiler we were using for this project was gcc (version around 2.95 - don't quote me on that, but pre-3.0 for sure). We did run into a couple of code generation bugs, but that was, from my memory more to do with using a not-so-popular processor [revealing which processor may reveal something I shouldn't about the project, so I'd rather not say what it was, even if it was a very long time ago].
A colleague close to me was investigating one of those code generation bugs, which was in a function of around 200k lines, all of the function a big switch-statement, with each case in the switch statement being around 50-1000 lines each (with several layers of sub-switch statements inside it).
From my memory of it, the code was crashing because it produced an invalid operation or stored something in a register already occupied for something else, so once you hit the right bit of code, it would fail due to an illegal memory access - and it had nothing to do with the long size of the code, because my colleague managed to get it down to about 30 lines of code eventually (after a lot of "lets cut this out and see if it still goes wrong"), and after a few days we had a new version of the compiler with a fix. Nice to know that your many thousands of dollars to pay for the compiler service contract is worth having at least sometimes...
My point here is that modern compilers tolerate a lot of large code. There are also minimum limits that "a compliant compiler must support at least of ". For example, I believe (from memory, again), that the compiler needs to support 127 levels of nested statements (that is, a combination of 127 if, switch, while and do-while) within a function. And, from a discussion somewhere (which is where the "the compiler should support 127 levels of nested statements" comes from), we found that MSVC and GCC both support a whole lot more (enough that we gave up on finding it...)
Short answer:
You have no choice if you need the performance of a compiler rather than the ease of life of an interpreter (or pre-compiler + interpreter). You will be compiling into some lower level language, and C is the assembly language of today, with C++ being about as available and as apt for the task as C. There is no reason why you should fear this route. It is actually, in some sense, a very common route.
Long answer:
Generated code is in no way unusual. Even relatively modest uses of generated code are resulting in C source that is "unlike what any programmer would ever write", in terms of quantity (trivial code patterns repeated with small variations millions of times), or quality (combinations of language features that a human would never use but that are still, say, legal C++). There are also quite a few compilers that compile into C or C++, some famous and well known to people who wrote the C and C++ language standards.
The most common C and C++ compilers cope well with generated code. Yes, they are never perfect.
Compilers may have various simple limitations, such as maximum code line
length; these tend to be documented and easy to comply with in your generator once you run into them.
Compilers may have defects.
Some kinds of defects are actually less of concern for generated code than for hand written code. Your code generator actually gives you a degree of freedom to deal with many situations in a systematic way as soon as you start understanding the pattern and caring about the problem.
Defects that actually result in the compiler not compiling valid code correctly are quickly discovered, given enough users of the compiler. They are treated by compiler vendors as particularly high priority defects. Even if the compiler is essentially dead or serving a niche market, so that no fix is available, plenty of information tends to be available on the Internet, including people's existing experience of working around the defect (different compiler, different code construction, different compiler switches... the solution varies a lot and may feel awkward but nobody gives up on their job just because some software is buggy, right)? So there is usually a choice of searchable solutions.
It's often good striving for portability across compilers, but also knowing and tracking the limits of portability. If you have not tested a particular C or C++ compiler very well, do not claim that it would work as part of your toolset.
You are asking an implied question between C and C++. Well, there are more shades of grey here. C++ is a very rich language. You can use almost all C++ features for good purposes within your generator, but in some cases you should ask yourself whether a particular major feature could become a liability, costing you more than it brings to you. For example, different compilers use different strategies for template instantiation. Implicit instantiation can lead to unforeseen complexity with portable generated code. If templates do help you design the generator tremendously, don't hesitate to use them; but if you only have a marginal use case for them, remember that you have a better reason than most people to restrict the language you generate into.
There are all sorts of implementation defined limits in C. Some well defined and visible to the programmer (think numeric limits), others are not. In my copy of the draft standard, section 5.2.4.1 details the lower bounds on these limits:
5.2.4 Environmental limits
Both the translation and execution environments constrain the implementation of
language translators and libraries. The following summarizes the language-related
environmental limits on a conforming implementation; the library-related limits are
discussed in clause 7.
5.2.4.1 Translation limits
The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits:18)
— 127 nesting levels of blocks
— 63 nesting levels of conditional inclusion
— 12 pointer, array, and function declarators (in any combinations) modifying an arithmetic, structure, union, or void type in a declaration
[...]
18) Implementations should avoid imposing fixed translation limits whenever possible.
I can't say for sure if your translator is likely to hit any of these or if the C compiler(s) you're targeting will have a problem even if you do, but I think you'll probably be fine.
As for bugs:
Clang - http://llvm.org/bugs/buglist.cgi?bug_status=all&product=clang
GCC - http://gcc.gnu.org/bugzilla/buglist.cgi?bug_status=all&product=gcc
Visual Studio - https://connect.microsoft.com/VisualStudio/feedback
It may sound obvious but the only way to really know is by testing. If you can't do all the effort yourself, at least make your product cross-platform so that people can easily test for you! If people like your project, they are usually willing to submit bug reports or even patches for free :)

Do inline functions make it harder to reverse-engineer the compiled binary?

So basically, besides possible performance effects, does inlining functions have any considerable effect on how difficult it is to reverse-engineer the program from its compiled and linked binary?
I mean, it should be, since 1) the cracker just sees more machine instructions, instead of nice understandable "call XXXXX", which he may already have discovered to do a certain thing. and 2) inlining provides more possibilities for the compiler to optimize code, and that is even more obfuscation, right?
Also, considering the inline keyword is just a suggestion to the compiler, how much effect can this really have? Should we bother? I mean, of course they will crack it eventually, but if by such simple measures we can make the cracker's life harder, why not?
The choice to inline methods or not should not be based on how easy it is to reverse engineer. The difference between inlining and not will be negligible.
The exception is if you have any anti-piracy code, inlining that, or even using macros to ensure it's "inlined" can help remove a single point of failure.
If you're concerned about this, I suggest looking into obfuscation tools that operate on the binary.
It will decrease the similarity between your input and his output. It generally won't have much effect on his efforts in general though.

How do I find how C++ compiler implements something except inspecting emitted machine code?

Suppose I crafted a set of classes to abstract something and now I worry whether my C++ compiler will be able to peel off those wrappings and emit really clean, concise and fast code. How do I find out what the compiler decided to do?
The only way I know is to inspect the disassembly. This works well for simple code, but there're two drawbacks - the compiler might do it different when it compiles the same code again and also machine code analysis is not trivial, so it takes effort.
How else can I find how the compiler decided to implement what I coded in C++?
I'm afraid you're out of luck on this one. You're trying to find out "what the compiler did". What the compiler did is to produce machine code. The disassembly is simply a more readable form of the machine code, but it can't add information that isn't there. You can't figure out how a meat grinder works by looking at a hamburger.
I was actually wondering about that.
I have been quite interested, for the last few months, in the Clang project.
One of Clang particular interests, wrt optimization, is that you can emit the optimized LLVM IR code instead of machine code. The IR is a high-level assembly language, with the notion of structure and type.
Most of the optimizations passes in the Clang compiler suite are indeed performed on the IR (the last round is of course architecture specific and performed by the backend depending on the available operations), this means that you could actually see, right in the IR, if the object creation (as in your linked question) was optimized out or not.
I know it is still assembly (though of higher level), but it does seem more readable to me:
far less opcodes
typed objects / pointers
no "register" things or "magic" knowledge required
Would that suit you :) ?
Timing the code will directly measure its speed and can avoid looking at the disassembly entirely. This will detect when compiler, code modifications or subtle configuration changes have affected the performance (either for better or worse). In that way it's better than the disassembly which is only an indirect measure.
Things like code size can also serve as possible indicators of problems. At the very least they suggest that something has changed. It can also point out unexpected code bloat when the compiler should have boiled down a bunch of templates (or whatever) into a concise series of instructions.
Of course, looking at the disassembly is an excellent technique for developing the code and helping decide if the compiler is doing a sufficiently good translation. You can see if you're getting your money's worth, as it were.
In other words, measure what you expect and then dive in if you think the compiler is "cheating" you.
You want to know if the compiler produced "clean, concise and fast code".
"Clean" has little meaning here. Clean code is code which promotes readability and maintainability -- by human beings. Thus, this property relates to what the programmer sees, i.e. the source code. There is no notion of cleanliness for binary code produced by a compiler that will be looked at by the CPU only. If you wrote a nice set of classes to abstract your problem, then your code is as clean as it can get.
"Concise code" has two meanings. For source code, this is about saving the scarce programmer eye and brain resources, but, as I pointed out above, this does not apply to compiler output, since there is no human involved at that point. The other meaning is about code which is compact, thus having lower storage cost. This can have an impact on execution speed, because RAM is slow, and thus you really want the innermost loops of your code to fit in the CPU level 1 cache. The size of the functions produced by the compiler can be obtained with some developer tools; on systems which use GNU binutils, you can use the size command to get the total code and data sizes in an object file (a compiled .o), and objdump to get more information. In particular, objdump -x will give the size of each individual function.
"Fast" is something to be measured. If you want to know whether your code is fast or not, then benchmark it. If the code turns out to be too slow for your problem at hand (this does not happen often) and you have some compelling theoretical reason to believe that the hardware could do much better (e.g. because you estimated the number of involved operations, delved into the CPU manuals, and mastered all the memory bandwidth and cache issues), then (and only then) is it time to have a look at what the compiler did with your code. Barring these conditions, cleanliness of source code is a much more important issue.
All that being said, it can help quite a lot if you have a priori notions of what a compiler can do. This requires some training. I suggest that you have a look at the classic dragon book; but otherwise you will have to spend some time compiling some example code and looking at the assembly output. C++ is not the easiest language for that, you may want to begin with plain C. Ideally, once you know enough to be able to write your own compiler, then you know what a compiler can do, and you can guess what it will do on a given code.
You might find a compiler that had an option to dump a post-optimisation AST/representation - how readable it would be is another matter. If you're using GCC, there's a chance it wouldn't be too hard, and that someone might have already done it - GCCXML does something vaguely similar. Of little use if the compiler you want to build your production code on can't do it.
After that, some compiler (e.g. gcc with -S) can output assembly language, which might be usefully clearer than reading a disassembly: for example, some compilers alternate high-level source as comments then corresponding assembly.
As for the drawbacks you mentioned:
the compiler might do it different when it compiles the same code again
absolutely, only the compiler docs and/or source code can tell you the chance of that, though you can put some performance checks into nightly test runs so you'll get alerted if performance suddenly changes
and also machine code analysis is not trivial, so it takes effort.
Which raises the question: what would be better. I can image some process where you run the compiler over your code and it records when variables are cached in registers at points of use, which function calls are inlined, even the maximum number of CPU cycles an instruction might take (where knowable at compile time) etc. and produces some record thereof, then a source viewer/editor that colour codes and annotates the source correspondingly. Is that the kind of thing you have in mind? Would it be useful? Perhaps some more than others - e.g. black-and-white info on register usage ignores the utility of the various levels of CPU cache (and utilisation at run-time); the compiler probably doesn't even try to model that anyway. Knowing where inlining was really being done would give me a warm fuzzy feeling. But, profiling seems more promising and useful generally. I fear the benefits are more intuitively real than actually, and compiler writers are better off pursuing C++0x features, run-time instrumentation, introspection, or writing D "on the side" ;-).
The answer to your question was pretty much nailed by Karl. If you want to see what the compiler did, you have to start going through the assembly code it produced--elbow grease is required. As to discovering the "why" behind the "how" of how it implemented your code...every compiler (and every build, potentially), as you mentioned, is different. There are different approaches, different optimizations, etc. However, I wouldn't worry about whether it's emitting clean, concise machine code--cleanliness and concision should be left to the source code. Speed, on the other hand, is pretty much the programmer's responsibility (profiling ftw). More interesting concerns are correctness, maintainability, readability, etc. If you want to see if it made a specific optimization, the compiler docs might help (if they're available for your compiler). You can also just trying searching to see if the compiler implements a known technique for optimizing whatever. If those approaches fail, though, you're right back to reading assembly code. Keep in mind that the code that you're checking out might have little to no impact on performance or executable size--grab some hard data before diving into any of this stuff.
Actually, there is a way to get what you want, if you can get your compiler to
produce DWARF debugging information. There will be a DWARF description for each
out-of-line function and within that description there will (hopefully) be entries
for each inlined function. It's not trivial to read DWARF, and sometimes compilers
don't produce complete or accurate DWARF, but it can be a useful source of information
about what the compiler actually did, that's not tied to any one compiler or CPU.
Once you have a DWARF reading library there are all sorts of useful tools you can
build around it.
Don't expect to use it with Visual C++ as that uses a different debugging format.
(But you might be able to do similar queries through the debug helper library
that comes with it.)
If your compiler manages to translate your "wrappings and emit really clean, concise and fast code" the effort to follow-up the emitted code should be reasonable.
Contrary to another answer I feel that emitted assembly code may well be "clean" if it is (relatively) easily mappable to the original source code, if it doesn't consist of calls all over the place and that the system of jumps is not too complex. With code scheduling and re-ordering an optimized machine code which is also readable is, alas, a thing of the past.

Is it worth writing part of code in C instead of C++ as micro-optimization?

I am wondering if it is still worth with modern compilers and their optimizations to write some critical code in C instead of C++ to make it faster.
I know C++ might lead to bad performance in case classes are copied while they could be passed by reference or when classes are created automatically by the compiler, typically with overloaded operators and many other similar cases; but for a good C++ developer who knows how to avoid all of this, is it still worth writing code in C to improve performance?
I'm going to agree with a lot of the comments. C syntax is supported, intentionally (with divergence only in C99), in C++. Therefore all C++ compilers have to support it. In fact I think it's hard to find any dedicated C compilers anymore. For example, in GCC you'll actually end up using the same optimization/compilation engine regardless of whether the code is C or C++.
The real question is then, does writing plain C code and compiling in C++ suffer a performance penalty. The answer is, for all intents and purposes, no. There are a few tricky points about exceptions and RTTI, but those are mainly size changes, not speed changes. You'd be so hard pressed to find an example that actually takes a performance hit that it doesn't seem worth it do write a dedicate module.
What was said about what features you use is important. It is very easy in C++ to get sloppy about copy semantics and suffer huge overheads from copying memory. In my experience this is the biggest cost -- in C you can also suffer this cost, but not as easily I'd say.
Virtual function calls are ever so slightly more expensive than normal functions. At the same time forced inline functions are cheaper than normal function calls. In both cases it is likely the cost of pushing/popping parameters from the stack that is more expensive. Worrying about function call overhead though should come quite late in the optimization process -- as it is rarely a significant problem.
Exceptions are costly at throw time (in GCC at least). But setting up catch statements and using RAII doesn't have a significant cost associated with it. This was by design in the GCC compiler (and others) so that truly only the exceptional cases are costly.
But to summarize: a good C++ programmer would not be able to make their code run faster simply by writing it in C.
measure! measure before thinking about optimizing, measure before applying optimization, measure after applying optimization, measure!
If you must run your code 1 nanosecond faster (because it's going to be used by 1000 people, 1000 times in the next 1000 days and that second is very important) anything goes.
Yes! it is worth ...
changing languages (C++ to C; Python to COBOL; Mathlab to Fortran; PHP to Lisp)
tweaking the compiler (enable/disable all the -f options)
use different libraries (even write your own)
etc
etc
What you must not forget is to measure!.
pmg nailed it. Just measure instead of global assumptions. Also think of it this way, compilers like gcc separate the front, middle, and back end. so the frontend fortran, c, c++, ada, etc ends up in the same internal middle language if you will that is what gets most of the optimization. Then that generic middle language is turned into assembler for the specific target, and there are target specific optimizations that occur. So the language may or may not induce more code from the front to middle when the languages differ greatly, but for C/C++ I would assume it is the same or very similar. Now the binary size is another story, the libraries that may get sucked into the binary for C only vs C++ even if it is only C syntax can/will vary. Doesnt necessarily affect execution performance but can bulk up the program file costing storage and transfer differences as well as memory requirements if the program loaded as a while into ram. Here again, just measure.
I also add to the measure comment compile to assembler and/or disassemble the output and compare the results of your different languages/compiler choices. This can/will supplement the timing differences you see when you measure.
The question has been answered to death, so I won't add to that.
Simply as a generic question, assuming you have measured, etc, and you have identified that a certain C++ (or other) code segment is not running at optimal speed (which generally means you have not used the right tool for the job); and you know you can get better performance by writing it in C, then yes, definitely, it is worth it.
There is a certain mindset that is common, trying to do everything from one tool (Java or SQL or C++). Not just Maslow's Hammer, but the actual belief that they can code a C construct in Java, etc. This leads to all kinds of performance problems. Architecture, as a true profession, is about placing code segments in the appropriate architectural location or platform. It is the correct combination of Java, SQL and C that will deliver performance. That produces an app that does not need to be re-visited; uneventful execution. In which case, it will not matter if or when C++ implements this constructors or that.
I am wondering if it is still worth with modern compilers and their optimizations to write some critical code in C instead of C++ to make it faster.
no. keep it readable. if your team prefers c++ or c, prefer that - especially if it is already functioning in production code (don't rewrite it without very good reasons).
I know C++ might lead to bad performance in case classes are copied while they could be passed by reference
then forbid copying and assigning
or when classes are created automatically by the compiler, typically with overloaded operators and many other similar cases
could you elaborate? if you are referring to templates, they don't have additional cost in runtime (although they can lead to additional exported symbols, resulting in a larger binary). in fact, using a template method can improve performance if (for example) a conversion would otherwise be necessary.
but for a good C++ developer who knows how to avoid all of this, is it still worth writing code in C to improve performance?
in my experience, an expert c++ developer can create a faster, more maintainable program.
you have to be selective about the language features that you use (and do not use). if you break c++ features down to the set available in c (e.g., remove exceptions, virtual function calls, rtti) then you're off to a good start. if you learn to use templates, metaprogramming, optimization techniques, avoid type aliasing (which becomes increasingly difficult or verbose in c), etc. then you should be on par or faster than c - with a program which is more easily maintained (since you are familiar with c++).
if you're comfortable using the features of c++, use c++. it has plenty of features (many of which have been added with speed/cost in mind), and can be written to be as fast as c (or faster).
with templates and metaprogramming, you could turn many runtime variables into compile-time constants for exceptional gains. sometimes that goes well into micro-optimization territory.

How to know what optimizations are done automatically by my compiler

I was going through this link Will it optimize and wondered how can we know what optimizations are done by a particular compiler.
Like does VC8.0 convert if-else statements to switch-case?
Is such information available on msdn?
As everyone seems to be bent on telling the OP that he shouldn't worry about it, there is some useful although not as specific as the OP requested) information about compiler optimization (options).
You'll have to figure out what flags you're using, especially for MSVC and Intel (GCC release build should default to -O2), but here are the links:
GCC
MSVC
Intel
This is about as close as you'll get before disassembling your binary after compilation.
It depends on the level of of optimization you choose for compiler.
you can find a very nice article about it here
First of all, if optimization took place then your program should work faster usually. After that you could inspect disassembly code to find out what kind of optimizations were performed.
I don't know anything about VC8.0, so I'm not sure how you would access that information. However, if you are generally interested in the kinds of optimisations that go on and want to experiment, I recommend you use LLVM. You can look at the unoptimised, disassembled byte code generated from the default C front end, and then run various optimiser passes over it to see what the effect is each time. Because it's a nicer, abstract assembly code, it tends to be a little easier to figure out what is an optimisation derivable from the code and what is a machine-specific optimisation.
Like does VC8.0 convert if-else statements to switch-case?
Compilers do not do magically rewrite your source code. And even if they did, what would that tell you? What you really would want to know is if the compiler compiled it into a jump table or into multiple compare operations. Any dis-assembler will tell you that.
To clarify my point: Writing a switch-case statement does not necesseraly imply that there will be a jump table in the binary. Not needing to worry about this is the whole point of having compilers.
Instead of figuring out which optimizations are done by the compiler in general, it's probably better to NOT have any dependencies on such compiler-specific knowledge.
Instead start out with a good design and algorithm, writing (as much as possible) portable code that's easy to follow. Then profile the code if it's too slow and fix the actual hotspots. Compiler optimizations are useful no doubt, but better is to apply some investigation to what's actually happening in the code. Algorithmic/design improvements at the source level will typically help performance more than the presence or absence of optimizations like transforming if/else into switch-case.
I'm not sure what "convert if/else to switch/case" means. My processor doesn't have a hardware switch/case instruction.
Typical compilers have several different ways to implement switch/case. A well-known one is using a jump table, but this is only done if appropriate.
For if/else, certainly it is normal for compilers to analyse a digraph of execution flow. I would expect a compiler to notice if each condition references the same variable, and I would expect the compiler to treat equivalent forms of conditionals the same way in general. But this isn't something I'd worry about.
IIRC, the general policy in GCC is that regressions in optimisation are tolerable so long as preferred improvements result. Optimisation is complex and what is "generally" a good optimisation isn't always that great. Plus for perfect optimisation, the compiler would have to know things it can't know (e.g. what inputs it will encounter in real life).
The point is that it really isn't worthwhile knowing that much about specific optimisations unless you happen to be a compiler developer. If you depend on something being optimised by V8, that particular optimisation might not happen in V9 or V10.