Why is it not possible to create a C++ decompiler that will function as accurately as those made for Java and C#?
There are several reasons:
Inlining. A lot of C++ code gets inlined in optimized builds. That plays havoc with any form of decompiler. To figure out that a function was inlined, the decompiler would have to analyze the specifics of the inlined code and match them up. And post-inlining optimization steps can make code very different, depending on where it was inlined.
Templates. Templates use #1 exclusively, but they create additional problems. It is at least theoretically possible that a function that gets inlined in two places would compile to the same sequence of assembly instructions. But for template code, which was instantiated with different template arguments? Different instantiations will usually have to compile down to different sequences of instructions. And this becomes even more difficult, since template code can call different sets of functions based on the template parameters. And those functions themselves could be inlined.
Compile-time execution. Template metaprogramming allows the compiler to actually execute code. But C++11's constexpr provides a more natural way to do some computations at compile time. Obviously, compile-time function calls or metafunction instantiations cannot be part of the compiled executable. Only the results of them will be (since that's kinda the point).
Lack of comprehensive runtime reflection. C# and Java both lace their bytecode with a lot of information about what the nature of the original source code. Object definitions are easily detectable, as are object names, member variable types and names, etc. C++ compiles down to machine language, which is not required to have any such information. And since it isn't required, compilers don't generate it. Even the reflection study group of the ISO C++ committee is focused on compile-time reflection, which is information that won't be available at runtime.
Even std::type_info doesn't offer anything. The reason being that, if the compiler does not detect that a particular type will have typeid called on it, then the compiler doesn't need to generate a std::type_info object for it. And even if it did, all that gives you is an object's name (and an identifier). Nothing more.
Because C++ compilers generally do not put any more information into the executable than they absolutely have to (especially not if they are compiling in release mode rather than a debug build), so the information you'd need to accurately decompile the program simply is not present in the executable.
Of course a C++ compiler could be made that does include all of the necessary information in the executable (e.g. in the most naive implementation, it could simply include a copy of the source code itself in the executable), but doing so would make the executables significantly larger, and most non-open-source C++ developers would prefer that other people not be able to decompile the executable, so there isn't a whole lot of demand for that functionality.
Related
I'm working on a C++ project with extensive compile-time computations. Long compilation time is slowing us down. How might I find out the slowest parts of our template meta-programs so I can optimize them? (When we have slow runtime computations, I have many profilers to choose from, e.g. valgrind's callgrind tool. So I tried building a debug GCC and profiling it compiling our code, but I didn't learn much from that.)
I use GCC and Clang, but any suggestions are welcome.
I found profile_templates on Boost's site, but it seems to be thinly documented and require the jam/bjam build system. If you show how to use it on a non-jam project1, I will upvote you. https://svn.boost.org/svn/boost/sandbox/tools/profile_templates/ appears to count number-of-instantiations, whereas counting time taken would be ideal.
1 Our project uses CMake and is small enough that hacking together a Jamfile just for template profiling could be acceptable.
I know this is an old question, but there is a newer answer that I would like to give.
There is a clang-based set of projects that target this particular problem. The first component is an instrumentation onto the clang compiler which produces a complete trace of all the template instantiations that occurred during compilation, with timing values and optionally memory usage counts as well. That tool is called Templight, as is accessible here (currently needs to compile against a patched clang source tree):
https://github.com/mikael-s-persson/templight
A second component is a conversion tool that allows you to convert the templight traces into other formats, such as easily parsable text-based format (yaml, xml, text, etc.) and into formats that can more easily be visualized, such as graphviz / graphML, and more importantly a callgrind output that can be loaded into KCacheGrind to visualize and inspect the meta-call-graph of template instantiations and their compile-time costs, such as this screenshot of a template instantiation profile of a piece of code that creates a boost::container::vector and sorts it with std::sort:
Check it out here:
https://github.com/mikael-s-persson/templight-tools
Finally, there is also another related project that creates an interactive shell and debugger to be able to interactively walk up and down the template instantiation graph:
https://github.com/sabel83/metashell
I've been working since 2008 on a library that uses template metaprogramming heavily. There is a real need for better tools or approaches for understanding what consumes the most compile time.
The only technique I know of is a divide and conquer approach, either by separating code into different files, commenting out bodies of template definitions, or by wrapping your template instantiations in #define macros and temporarily redefining those macros to do nothing. Then you can recompile the project with and without various instantiations and narrow down.
Incidentally just separating the same code into more, smaller files may make it compile faster. I'm not just talking about opportunity for parallel compilation - even serially, I observed it to still be faster. I've observed this effect in gcc both when compiling my library, and when compiling Boost Spirit parsers. My theory is that some of the symbol resolution, overload resolution, SFINAE, or type inference code in gcc has an O(n log n) or even O(n^2) complexity with respect to the number of type definitions or symbols in play in the execution unit.
Ultimately what you need to do is carefully examine your templates and separate what really depends on the type information from what does not, and use type erasure and virtual functions whereever possible on the portion of the code that does not actually require the template types. You need to get stuff out of the headers and into cpp files if that part of the code can be moved. In a perfect world the compiler should be able to figure this out for itself - you shouldn't have to manually move this code to babysit it - but this is the state of the art with the compilers we have today.
The classic book C++ Template Metaprogramming: Concepts, Tools, and Techniques from Boost and Beyond comes with a 20 page appendix on profiling compile-time costs. It has a companion CD with the code that generates the graphs in that appendix.
Another paper is http://aszt.inf.elte.hu/~gsd/s/cikkek/profiling/profile.pdf, perhaps that is of use to you.
Yet another, but more labor-intensive, way is to count the number of instantiations of each class from your compiler output.
http://www.cs.uoregon.edu/research/tau/about.php might be something which can be of your interest as for templated entities, it shows the breakup of time spent for each instantiation. Other data includes how many times each function was called, how many profiled functions did each function invoke, and what the mean inclusive time per call was
I am aware that the keyword inline has useful properties e.g. for keeping template specializations inside a header file.
On the other hand I have often read that inline is almost useless as hint for the compiler to actually inline functions.
Further the keyword cannot be used inside a cpp file since the compiler wants to inspect functions marked with the inline keyword whenever they are called.
Hence I am a little confused about the "automatic" inlining capabilities of modern compilers (namely gcc 4.43). When I define a function inside a cpp, can the compiler inline it anyway if it deems that inlining makes sense for the function or do I rob him of some optimization capabilities ? (Which would be fine for the majority of functions, but important to know for small ones called very often)
Within the compilation unit the compiler will have no problem inline functions (even if they are not marked as inline). Across compilation units it is harder but modern compilers can do it.
Use of the inline tag has little affect on 'modern' compilers and whether it actually inlines functions (it has better heuristics than the human mind) (unless you specify flags to force it one way or the other (which is usually a bad idea as humans are bad at making this decision)).
Microsoft Visual C++ was able to do so at least since Visual Studio 2005. They call it "Whole Program Optimization" or "Link-Time Code Generation". In this implementation, the compiler will not actually produce machine code, but write the preprocessed C++ code into the object files. The linker will then merge all of the code into one huge code unit and perform the actual compilation..
GCC is able to do this since at least version 4.5, with major improvements coming in GCC 4.7. To my knowledge the feature is still considered somewhat experimental (at least in so far as many Linux distributions not using it). GCC's implementation works very similarly by first writing the preprocessed source (in its GIMPLE intermediate language) into the object files, then compiling all of the object files into a single object file which is then passed to the linker (this allows GCC to continue to work with existing linkers).
Many big C++ projects also do what is now being called "unity builds". Instead of passing hundreds of individual C++ source files into the compiler, one source file is created that includes all the other source files in the project. The original intent behind this is to decrease compilation times (since headers etc. do not have to be parsed over and over), but as a side-effect, it will have the same outcome as the LTO/LTCG techniques mentioned above: giving the compiler perfect visibility into all functions in all compilation units.
I jump between being impressed by my C++ compiler's (MSVC 2010) ingenuity and its stupidity. Some code that did pixel format conversion via templates, which would have resolved into 5-10 assembly instructions when properly inlined, got bloated into kilobytes(!) of nested function calls. At other times, it inlines so aggressively that whole classes disappear even though they contained non-trivial functionality.
This depends on your compilation flags. With -combine and -fwhole-program, gcc will do function inlining across cpp boundaries. I'm not sure how much the linker will do if you compile into multiple object files.
The standard dictates nothing about how a function can be inlined. And compilers can inline functions if they have access to their implementation. If you only have a header with binaries, it would be impossible. If it's in the same module, the compiler can inline the function even if it is in the cpp file.
I've been programming with C++ for quite a while and I enjoy using templates a lot. What I've been wondering recently due to my foray into embedded programming is how one should expect the linker to behave with respect to code duplication in template instances where the template parameter is different.
For multiple instances of the same template with same parameters this is well known to be optimized away during link time (see also: How does C++ link template instances)
However in my case I'm interested if the linker will recognize any duplicated code between two templates that were instantiated with different parameters. As they are different types I would assume that they would not automatically be collapsed. However since they might have some functions that do not depend on the template parameters and would thus be identical between the two classes one might assume that a linker could optimize those away and thus save space.
What would be the expected behaviour in this case?
gold linker does exactly that.
Safe ICF: Pointer Safe and Unwinding Aware Identical Code Folding in Gold:
We have found that large C++ applications and shared libraries tend to have many functions whose code is identical with another function. As much as 10% of the code could theoretically be eliminated by merging such identical functions into a single copy. This optimization, Identical Code Folding (ICF), has been implemented in the gold linker. At link time, ICF detects functions with identical object code and merges them into a single copy.
I'm working on a C++ project with extensive compile-time computations. Long compilation time is slowing us down. How might I find out the slowest parts of our template meta-programs so I can optimize them? (When we have slow runtime computations, I have many profilers to choose from, e.g. valgrind's callgrind tool. So I tried building a debug GCC and profiling it compiling our code, but I didn't learn much from that.)
I use GCC and Clang, but any suggestions are welcome.
I found profile_templates on Boost's site, but it seems to be thinly documented and require the jam/bjam build system. If you show how to use it on a non-jam project1, I will upvote you. https://svn.boost.org/svn/boost/sandbox/tools/profile_templates/ appears to count number-of-instantiations, whereas counting time taken would be ideal.
1 Our project uses CMake and is small enough that hacking together a Jamfile just for template profiling could be acceptable.
I know this is an old question, but there is a newer answer that I would like to give.
There is a clang-based set of projects that target this particular problem. The first component is an instrumentation onto the clang compiler which produces a complete trace of all the template instantiations that occurred during compilation, with timing values and optionally memory usage counts as well. That tool is called Templight, as is accessible here (currently needs to compile against a patched clang source tree):
https://github.com/mikael-s-persson/templight
A second component is a conversion tool that allows you to convert the templight traces into other formats, such as easily parsable text-based format (yaml, xml, text, etc.) and into formats that can more easily be visualized, such as graphviz / graphML, and more importantly a callgrind output that can be loaded into KCacheGrind to visualize and inspect the meta-call-graph of template instantiations and their compile-time costs, such as this screenshot of a template instantiation profile of a piece of code that creates a boost::container::vector and sorts it with std::sort:
Check it out here:
https://github.com/mikael-s-persson/templight-tools
Finally, there is also another related project that creates an interactive shell and debugger to be able to interactively walk up and down the template instantiation graph:
https://github.com/sabel83/metashell
I've been working since 2008 on a library that uses template metaprogramming heavily. There is a real need for better tools or approaches for understanding what consumes the most compile time.
The only technique I know of is a divide and conquer approach, either by separating code into different files, commenting out bodies of template definitions, or by wrapping your template instantiations in #define macros and temporarily redefining those macros to do nothing. Then you can recompile the project with and without various instantiations and narrow down.
Incidentally just separating the same code into more, smaller files may make it compile faster. I'm not just talking about opportunity for parallel compilation - even serially, I observed it to still be faster. I've observed this effect in gcc both when compiling my library, and when compiling Boost Spirit parsers. My theory is that some of the symbol resolution, overload resolution, SFINAE, or type inference code in gcc has an O(n log n) or even O(n^2) complexity with respect to the number of type definitions or symbols in play in the execution unit.
Ultimately what you need to do is carefully examine your templates and separate what really depends on the type information from what does not, and use type erasure and virtual functions whereever possible on the portion of the code that does not actually require the template types. You need to get stuff out of the headers and into cpp files if that part of the code can be moved. In a perfect world the compiler should be able to figure this out for itself - you shouldn't have to manually move this code to babysit it - but this is the state of the art with the compilers we have today.
The classic book C++ Template Metaprogramming: Concepts, Tools, and Techniques from Boost and Beyond comes with a 20 page appendix on profiling compile-time costs. It has a companion CD with the code that generates the graphs in that appendix.
Another paper is http://aszt.inf.elte.hu/~gsd/s/cikkek/profiling/profile.pdf, perhaps that is of use to you.
Yet another, but more labor-intensive, way is to count the number of instantiations of each class from your compiler output.
http://www.cs.uoregon.edu/research/tau/about.php might be something which can be of your interest as for templated entities, it shows the breakup of time spent for each instantiation. Other data includes how many times each function was called, how many profiled functions did each function invoke, and what the mean inclusive time per call was
I've few questions about C++ compilers
Are C++ compilers required to be one-pass compiler? Does the Standard talk about it anywhere?
In particular, is GCC one-pass compiler? If it is, then why does it generate the following error twice in this example (though the template argument is different in each error message)?
error: declaration of ‘adder<T> item’ shadows a parameter
error: declaration of ‘adder<char [21]> item’ shadows a parameter
A more general question
What are the advantages and disadvantages of one-pass compiler and multi-pass compiler?
Useful links:
A List of C/C++ compilers (wikipedia)
An incomplete list of C++ compilers (Bjarne Stroustrup's site)
The standard sets no requirements what so ever with regards to
how a compiler is implemented. But what do you mean by
"one-pass"? Most compilers today do only read the input file
once. They create an in memory representation (often in the
form of some sort of parse tree), and may make multiple passes
over that. And almost certainly make multiple passes over parts
of it. The compiler must make a "pass" over the internal
representation of a template each time it is instantiated, for
example; there's no way of avoiding that. G++ also makes
a "pass" over the template when it is defined, before any
instantiation, and reports some errors then. (The standard
committee expressedly designed templates to allow a maximum of
error detection at the point of definition. This is the
motivation behind the requirement for typename in certain
places, for example.) Even without templates, a compiler will
generally have to make two passes over a class definition if
there are functions defined in it.
With regards to the more general question, again, I think you'd
have to define exactly what you mean by "one-pass". I don't
know of any compiler today which reads the source file several
times, but almost all will visit some or all of the nodes in the
parse tree more than once. Is this one-pass or multi-pass? The
distinction was more significant in the past, when memory wasn't
sufficient to maintain much of the source code in an internal
representation. Languages like Pascal and, to a lesser degree
C, were sometimes designed to be easy to implement with a single
pass compiler, since a single pass compiler would be
significantly faster. Today, this issue is largely irrelevant,
and modern languages, including C++, tend to ignore it; where
C++ seems to conform to the needs of a one-pass compiler, it's
largely for reasons of C compatibility, and where
C compatibility is not an issue (e.g. in a class definition), it
often makes order of declaration irrelevant.
From what I know, 30 years ago it was important for a compiler to be one-pass, because reads and writes to disk (or magnetic tape) were very slow and there was not enough memory to hold whole code (thanks James Kanze). Also, a single-pass is a requirement for scripting/interactive languages.
Nowdays compilers are usually not one-pass, there are several intermediate representations (e.g Abstract Syntax Tree or Static Single Assignment Form) that the code is transformed into and then analised/optimised.
Some elements in C++ cannot be solved without some intermediate steps, e.g. in a class you can reference members which are defined only later in the class body. Also, all templates need to be somehow remembered for further access during instantiation.
What does not happen usually, is that the source code is not parsed several times --- there is no need for that. So you should not experience same syntactic error being reported several times.
No, I would be surprised if you found a heavily used C++ single pass compiler.
No, it does multiple passes and even different optimizations based on the flags you pass it.
Advantages (single-pass): fast! Since all the source only needs to be examined once the compilation phase (and thus beginning of execution) can happen very quickly. It is also a model that is attractive because it makes the compiler easy to understand and often times "easier" to implement. (I worked on a single pass Pascal compiler once, but don't encounter them often, whereas single pass interpreters are common)
Disadvantages (sinlge-pass): Optimization, semantic/syntactic analysis. Sometimes a single code look lets things through that are easily caught by simple mechanisms in multiple passes. (kind of why we have things like JSLint)
Advantages (multi-pass): optimizations, semantic/syntactic analysis. Even pseudo interpreted languages like "JRuby" go through a pipeline compilation process to get to java/jvm bytecode before execution, you could consider this multi-pass and the multiple looks at the varying representations (and consequently the resulting optimizations) of code can make it very fast.
Disadvantages (multi-pass): complexity, sometimes time (depending on if AOT/JIT is being used as your compilation method)
Also, single-pass is pretty common in academia to help learn the aspects of compiler design.
Walter Bright, the developer of the first C++ compiler, has stated that he believes it is not possible to compile C++ without at least 3 passes. And, yes, that means 3 full text-transforming passes over the source, not just traversals through an internal tree representation. See his Dr. Dobb's magazine article, "Why is C++ compilation so slow?" So any hope of finding a true one-pass compiler seems doomed. (I think this was part of the motivation Bright had to develop D, his C++ alternative.)
The compiler only needs to look at the sources once top down, but that does not mean that it does not have to process the parsed contents more than once. In particular with templates, it has to instantiate the templated code with the type, and that cannot happen until the template is used (or explicitly instantiated by the user), which is the reason for your duplicate errors:
When the template is defined, the compiler detects an error and at that point the type has not been substituted. When the actual instantiation occurs it substitutes the template arguments and processes the result, which is what triggers the second error. Note that if the template was specialized after the first definition, and before the instantiation, for that particular type, the second error need not occur.