How to see the added code in C++ by the compiler?
E.g., we know that
when an object of some class goes out of scope, the destructor for that object is called, but how do you see the specific code that does the destructor call? Is that code still written in C++?
Its compiler-dependent and in assembly language. For example, with the Microsoft compiler, compiling with /FAsc will generate a .cod file for each object file containing the assembly code along with the original C++ lines as comments. It will show the calls to constructors/destructors as well.
There's not necessarily any "code" that gets added. C++ is pretty clear on when such things happen, and for the compiler, making a new object clearly means calling its constructor -- no additional "code" anywhere.
You're right, however, things like calls to the constructor or destructor must end up somewhere in the assembly -- but there's absolutely no guarantee that having a look at the assembly reveals much more than what you'd have known without. C++ compilers are pretty mature in these aspects, and inline a lot of things in cases where that makes sense, making the same code look different in different places.
The closest thing you'll get is adding debug symbols to your build and using a debugger to get a call graph -- that will make sure that you notice when what you see as code gets called.
You can add flags to the compile command which will let you see the file in various stages of operations done by the compiler. For e.g., the -S flag will produce a file which would have had the preprocessor done and the initial compilation done, but before the assembler runs. However, this code will not be written in C++.
Related
I have studied many articles to understand whether Compiler generates Object Code or Assembly Code and there is conflict, even in stack overflow. Some says Compiler generates Object Code while other says Compiler generates Assembly Code which is then converted to Object Code by Assembler. Both answers has up-votes. Is there any clarification or justification for this.
Ultimately, the compiler has to somehow produce object files with the code that will end up in the application, and the linker combines the code from object files and libraries (which are just collections of object files) to produce the application. So it's correct to say that the compiler produces object files and the linker combines them.
On the other hand, there are various ways that the compiler can produce the object files. One way is to directly generate object files. Another way is to generate assembler code and run the assembler to produce the object files. That introduces some flexibility, because the compiler doesn't have to know the details of how object files are laid out; the assembler does that. Yet another way is to generate C code and run the C compiler (which could, in turn generate assembler code and run the assembler) on that to produce object files. That's how cfront worked back in the olden days of C++. It's also how some modern compiler front-ends work. For example, Edison Design Group sells a C++ front-end that vendors can hook up to their own back-end for code generation. They also provide a version that generates C code, for use during compiler development when the back-end isn't yet working. Typically in these cases, the compiler will have a switch to keep the C or assembler file around, so you can examine that to see what's going on.
I am aware that the keyword inline has useful properties e.g. for keeping template specializations inside a header file.
On the other hand I have often read that inline is almost useless as hint for the compiler to actually inline functions.
Further the keyword cannot be used inside a cpp file since the compiler wants to inspect functions marked with the inline keyword whenever they are called.
Hence I am a little confused about the "automatic" inlining capabilities of modern compilers (namely gcc 4.43). When I define a function inside a cpp, can the compiler inline it anyway if it deems that inlining makes sense for the function or do I rob him of some optimization capabilities ? (Which would be fine for the majority of functions, but important to know for small ones called very often)
Within the compilation unit the compiler will have no problem inline functions (even if they are not marked as inline). Across compilation units it is harder but modern compilers can do it.
Use of the inline tag has little affect on 'modern' compilers and whether it actually inlines functions (it has better heuristics than the human mind) (unless you specify flags to force it one way or the other (which is usually a bad idea as humans are bad at making this decision)).
Microsoft Visual C++ was able to do so at least since Visual Studio 2005. They call it "Whole Program Optimization" or "Link-Time Code Generation". In this implementation, the compiler will not actually produce machine code, but write the preprocessed C++ code into the object files. The linker will then merge all of the code into one huge code unit and perform the actual compilation..
GCC is able to do this since at least version 4.5, with major improvements coming in GCC 4.7. To my knowledge the feature is still considered somewhat experimental (at least in so far as many Linux distributions not using it). GCC's implementation works very similarly by first writing the preprocessed source (in its GIMPLE intermediate language) into the object files, then compiling all of the object files into a single object file which is then passed to the linker (this allows GCC to continue to work with existing linkers).
Many big C++ projects also do what is now being called "unity builds". Instead of passing hundreds of individual C++ source files into the compiler, one source file is created that includes all the other source files in the project. The original intent behind this is to decrease compilation times (since headers etc. do not have to be parsed over and over), but as a side-effect, it will have the same outcome as the LTO/LTCG techniques mentioned above: giving the compiler perfect visibility into all functions in all compilation units.
I jump between being impressed by my C++ compiler's (MSVC 2010) ingenuity and its stupidity. Some code that did pixel format conversion via templates, which would have resolved into 5-10 assembly instructions when properly inlined, got bloated into kilobytes(!) of nested function calls. At other times, it inlines so aggressively that whole classes disappear even though they contained non-trivial functionality.
This depends on your compilation flags. With -combine and -fwhole-program, gcc will do function inlining across cpp boundaries. I'm not sure how much the linker will do if you compile into multiple object files.
The standard dictates nothing about how a function can be inlined. And compilers can inline functions if they have access to their implementation. If you only have a header with binaries, it would be impossible. If it's in the same module, the compiler can inline the function even if it is in the cpp file.
I have a C++ library which I'm instrumenting using vsinstr.exe and then running vsperfmon.exe. When I open the .coverage file in Visual Studio I am seeing some lines which are not highlighted in any color, and I know for sure that these lines were hit. What could be the reason for this? This doesn't happen when I run the same for C# libraries. It doesn't help that I'm a total newbie in C++, but I have lines with simple code not showing as hit, such as a declaration of a new variable or calls to other methods.
If you run a binary code instrumenter, it can't instrument code that isn't there. So optimized-away code, even if logically executed, can't be seen by a binary instrumenter.
If you instrument the source, then even if the compiler optimizes "away" certain code, the instrumentation (having a side effect) doesn't get optimized away. The logically executed code still vanishes from the object file, but when it would get executed, the instrumentation for that code will still exist and get executed. So you get instrumentation signalling, that indicates that optimized code, was actually "executed" in effect.
This happens because source instrumentation takes advantage of the compiler and how it must preserve behaviour while optimzing. Here's another example of this:
for (i=0;i<1000000;i++)
{ executed[5923]=true;
<body>
}
What is shown is instrumented code. The "executed[k]=true;" is the probe (for the "kth" chunk of program code) that says the loop body got executed. A binary instrumenter might do the equivalent of this in the object code. Now when the loop runs, the probe gets executed on every iteration. If this is a critical loop, performance gets affected, so instrumentatino can affect timing behavior, sometimes badly. (We note the instrumented object code is thrown away).
With source instrumentation, you get this source text. (Just like the object code case, you don't keep this, you just compile and run it, and then throw away the instrumented source code.) The difference is the optimizing compiler recognizes the probe as having a loop-invariant effect, and rewrites the object like this:
executed[5923]=true;
for (i=0;i<1000000;i++)
{ <body>
}
The cost of the instrumentation has effectively gone to zero. So source code instrumentation gives execution times which are much closer to the uninstrumented program.
Of course, if you test the un-optimized program, then presumably you don't care about the extra overhead for either binary or source instrumentation. And in that case, even a binary instrumenter will show that code that (could have been but) was not optimized as having been executed, if it is executed.
Our Test Coverage tools do source instrumentation for many languages, including C++ (and even for Visual C++ dialects, including C++14). It will show you that optimized-away code as covered. You don't need to do anything special to get the "right" answer.
Consider a situation. We have some specific C++ compiler, a specific set of compiler settings and a specific C++ program.
We compile that specific programs with that compiler and those settings two times, doing a "clean compile" each time.
Should the machine code emitted be the same (I don't mean timestamps and other bells and whistles, I mean only real code that will be executed) or is it allowed to vary from one compilation to another?
The C++ standard certainly doesn't say anything to prevent this from happening. In reality, however, a compiler is normally deterministic, so given identical inputs it will produce identical output.
The real question is mostly what parts of the environment it considers as its inputs -- there are a few that seem to assume characteristics of the build machine reflect characteristics of the target, and vary their output based on "inputs" that are implicit in the build environment instead of explicitly stated, such as via compiler flags. That said, even that is relatively unusual. The norm is for the output to depend on explicit inputs (input files, command line flags, etc.)
Offhand, I can only think of one fairly obvious thing that changes "spontaneously": some compilers and/or linkers embed a timestamp into their output file, so a few bytes of the output file will change from one build to the next--but this will only be in the metadata embedded in the file, not a change to the actual code that's generated.
According to the as-if rule in the standard, as long as a conforming program (e.g., no undefined behavior) cannot tell the difference, the compiler is allowed to do whatever it wants. In other words, as long as the program produces the same output, there is no restriction in the standard prohibiting this.
From a practical point of view, I wouldn't use a compiler that does this to build production software. I want to be able to recompile a release made two years ago (with the same compiler, etc) and produce the same machine code. I don't want to worry that the reason I can't reproduce a bug is that the compiler decided to do something slightly different today.
There is no guarantee that they will be the same. Also according to http://www.mingw.org/wiki/My_executable_is_sometimes_different
My executable is sometimes different, when I compile and recompile the same source. Is this normal?
Yes, by default, and by design, ~MinGW's GCC does not produce ConsistentOutput, unless you patch it.
EDIT: Found this post that seems to explain how to make them the same.
I'd bet it would vary every time due to some metadata compiler writes (for instance, c# compiled dlls always vary in some bytes even if I do "build" twice in a row without changing anything). But anyways, I would never rely on that it would not vary.
I have run though a code formatting tool to my c++ files. It is supposed to make only formatting changes. Now when I built my code, I see that size of object file for some source files have changed. Since my files are very big and tool has changed almost every line, I dont know whether it has done something disastrous. Now i am worried to check in this code to repo as it might lead to runtime error due to formatting tool. My question is , will the size of object file be changed , if code formatting is changed.?
Brief answer is no:)
I would not check your code into the repo without thoroughly checking it first (review, testing).
Pure formatting changes should not change the object file size, unless you've done a debug build (in which case all bets are off). A release build should be not just the same size, but barring your using __DATE__ and such to insert preprocessor content, it should be byte-for-byte the same as well.
If the "reformatting" tool has actually done some micro-optimizations for you (caching repeated access to invariants in local vars, or undoing your having done that unnecessarily), that might affect the optimization choices the compiler makes, which can have an effect on the object file. But I wouldn't assume that that was the case.
if ##__LINE__ macro is used might produce longer strings. How different are the sizes?
(this macro is often hides in new and assert messages in debug.)
just formatting the code should not change the size of the object file.
It might if you compile with debugging symbols, as it might have added more line number information. Normally it wouldn't though, as has already been pointed out.
Try comparing object files built without debugging symbols.
Try to find a comparison tool that won't care about the formatting changes (like perhaps "diff--ignore-all-space") and check using that before checking in.