Using GCC to output commented & annotated intermediate files

Using GCC to output commented & annotated intermediate files - c++

Is it possible to convince GCC to emit an intermediate file which shows:
comments
original source
expanded macro definitions
optimizations applied by compiler
resulting C or C++ code which will be turned in to assembly code?
I'd rather see intermediate C/C++ instead of assembler, but I can use just assembler too if it's sufficiently annotated.
I am trying to reverse engineer a library composed almost entirely of macros in order to extend it. I'd also like to see the effects of optimization, in order to give the compiler more opportunities to do more optimization. (In other words, to see where my previous attempts have been ineffective)

GCC applies optimizations not in the C++-code directly but in some internal language-independant format (called GIMPLE) which cannot be reverted into C++ code that easily.
Depending on what you want, you can either
just expand macros: g++ -E
or look at an assembler output where you can see which line of C++ code maps to which assembler block:
g++ -g ... && objdump -S output
I don't recommend outputting assembler directly from gcc (with -S) as the generated annotations are almost useless.

1 and 2 are shown in, well, the original source.
3 You can get source with expanded macro definitions (in fact fully preprocessed) with -E.
4 The intermediate code at various stage of optimization can be obtained with -da or various -fdump-rtl-xxx, -fdump-tree-xxx and other -fdump-xxx options.
These are documented here:
http://gcc.gnu.org/onlinedocs/gcc-4.8.1/gcc/Debugging-Options.html#Debugging-Options
5 I don't think GCC does source-to-source transformations, so the resulting C++ code is the original C++ code.
What transformations GCC does is described here:
http://gcc.gnu.org/onlinedocs/gccint/Passes.html#Passes

Related

-mimplicit-it compiler flag not recognized

I am attempting to compile a C++ library for a Tegra TK1. The library links to TBB, which I pulled using the package manager. During compilation I got the following error
/tmp/cc4iLbKz.s: Assembler messages:
/tmp/cc4iLbKz.s:9541: Error: thumb conditional instruction should be in IT block -- `strexeq r2,r3,[r4]'
A bit of googling and this question led me to try adding -mimplicit-it=thumb to CMAKE_CXX_FLAGS, but the compiler doesn't recognize it.
I am compiling on the tegra with kernal 3.10.40-grinch-21.3.4, and using gcc 4.8.4 compiler (thats what comes back when I type c++ -v)
I'm not sure what the initial error message means, though I think it has something to do with the TBB linked library rather than the source I'm compiling. The problem with the fix is also mysterious. Can anyone shed some light on this?

-mimplicit-it is an option to the assembler, not to the compiler. Thus, in the absence of specific assembler flags in your makefile (which you probably don't have, given that you don't appear to be using a separate assembler step), you'll need to use the -Wa option to the compiler to pass it through, i.e. -Wa,-mimplicit-it=thumb.
The source of the issue is almost certainly some inline assembly - possibly from a static inline in a header file if you're really only linking pre-built libraries - which contains conditionally-executed instructions (I'm going to guess its something like a cmpxchg implementation). Since your toolchain could well be configured to compile to the Thumb instruction set - which requires a preceding it (If-Then) instruction to set up conditional instructions - by default, another alternative might be to just compile with -marm (and/or remove -mthumb if appropriate) and sidestep the issue by not using Thumb at all.

Adding compiler option:
-wa
should solve the problem.

Trace gcc compilation and what code slows it down

I want to find out what code causes slow compilation times in gcc. I previously had a code being compiled slowly and someone told me the command-line switch that makes gcc to print each step that it compiles, including each function/variable/symbol and so on. That helped a lot (I could literally see in console where gcc chokes), but I forgot what was the switch.

I found it (from the gcc man page):
-Q
Makes the compiler print out each function name as it is compiled, and print some statistics about each pass when it finishes.

See also this answer to a quite similar question.
You very probably want to invoke GCC with -time or more probably -ftime-report which gives you the time spent by cc1 or cc1plus ... (the compiler proper started by the gcc or g++command) which shows the time spent in each internal phases or passes of the GCC compiler. Don't forget also optimizations, debugging, and warnings flags (e.g. -Wall -O -g); they all slow down the compilation.
You'll learn that for C programs, parsing is a small fraction of the compilation time, as soon as you ask for some optimization, e.g. -O1 or -O2. (This is less true for C++, when parsing can take half of the time, notably since template expansion is considered as parsing).
Empirically, what slows down GCC are very long function bodies. Better have 50 functions of 1000 lines each than one single function of 50000 lines (and this could happen in programs generating some of their C++ code, e.g. RefPerSys or perhaps -in spring 2021- Bismon).

Try the -v (verbose) compilation.
See this link:
http://www.network-theory.co.uk/docs/gccintro/gccintro_75.html
edit:
I understand. Maybe this will help:
gcc -fdump-tree-all -fdump-rtl-all
and the like (-fdump-passes). See here: http://fizz.phys.dal.ca/~jordan/gcc-4.0.1/gcc/Debugging-Options.html

How to generate machine code with llvm

I'm currently working on a compiler project using llvm. I have followed various tutorials to the point where I have a parser to create a syntax tree and then the tree is converted into an llvm Module using the provided IRBuilder.
My goal is to create an executable, and I am confused as what to do next. All the tutorials I've found just create the llvm module and print out the assembly using Module.dump(). Additionally, the only documentation I can find is for llvm developers, and not end users of the project.
If I want to generate machine code, what are the next steps? The llvm-mc project looks like it may do what I want, but I can't find any sort of documentation on it.
Perhaps I'm expecting llvm to do something that it doesn't. My expectation is that I can build a Module, then there would be an API that I can call with the Module and a target triple and an object file will be produced. I have found documentation and examples on producing a JIT, and I am not interested in that. I am looking for how to produce compiled binaries.
I am working on OS X, if that has any impact.

Use llc -filetype=obj to emit a linkable object file from your IR. You can look at the code of llc to see the LLVM API calls it makes to emit such code. At least for Mac OS X and Linux, the objects emitted in such a manner should be pretty good (i.e. this is not a "alpha quality" option by now).
LLVM does not contain a linker (yet!), however. So to actually link this object file into some executable or shared library, you will need to use the system linker. Note that even if you have an executable consisting of a single object file, the latter has to be linked anyway. Developers in the LLVM community are working on a real linker for LLVM, called lld. You can visit its page or search the mailing list archives to follow its progress.

As you can read on the llc guide, it is indeed intended to just generate the assembly, and then "The assembly language output can then be passed through a native assembler and linker to generate a native executable" - e.g. the gnu assembler (as) and linker (ld).
So the main answer here is to use native tools for assembling and linking.
However, there's experimental support for generating the native object directly from an IR file, via llc:
-filetype - Choose a file type (not all types are supported by all targets):
=asm - Emit an assembly ('.s') file
=obj - Emit a native object ('.o') file [experimental]
Or you can use llvm-mc to assemble it from the .s file:
-filetype - Choose an output file type:
=asm - Emit an assembly ('.s') file
=null - Don't emit anything (for timing purposes)
=obj - Emit a native object ('.o') file
I don't know about linkers, though.
In addition, I recommend checking out the tools/bugpoint/ToolRunner.h file, which exposes a wrapper combining llc and the platform's native C toolchain for generating machine code. From its header comment:
This file exposes an abstraction around a platform C compiler, used to compile C and assembly code.

Check out these functions in llvm-c/TargetMachine.h:
/** Emits an asm or object file for the given module to the filename. This
wraps several c++ only classes (among them a file stream). Returns any
error in ErrorMessage. Use LLVMDisposeMessage to dispose the message. */
LLVMBool LLVMTargetMachineEmitToFile(LLVMTargetMachineRef T, LLVMModuleRef M,
char *Filename, LLVMCodeGenFileType codegen, char **ErrorMessage);
/** Compile the LLVM IR stored in \p M and store the result in \p OutMemBuf. */
LLVMBool LLVMTargetMachineEmitToMemoryBuffer(LLVMTargetMachineRef T, LLVMModuleRef M,
LLVMCodeGenFileType codegen, char** ErrorMessage, LLVMMemoryBufferRef *OutMemBuf);

To run the example BrainF program, compile it and run:
echo ,. > test.bf
./BrainF test.bf -o test.bc
llc -filetype=obj test.bc
gcc test.o -o a.out
./a.out
then type a single letter and press Enter. It should echo that letter back to you. (That's what ,. does.)
The above was tested with LLVM version 3.5.0.

Can we inspect an object file for presence of temporaries introduced by C++ compiler?

Is there a way to inspect object file generated from code below ( file1.o ) for presence of compiler introduced temporary? What tools can we use to obtain such info from object files?
//file1.cpp
void func(const int& num){}
int main(){ func(2); }

The easiest way I can think of to do this is to load up a program that uses the object file and disassemble the function in the debugger. The program code you posted would work fine for this. Just break on the call to func and then display the assembler when you single-step into the function.
In a more complex program you can usually display the assembler code for a given function by name. Check your debugger documentation for how to do this. On Windows (Visual Studio) you can open the Disassembly window and enter the name of the function to display the assembler code.
If you have the source, most compilers allow you to output assembler, sometimes mixed with the source code. For Visual C++ this is /Fa.

If you're on an ELF system and have GNU binutils you can call readelf, probably with the -s switch.

If you have the source available, it is probably easier to look at the assembler file generated by the compiler (-save-temps for gcc). Otherwise, objdump is your friend.

You can use clang -cc1 --ast-print-xml to get a XML representation of a translation unit. The presence of temporaries can be easily detected from the AST.

Detect GCC compile-time flags of a binary

Is there a way to find out what gcc flags a particular binary was compiled with?

A quick look at the GCC documentation doesn't turn anything up.
The Boost guys are some of the smartest C++ developers out there, and they resort to naming conventions because this is generally not possible any other way (the executable could have been created in any number of languages, by any number of compiler versions, after all).
(Added much later): Turns out GCC has this feature in 4.3 if asked for when you compile the code:
A new command-line switch -frecord-gcc-switches ... causes the command line that was used to invoke the compiler to be recorded into the object file that is being created. The exact format of this recording is target and binary file format dependent, but it usually takes the form of a note section containing ASCII text.

Experimental proof:
diciu$ gcc -O2 /tmp/tt.c -o /tmp/a.out.o2
diciu$ gcc -O3 /tmp/tt.c -o /tmp/a.out.o3
diciu$ diff /tmp/a.out.o3 /tmp/a.out.o2
diciu$
I take that as a no as the binaries are identical.

I'm the one who asked Brian to post this originally. My question had to do with the samba binary. I found out that you can run smb -b to get information on how it was built.

I don't think so.
You can see if it has debug symbols, which means -g was used ;) But I can't think of any way how you could find out which directories have been used to search for include headers for example.
Maybe a better answer is possible if you only target for a specific flag; e.g. if you only want to know if the flag "..." was set when this binary was compiled or not. In that case, what flag would this be?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Using GCC to output commented & annotated intermediate files - c++

Related

-mimplicit-it compiler flag not recognized

Trace gcc compilation and what code slows it down

How to generate machine code with llvm

Can we inspect an object file for presence of temporaries introduced by C++ compiler?

Detect GCC compile-time flags of a binary

Categories

Resources