We have recently downloaded, installed and compiled gcc-3.0.4 code. gcc compiler has built successfully and we where able to compile some same test cpp file. I would like to know how we can modify gcc source code so that we add additional run time debugging statements like the binary in execution compiled by my gcc should print below statement in a log file:
filename.cpp::FunctionName#linenumber-statement
or any additional information that I can insert via this tailored compiler code.
Have you looked at the macros __FILE__ and __LINE__? They do that for you without modifying the compiler. See here for more information.
My general understand of the GCC architecture is, that it is divided into front-end (parser), middle (optimization in a special intermediate language), and a back-end (generating platform dependent output). So, for your purposes you would have to look into the back-end part.
Don't do that with an ancient compiler like GCC 3.0.
With a recent GCC 4.9 (in end of 2014 or january 2015) you could customize the compiler, e.g. with a MELT extension, which would add a new optimization pass working on Gimple. That pass would insert a Gimple statement (hopefully a call to some debugging print) before each Gimple call statement.
This is a non-trivial work (perhaps weeks of work). You need to understand all of Gimple
Related
I have an existing and working source-to-source code modification tool using libtooling (written in C++). Now I want to integrate this tool into clang, so users can compile the modified source code without actually saving it somewhere.
The modification part isn't problematic, Matchers + Rewriters work the same way with clang, my problem is how to tell the compiler to reparse the source code after my changes.
I'm looking for examples of code that triggers non-determinism in GCC or Clang's compilation process.
One prominent example is the usage of the __DATE__ macro.
GCC and Clang have a plethora of compiler flags to control the outcome of non-deterministic actions within the compiler eg. -frandom-seed and -fno-guess-branch-probability
Are there any small examples that are affected by these flags?
To be more precise:
$ c++ main.cpp -o main && shasum main
aabbccddee
$ c++ main.cpp -o main && shasum main
eeddccbbaa
I'm looking for macro-free code examples where multiple runs of the compiler lead to different outputs, but can be fixed by e.g. -frandom-seed
EDIT:
related: from the gcc docs:
-fno-guess-branch-probability:
Sometimes gcc will opt to use a randomized model to guess branch probabilities,
when none are available from either profiling feedback (-fprofile-arcs)
or __builtin_expect.
This means that different runs of the compiler on the same program
may produce different object code.
The default is -fguess-branch-probability at levels -O, -O2, -O3, -Os.
While old, this question is interesting for reproducible builds.
As you've stated, there are multiple source of non-determinism while compiling some C/C++ source.
Non-determinism in preprocessor
The preprocessor usually implements some numerous super macro which are changing between runs. There's the obvious __DATE__ and __TIME__ but also the non obvious __cplusplus or __STD_C_VERSION__ or __GNUC_PATCHLEVEL__ which can changes when the OS updates.
There's also the __FILE__ that will contain the path of the building environment (different from machine to machine).
Please notice that for the former macro, GCC observes the environment variable SOURCE_DATE_EPOCH to overwrite the date and time macro. Other compilers might have some other behavior.
Non-determinism in the compiler
The compiler might have different optimization strategies based on non-deterministic approach. You've cited one in GCC, but other might exists.
For MSVC, you might be interested in the /BREPRO compiler flag.
You'll have to RTFM for your compiler to know more.
Non-determinism in the linker
On some architecture, the linked object and/or library will contain a timestamp. MacOS is one of them. So for the same set of .o files, you'll get a different resulting executable.
Also, if you use Link Time Optimization, many compiler will create different versions of the .o files named randomly. Again for GCC, you'll use -frandom-seed=31415 to "fix" this randomness, but YMMV.
Non-determinism in the build-process
Sometimes repositories contain additional operation that are performed outside of the compilation stage. Like generating header files based on some configuration flags (or other steps).
In that case, this per-project's specific operations might not be deterministic either.
For a good overview of the deterministic builds, please refer to this post
I have a software that is able to generate C code that I would like to use in a just-in-time compilation context. From what I understand, LLVM/Clang is the way to go and, for maintainability of the project, I'd like to use the C API of llvm and Clang (libclang).
I started out creating a libclang context using clang_createIndex and a translation unit using createTranslationUnitFromSourceFile (would have been nice to be able to avoid going via the file system and pass the source code as a string instead). But I pretty much get stuck there. How can I go from the libclang translation unit to an LLVM "execution engine" which is what appears to be needed for JIT? Or is this not even possible using the C API?
The best method to learn how to use a body of code is to investigate the examples you're given.
There exist tutorials on how to leverage the clang/llvm tools to compile C++ code and emit LLVM-IR, to compile LLVM-IR to LLVM-Bitcode, and to execute that LLVM-bitcode. All that is necessary to learn to incorporate this functionality in our application is to investigate the execution path of these tools, to find the sequence of methods that accomplish what we want.
Here is an example using the example tools of compiling a cpp file to llvm-bitcode, and executing it.
clang++ -c -O3 -emit-llvm main.cpp -o main.bc
lli main.bc
This is a great start, we can just look at the source behind the tools, and investigate the execution path outlined by the arguments. Since these tools are merely interfaces exposing the underlying functionality available in the llvm/clang libraries that we can add to our project, following the execution path shallowly will give us a sequence of library available methods that we can call within our application to accomplish the same results.
Once the sequence of library methods is trivially established, you can delve into breaking individual library methods into their underlying functionality, and tease out the exact behavior we desire through a relatively small set of modifications here and there, rather than trying to reimplement something from the ground up.
I was told that clang is a driver that works like gcc to do preprocessing, compilation and linkage work. During the compilation and linkage, as far as I know, it's actually llvm that does the optimization ("-O1", "-O2", "-O3", "-Os", "-flto").
But I just cannot understand how llvm is involved.
It seems that compiling source code doesn't even need a static library such as libLLVMCore.a, instead for debian clang packages depends on another package called libllvm-3.4(clang version is 3.4), which contains libLLVM-3.4.so(.1), does clang use this shared library for optimization?
I've checked clang source code for a while and found that include/clang/Driver/Options.td contains the related options, but unfortunately I failed to find the source files that include that file, so I'm still not aware of the mechanism.
I hope someone might give me some hints.
(TL;DontWannaRead - skip to the end of this answer)
To answer your question properly you first need to understand the difference between a compiler's front-end and back-end (especially the first one).
Clang is a compiler front-end (http://en.wikipedia.org/wiki/Clang) for C, C++, Objective C and Objective C++ languages.
Clang's duty is the following:
i.e. translating from C++ source code (or C, or Objective C, etc..) to LLVM IR, a textual lower-level representation of what should that code do. In order to do this Clang employs a number of sub-modules whose descriptions you could find in any decent compiler construction book: lexer, parser + a semantic analyzer (Sema), etc..
LLVM is a set of libraries whose primary task is the following: suppose we have the LLVM IR representation of the following C++ function
int double_this_number(int num) {
int result = 0;
result = num;
result = result * 2;
return result;
}
the core of the LLVM passes should optimize LLVM IR code:
What to do with the optimized LLVM IR code is entirely up to you: you can translate it to x86_64 executable code or modify it and then spit it out as ARM executable code or GPU executable code. It depends on the goal of your project.
The term "back-end" is often confusing since there are many papers that would define the LLVM libraries a "middle end" in a compiler chain and define the "back end" as the final module which does the code generation (LLVM IR to executable code or something else which no longer needs processing by the compiler). Other sources refer to LLVM as a back end to Clang. Either way, their role is clear and they offer a powerful mechanism: whatever the language you're targeting (C++, C, Objective C, Python, etc..) if you have a front-end which translates it to LLVM IR, you can use the same set of LLVM libraries to optimize it and, as long as you have a back-end for your target architecture, you can generate optimized executable code.
Recalling that LLVM is a set of libraries (not just optimization passes but also data structures, utility modules, diagnostic modules, etc..), Clang also leverages many LLVM libraries during its front-ending process. You can't really tear every LLVM module away from Clang since the latter is built on the former set.
As for the reason why Clang is said to be a "compilation driver": Clang manages interpreting the command line parameters (descriptions and many declarations are TableGen'd and they might require a bit more than a simple grep to swim through the sources), decides which Jobs and phases are to be executed, set up the CodeGenOptions according to the desired/possible optimization and transformation levels and invokes the appropriate modules (clangCodeGen in BackendUtil.cpp is the one that populates a module pass manager with the optimizations to apply) and tools (e.g. the Windows ld linker). It steers the compilation process from the very beginning to the end.
Finally I would suggest reading Clang and LLVM documentation, they're pretty explicative and most of your questions should look for an answer there in the first place.
It's not exactly like GCC, so don't spend too much time trying to match the two precisely.
The LLVM compiler is a compiler for one specific language, LLVM. What Clang does is compile C++ code to LLVM, without optimizations. Clang can then invoke the LLVM compiler to compile that LLVM code to optimized assembly.
My definition of powerful is ability to customize.
I'm familiar with gcc I wanted to try MSVC. So, I was searching for gcc equivalent options in msvc. I'm unable to find many of them.
controlling kind of output
Stop after the preprocessing stage; do not run the compiler proper.
gcc: -E
msvc: ???
Stop after the stage of compilation proper; do not assemble.
gcc: -S
msvc: ???
Compile or assemble the source files, but do not link.
gcc: -c
msvc:/c
Useful for debugging
Print (on standard error output) the commands executed to run the stages of compilation.
gcc: -v
msvc: ???
Store the usual “temporary” intermediate files permanently;
gcc: -save-temps
msvc: ???
Is there some kind of gcc <--> msvc compiler option mapping guide?
gcc Option Summary lists more options in each section than Compiler Options Listed by Category. There are hell lot of important and interesting things missing in msvc. Am I missing something or msvc is really less powerful than gcc.
MSVC is an IDE, gcc is just a compiler. CL (the MSVC compiler) can do most of the steps that you are describing from gcc's point of view. CL /? gives help.
E.g.
Pre-process to stdout:
CL /E
Compile without linking:
CL /c
Generate assembly (unlike gcc, though, this doesn't prevent compiling):
CL /Fa
CL is really just a compiler, if you want to see what commands the IDE generates for compiling and linking the easiest thing to look at the the command line section of the property pages for an item in the IDE. CL doesn't call a separate preprocessor or assembler, though, so there are no separate commands to see.
For -save-temps, the IDE performs separate compiling and linking so object files are preserved anyway. To preserve pre-processor output and assembler output you can enable the /P and /Fa through the IDE.
gcc and CL are different but I wouldn't say that the MSVC lacks "a hell lot" of things, certainly not the outputs that you are looking for.
For the equivalent of -E, cl.exe has /P (it doesn't "stop after preprocessing stage" but it outputs the preprocessor output to a file, which is largely the same thing).
For -S, it's a little murkier, since the "compilation" and "assembling" steps happen in multiple places depending on what other options you have specified (for example, if you have whole program optimization turned on, then machine code is not generated until the link stage).
For -v, Visual C++ is not the same as GCC. It executes all stages of compilation directly in cl.exe (and link.exe) so there are no "commands executed" to display. Similarly for -save-temps: because everything happens inside cl.exe and link.exe directly, the only "temporary" files are the .obj files that cl.exe produces and they're always saved anyway.
At the end of the day, though, GCC is an open source project. That means anybody with an itch to scratch can add whatever command-line options they like with relatively little resistance. For Visual C++, a commercial closed-source product, every option needs to have a business case, design meetings, test plans and so on. Every new feature starts with minus 100 points.
Both compilers have a plethora of options for modifying... everything. I suspect that any option not present in either is an option for something not worth doing in the first place. Most "normal" users don't find a use for most of those options anyway.
If you're looking purely at the number of available options as a measure of "power" or "flexibility" then you'll probably find gcc to be the winner, simply because gcc handles many platforms other than Windows and has specific options for many of those platforms that you obviously won't find in MSVC. gcc (well, the gcc toolchain) also compiles a whole lot of languages beyond C and C++; I recently used it for Objective-C, for example.
EDIT: I'm with Dean in questioning the validity of your question. Yes, MSVC (cl) has options for the equivalent of many of gcc's options, but no, the number of options doesn't really mean much.
In short: Unless you're doing something very special, you'll find MSVC easily "powerful enough" on the Windows platform that you will likely not be missing any gcc options.