How is clang able to steer C/C++ code optimization? - c++

I was told that clang is a driver that works like gcc to do preprocessing, compilation and linkage work. During the compilation and linkage, as far as I know, it's actually llvm that does the optimization ("-O1", "-O2", "-O3", "-Os", "-flto").
But I just cannot understand how llvm is involved.
It seems that compiling source code doesn't even need a static library such as libLLVMCore.a, instead for debian clang packages depends on another package called libllvm-3.4(clang version is 3.4), which contains libLLVM-3.4.so(.1), does clang use this shared library for optimization?
I've checked clang source code for a while and found that include/clang/Driver/Options.td contains the related options, but unfortunately I failed to find the source files that include that file, so I'm still not aware of the mechanism.
I hope someone might give me some hints.

(TL;DontWannaRead - skip to the end of this answer)
To answer your question properly you first need to understand the difference between a compiler's front-end and back-end (especially the first one).
Clang is a compiler front-end (http://en.wikipedia.org/wiki/Clang) for C, C++, Objective C and Objective C++ languages.
Clang's duty is the following:
i.e. translating from C++ source code (or C, or Objective C, etc..) to LLVM IR, a textual lower-level representation of what should that code do. In order to do this Clang employs a number of sub-modules whose descriptions you could find in any decent compiler construction book: lexer, parser + a semantic analyzer (Sema), etc..
LLVM is a set of libraries whose primary task is the following: suppose we have the LLVM IR representation of the following C++ function
int double_this_number(int num) {
int result = 0;
result = num;
result = result * 2;
return result;
}
the core of the LLVM passes should optimize LLVM IR code:
What to do with the optimized LLVM IR code is entirely up to you: you can translate it to x86_64 executable code or modify it and then spit it out as ARM executable code or GPU executable code. It depends on the goal of your project.
The term "back-end" is often confusing since there are many papers that would define the LLVM libraries a "middle end" in a compiler chain and define the "back end" as the final module which does the code generation (LLVM IR to executable code or something else which no longer needs processing by the compiler). Other sources refer to LLVM as a back end to Clang. Either way, their role is clear and they offer a powerful mechanism: whatever the language you're targeting (C++, C, Objective C, Python, etc..) if you have a front-end which translates it to LLVM IR, you can use the same set of LLVM libraries to optimize it and, as long as you have a back-end for your target architecture, you can generate optimized executable code.
Recalling that LLVM is a set of libraries (not just optimization passes but also data structures, utility modules, diagnostic modules, etc..), Clang also leverages many LLVM libraries during its front-ending process. You can't really tear every LLVM module away from Clang since the latter is built on the former set.
As for the reason why Clang is said to be a "compilation driver": Clang manages interpreting the command line parameters (descriptions and many declarations are TableGen'd and they might require a bit more than a simple grep to swim through the sources), decides which Jobs and phases are to be executed, set up the CodeGenOptions according to the desired/possible optimization and transformation levels and invokes the appropriate modules (clangCodeGen in BackendUtil.cpp is the one that populates a module pass manager with the optimizations to apply) and tools (e.g. the Windows ld linker). It steers the compilation process from the very beginning to the end.
Finally I would suggest reading Clang and LLVM documentation, they're pretty explicative and most of your questions should look for an answer there in the first place.

It's not exactly like GCC, so don't spend too much time trying to match the two precisely.
The LLVM compiler is a compiler for one specific language, LLVM. What Clang does is compile C++ code to LLVM, without optimizations. Clang can then invoke the LLVM compiler to compile that LLVM code to optimized assembly.

Related

Are g++ and clang++ 100% binary compatible? [duplicate]

If I build a static library with llvm-gcc, then link it with a program compiled using mingw gcc, will the result work?
The same for other combinations of llvm-gcc, clang and normal gcc. I'm interested in how this works out on Linux (using normal non-mingw gcc, of course) and other platforms as well, but the emphasis is on Windows.
I'm also interested in all languages, but with a strong emphasis on C and C++ - obviously clang doesn't support Fortran etc, but I believe llvm-gcc does.
I assume they all use the ELF file format, but what about call conventions, virtual table layouts etc?
Yes, for C code Clang and GCC are compatible (they both use the GNU Toolchain for linking, in fact.) You just have to make sure that you tell clang to create compiled objects and not intermediate bitcode objects. C ABI is well-defined, so the only issue is storage format.
C++ is not portable between compilers in the slightest; different compilers use different virtual table calls, constructors, destruction, name mangling, template implementations, etc. As a rule you should assume objects from one C++ compiler will not work with another.
However yes, at the time of writing Clang++ is able to use GCC/C++ compiled libraries as well; I recently set up a rig to compile C++ programs with clang using G++'s standard runtime library and it compiles+links just fine.
I don't know the answer, but slide 10 in this presentation seems to imply that the ".o" files produced by llvmgcc contain LLVM bytecode (.bc) instead of the usual target-specific object code, so that link-time optimization is possible. However, the LLVM linker should be able to link LLVM code with code produced by "normal" GCC, as the next slide says "link in native .o files and libraries here".
LLVM is a Linux tool, I have sometimes found that Linux compilers don't work quite right on Windows. I would be curious whether you get it to work or not.
I use -m i386pep when linking clang's .o files by ld. llvm's devotion to integrating with gcc is seen openly at http://dragonegg.llvm.org/ so its very intuitive to guess llvm family will greatly be cross-compatible with gcc tool-chain.
Sorry - I was coming back to llvm after a break, and have never done much more than the tutorial. First time around, I kind of burned out after the struggle getting LLVM 2.6 to build on MinGW GCC - thankfully not a problem with LLVM 2.7.
Going through the tutorial again today I noticed in Chapter 5 of the tutorial not only a clear statement that LLVM uses the ABI (Application Binary Interface) of the platform, but also that the tutorial compiler depends on this to allow access to external functions such as sin and cos.
I still don't know whether the compatible ABI extends to C++, though. That's not an issue of call conventions so much as name mangling, struct layout and vtable layout.
Being able to make C function calls is enough for most things, there's still a few issues where I care about C++.
Hopefully they fixed it but I avoid llvm-gcc because I (also) use llvm as a cross compiler and when you use llvm-gcc -m32 on a 64 bit machine the -m32 is ignored and you get 64 bit ints which have to be faked on your 32 bit target machine. Clang does not have that bug nor does gcc. Also the more I use clang the more I like. As to your direct question, dont know, in theory these days targets have well known or used calling conventions. And you would hope both gcc and llvm conform to the same but you never know. the simplest way to find this out is to write a couple of simple functions, compile and disassemble using both tool sets and see how they pass operands to the functions.

LLVM for parsing math expressions

I have some troubles wrapping my head around what LLVM actually does...
Am I right to assume that it could be used to parse mathematical expressions at runtime in a C++ program?
Right now at runtime, I'm getting the math expressions and build a C program out of it, compile it on the fly by doing system call to gcc. Then I dynamically load the .so produced by gcc and extract my eval function...
I'd like to replace this workflow by something simpler, maybe even faster...
Can LLVM help me out? Any resources out there to get me started?
You're describing using LLVM as a JIT compiler, which is absolutely possible. If you generate LLVM IR code (in memory) and hand it off to the library, it will generate machine code for you (still in memory). You can then run that code however you like.
If you want to generate LLVM IR from C code, you can also link clang as a library.
Here is a PDF I found at this answer, which has some examples of how to use LLVM as a JIT.

LLVM bitcode cross-platform

Just to be sure: Is LLVM bitcode cross-platform? By which I mean, can the generated IR (".bc") file be distrubuted and interpreted/JITed over various platforms?
If so, how does Clang convert C++ into platform independend code? While in the C++ language itself, preprocessors for determining it's target platform are used before it actually compiles.
LLVM IR can be cross-platform, with the obvious exceptions others have listed. However, that does not mean Clang generates cross-platform code. As you note, the preprocessor is almost universally used to only pass parts of the code to the C/C++ compiler, depending on the platform. Even when this is not done in user code, many system headers include a bit or two that's platform-specific, such as typedefs. For example, if you compile C code using size_t to LLVM IR on a platform where size_t is 32 bit, the LLVM IR now uses i32 for that, and there's no way in hell you can reverse engineer that to fix it.
Google's Portable Native Client project (thanks #willglynn for the link), if I understand it correctly, achieves portability by fixing the ABI for all target platforms. So in that sense, it doesn't solve the aforementioned issues: The LLVM IR is not portable to platform with a different ABI. The only reason this is more portable is that the clients provide a layer which matches the PNaCl ABI to the actual ABI. In other words, PNaCl code isn't portable to many platforms, the "PNaCl VM" is.
So, bottom line: If you're very careful, you can use LLVM IR across multiple platforms, but not without doing significant additional work (which Clang doesn't do) to abstract over the ABI differences.
Given an IR file, can I be sure it could compile to my target?
You can not assume an arbitrary IR file will always be cross-platform, as there are things in a given file that might not be platform-independent. The most notable example is that the IR can contain actual assembler sequences (via module-level or inline assembly segments), but there are other examples - e.g. usage of target specific intrinsics or calling conventions that are only supported on some targets.
Can I generate an IR file that is guaranteed to compile on all targets?
I don't know, but I believe you can, especially if you avoid specifying things like inline assembly, calling conventions, required / preferred ABI for types, etc. It can affect the optimizations the compiler will perform, though.

How to embed LLVM?

The LLVM Core project consists of:
Compiler - converts source code to LLVM IR
VM - executes compiled IR code
How can I embed the VM to a C++ application?
The LLVM is really a collection of libraries that you can link to, so it's pretty easy to embed. More often the LLVM takes IR that you generate and compiles it directly to machine code. There is also a library available to interpret and execute IR for platforms that do not support JIT compilation.
There's a pretty good tutorial available on the LLVM website here: http://llvm.org/docs/tutorial/. I suggest that you go through that and then ask more specific questions if you have them.
Take a look at the HowToUseJIT example in LLVM.

Run time Debugging

We have recently downloaded, installed and compiled gcc-3.0.4 code. gcc compiler has built successfully and we where able to compile some same test cpp file. I would like to know how we can modify gcc source code so that we add additional run time debugging statements like the binary in execution compiled by my gcc should print below statement in a log file:
filename.cpp::FunctionName#linenumber-statement
or any additional information that I can insert via this tailored compiler code.
Have you looked at the macros __FILE__ and __LINE__? They do that for you without modifying the compiler. See here for more information.
My general understand of the GCC architecture is, that it is divided into front-end (parser), middle (optimization in a special intermediate language), and a back-end (generating platform dependent output). So, for your purposes you would have to look into the back-end part.
Don't do that with an ancient compiler like GCC 3.0.
With a recent GCC 4.9 (in end of 2014 or january 2015) you could customize the compiler, e.g. with a MELT extension, which would add a new optimization pass working on Gimple. That pass would insert a Gimple statement (hopefully a call to some debugging print) before each Gimple call statement.
This is a non-trivial work (perhaps weeks of work). You need to understand all of Gimple