Just-in-time compilation using libclang and LLVM C - llvm

I have a software that is able to generate C code that I would like to use in a just-in-time compilation context. From what I understand, LLVM/Clang is the way to go and, for maintainability of the project, I'd like to use the C API of llvm and Clang (libclang).
I started out creating a libclang context using clang_createIndex and a translation unit using createTranslationUnitFromSourceFile (would have been nice to be able to avoid going via the file system and pass the source code as a string instead). But I pretty much get stuck there. How can I go from the libclang translation unit to an LLVM "execution engine" which is what appears to be needed for JIT? Or is this not even possible using the C API?

The best method to learn how to use a body of code is to investigate the examples you're given.
There exist tutorials on how to leverage the clang/llvm tools to compile C++ code and emit LLVM-IR, to compile LLVM-IR to LLVM-Bitcode, and to execute that LLVM-bitcode. All that is necessary to learn to incorporate this functionality in our application is to investigate the execution path of these tools, to find the sequence of methods that accomplish what we want.
Here is an example using the example tools of compiling a cpp file to llvm-bitcode, and executing it.
clang++ -c -O3 -emit-llvm main.cpp -o main.bc
lli main.bc
This is a great start, we can just look at the source behind the tools, and investigate the execution path outlined by the arguments. Since these tools are merely interfaces exposing the underlying functionality available in the llvm/clang libraries that we can add to our project, following the execution path shallowly will give us a sequence of library available methods that we can call within our application to accomplish the same results.
Once the sequence of library methods is trivially established, you can delve into breaking individual library methods into their underlying functionality, and tease out the exact behavior we desire through a relatively small set of modifications here and there, rather than trying to reimplement something from the ground up.

Related

Is it possible to compile Clang LibTooling into standalone library?

I`m trying to find best solution for parsing C++ code for my refactor support tool. I decided to use clang frontend because it is meant to be frontend of C++ compiler. I found the documentation page that describes parsing of C++ code using clang LibTooling. But, unfortunately, it looks for me that LibTooling cannot be used separately from clang infrastructure installed on the tool user side.
So, my question: is it possible to compile Clang LibTooling to standalone parser that my be reused between C++ parse projects without clang infrastructure? Ideal solution for me is shown on the picture. I want to compile LibTooling linking it with my wrapping library (that may simplify / adopt LibTooling API for my purposes) and then reuse it between different projects.
P.S.: Maybe, something like it where performed in Firolino's clang project - but it looks that full clang installation should be used for too.
P.P.S.: As for me, solution that is whanted for my may be useful for all C++ community. So if somebody want to cooperate with me in making this project - you are welcome!

Is there anything like a forwarding C++ preprocessor, that could be used by GCC?

I've been searching around for different custom pre-processor extensions and replacements, but all of them seem to come with 1 of 2 caveats:
Either 1), you generate the code as a separate build-system, them manually put the output into your real (CMake) build system, or 2) you end up losing the builtin preprocessor for GCC.
Is there really no tool that can, say, run each file it gets against some configured script, then through cpp, then pass the result to gcc?
I'd love to use something like Cog by just setting an environment variable for gcc, indicating a tool that runs Cog first and then the standard preprocessor.
Alternatively, is there a straightforward way to accomplish that in CMake, itself? I don't want to have to write a custom script for each file, especially if I have to then hard-code the compiler/preprocessor flags in each target.
edit: For clarity, I am aware of several partial/partially-applicable solutions. For example, how to tell GCC to use a different preprocessor. (Or really, to look in a different place for its own preprocessor, cc1. See: Custom gcc preprocessor) However, that leaves a lot of work to do, to modify files, and then correctly invoke the real cc1, with the correct original arguments.
Since that is effectively a constant/generic problem, I'm just surprised there is no drop in program.
Edit 2: After looking over several proposed solutions, I am not convinced there is an answer to this question. For example, if files are going to be generated by CMake, then they can't be included and browsed by the IDE - due to not yet existing.
As ridiculous as it sounds, I don't think there is any way to extend the preprocessor short of forking Gcc. Everything recommended so far, constitutes incomplete hacks.
The GCC (C++ compiler) is made for compiling C++ programs. As the C++ preprocessor is standardized within the C++ standard there is usually no need for anything like a "plugin" or "extension" there.
Don't listen to the comments, that suggest you using any exotic extension to CMake or change source code of GCC. Running source files through a different program (cog in your case) before compiling is a well known task and all major build systems support it right away.
In CMake you can use the add_custom_command function. If you need this for more than one file, you could use a CMake loop like e.g. suggested in this answer.

Is it possible to compile LLIR to binary without clang?

I'm writing a compiler that embeds the LLVM API. By copying code from the llc tool, I can output assembly language or object files that I can turn into binaries using clang or an assembler.
But I want my compiler to be self contained. Is it possible to turn LLIR into binaries using LLVM? This seems like the sort of thing that should be in the LLVM toolkit.
Yes, it is possible and this is also done by llc with -filetype=obj argument.
You can consult the compileModule function to learn how to use the programmatic API.
Note that this will only generate an object file for a given translation unit. You will also need a linker to convert it into a proper executable or library. The LLVM linker, lld, can also be embedded into client applications as a library, so in the end you will be able to create a self-hosting compiler.

make SCons compile everything in one gcc line?

I have a rather complex SCons script that compiles a big C++ project.
This gcc manual page says:
The compiler performs optimization based on the knowledge it has of the program. Compiling multiple files at once to a single output file mode allows the compiler to use information gained from all of the files when compiling each of them.
So it's better to give all my files to a single g++ invocation and let it drive the compilation however it pleases.
But SCons does not do this. it calls g++ separately for every single C++ file in the project and then links them using ld
Is there a way to make SCons do this?
The main reason to have a build system with the ability to express dependencies is to support some kind of conditional/incremental build. Otherwise you might as well just use a script with the one command you need.
That being said, the result of having gcc/g++ optimize as the manual describe is substantial. In particular if you have C++ templates you use often. Good for run-time performance, bad for recompile performance.
I suggest you try and make your own builder doing what you need. Here is another question with an inspirational answer: SCons custom builder - build with multiple files and output one file
Currently the answer is no.
Logic similar to this was developed for MSVC only.
You can see this in the man page (http://scons.org/doc/production/HTML/scons-man.html) as follows:
MSVC_BATCH When set to any true value, specifies that SCons should
batch compilation of object files when calling the Microsoft Visual
C/C++ compiler. All compilations of source files from the same source
directory that generate target files in a same output directory and
were configured in SCons using the same construction environment will
be built in a single call to the compiler. Only source files that have
changed since their object files were built will be passed to each
compiler invocation (via the $CHANGED_SOURCES construction variable).
Any compilations where the object (target) file base name (minus the
.obj) does not match the source file base name will be compiled
separately.
As always patches are welcome to add this in a more general fashion.
In general this should be left up to the program developer. Trying to compile all together in an amalgamation may introduce unintended behaviour to the program if it even compiles in the first place. Your best bet if you want this kind of optimisation without editing the source yourself is to use a compiler with inter-process optimisation like icc -ipo.
Example where an amalgamation of two .c files would not compile is for example if they use two identical static symbols with different functionality.

binary generation from LLVM

How does one generate executable binaries from the c++ side of LLVM?
I'm currently writing a toy compiler, and I'm not quite sure how to do the final step of creating an executable from the IR.
The only solution I currently see is to write out the bitcode and then call llc using system or the like. Is there a way to do this from the c++ interface instead?
This seems like it would be a common question, but I can't find anything on it.
LLVM does not ship the linker necessary to perform this task. It can only write out as assembler and then invoke the system linker to deal with it. You can see the source code of llvm-ld to see how it's done.