I need to do heavy instrumentation using a LLVM pass. I want to avoid the IR Builder because it is somehow complicated and the code looks really messy. Isn't there a more convenient way to create LLVM IR? I think of a way where I can use for example C/C++ to create the instrumented code.
I do not use the IR Builder for my instrumentation. Most of the instrumentation is accomplished with two steps:
The LLVM pass identifies the instructions of interest and inserts function calls to the requisite instrumentation routine.
The instrumentation routines are written in C files and compiled and linked into the final program. Using link-time optimization (LTO), this approach achieves good performance by removing the function calls and directly inserting the machine code for the library instrumentation routines.
Therefore, most of the instrumentation is C code that clang compiles down to necessary IR.
Other pieces of instrumentation have to be crafted dynamically, so the IR is constructed by invoking the appropriate XYZInst::Create calls, where XYZ is the specific instruction.
The first approach is closer to what you desire; however, it required writing a separate script to act as the compiler driver and managing Makefiles, libraries, et cetera.
REQUIREMENT: For a certain project we have unique requirement. The application supports an expression language that allows the user to define their own complex expressions that can be evaluated at run time (many hundred times a second) and they need to be executed at machine level for performance.
WORKING: Our expression parser translates the script into corresponding assembly language routine perfectly. We checked it by statically linking the object files generated with our C test program and they produce correct result.
Since the client can change the script anytime, our program (at run time) detects the change, calls the parser which generates the corresponding assembly routine. We then call the assembler from back end to create the object code.
PROBLEM
How can we call this assembly routine dynamically from the C++ program
(Loader)?
We are not supposed to call the C++ compiler to link it with the loader because the loader already would have other subroutines running and we cannot take the loader off, recompile and then execute the new loader program.
I tried searching for a solution online but every time the results are littered with .NET assembly dynamic calling. Our app has nothing to do with .NET.
First, the "generated plugin" approach (on Linux; my answer focuses on Linux but could be adapted to Windows with some effort; you could use many-platform frameworks like Qt or POCO or Glib from GTK; then all wrap plugin loading abilities à la dlopen with a common API that you could use on Windows, on Linux, on MacOSX, on Android) :
generate C (or assembly) code in some file /tmp/generated01.c (you might even generate C++ code using standard C++ containers, but its compilation would be significantly slower; beware of name mangling so emit and use extern "C" functions; read the C++ dlopen mini HowTo). See this answer explaining why generating C is worthwhile (and could be better, and more portable, than generating assembler code).
run (using fork+execve+waitpid, or simply system) a compilation of that generated file into a shared object /tmp/genenerated01.so by running gcc -fPIC -Wall -O /tmp/generated01.c -shared -o /tmp/generated01.so command; you practically need to get position-independent code, hence the -fPIC flag. If using dlopen on your generated assembler code you'll need to improve your assembler generator to emit PIC code.
dlopen that new /tmp/generated01.so (so use the dynamic linker), see dlopen(3); you could even remove the now useless generated C file /tmp/generated01.c
dlsym the relevant symbols to get function pointers to the generated code, see dlsym(3); your application would simply call the generated code using these function pointers.
when you are sure that you don't need any functions from it and that no call frame uses it, you could dlclose that shared object library (but you might accept to leak some address space by not calling dlclose at all)
The above approach is worthwhile and can be used a big lot of times (my manydl.c demonstrates that you could dlopen a million different shared objects), and is practically even compatible (even when emitting C code!) with an interactive Read-Eval-Print-Loop -on most current desktops and laptops and servers-, since most of the time the generated /tmp/generated01.c would be quite small (e.g. a few hundred lines at most) to be very quickly generated and compiled (by gcc, etc...). I am even using this in MELT for its REPL mode. On Linux this plugin approach generally requires to link the main application with -rdynamic (so that dlopen-ed plugins can reference and call functions from the main application).
Then, other approaches could be to use some Just-In-Time compilation library, like
GNU lightning (which emits slow machine code very quickly - so very short JIT emission time, but the generated code is running slowly since it is very unoptimized)
asmjit; it is x86-64 specific, and enables you to generate individual x86-64 machine instructions
GNU libjit is available for several platforms, and offer an "interpreter" mode for other platforms
LLVM (part of Clang/LLVM compiler, usable as a JIT library)
GCCJIT (a new JIT library front-end to GCC)
Grossly speaking, the first elements of that list are able to emit JIT machine code fairly quickly, but that code won't run as fast as compiling with gcc -fPIC -O1 or -O2 the equivalent generated C code (but would run typically 2x to 5x slower!); the last two elements (LLVM & GCCJIT) are compiler based: so they are able to optimize and emit efficient code, at the expense of slower JIT code emission. All the JIT libraries are able (like dlsym does for plugins) to give function pointers to newly JIT-constructed functions.
Notice that there is a trade-off to be made: some techniques are able to generate quickly some machine code, if you accept that generated code to later run a bit slowly; other techniques (notably GCCJIT or LLVM) are spending time to optimize the generated machine code, so takes more time to emit the machine code, but that code would later run quickly. You should not expect both (small generation time, quick execution time), since there is no such thing as a free lunch.
I believe that generating manually some assembler code is practically not worthwhile. You won't be able to generate very optimized code (because optimization is a very difficult art, and both GCC and Clang have millions of source line code for optimization passes), unless you spend many years of work for that. Using some JIT library is easier, and "compiling" to C or C++ is also quite easy (you leave the burden of optimization to the C compiler you are calling).
You could also consider rewriting your application into some language with homoiconicity and metaprogramming abilities (e.g. multi-stage programming), such as Common Lisp (and many others, e.g. those providing eval). Its SBCL implementation is always emitting machine code...
You could also embed an interpreter like Lua -perhaps even LuaJit- or Guile in your application. The main advantage of embedding an existing language is that there are resources (books, modules, ...) and community of people knowing them (designing a good language is difficult!). Also, the embedded interpreter library is well designed and probably well debugged (since used a lot), and some of them are fast enough (since using bytecode techniques).
As the comments already suggest, LoadLibrary (Windows) and dlopen (Linux/POSIX) are by far the easiest solution. These are specifically intended to dynamically load code. Equally important, they both allow unloading as well, and there are functions to then get a function entry point by name.
You can dynamically do it. I will take linux case as an example. Since your parser working fine and generates machine code, you should be able to generate .so (for linux) or .dll for windows.
Next, load the library as
handle = dlopen(so_file_name, RTLD_LAZY);
Next get function pointer
func = dlsym(handle, "function_name");
Then you should be able to execute it as func()
One thing you need to experiment (in case you do not get desired result) is close and open the so file or dll file (you need to do only if required, else it may reduce performance)
It sounds like you can generate the proper byte code. So you could just ensure that you generate position independent code, write it into an executable piece of memory, and then call or create thread upon the code. The simplest way would just be to cast the pointer to the base of the memory you wrote the code into as a function pointer, and then call it.
If you write your bytecode to avoid referencing different sections, and instead reference offsets from its loaded base, 'loading' the code is as easy as writing it to executable memory. You could do a call/pop/jmp to find the base of the code once it begins executing.
Conversely, and probably the easiest solution, would be to just write the code out as function expecting arguments, that way you could pass the code's base and any other arguments to it, as you would with any other function, as long as you use the proper typedef for your function pointer, and the generated assembly handles the arguments properly. As long as you avoid creating absolute jumps or data references to absolute addresses, you shouldn't have any issue.
too late but I think it would help someone else.
in case you want to dynamically execute a piece of code, you can create an interpreter for this.
compile your expressions into some byte code then write the interpreter for executing this.
here is a tutorial about writing interpreters, but in python.
https://ruslanspivak.com/lsbasi-part1/
you can write it using c/c++
As you might know, PIN is a dynamic binary instrumentation tool. By using Pin for example, I can instrument every load and store in my application. I was wondering If there is a similar tool which injects code at compile time (Using a higher level of information, not requiring us to write the LLVM pass), rather than at runtime like Pin. I am especially interested for such kind of tool for LLVM.
You could write LLVM passes of your own and apply them on your code to "instrument" it during compile time. These work on LLVM IR and produce LLVM IR, so for some tasks this will be a very natural thing to do and for other tasks it might be cumbersome or difficult (because of the differences between LLVM and IR and the source language). It depends.
Is it possible somehow to write a compiler producing LLVM IR code which user will JIT compile and after compiling it in memory it would be written to disk as binary file?
The idea behind this scenario is that I dont want to compile LLVM IR code and let users to execute it immediately (with lower performance due to JIT compiling). But I want that when users execute this program second time it would be already compiled?
So the question is how to reuse code produced by JIT when generating native binaries? I doubt there is API to do this, but remembering how MC JIT works, it might be relatively easy to implement.
But from my POV it's better to jsut compile LLVM IR into native code on the second run.
I was reading here and there about llvm that can be used to ease the pain of cross platform compilations in c++ , i was trying to read the documents but i didn't understand how can i
use it in real life development problems can someone please explain me in simple words how can i use it ?
The key concept of LLVM is a low-level "intermediate" representation (IR) of your program.
This IR is at about the level of assembler code, but it contains more information to facilitate optimization.
The power of LLVM comes from its ability to defer compilation of this intermediate representation to a specific target machine until just before the code needs to run. A just-in-time (JIT) compilation approach can be used for an application to produce the code it needs just before it needs it.
In many cases, you have more information at the time the program is running that you do back at head office, so the program can be much optimized.
To get started, you could compile a C++ program to a single intermediate representation, then compile it to multiple platforms from that IR.
You can also try the Kaleidoscope demo, which walks you through creating a new language without having to actually write a compiler, just write the IR.
In performance-critical applications, the application can essentially write its own code that it needs to run, just before it needs to run it.
Why don't you go to the LLVM website and check out all the documentation there. They explain in great detail what LLVM is and how to use it. For example they have a Getting Started page.
LLVM is, as its name says a low level virtual machine which have code generator. If you want to compile to it, you can use either gcc front end or clang, which is c/c++ compiler for LLVM which is still work in progress.
It's important to note that a bunch of information about the target comes from the system header files that you use when compiling. LLVM does not defer resolving things like "size of pointer" or "byte layout" so if you compile with 64-bit headers for a little-endian platform, you cannot use that LLVM source code to target a 32-bit big-endian assembly output pater.
There is a good chapter in a book explaining everything nicely here: www.aosabook.org/en/llvm.html