I've been reading about LLVM and clang and I understand that the LLVM framework generates object files for the target machine and the system linker (ld, for linux) does the dynamic linking.
But I don't understand how things work when JIT compilation takes place. Is the system linker still invoked in this case? I see that there is a RuntimeDyld.cpp file. Is it to link between different bitcode files being JIT compiled, or does it link to other shared libraries that are loaded dynamically?
Related
I wrote a simple toy language compiler frontend that generates LLVM IR using llvm-sys (Rust bindings for LLVM's C library). I then generated an object file by creating a LLVMTargetMachine based on the machine's target triple then calling LLVMTargetMachineEmitToFile, which successfully generates an executable. However, running the executable produces zsh: exec format error: ./a.out.
I figured out that I had to run ld -lSystem ./a.out after generating the executable to make it work. How should I automatically call the linker in code?
Currently using LLVM 9.0 on macOS Catalina.
Indeed, LLVMTargetMachineEmitToFile produces an object file that still needs to be linked - either into an executable or shared library.
You need a linker for this, which is not, strictly speaking, a part of LLVM. LLVM's integrated linker operates on LLVM IR, not native machine code.
However, just like there is a LLVM-related C compiler, Clang, there also is a LLVM-related native linker called LLD. AFAIK, it can be used as a library, so you can imbue your compiler with integrated linked.
It worth noting that native compilers follow "pipeline" architecture, where the compiler itself and linker (and sometimes assembler too) are completely decoupled one from other. In such architecture, the compiler executable (like clang or g++) is actually a driver program that invokes other programs (cc1, the real compiler and ld, the linker) to produce the final binary.
I'm searching for a C or C++ library which can load and link obj files (doesn't matter if ELF or obj) dynamicly at runtime. I spend some time searching for such library, but my results weren't successful.
What I tried:
LLVM:
Currently my best solution! I used Clang to generate .obj files in the bytecode format of LLVM and used its JIT functions to dynamic load and execute the function. But, the LLVM is huge and my PC at home hasn't the power to compile the complete LLVM just for the JIT. Also I encountered some problems with relocation overflows or not implemented relocation types.
libjit:
I read, that it can load .elf files and link them too. But sadly, I couldn't compile it for windows, so I couldn't try.
Nanojit and NativeJit:
It seems like they don't support JITting an object file.
So... What can I do? Do I have to stick around with the LLVM? Are there any alternatives?
I suppose that an analogy that can be taken as a 1st approach is that the .bc is similar to an .o (or .obj) file in that it is just the translation of C++ code to an intermediate language, and tht it can contain references to functions not defined in it, to be searched in libraries.
And that the JIT-ted code is similar to a DLL, in the sense that it will be linked dynamically to the executable where it will run in.
You need not to compile LLVM -- you can download the binaries for LLVM and assorted utilities (like clang) from LLVM Download Page
I have a hostapp.cpp that loads a object.so shared object at run-time, the shared object is compiled using only with the needed .h files from the host app but at run-time it needs to access those functions (present at the host app).
Compiling the host app with -rdynamic apparently solves this issue but it unnecessarily exposes the object to the full symbol table of the host app, even though it only needs to resolve a few of them.
How can I specify exactly what host-app symbols will be known by the shared object?
Edit: I'm building and running on GNU/Linux with the GNU toolchain.
Your question is under-specified: you never said what platform you are building for, what linker you use, etc.
Assuming you build for Linux, you can specify symbols to export from the main executable using one of the following methods:
If you are using gold (the GNU ELF linker), --export-dynamic-symbol will do what you need.
If you are using binutils linker, you can use linker version script to do the same (example).
You can mark symbols to be exported with __attribute__((visibility("default"))), compile with -fvisibility-hidden, and link with -rdynamic. That should hide most of the symbols, but will not work well if you link in libraries which you can't recompile.
I am using a Windows host machine to cross compile programs on to a Linux RT platform using a GCC cross compiler.
Assume, the C program I write, links to a shared library libShared.so as I am using functions defined in that library inside my program.
In the Eclipse editor I am providing the library name (libshared.so), and its path under linker options.
Now when I compile the program, I get compilation errors since libShared.so links to multiple other libraries (eg: lib11.so, lib12.so, lib13.so), but I am not explicitly calling any functions from these libraries.
My question is, Why should the compiler generate errors, when I don't explicitly use functions defined in those libraries ?
However when I specify the name and path of the libraries liked by libShared.so, the compilation passes.
Is the Linker part of the Operating System or the Compiler/IDE?
It is part of the compiler/IDE. Or to be precise, the compiler and the linker are separate programs (invoked at separate phases of building an executable), but usually the whole bunch (which includes several other executables) is referred to as a compiler, e.g. gcc.
The linker is not part of the OS, although some OSs (such as Linux) may come bundled with one (or even multiple) linker(s) as part of some compiler toolchain(s). Regardless of this, you can install and use several different compilers (which include their own linker each) on the same OS. E.g. on a Windows OS you can have both gcc and msvc installed, although gcc can't be used with the Visual Studio IDE, as it is bundled only with msvc. But AFAIK Eclipse can use either.
Update: you seem to be confused by the name similarity between the linker in the compiler toolchain and the dynamic linker of an OS.
The linker of the compiler toolchain does its job during the build process, when it needs to patch together different compilation units to form a coherent executable program. Often, the code contains calls to external libraries; these libraries can be either static or dynamic. Static libraries are basically storages of executable methods, which the linker can physically copy into an executable. Dynamic libraries contain methods which need not be copied; instead, the linker stores a sort of reference to the library method into the executable. When the executable is run, the dynamic library is loaded with the help of the OS, and the library method is then called. This is accomplished by a part of the OS which is, rather unfortunately, called the dynamic linker - however this is entirely different from the linker in the compiler toolchain, and should rather be called loader.
Dynamic libraries can be shared in memory, i.e. the same library code can be used by multiple executables in parallel (hence they are also referred to as shared libraries). Whereas code copied from static libraries is duplicated across all executables.
The linker is part of the compiler tool chain (preprocessor -> compiler -> assembler -> linker).
it is part of compiler usually. technically compiler and linker are different tools, but they usually come together.