Load and link obj files at Runtime in C/C++ (JIT) - c++

I'm searching for a C or C++ library which can load and link obj files (doesn't matter if ELF or obj) dynamicly at runtime. I spend some time searching for such library, but my results weren't successful.
What I tried:
LLVM:
Currently my best solution! I used Clang to generate .obj files in the bytecode format of LLVM and used its JIT functions to dynamic load and execute the function. But, the LLVM is huge and my PC at home hasn't the power to compile the complete LLVM just for the JIT. Also I encountered some problems with relocation overflows or not implemented relocation types.
libjit:
I read, that it can load .elf files and link them too. But sadly, I couldn't compile it for windows, so I couldn't try.
Nanojit and NativeJit:
It seems like they don't support JITting an object file.
So... What can I do? Do I have to stick around with the LLVM? Are there any alternatives?

I suppose that an analogy that can be taken as a 1st approach is that the .bc is similar to an .o (or .obj) file in that it is just the translation of C++ code to an intermediate language, and tht it can contain references to functions not defined in it, to be searched in libraries.
And that the JIT-ted code is similar to a DLL, in the sense that it will be linked dynamically to the executable where it will run in.
You need not to compile LLVM -- you can download the binaries for LLVM and assorted utilities (like clang) from LLVM Download Page

Related

How to use llvm libraries

I am working in a project that consist of some C++ teams. Each team delivers libraries and our team is integrating these libraries into a front end application.
The application is cross platform, so it means that other the teams have to provide the same (static) libraries compiled for different platforms/CPU architecture/configuration. Eg. we have Visual Studio 2015/2013, 32bit/64bit, linux, Debug/Release etc.
It would be nice to reduce the number of these static library "manifests", so I was looking into the Clang/LLVM. The idea would be compile the static libraries into LLVM bitcode and use the llvm-ar tool to create an llvm static library. When we have to make the binaries for a specific platform we would use the llc (LLVM platform compiler) to create the native code static library and do the linking with the platform linker.
Questions:
is there a better way to do what I want to achieve?
the llc does not seem to support the compiling of a static library, only individual translation units (.bc -> .o). Of course I can extract each individual bitcode file, assemble it to native object file and use the platform librarian tool (lib/ar) to make the static library, but I wonder if there is a more streamlined solution.
the gold linker seems to make something I need, but seems to be restricted to ELF format. I have to support Windows/Linux and maybe IOS
LLVM IR generated from target-specific and platform-specific language (C/C++) won't be target neutral. Think about type sizes, alignments, ABI requirements, etc. Not the mention pure source code features like preprocessor. So, no, the approach you thought about won't work at all.
See LLVM bitcode cross-platform for some more information.

Difference between code object and executable file

I'm a C++ beginner and I'm studying the basics of the language. There is a topic in my book about the compiler and my problem is that I can not understand what the text wants to say:
C++ is a compiled language so you need to translate the source code in
a file that the computer can execute. This file is generated by the
compiler and is called the object code ( .obj ), but a program like
the "hello world" program is composed by a part that we wrote and a part
of the C++ library. The linker links these two parts of a program and
produces an executable file ( .exe ).
Why does my book tell that the file that is executed by the computer is the one with the obj suffix (the object code) and then say that it is the one with the exe suffix?
Object files are source compiled into binary machine language, but they contain unresolved external references (such as printf,for instance). They may need to be linked against other object files, third party libraries and almost always against C/C++ runtime library.
In Unix, both object and exe files are the same COFF format. The only difference is that object files have unresolved external references, while a.out files don't.
The C++ specification is a technical document in English. For C++11 have a look inside n3337 (or spend a lot of money to buy the paperback ISO standard). In theory you don't need a computer to run a C++ program (you could use a bunch of human slaves, but that would be unethical, inefficient, and unreliable).
You could have a C++ implementation which is an interpreter, not a compiler (e.g. Ch by SoftIntegration)
If you install Linux on your laptop (which I recommend doing to every student) then you could have several free software C++ compilers, in particular GCC and Clang/LLVM (using g++ and clang commands respectively). Source files are suffixed .cc, or .cxx, or .cpp, or even .C (I prefer .cc), and you could ask the compiler to handle a file of some other suffix as a C++ source file (but that is not conventional). Then, both object files (suffixed .o) and executables share the same ELF format. Conventionally, executables don't have any suffix (e.g. g++ is a binary executable, not doing much except starting other processes like cc1plus -the compiler proper-, as -the assembler-, ld -the linker- etc...)
In all cases I strongly recommend:
to enable all warnings and debug info during compilation (e.g. use g++ -Wall -g ....)
to improve your source code till you got no warnings
to learn how to use the debugger (gdb)
to be able to build your program on the command line
to use a version control system like git
to use a good editor like emacs, gedit, geany, or gvim
once you are writing programs in several source files, learn how to use a builder like make
to learn C++11 (or even perhaps C++14) rather than older C++ standards
to also learn other programming languages (Ocaml, Scheme, Haskell, Prolog, Scala, ....) since they would improve your thinking and your way of coding in C++
to study the source code of several free software coded in C++
to read the documentation of every function that you are using, e.g. on cppreference or in man pages (for Linux)
to understand what is undefined behavior (the fact that your program sometimes work does not make it correct).
Concretely, on Linux you could edit your Hello World program (file hello.cc) with gedit or emacs (with a command like gedit hello.cc) etc..., compile it using g++ -Wall -g hello.cc -o hello command, debug it using gdb ./hello, and repeat (don't forget to use git commands for version control).
Sometimes it makes sense to generate some C++ code, e.g. by some shell, Python, or awk script (or even by your own program coded in C++ which generates C++ code!).
Also, understand that an IDE is not a compiler (but runs the compiler for you).
The basic steps for creating an application from a C or C++ source file are as follows:
(1) the source files are created (by a person or generated by a program), (2) the source files are compiled (which is really two steps, Preprocessor and compilation) into object code, (3) the object files that are created by the C/C++ compiler are linked to create the .exe
So you have these steps of transforming one version of the computer program, the source files, to another, the executable. The C++ source is compiled to produce the object files. The object files are then linked to produce the executable file.
In most cases there are several different programs involved in the compile and link process with C and C++. Each program takes in certain files and creates new files.
C/C++ Preprocessor takes in source code files and generates source code files
C/C++ Compiler takes in source code files and generates object code files
the linker takes in object code files and libraries and generates executable files
See What is the difference between - 1) Preprocessor,linker, 2)Header file,library? Is my understanding correct?
Most compiler installations have a program that runs these various applications for you. So if you are using gcc then gcc program will run first the C++ Preprocessor then then C++ compiler and then the linker. However you can modify what gcc does with command line options to tell it to only run the C++ Preprocessor or to only compile the source files but not to link them or to only link the object code files.
A brief history of computer languages and programming
The languages used for programming computers along with the various software development tools have evolved over the years.
The first computers were programmed with numbers entered by switches on a console.
Then people started developing languages and software that could be used to create software more easily and quicker. The first major development was creating assembler language where each line of source was converted by a computer program into a machine code instruction. Along with this came the development of linkers (which link pieces of machine code together into larger pieces). Assemblers were improved by adding a macro or preprocessor facility somewhat like the C/C++ Preprocessor though designed for assembly language.
Then people created programming languages that looked more like people written languages rather than assembler (FORTRAN and COBOL and ALGOL for instance). These languages were easier to read and a single line of source might be converted into several machine instructions so it was more productive to write computer programs in these languages rather than assembler.
The C programming language was a later refinement using lessons learned from the early programming languages such as FORTRAN. And C used some of the same software development tools that already existed such as linkers which already existed. Still later C++ was invented, starting off as a refinement of C introducing object oriented facilities. In fact the first C++ compiler was really a C++ translator which translated C++ source code to C source code which was then compiled with a C compiler. However modern C++ is compiled straight to machine code in order to provide the full functionality of the C++ standard with templates, lambdas, and all the other things with C++11 and later.
linkers and loaders
When you run a program you run the executable file. The executable file contains several kinds of information. The first is the machine instructions that are the result of compiling the C++ source code. The other is information that the loader uses in order to know how to load the executable into memory.
In the old days, long long ago all libraries and object files were linked together into an executable file and the executable file was loaded by the loader and the loader was pretty simple.
Then people invented shared libraries and dynamic link libraries and this required the linker to be more complex and the loader to be more complex.
The linker became more complex because it had to be able to recognize the difference between using a shared library and a static library and be able to generate an executable file that not only contains the linked object code but also information for the loader about any dynamic libraries.
The loader became more complex because not only does the loader have to load the executable file into memory so that it can start running, the loader must also find any shared libraries or dynamic link libraries that are also needed and load those too. And the loader also has to do a certain amount of linking of the additional components, the shared libraries, so the loader does a lot more than it used to do.
See also
Difference between shared objects (.so), static libraries (.a), and DLL's (.so)?
What is an application binary interface (ABI)?
How to make a SIMPLE C++ Makefile
Object code (within an object file): Output from a compiler intended as input for a linker (for the linker to produce executable code).
Executable: A program ready to be run (executed) on a computer

Finding all libraries and header files forming a C++ executable

If I have a C++ source file, gcc can give all its dependencies, in a tree structure, using the -H option. But given only the C++ executable, is it possible to find all libraries and header files that went into its compilation and linking?
If you've compiled the executable with debugging symbols, then yes, you can use the symbols to get the files.
If you have .pdb files (Visual studio creates them to store sebugging information separately) you can use all kinds of programs to open them and see the source files and methods.
You can even open it with a text editor and you'll see, among the gibrish, a list of the functions and source files.
If you're using linux (or GNU compilers in general), you can use gdb (again only if you have debug symbols enables in compilation time).
Run gdb on your executable, then run the command: info sources
That's an important reason why you should always remove that flag when going into production. You don't want clients to mess around with your sources, functions, and code.
You cannot do that, because that executable might have been build on a machine on which the header files (or the C++ code, or the libraries) are private or even generated. Also, if a static library is linked in, you have no reliable way to find out.
In practice however, on Linux, using nm or objdump or ldd on the executable will often (but not always) gives you a good clue about the needed libraries.
Also, some executables are dynamically loading a plugin e.g. using dlopen, so your question might not have any sense (since that plugin is known only at runtime).
Notice also that you might not know if an executable is obtained by compiling some C++ code (you might not be able to tell if it was obtained from C, C++, D, or Ocaml, ... source code, or a mixture of them).
On Linux, if you build an executable with static linking and stripping, people won't be able to easily guess the source programming language that you have used.
BTW, on Linux distributions, it is the role of the package management system to deal with such dependencies.
As answered by Yochai Timmer if the executable contains debug information (e.g. in DWARF format) you should be able to get a lot more information.

position independent executable (-pie) for arm(cortex-m3)

I'm programming for stm32 (Cortex-m3) with codesourcery g++ lite(based on gcc4.7.2 version). And I want the executables to be loaded dynamically.
I knew I have two options available:
1. relocatable elf, which needs a elf parser.
2. position independent code (PIC) with a global offset register
I prefer PIC with global offset register, because it seems it's easier to implement and I'm not familiar with elf or any elf library. Also, It's easy to generate a .bin file from an elf file with some tools.
I've tried building my program with "-msingle-pic-base -fpic" compiling options and "-pie" linking options, but then I got a linking error:
...path...ld.exe: ...path...thumb2\libstdc++.a(pure.o): relocation
R_ARM_THM_MOVW_ABS_NC against `a local symbol' can not be used when
making a shared object; recompile with -fPIC
I don't quite understand the error message. It seems the default standard c/c++ library can't go with my options and I need to get the source of the library and rebuild for my own purpose.
So,
1. Could anyone provide me any useful information/link on how to work with the position independent executable ?
2. with the -msingle-pic-base option, I don't need to care too much about the GOT and ld script anymore, right?
Note: Without the "-pie" linking option I can build the program. But the program fails when calling a c++ virtual function (when I'm using the IDE(keil)'s simulator to debug my program). I don't understand what's going on and what I've been missing.
----------------------------------------------------------------------
-- added 20130314
with the -msingle-pic-base option, I don't need to care too much about the GOT and ld script anymore, right?
From my experiments, the register (r9 is used in my program) should point to the beginning of the got.plt sections. Delete the "-pie" option, the linking will success, (with r9 properly set) then the c++ virtual function is called successfully. However, I still think the "-pie" option is important, which may ensure that the current standard library is position independent. Could anyone explain this for me?
----------------------------------------------------------------------
-- added 20130315
I took a look at the documents on ABI from ARM's website. But it was of little help because they are not targeting a specific platform. There seems to be a concept of EABI (I'm using sourcery's arm-none-eabi edition), but I couldn't find any documentation on "EABI" from arm's website. I can't neither find documentation on this topic from sourcery and gcc's. There're more than one implementation of PIC, so which one is the sourcery g++ using in the none-eabi case? I think the behaviors of the "-msingle-pic-base", "-fpie", "-pie" options are so poorly documented !
-----------------------------------------------------------------------
From the dis-assembly code, I just figured out that, whit the "-msingle-pic-base", the r9 should point to the base address of the ".got" section, the pointers in the .got sections are absolute pointer and the addressing of variable is similar to the description in the article : Position Independent Code (PIC) in shared libraries. So I still need to modify the ".got" sections on loading. I don't know what is the ".got.plt" section used for in my program. It seems that function calls are using PC-relative addressing.
How to build with the "-pie" or how to link a standard library compiled with "-fpic" is still a problem for me.
The error message tells you to recompile the libstdc++ library, which is most often built, when the gcc compiler is built.
Thus you must recompile your standard libraries (libstdc++, libgcc_*, libc, libm and the all) with -fPIC and link your project against them.
If you rely on prebuilt compiler packages, you're mostly out of the game in the microcontroller world. If you build your compiler yourself (which is, by the way, not too difficult, but an advanced/expert task) you are on the go.
It is also possible to compile your stdandard libraries yourself with the compiler you have. You will need the sources of libraries and figure out, how the compiler package build system builds them and you have to mimic this. Perhaps here are some experts, who can advise you on this way.
There's a nice blog post on this topic, eight years after asking the question initially, but it's there: https://mcuoneclipse.com/2021/06/05/position-independent-code-with-gcc-for-arm-cortex-m/
The general outline is that you have to:
Set up GOT from linker-generated information
Set up PLT from Program Header information
Implement a binder based on the GOT entries
Compile your library as a shared relocatable binary: -msingle-pic-base -mpic-register=r9 -mno-pic-data-is-text-relative -fPIC
Set R9 accordingly

Converting COFF lib file to OMF format

Is there any way to convert COFF library (lib file) to OMF library for using with C++Builder6 ? This coff is not just import library, it conatians some code.
When I try to convert it using borland's coff2omf.exe, I get 1KB file from 15KB file.
Instead of DigitalMars converter, you may use the Object file converter -- objconv -- available at agner.org/optimize
This utility can be used for converting object files between COFF/PE,
OMF, ELF and Mach-O formats for all 32-bit and 64-bit x86 platforms.
Can modify symbol names in object files. Can build, modify and convert
function libraries across platforms. Can dump object files and
executable files. Also includes a very good disassembler supporting
the SSE4, AVX, AVX2, AVX512, FMA3, FMA4, XOP and Knights Corner
instruction sets. Source code included (GPL).
This is a great site for low-level optimization, and there are a lot of useful information in the associated manual PDF file, about the library formats across several platforms.
It's fairly typical for an OMF object file to be a lot smaller than an equivalent COFF object, so what you're getting may well be valid.
If you find that it's really not, you can probably break the lib file into individual object files, disassemble the object files, re-assemble them to OMF object files, and put those together into an OMF lib file.
This is rather late, but if anyone is looking for an answer, you can checkout COFFIMPLIB from DigitalMars. COFF2OMF is available at the same site, but it looks like that's older.
It may be worth noting that in newer versions of Delphi (>= XE2), the compiler accepts COFF as well as OMF. It's probably also true for C++ Builder. The 64 bit compilers use only COFF.
See here for more informations about linking COFF.
Integrating Delphi (omf) and Ada (gcc, coff) required lots of effort until I've given up doing it in a single exe.
I honestly tried to disintegrate gcc rtl and ada rtl .a (coff libraries) into lots of .o (objects), convert them via coff2omf (there were DMD coff2omf and iirc another convobj or so). Some of the coff .o failed to be converted to .obj so I can't say if it was a reliable way at all.
Assembler level conversion is not so simple when it takes to exceptions and other deep details.
It's a pity I haven't tried a tool named
ftp://ftp.styx.cabel.net/pub/UniLink/
It's not obvious, but UniLink can probably be used to achieve the goal. One of its targets is C++ Builder package (both dynamic and static). unilink -Tpp -GI should do the trick