Object code relocation and Intel Pin interaction - c++

I am working on a multiprocessor architectural simulator that uses Intel Pin to instrument C++ executable binaries and report interesting events (e.g., some function calls, thread create/finish, etc.). Basically, I build an instruction-decode cache of all instructions when their images are loaded and analyze instruction execution afterwards. So it is important for instruction addresses at image-load time to be the same as (or at least get updated synchronously with) instruction addresses at run-time.
Intel Pin API (e.g., IMG_AddInstrumentFunction) enables me to get information about the loaded images (executables and shared libraries) such as entry points, low/high address, etc.
However, I noticed that the instrumented program executes instructions at addresses that do not belong to any of the loaded images. By inspection, I am suspecting that the dynamic loader (image /lib64/ld-linux-x86-64.so.2 on 64-bit Centos 6.3) is relocating the main executable in memory by calling routine _dl_relocate_object.
I understand the need for relocatable code and all that stuff. I just need pointers to a good documentation (or just a brief description/advice) on how/when these relocations might happen (at load-time and runtime) so that I can take them into account in my architectural simulator. In other words, the mechanism used to achieve it (library functions that I need to instrument, conditions, or maybe randomization if there is any, g++ compiler switches that can be used to suppress relocation, etc).
P.S.: I am only targeting x86/x86_64 architectures

Relocation are processor specific, so ARM and x86-64 and x86 have different relocations (because their instruction set is different).
Relocation are also operating system specific, but some related OSes try to have the same relocations, e.g. Solaris and Linux for x86-64
They are described in detail in the ABI (application binary interface) specification "System V Application Binary Interface AMD64 Architecture Processor Supplement". The original x86-64 ABI used to be on http://www.x86-64.org/documentation.html
but that site is not responding since several weeks. An old copy is on this link and a newer one is here
There is also the X32 ABI
See also this question.

Related

Cross-platform binary in C++

I have read that a binary is the same for Windows and Linux (I am not sure if it has a different format).
What are the differences between binaries for Linux and binaries for Windows (when speaking about format)?
And if there are none, what stops us from making a single binary for both operating systems (make sure that I mean the last file where you can run and not the source code)?
Since there are differences in the binary format, how are the operating systems themselves compiled?
If I'm not mistaken Microsoft uses Windows to compile the next Windows version/update.
How are these binaries executable by the machine (even the kernel is one of those)?
Aren't they in the same format? (For such a low-level program.)
As mentioned in the comments by #AlanBirtles, this is in fact possible. See Actually Portable Executable, Redbean, and the article about the two.
Even though the binary formats are different, it's possible to come up with a file that's valid in several different formats.
But, this being an obscure hack, I would stay away from it in production (or at all).
I have read that a binary is the same for Windows and Linux (not sure if it has a different format)
Then you have read wrong. They are completely different formats.
What are the differences between binaries for Linux and binaries for Windows (when speaking about format)?
Windows: Portable Executable (PE)
Linux: Executable and Linkable Format (ELF)
And if there are none
There are many differences.
What stops us from making a single binary for both operating systems?
The fact that different OSes require different binary formats for their respective executable files.
Since there are differences in the binary format, how are the operating systems themselves compiled?
Ask the OS manufacturers for details. But basically, they are compiled like any other program (just very complex programs). Like any other program, their source code is compiled into appropriately-formatted executable files for each platform they are targeting, and even for each target CPU. For instance, Windows uses the PE format for all of its executables, but the underlying machine code inside each executable is different whether the executable is running on an x86 CPU vs an x64 CPU vs an ARM CPU.
If I'm not mistaken Microsoft uses Windows to compile the next Windows version/update
Yes. That is commonly known as "dog-fooding" or "bootstrapping". For instance, VC++ running on Windows is used to develop new versions of VC++, Windows, etc.
How are these binaries executable by the machine (even the kernel is one of those)?
When an executable file is run (by the user, by a program, etc.), a request goes to the OS, and the OS's executable loader then parses the file's format as needed to locate the file's machine code, and then runs that code on the target CPU(s) as needed.
As for running the user's OS itself, there is an even more fundamental OS running on the machine (i.e., the BIOS) which (amongst other things) loads the user's OS at machine startup and starts it running on the available CPU(s). See Booting an Operating System for more details about that.
Aren't they in the same format? (For such a low-level program.)
No.
An executable binary compiled for Linux x86-64 (see elf(5)...) won't run on the same computer with Windows (unless you use some emulator which luckily works for your binary, like Wine or QEMU).
Using emulators will slow down the execution. Some emulators don't emulate every aspect of a computer in 2022.
This is one of the reasons I prefer open source application software (e.g. RefPerSys or GCC) ..... With efforts and good design you can (in principle) port them to another platform.
Some corporations designed some kind of fat binary formats. They did not succeed a lot.
Levine's book about Linkers and loaders explain several different binary formats.

How do you change the target machine configuration in fortran?

I am using the Intel Fortran Compiler on Linux. I know that if I type in "ifort -dumpmachine" it will provide the target machine configuration for the compilation (e.g. "x86_64-linux-gnu") but I need to know how to change this if I want to compile for a different operating system (e.g. a different version of linux). The "-arch" compiler option allows you to chagne the processor architecture but I need to know how to also change the operating system.
Cross compilation is highly processor dependent stuff, so there is no general Fortran answer. As far as I know Intel Fortran is available only for a limited number of architectures - x86 and x86-64. There are separate products for Linux, Windows and OS X and you can not cross-compile between them.
You did not specify what you mean by different version of Linux. You should find in the manual of your version of the compiler, what kernel version need the resulting executables. In principle you can then target all such distributions. There may be also problems with the right Intel runtime libraries and glibc and also your other libraries you use. You can solve this by statically linking your libraries on your machine (use -static, or -static-intel for Intel runtime libraries only), but be aware, that they have to be also compatible with your target architecture (In particular, if they require advanced instruction set, like SSE(2) or AVX).

Is it possible to compile a C/C++ source code that executes in all Linux distributions without recompilation?

Is it possible to compile a C/C++ source code that executes in all Linux distributions without recompilation?
If the answer is yes, can I use any external (non-standard C/C++) libraries?
I want distribute my binary application instead of distribute of source code.
No, you can't compile an executable the executes in all Linux distributions. However, you can compile an executable that works on most distributions that people will tend to care about.
Compile 32-bit. Compile for the minimum CPU level you're willing to support.
Build your own version of glibc. Use the --enable-kernel option to set the minimum kernel version you're willing to support.
Compile all other libraries you plan to use yourself. Use the headers from your glibc build and your chosen CPU/compiler flags.
Link statically.
For anything you couldn't link to statically (for example, if you need access to the system's default name resolution or you need PAM), you have to design your own helper process and API. Release the source to the helper process and let them (or your installer) compile it.
Test thoroughly on all the platforms you need to support.
You may need to tweak some libraries if they call functions that cannot work with this mechanism. That includes dlopen, gethostbyname, iconv_open, and so on. (These kinds of functions fundamentally rely on dynamic linking. See step 5 above. You will get a warning when you link for these.)
Also, time zones tend to break if you're not careful because your code may not understand the system's zone format or zone file locations. (You will get no warning for these. It just won't work.)
Most people who do this are building with the minimum supported CPU being a Pentium 4 and the minimum supported kernel version being 2.6.0.
There are two differences which are among installations. Architecture and libraries.
Having one binary for different architectures is not directly possible; there was an attempt to have binary for multiple archs in one file (fatelf), but it is not widely used and unlikely to gain momentum. So at least you have to distribute separate binaries for ia32, amd64, arm, ... (most if not all amd64 distros have kernel compiled with support for running ia32 code, though)
Distributions contain different versions of libraries. You're fine as long as the API does not change, you can link to that library. Some libs ensure inary backwards-compatibility within major number (so GTK2.2 app will run fine with GTK2.30 lib, but not necessarily vice versa). If you want to be sure, you have to link statically with all libs that you use, except the most basic ones (probably only libc6, which is binary-compatible accross distros AFAIK). This can increase size of the binary, and it one of reasons why e.g. Acrobat Reader is relatively big download, although the app itself is not specially rich functionality-wise.
There was a transitional period for c++ ABI, which changed between gcc 2.9 and 3 (IIRC), but the old ABI would be really just on ancient installations. This should no longer be an isse for you, and if you link statically, it is irrelevant anyway.
Generally no.
There are several bariers.
Different architectures
While a 32bit binary will run on a x86_64 system, it won't work vice versa. Plus there is a lot of ARM systems.
Kernel ABI
Kernel ABI changes very slowly, but it does change, therefore you can't really support all possible versions. Note that in some places kernel 2.2 is still in use.
What you can do is to create a statically linked binary. Such binary will include all libraries your app depends on, and it will work on all systems with the same architecture and a reasonably similar kernel version.

Compiling a C program with a specific architecture

I was recently fighting some problems trying to compile an open source library on my Mac that depended on another library and got some errors about incompatible library architectures. Can somebody explain the concept behind compiling a C program for a specific architecture? I have seen the -arch compiler flag before and have seen values passed to it such as ppc, i386 and x86_64 which I assume maps to the CPU "language", but my understanding stops there. If one program uses a particular architecture, do all libraries that it loads need to be on the same architecture as well? How can I tell what architecture a given program/process is running under?
Can somebody explain the concept behind compiling a C program for a specific architecture?
Yes. The idea is to translate C to a sequence of native machine instructions, which have the program coded into binary form. The meaning of "architecture" here is "instruction-set architecture", which is how the instructions are coded in binary. For example, every architecture has its own way of coding for an instruction that adds two integers.
The reason to compile to machine instructions is that they run very, very fast.
If one program uses a particular architecture, do all libraries that it loads need to be on the same architecture as well?
Yes. (Exceptions exist but they are rare.)
How can I tell what architecture a given program/process is running under?
If a process is running on your hardware, it is running on the native architecture which on Unix you can discover by running the command uname -m, although for the human reader the output from uname -a may be more informative.
If you have an executable binary or a shared library (.so file), you can discover its architecture using the file command:
% file /lib/libm-2.10.2.so
/lib/libm-2.10.2.so: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped
% file /bin/ls
/bin/ls: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.8, stripped
You can see that these binaries have been compiled for the very old 80386 architecture, even though my hardware is a more modern i686. The i686 (Pentium Pro) is backward compatible with 80386 and runs 80386 binaries as well as native binaries. To make this backward compatibility possible, Intel went to a great deal of trouble and expense—but they practically cornered the market on desktop CPUs, so it was worth it!
One thing that may be confusing here is that the Mac platform has what they call a universal binary, which is really two binaries in one archive, one for intel and the other for ppc architecture. Your computer will automatically decide which one to run. You can (sometimes) run a binary for another architecture in an emulation mode, and some architectures are supersets of others (ie. i386 code will usually run on a i486, i586, i686, etc.) but for the most part the only code you can run is code for your processor's architecture.
For cross compiling, not only the program, but all the libraries it uses, need to be compatible with the target processor. Sometimes this means having a second compiler installed, sometimes it is just a question of having the right extra module for the compiler availible. The cross compiler for gcc is actually a seperate executable, though it can sometimes be accessed via a command line switch. The gcc cross compilers for various architectures are most likely separate installs.
To build for a different architecture than the native of your CPU, you will need a cross-compiler, which means that the code generated cannot run natively on the machine your sitting on. GCC can do this fine. To find out which architecture a program is built for check out the file command. In Linux-based systems at least, a 32-bit x86 program will require 32-bit x86 libs to go along with it. I guess it's the same for most OSes.
Does ldd help in this case?

What does -fPIC mean when building a shared library?

I know the '-fPIC' option has something to do with resolving addresses and independence between individual modules, but I'm not sure what it really means. Can you explain?
PIC stands for Position Independent Code.
To quote man gcc:
If supported for the target machine, emit position-independent code, suitable for dynamic linking and avoiding any limit on the size of the global offset table. This option makes a difference on AArch64, m68k, PowerPC and SPARC.
Use this when building shared objects (*.so) on those mentioned architectures.
The f is the gcc prefix for options that "control the interface conventions used
in code generation"
The PIC stands for "Position Independent Code", it is a specialization of the fpic for m68K and SPARC.
Edit: After reading page 11 of the document referenced by 0x6adb015, and the comment by coryan, I made a few changes:
This option only makes sense for shared libraries and you're telling the OS you're using a Global Offset Table, GOT. This means all your address references are relative to the GOT, and the code can be shared accross multiple processes.
Otherwise, without this option, the loader would have to modify all the offsets itself.
Needless to say, we almost always use -fpic/PIC.
man gcc says:
-fpic
Generate position-independent code (PIC) suitable for use in a shared
library, if supported for the target machine. Such code accesses all
constant addresses through a global offset table (GOT). The dynamic
loader resolves the GOT entries when the program starts (the dynamic
loader is not part of GCC; it is part of the operating system). If
the GOT size for the linked executable exceeds a machine-specific
maximum size, you get an error message from the linker indicating
that -fpic does not work; in that case, recompile with -fPIC instead.
(These maximums are 8k on the SPARC and 32k on the m68k and RS/6000.
The 386 has no such limit.)
Position-independent code requires special support, and therefore
works only on certain machines. For the 386, GCC supports PIC for
System V but not for the Sun 386i. Code generated for the
IBM RS/6000 is always position-independent.
-fPIC
If supported for the target machine, emit position-independent code,
suitable for dynamic linking and avoiding any limit on the size of
the global offset table. This option makes a difference on the m68k
and the SPARC.
Position-independent code requires special support, and therefore
works only on certain machines.