I was recently fighting some problems trying to compile an open source library on my Mac that depended on another library and got some errors about incompatible library architectures. Can somebody explain the concept behind compiling a C program for a specific architecture? I have seen the -arch compiler flag before and have seen values passed to it such as ppc, i386 and x86_64 which I assume maps to the CPU "language", but my understanding stops there. If one program uses a particular architecture, do all libraries that it loads need to be on the same architecture as well? How can I tell what architecture a given program/process is running under?
Can somebody explain the concept behind compiling a C program for a specific architecture?
Yes. The idea is to translate C to a sequence of native machine instructions, which have the program coded into binary form. The meaning of "architecture" here is "instruction-set architecture", which is how the instructions are coded in binary. For example, every architecture has its own way of coding for an instruction that adds two integers.
The reason to compile to machine instructions is that they run very, very fast.
If one program uses a particular architecture, do all libraries that it loads need to be on the same architecture as well?
Yes. (Exceptions exist but they are rare.)
How can I tell what architecture a given program/process is running under?
If a process is running on your hardware, it is running on the native architecture which on Unix you can discover by running the command uname -m, although for the human reader the output from uname -a may be more informative.
If you have an executable binary or a shared library (.so file), you can discover its architecture using the file command:
% file /lib/libm-2.10.2.so
/lib/libm-2.10.2.so: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped
% file /bin/ls
/bin/ls: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.8, stripped
You can see that these binaries have been compiled for the very old 80386 architecture, even though my hardware is a more modern i686. The i686 (Pentium Pro) is backward compatible with 80386 and runs 80386 binaries as well as native binaries. To make this backward compatibility possible, Intel went to a great deal of trouble and expense—but they practically cornered the market on desktop CPUs, so it was worth it!
One thing that may be confusing here is that the Mac platform has what they call a universal binary, which is really two binaries in one archive, one for intel and the other for ppc architecture. Your computer will automatically decide which one to run. You can (sometimes) run a binary for another architecture in an emulation mode, and some architectures are supersets of others (ie. i386 code will usually run on a i486, i586, i686, etc.) but for the most part the only code you can run is code for your processor's architecture.
For cross compiling, not only the program, but all the libraries it uses, need to be compatible with the target processor. Sometimes this means having a second compiler installed, sometimes it is just a question of having the right extra module for the compiler availible. The cross compiler for gcc is actually a seperate executable, though it can sometimes be accessed via a command line switch. The gcc cross compilers for various architectures are most likely separate installs.
To build for a different architecture than the native of your CPU, you will need a cross-compiler, which means that the code generated cannot run natively on the machine your sitting on. GCC can do this fine. To find out which architecture a program is built for check out the file command. In Linux-based systems at least, a 32-bit x86 program will require 32-bit x86 libs to go along with it. I guess it's the same for most OSes.
Does ldd help in this case?
Related
I have read that a binary is the same for Windows and Linux (I am not sure if it has a different format).
What are the differences between binaries for Linux and binaries for Windows (when speaking about format)?
And if there are none, what stops us from making a single binary for both operating systems (make sure that I mean the last file where you can run and not the source code)?
Since there are differences in the binary format, how are the operating systems themselves compiled?
If I'm not mistaken Microsoft uses Windows to compile the next Windows version/update.
How are these binaries executable by the machine (even the kernel is one of those)?
Aren't they in the same format? (For such a low-level program.)
As mentioned in the comments by #AlanBirtles, this is in fact possible. See Actually Portable Executable, Redbean, and the article about the two.
Even though the binary formats are different, it's possible to come up with a file that's valid in several different formats.
But, this being an obscure hack, I would stay away from it in production (or at all).
I have read that a binary is the same for Windows and Linux (not sure if it has a different format)
Then you have read wrong. They are completely different formats.
What are the differences between binaries for Linux and binaries for Windows (when speaking about format)?
Windows: Portable Executable (PE)
Linux: Executable and Linkable Format (ELF)
And if there are none
There are many differences.
What stops us from making a single binary for both operating systems?
The fact that different OSes require different binary formats for their respective executable files.
Since there are differences in the binary format, how are the operating systems themselves compiled?
Ask the OS manufacturers for details. But basically, they are compiled like any other program (just very complex programs). Like any other program, their source code is compiled into appropriately-formatted executable files for each platform they are targeting, and even for each target CPU. For instance, Windows uses the PE format for all of its executables, but the underlying machine code inside each executable is different whether the executable is running on an x86 CPU vs an x64 CPU vs an ARM CPU.
If I'm not mistaken Microsoft uses Windows to compile the next Windows version/update
Yes. That is commonly known as "dog-fooding" or "bootstrapping". For instance, VC++ running on Windows is used to develop new versions of VC++, Windows, etc.
How are these binaries executable by the machine (even the kernel is one of those)?
When an executable file is run (by the user, by a program, etc.), a request goes to the OS, and the OS's executable loader then parses the file's format as needed to locate the file's machine code, and then runs that code on the target CPU(s) as needed.
As for running the user's OS itself, there is an even more fundamental OS running on the machine (i.e., the BIOS) which (amongst other things) loads the user's OS at machine startup and starts it running on the available CPU(s). See Booting an Operating System for more details about that.
Aren't they in the same format? (For such a low-level program.)
No.
An executable binary compiled for Linux x86-64 (see elf(5)...) won't run on the same computer with Windows (unless you use some emulator which luckily works for your binary, like Wine or QEMU).
Using emulators will slow down the execution. Some emulators don't emulate every aspect of a computer in 2022.
This is one of the reasons I prefer open source application software (e.g. RefPerSys or GCC) ..... With efforts and good design you can (in principle) port them to another platform.
Some corporations designed some kind of fat binary formats. They did not succeed a lot.
Levine's book about Linkers and loaders explain several different binary formats.
I am using the Intel Fortran Compiler on Linux. I know that if I type in "ifort -dumpmachine" it will provide the target machine configuration for the compilation (e.g. "x86_64-linux-gnu") but I need to know how to change this if I want to compile for a different operating system (e.g. a different version of linux). The "-arch" compiler option allows you to chagne the processor architecture but I need to know how to also change the operating system.
Cross compilation is highly processor dependent stuff, so there is no general Fortran answer. As far as I know Intel Fortran is available only for a limited number of architectures - x86 and x86-64. There are separate products for Linux, Windows and OS X and you can not cross-compile between them.
You did not specify what you mean by different version of Linux. You should find in the manual of your version of the compiler, what kernel version need the resulting executables. In principle you can then target all such distributions. There may be also problems with the right Intel runtime libraries and glibc and also your other libraries you use. You can solve this by statically linking your libraries on your machine (use -static, or -static-intel for Intel runtime libraries only), but be aware, that they have to be also compatible with your target architecture (In particular, if they require advanced instruction set, like SSE(2) or AVX).
Is it possible to compile a C/C++ source code that executes in all Linux distributions without recompilation?
If the answer is yes, can I use any external (non-standard C/C++) libraries?
I want distribute my binary application instead of distribute of source code.
No, you can't compile an executable the executes in all Linux distributions. However, you can compile an executable that works on most distributions that people will tend to care about.
Compile 32-bit. Compile for the minimum CPU level you're willing to support.
Build your own version of glibc. Use the --enable-kernel option to set the minimum kernel version you're willing to support.
Compile all other libraries you plan to use yourself. Use the headers from your glibc build and your chosen CPU/compiler flags.
Link statically.
For anything you couldn't link to statically (for example, if you need access to the system's default name resolution or you need PAM), you have to design your own helper process and API. Release the source to the helper process and let them (or your installer) compile it.
Test thoroughly on all the platforms you need to support.
You may need to tweak some libraries if they call functions that cannot work with this mechanism. That includes dlopen, gethostbyname, iconv_open, and so on. (These kinds of functions fundamentally rely on dynamic linking. See step 5 above. You will get a warning when you link for these.)
Also, time zones tend to break if you're not careful because your code may not understand the system's zone format or zone file locations. (You will get no warning for these. It just won't work.)
Most people who do this are building with the minimum supported CPU being a Pentium 4 and the minimum supported kernel version being 2.6.0.
There are two differences which are among installations. Architecture and libraries.
Having one binary for different architectures is not directly possible; there was an attempt to have binary for multiple archs in one file (fatelf), but it is not widely used and unlikely to gain momentum. So at least you have to distribute separate binaries for ia32, amd64, arm, ... (most if not all amd64 distros have kernel compiled with support for running ia32 code, though)
Distributions contain different versions of libraries. You're fine as long as the API does not change, you can link to that library. Some libs ensure inary backwards-compatibility within major number (so GTK2.2 app will run fine with GTK2.30 lib, but not necessarily vice versa). If you want to be sure, you have to link statically with all libs that you use, except the most basic ones (probably only libc6, which is binary-compatible accross distros AFAIK). This can increase size of the binary, and it one of reasons why e.g. Acrobat Reader is relatively big download, although the app itself is not specially rich functionality-wise.
There was a transitional period for c++ ABI, which changed between gcc 2.9 and 3 (IIRC), but the old ABI would be really just on ancient installations. This should no longer be an isse for you, and if you link statically, it is irrelevant anyway.
Generally no.
There are several bariers.
Different architectures
While a 32bit binary will run on a x86_64 system, it won't work vice versa. Plus there is a lot of ARM systems.
Kernel ABI
Kernel ABI changes very slowly, but it does change, therefore you can't really support all possible versions. Note that in some places kernel 2.2 is still in use.
What you can do is to create a statically linked binary. Such binary will include all libraries your app depends on, and it will work on all systems with the same architecture and a reasonably similar kernel version.
I currently have a program I have compiled in x86_64, it relies on quite a few libraries also compiled in x86_64 (so recompiling them all would be a big project). I am looking to run a i386 dylib, however whenever I load it using dlopen I get an error saying it was not built for my architecture. Is there any way to either convert the i386 lib directly to a x86_64 (I do not have the source code for this) or run it on an x86_64 architecture?
You cannot load an i386 library in an x86_64 executable.
There only way to get an x86_64 library out of an i386 one is to recompile it for the right target. If you don't have the source code, this cannot be done.
You can recompile all your code for i386 and use the library though.
You can't load a 32-bit (i386) library (dylib) into a 64-bit (x86_64) process, nor vice versa.
The machine can run either 32-bit or 64-bit processes; what you can't do is mix 32-bit and 64-bit code in a single process.
If that library is irreplaceable, you can't recompile it and you really need the rest of the program to be x86_64, you can run it in a separate process and use some form of IPC to call the code and pass results.
In a lot of cases though, it may be easier to rewrite the library or replace it with something else that does a similar job.
I'm creating a library in C++. It links against Windows libraries on Windows and Linux libraries on Linux. It's abstracted, all is well.
However, is it feasible to dynamically detect, load and use libraries (and copying header files for use) so it could be used on any platform if it was running under LLVM JIT?
Unfortunately, the LLVM intermediate representation in the bitcode files is not machine completely machine independent. You could probably get away with x86 Linux and Windows, but that same bitcode would probably not run on x86_64 systems, for example.