c++ compiles to portable assembly? - c++

All the generated binaries seem to be only OS dependent, and not hardware dependent.
I thought the assembly for each cpu was different, which would mean you have to compile for each diffrent cpu type.
So then why is there compatibility?

Your question appears completely unclear: (cross-)complied binaries are of course OS/Machine dependent.
"so then why is there compatibility?"
At the portable language level (plain standard c++ functions and classes), you can compile your code to run on various OS/machine architectures.
This doesn't mean you just can copy artifacts compiled for a particular OS/machine environment to another one without recompiling from source there (or using a cross-compiler).

Related

Building C++ code with different version of Visual Studio produces different file size of .exe?

I can build my own .sln manually on my machine, or have Azure DevOps build it on a remote machine. The software is targeting .NET Core 3.1, and using C++17. I had noticed that building the same code, from the same branch, produced a different size .exe: the remote one had 9 KB less than the local one.
I finally got the same result when I upgraded the remote machine's version of Visual Studio 2019 to match mine (from 16.8.something up to 16.11.14). But how can this difference be explained? Is there something missing from the smaller file? It should have all the same methods, logic, and functionality. There were no errors, so no part of it could have failed to compile.
I also have to build Java projects with Maven and have heard that it can be built "slightly differently" depending on Maven versions. That made sense at first, but in hindsight I don't know exactly what that means.
Has anyone been really in the weeds with software builds (specifically with Visual Studio and its C++ compiler) that can explain this concept of "slightly different builds", or has a good idea?
Would both version be functionally identical, or is there no easy way to tell?
The C++ standard does not dictate the machine code that should be produced by the compiler. It just specifies the expected observable behavior.
So if for example you have a for loop the standard dictates the behavior (initializing, checking the condition etc.). But you can translate to machine code in various ways, e.g. using different registers, or executing the statements in different order (as long as the observable behavior is the same).
This principle is called the as-if rule.
So different compilers (or compiler versions) can produce different machine code. The cause might be either different optimizations, or different ways of translating C++ into machine code (as the mapping between C++ and machine code is not 1-1).
Examples related to optimizations:
If you have various statements in the code that are not dependent (e.g. you modify different unrelated variable), the compiler/optimizer might re-order them if memory access patern would be more efficient. Or the compiler/optimizer might eliminate statement that do not have observable behavior (like incrementing a variable that is never read afterwards). Another example is whether functions are inlined which is entirly up to the compiler/optimizer, and affect the binary code.
Therefore there's no guarentee for the size (or content) of a compiled binary file.

How does the C++ compiler know which CPU architecture is being used

with reference to : http://www.cplusplus.com/articles/2v07M4Gy/
During the compilation phase,
This phase translates the program into a low level assembly level code. The compiler takes the preprocessed file ( without any directives) and generates an object file containing assembly level code. Now, the object file created is in the binary form. In the object file created, each line describes one low level machine level instruction.
Now, if I am correct then different CPU architectures works on different assembly languages/syntax.
My question is how does the compiler comes to know to which assembly language syntax the source code has to be changed? In other words, how does the C++ compiler know which CPU architecture is there in the machine it is working on ?
Is there any mapping used by assembler w.r.t the CPU architecture for generating assembly code for different CPU architectures?
N.S : I am beginner !!
Each compiler needs to be "ported" to the given system. For each system supported, a "compiler port" needs to be programmed by someone who knows the system in-depth.
WARNING : This is extremely simplified
In short, there are three main parts to a compiler :
"Front-end" : This part reads the language (in this case c++) and converts it to a sort of pseudo-code specific to the compiler. (An Abstract Syntactic Tree, or AST)
"Optimizer/Middle-end" : This part takes the AST and make a non-architecture-dependant optimized one.
"Back-end" : This part takes the AST, and converts it to binary executable code, specific to the architecture you want to compile your language on.
When you download a c++ compiler for your platform, you, in fact, download the c++ frontend with the linux-amd64 backend, for example.
This coding architecture is extremely helpful, because it allows to port the compiler for another architecture without rewriting the whole parsing/optimizing thing. It also allows someone to create another optimizer, or even another frontend supporting a whole different language, and, as long as it outputs a correct AST, it will be compatible with every single backend ever written for this compiler.
Simply put, the knowledge of the target system is coded into the compiler.
So you might have a C compiler that generates SPARC binaries, and a C compiler that generates VAX binaries. They both accept the same input language (as defined in the C standard), but produce different programs from it.
Often we just refer to "the compiler", meaning the one that will generate binaries for our current environment.
In modern times, the distinction has become less obvious with compiler collections such as GCC. Now the "different compilers" are often the same compiler program, just set up with different configurations (these are the "target description files").
Just to complete the answers given here:
The target architecture is indeed coded into the specific compiler instance you're using. This is important also for a process called "cross-compiling" - The process of compiling on a certain system an executable that would operate on another system/architecture.
Consider working on an embedded system-on-chip that uses a completely different instruction set than your own - You're working on an x86/64 Linux system, but need to compile a mobile app running on an ARM micro-processor, or some other type of assembly architecture.
It would be unreasonable to compile your code on the target system, which might be so limited in CPU and memory that it can't feasibly run a compiler - and so you can use a GCC (or any other compiler) port for that target architecture on your favorite system.
It's also quite critical to remember that the entire tool-chain is often compatible to the target system, for instance when shared libraries such as libc are getting in play - as the target OS could be a different release of Linux and would have different versions of common functions - In which case it's common to use tool-chains that contain all the necessary libraries and use something like chroot or mock to compile in the "target environment" from within your system.

Why can't object (.obj) files be moved across platforms?

Why can't we move .obj files from c compilation across OS platforms and use it to build the executable file at the end?
If we can do so can we call C a platform independent language like Java?
The C language is platform independent.
The files generated by the compiler, the object and executable files, are platform dependent. This is due to the fact the ultimate goal of a compiler is to generate an executable file for the target architecture only, not for every known architecture.
Java class files are platform independent because Sun was the only designer of Java, it actually made all the rules (from bytecode to file format and VM behavior) and everyone else had to adapt.
This didn't happen with native binary formats, every OS made its format, compiler made its object format and every CPU has its own ISA.
There is absolutely nothing in any specification that says this CAN'T be the case. (Note that the languages C and C++ are both platform independent, but the OBJECT files produced by C and C++ are what is NOT platform independent)
However, because C and C++ are both languages designed for performance, most compilers produce machine code for the target system. And you may then say "but my Linux machine runs on the same processor as my Windows machine", but of course, that's not the ONLY difference between object files or executable files on different OS architectures. And whilst it may be possible to convert object files containing machine code for the same processor from one format to another, it's fraught with problematic things like "what do with inlined system calls" (in other words, someone called gettimeofday via the std::chrono interface, and the compiler inlined this call, which is a call directly to the OS - well, Windows has no idea what gettimeofday is, it's called GetSystemTime or some such, and the method of calling the OS is completely different...)
If you want an OS independent system, then all object files must be "pure" - and of course, both systems need to support the same object file format (or support conversion of them).
One could make a C or C++ compiler that does what Java (and C#, etc) does, where the compiler doesn't produce machine-code for the target system, but produces a "intermediate form" - but that would be a little contrary to the ideas of C and C++, which is that the language is designed to be VERY efficient, and not have a lot of overhead. If portability is more important than performance, maybe you want to use Java? Or some other portable language...
Different platforms use different object file formats (ELF for Linux, COFF/PE for Windows), so an object file built on one platform may not be usable on another.
Remember that an object file is (usually) native machine code, just not in an executable form.
C is cross-platform in source code level.
Once it is compiled, the binary is subject to many factors.
In architecture level, it is possible to generate intermediate object code like LLVM, and do JIT on target machine so that the code fits for target architecture.
However, unless you are doing free-standing development, the platform dependency can comes to play a party preventing you to directly run the code. These dependency may include linking parameters, difference in implementation of standard libraries, platform-specific features, etc.
There is still exception, if the OS provide binary-level compatibility (like BSD ), you may indeed run code compiled for other platform directly.

Will statically linked c++ binary work on every system with same architecture?

I'm making a very simple program with c++ for linux usage, and I'd like to know if it is possible to make just one big binary containing all the dependencies that would work on any linux system.
If my understanding is correct, any compiler turns source code into machine instructions, but since there are often common parts of code that can be reused with different programs, most programs depend on another libraries.
However if I have the source code for all my dependencies, I should be able to compile a binary in a way that would not require anything from the system? Will I be able to run stuff compiled on 64bit system on a 32bit system?
In short: Maybe.
The longer answer is:
It depends. You can't, for example, run a 64-bit binary on a 32-bit system, that's just not even nearly possible. Yes, it's the same processor family, but there are twice as many registers in the 64-bit system, which also has twice as long registers. What's the 32-bit processor going to "give back" for the value of those bits and registers that doesn't exist in the hardware in the processor? It just plain won't work. Some of the instructions also completely change meaning, so the system really needs to be "right" for the compiled code, or it won't work - fortunately, Linux will check this and plain refuse if it's not right.
You can BUILD a 32-bit binary on a 64-bit system (assuming you have all the right libraries, etc, installed for both 64- and 32-bit, etc).
Similarly, if you try to run ARM code on an x86 processor, or MIPS code on an ARM processor, it simply has no chance of working, because the actual instructions are completely different (or they would be in breach of some patent/copyright or similar, because processor instruction sets contain portions that are "protected intellectual property" in nearly all cases - so designers have to make sure they do NOT do "the same as someone else's design"). Like for 32-bit and 64-bit, you simply won't get a chance to run the wrong binary here, it just won't work.
Sometimes, there are subtle differences, for example ARM code can be compiled with "hard" or "soft" floating point. If the code is compiled for hard float, and there isn't the right support in the OS, then it won't run the binary. Worse yet, if you compile on x86 for SSE instructions, and try to run on a non-SSE processor, the code will simply crash [unless you specifically build code to "check for SSE, and display error if not present"].
So, if you have a binary that passes the above criteria, the Linux system tends to change a tiny bit between releases, and different distributions have subtle "fixes" that change things. Most of the time, these are completely benign (they fix some obscure corner-case that someone found during testing, but the general, non-corner case behaviour is "normal"). However, if you go from Linux version 2.2 to Linux version 3.15, there will be some substantial differences between the two versions, and the binary from the old one may very well be incompatible with the newer (and almost certainly the other way around) - it's hard to know exactly which versions are and aren't compatible. Within releases that are close, then it should work OK as long as you are not specifically relying on some feature that is present in only one (after all, new things ARE added to the Linux kernel from time to time). Here the answer is "maybe".
Note that in the above is also your implementation of the C and C++ runtime, so if you have a "new" C or C++ runtime library that uses Linux kernel feature X, and try to run it on an older kernel, before feature X was implemented (or working correctly for the case the C or C++ runtime is trying to use it).
Static linking is indeed a good way to REDUCE the dependency of different releases. And a good way to make your binary huge, which may be preventing people from downloading it.
Making the code open source is a much better way to solve this problem, then you just distribute your source code and a list of "minimum requirements", and let other people deal with it needing to be recompiled.
In practice, it depends on "sufficiently simple". If you're using C++11, you'll quickly find that the C++11 libraries have dependencies on modern libc releases. In turn, those only ship with modern Linux distributions. I'm not aware of any "Long Term Support" Linux distribution which today (June 2014) ships with libc support for GCC 4.8
The short answer is no, at least without serious hack.
Different linux distribution may have different glue code between user-space and kernel. For instant, an hello world seemingly without dependency built from ubuntu cannot be executed under CentOS.
EDIT: Thanks for the comment. I re-verify this and the cause is im using 32-bit VM. Sorry for causing confusion. However, as noted above, the rule of thumb is that even same linux distribution may sometime breaks compatibility in order to deploy bugfix, so the conclusion stands.

Does building the compiler from source result in better optimization?

Consider this simple case scenario:
I download the pre-built binaries of a C++ compiler (say CLang or GCC or anything else) for my generic OS (that is not windows). I compile my code which consists of some computationally expensive mathematical calculation with optimization flag -O3 and I have an execution time of T1.
On a different attempt, this time instead of using pre-built binaries I download the source code and build the compiler by myself on my generic machine. I compile the same code with the same optimization flag, achieving execution time T2?
Will T2 < T1 or they will be more or less the same?
In other words, is the execution time independent from the way that compiler is built?
The compiler's optimization of your code is the result of the behavior of the compiler, not the performance of the compiler.
As long as the compiler has the same behavioral design, it will produce exactly the same output.
Generally the same compiler version should generate the same assembler code given the same C or C++ code input. However there are certain things that might further affect the code that is being execute when you run the compiler.
A distro might have backported (or even created own) patches from other versions.
Modern compilers often have library depenencies (e.g. cloog) that may have different behaviour in different versions, causing the compiler to make code generation decisions based on essentially other data
These libraries may (in some compiler versions) be optional at compile time (might need to give --enable switches to configure, or configure tries to autodetect them).
Compiler switches like -march=native will look on what hardware you compile and try to optimize accordingly.
a time limit in the compilers optimizer triggers, essentially making better optimizations on better machines; or the same for memory (I don't think thats to be found in modern compilers anymore though)
That said, even the same assembler might perform different on yours and their machine, e.g. because one is optimized for AMD, the other for intel.
In my opinion, and in theory, compilation speed can be faster, since you can say to "compiler which compile the compiler", "please target to my computer, and you can use my computer's processor's own machine code to optimize".
But I think compiler's optimization cannot be faster.. To make compiler's optimization faster, I think we need put something like new technology into compiler, not just re-compile.
That depends on how that compiler is implemented and on your platform, but the answer will be most likely "no".
If your platform provides specific functionality that can improve the performance of your program, the optimizer in your compiler might use that functionality to produce a faster program. The optimizer can do so only if the compiler writer was aware of the functionality and has implemented special treatment for your platform in the optimizer. If that is the case, the detection might be done dynamically in the optimizer, meaning any build of the optimizer can detect the platform and optimize your code. Only if the detection has to occur at compiletime of the optimizer for some reason, recompiling it on your platform could give that advantage. But if such a better build for your plaform exists, the compiler vendor most likely has provided binaries for it.
So, with all these ifs, it's unlikely that your program will be any faster when you recompile the compiler on your platform. There is a chance, however, that the compiler will be a bit faster if it is optimized to your platform rather than a generic binary, resulting on shorter compiletimes.