What does a C++ program require to run?

What does a C++ program require to run? - c++

This question has been bothering me for a while now. Let's consider the two following programs:
#incude <iostream>
int main()
{
std::cout << "Hello, World!";
}
and
int main()
{
int x = 5;
int y = x*x;
}
Windows:
The first example, naturally, requires some system .dll's for the console. I understand that. What about the second? Does it need anything to run? Some runtime libraries? By the way, what do runtime libraries actually do?
Linux:
No idea, can you enlighten me?
I know it depends on the compiler and OS, but I need either a general answer or particular examples. TIA.

As a general answer, the first will require the C++ runtime libraries (the stuff you need to support the standard library calls). These form an interface of sorts between the language and the support libraries, which in turn know how to achieve what they do in the given environment.
The second makes no use of the runtime libraries. It will use the C startup and termination code (that initialises and tears down the C environment) but it's a discussion point as to whether or not these are considered part of the runtime libraries. If you consider them a part, then , yes, they will be used. It will probably be a very small part used since there's usually a big difference in size between startup code and the streams stuff.
You can link your code statically (binding at link time) with runtime libraries or dynamically (so that the actual binding is done at load time). That's true for both Windows and Linux.

For Windows applications you can use the Dependency Walker to see all dependencies.

The first program performs stream I/O, which means it has to interact with resources (console, gui) managed by the OS. So, ultimately, the OS has to be invoked via an API implemented in a system dll.
On windows the second program requires no libraries. I'm fairly sure the same is true on Linux.

Compile them with GCC, and get executable named 'hi', in console write:
ldd hi
will give you the shared objects(dynamic libraries) which are connected to your program.
Just for quick answer here is an output:
ldd tifftest
libtiff.so.3 => /usr/lib/libtiff.so.3 (0x4001d000)
libc.so.6 => /lib/libc.so.6 (0x40060000)
libjpeg.so.62 => /usr/lib/libjpeg.so.62 (0x40155000)
libz.so.1 => /usr/lib/libz.so.1 (0x40174000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

Well, let's look at this from a more general point of view:
To start with, you'll need a computer with a compatible CPU that works with target machine of the compiler output. You might think this is obvious but assuming that code compiles to x86 machine code, it won't run on an Alpha CPU which uses different instructions. Alternatively, if you compile to x64 machine code, it won't run on an x86-only CPU. So the correct hardware is necessary to run the C++ program, in contrast to virtual-machine based languages like Java, which abstract that away.
You will also need the correct operating system. I'm not an expert on porting programs but I don't think it's possible to build a single executable that runs on multiple operating systems in C++. For example, compiling even your second example to Windows will have a lot of runtime-library code behind the scenes before and after the actual call to your main() function. This will do things like prepare the heap and initialise the CRT library. The CRT for Windows is implemented via the Windows API. You can static link the library so no CRT DLL is required, but the code in your program still makes calls to the Windows API, so is still platform dependent. As an experiment I compiled an empty program with static linking on Windows with Visual Studio, and according to Dependency Walker it still references KERNEL32.DLL for functions like HeapCreate and ExitProcess. So the 'empty' program still does a whole bunch of operating system stuff for you, in preparation for doing something useful (regardless of whether or not your program does anything useful).
Also note there may well be a minimum operating system version: Visual Studio 2010 requires Windows XP SP2 or above for even an empty program, due to calls made to EncodePointer and DecodePointer. See this question.
The system will have to have the memory available to launch your program. You may think it does nothing, but as above demonstrates, before main() is called a whole load of OS initialization calls are made by your program's library. These probably require some memory, and the processing time necessary to execute it.
Depending on the configuration of the operating system, you may need sufficient security privileges to launch executable programs.
So, in short, to run an empty C++ program even with static linking, you need the right CPU, operating system, permission to run the executable, and memory/processing time to complete the program. Compared to VM technologies like Java or .NET, the requirements would reduce to probably just the correct virtual machine, necessary privilege, and necessary memory/CPU time to run the program. This may not be as simple as it sounds: you might need the correct version of a virtual machine, such as .NET framework 4.0. This can complicate your distribution process since updating the entire JVM or .NET framework for a machine can be a time consuming process requiring administrator privileges and maybe an internet connection. In some narrow cases, this could be a deal breaker, since on rare occasions it may be easier to be able to say "it will run on any x86-compatible Windows operating system since XP" as opposed to "any machine with the latest virtual machine that was only released yesterday". For most purposes, though, the fact the virtual machine allows you to (in theory) forget about the CPU and operating system makes the distribution of the program easier; with C++, you are required to at least compile separate executables for each combination of platform and CPU you want to support, regardless of the additional requirements of the libraries you're using.

C programs on Windows require CRT libraries that come with Windows. C++ sometimes require so called "C++ redistributable". They can be embedded in app via linking but this will make EXE bigger.

For the 1st part of your question - you have been answered by several members.
But What I am saying is general and required for both cases - (in case you are not aware of)
For any program to run, it has to be provided with resources it needs. While answering 1st part team has already listed several items.
But in general, what it needs is well defined address space (in main memory), its properties and CPU time. Operating System ensures you get that when you execute your program. Unless there is some ridiculous conflict your program will get that (and that's why I guess Chubsdad commented "you need luck").
OS scheduling, CPU asking to fetch instructions/data from memory and then the executing it... all forms a "machine" that executes your program.
Finding the entry point (or first point in your program to execute) is all that is decided either at compile time (main function for example) or while you load your program using some system call like exec() (in Unix) / CreateProcess() (in windows).

On Linux, any C program is statically linked to some CRT libraries. The true entry point of the program is the _start() function defined in /usr/lib/crt1.o. This function calls some libc functions like __libc_start_main(). Thus you still need the libc library...
You could do without libc, but it's tricky. You would need to rename your entry point _start(), or instruct the linker to start at main(). And you would also need some inline assembly to issue the _exit() system call when the program is done, otherwise it would just crash. And of course, do the link explicitly with the ld command instead of through the gcc frontend.

Related

What affects generated machine code at each step of the compilation process?

I am almost certain this question has been asked before, but I can not seem to find the right keywords to search for to get an answer. My apologies if this is a duplicate.
I am better trying to understand the compilation process of say a C++ file as it goes from the C++ syntax to the binary machine code. In addition I am trying to understand what influences the resulting machine code.
First, I am nearly certain that the following are the only factors (for most systems) that dictate the final machine code (please correct me if I am wrong here)
The tools used to compile, assemble, and link.
Things like gnu c compiler, clang, visual studio, nasm, ect.
The kernel of the system being used.
Whether its a specific version of the linux kernel, windows microkernel, or some other kernel like a mac os x one.
The operating system being used.
This one I am less clear about. I am unsure if machines running the same linux kernel, but different os, in this case let's say debian vs centos, will they produce different binaries.
Lastly the hardware architecture.
Different cpu architectures like arm 64, x86, power pc, ect. take different op codes so obviously the machine code should be different.
So with that being said here is my understanding of the compilation process and where each of these dependencies show up.
I write a C++ file and use code that my system can understand. A good example might be using <winsock.h> on windows and <sys/socket.h> on linux.
The preprocessor runs and executes any preprocessor macros.
Here I know that different preprocessors will define different macros but for now I will assume this is not too machine dependent. (This might be wrong to assume).
The compiler tools run to produce assembly file outputs.
Here the assembly produced depends on the compiler and what optimizations or choices it makes.
It also depends on the kernel because different kernels have different system calls and store files in different locations. This means the assembly might make changes such as different branching when calling functions specific to that kernel.
The operating system? Still unsure how the operating system fits in to this. If two machines have the same kernel, what does the operating system do to the binaries?
Finally the assembly code depends on the cpu architecture. I think that is a pretty obvious statement.
Once the compiler produces an assembly. We can then invoke the assembler to turn our assembly code into almost complete machine code. (I think machine code is identical to binary opcodes a cpu manual lists but this might be wrong).
The corresponding machine code files (often called object files I think) contain nearly all the instructions needed to run or reference other machine code files which will be linked in the next step.
This machine code usually has some format (I think ELF is a popular format for linux) and this format is dependent on the linker for sure.
I don't think the kernel, operating system, or hardware affect the layout/format of the object file but this is probably wrong. If they do please correct this.
The hardware will affect the actual machine code produced because again I think it is a 1 to 1 mapping of machine code instructions to opcodes for a cpu.
I am unsure if the kernel or operating system affect the linking process because I thought their changes were already incorporated in the compiling step.
Finally the linking step occurs.
I think this is as simple as the linker looking for all the referenced machine code and injecting it into one complete machine code file which can be executed.
I have no clue what affects this besides the linker tool itself.
So with all that, I need help identifying inaccuracies with the procedure I described above, and any dependencies I might have missed whether it be cpu, os, kernel, or tool ones.
Thank you and sorry for the long winded question. This probably should have been broken up into multiple questions but I am too far in. If this does not go well I may ask each part in individual questions.
EDIT:
Questions with more focus.
What components of a machine affect the machine code produced given a C++ file input?

Actually that is a lot of questions and usually you're question would be much too broad for SO (as you managed to recognize by yourself). But on the other hand you showed a deep interest (just by writing such a long and profound question) and also a lot of correct understanding of the process of compiling a program. The things you are missing or not understanding correctly (and you are probably the most interested in) are those things, that I myself found hard to learn. Thus I will provide you with some important points, that I think you are missing in the big picture.
Note that I am very much used to Linux, so I will mostly describe how things work on Linux. But I believe that most things also happen in a similar way on other operating systems.
Let's begin with the hardware. A modern computer has a CPU of some architecture. There are lots of different of CPU architectures. You mentioned some of them like arm, x86, etc. which are families of similar CPUs and can be divided into smaller groups by bit width and/or supported extensions. Ultimately your processor has a specified instruction set that defines which opcodes it supports and what those opcodes do. If a native (compiled) program runs, there are raw opcodes in the memory and the CPU directly executes them following its architecture specification.
Aside from the CPU there is a lot more hardware connected to your computer. Usually communicating with this hardware is complicated and not standardized. If a user program for example gets input keystrokes from the keyboard, in does not have to directly communicate with the keyboard, but rather does this via the operating system kernel. This works by a mechanism called syscall interrupt. The kernel installs an handler routine, that is called if a user program triggers such an interrupt with a special CPU instruction. You can think of it like a language agnostic function call from the program into the kernel. For example for Linux you can find a list of all syscalls at the syscall(2) man page. The syscalls form the kernel's Application Binary Interface (kernel ABI). Reading and writing from a terminal or using a filesystem are examples for syscall functionality.
As you can see, there are already very high level functions, that are implemented in the kernel. However the functionality is still quite limited for most typical applications. To encapsulate the syscalls and provide functions for memory management, utility functions, mathematical functions and many other things you probably use in your daily programs, there is usually another layer between the program and the kernel. This thing is called the C standard library, and it is a shared library (we will cover what exactly this is in a moment). On GNU/Linux it is the glibc which is the single most important library on a GNU/Linux system (and notably not part of the kernel 1). While it implements all the features that are required by the C standard (for example functions like malloc() or strcpy()), it also ships a lot of additional functions which are a superset of the ISO C standard library, the POSIX standard and some extensions. This interface is usually called the Application Programming Interface (API) of the operating system. While it is in principle possible to bypass the API and directly use the syscalls, almost all programs (even when written in other languages than C or C++) use the C library.
Now get yourself a coffee and a few minutes of rest. We now have enough background information to look at how a C++ program is transformed into a binary, and how exactly this binary is executed.
A C++ program consists of different compilation units (usually each different source file is a compilation unit). Each compilation unit undergoes the following steps
The preprocessor is run on the file. It includes header, expands macros and does some other stuff. As you wrote in your question this is rather platform independent. The preprocessor actions are standardized in the C++ standard.
The resulting code is compiled. That means C++ code is translated into assembly code. Because assembly code directly reflects the CPU instructions, this step is dependent on the target CPU architecture, that the compiler was configured for (usually the host CPU). The compiler is allowed to optimize and translate the program in any way it wants, as long as it follows the as-if rule. Thus this step is also higly dependent on the compiler you are using.
Note: Symbols (especially functions) that are not defined, are left undefined. If you say call the malloc() function, this will not be compiled, but left unevaluated until later. Thus this step is also not much dependent on the operating system.
Assembling takes place. This is very straightforward. The assembly code usually can be converted directly into binary CPU instructions. Local symbols (such as goto labels etc.) are resolved and replaced by their corresponding addresses. Unknown external symbols such as the mentioned malloc() call still are left unevaluated and are stored in the object file's symbol table. Because most of the syscalls are wrapped in library functions, the assembly code will usually not directly contain syscall code. Thus this step is depended on the CPU architecture. It is however dependent on the ABI2, which in term is dependent on the compiler and the OS.
Linking takes place. The different compilation units are combined into a single executable binary in an OS-dependent format (e.g. GNU/Linux uses ELF). Here yet more symbols are resolved. For example if one compilation calls a function in another compilation unit, this call is resolved and the symbol is replaced by the function address. If you link to a library statically, this is just treated like another compilation unit and included into the executable with its symbols resolved.
Shared libraries are checked for the needed symbols, but not linked yet. For example in case of the malloc() call, the linker checks, that there is a malloc symbol in the glibc, but the symbol in the executable still remains unresolved.
At this point you have a executable binary. As you might noticed, there might still be unresolved symbols in that binary. Thus you cannot just load that binary into RAM and let the CPU execute it. A final step called dynamic linking is needed. On Linux the program that performs this step is called the dynamic linker/loader. Its task is to load the executable ELF file into memory, look up all the needed dynamic libraries, load them into memory as well (a list is stored in the ELF file) and resolve the remaining symbols. This last step happens each time the program is executed. Now finally the malloc() symbol is resolved with the address in the glibc shared library.
You have pure CPU instructions in memory, the CPU's program counter register (the one that tracks the next instruction) is set to the entry point, and the program can begin to run. Every now and then it is interrupted either because it makes a syscall, or because it is interrupted by the kernel scheduler to let another program run on that CPU core.
I hope I could answer some of your questions and satisfy your curiosity. I think the most important part you were missing, was how dynamic linking happens. This is a very interesting topic which is related to concepts like position independent code. I wish you could luck learning.
1 this is also one reason why some people insist on calling Linux based systems GNU/Linux. The glibc library (together with many other GNU programs) defines much of the operating system structure, interacts with supplementary programs and configuration files etc. There are however Linux based systems without glibc. One of them is Android, using Googles bionic libc.
2 The ABI is related to the calling convention. This is a mixture of operating system, programming language and compiler specification. It is one of the reasons (besides name mangling, see the comment of PeterCordes below) you need those extern "C" {...} scopes in C++ header files, that declare C functions in shared libraries. It basically is a convention on how to pass parameters and return values between functions.

Neither operating system nor kernel are directly involved in any of this.
Their limited involvement is in that if you want to build Linux 64 bit binaries for x86 using gnu tools then you need to in some way (download and install or build yourself) build the gnu tools themselves for that target processor and that operating system. As system calls are specific to the operating system and target, and also the binaries supported by that operating system. Not strictly just the elf file format, that is just a container, but the linking and possibly bootstrap is also specific to the operating systems loader. (or if building something for the kernel that would have other rules). For example, does the application loader initialize .bss and .data for you from specific information in the .elf file, or like on an mcu does the bootstrap code itself have to do this?
The builder for gnu tools for a target like linux and ideally a pre-built binary for your os and target, would have paths setup in some way. The c library would have a default linker script and its intimate partner the bootstrap.
After that point, it is just a dumb toolchain. Include files be they at the system level, compiler level, or programmer level are just includes in the C language. The default paths and gcc knows where it was executed from so it knows where in a normal build the gcc and other libraries live.
gcc itself is not a compiler actually it calls other programs like the preprocessor, the compiler itself, the assembler and linker.
The preprocessor is going to do the search and replace for includes and defines and end up with one great big cpp file, then pass that to the compiler.
The compiler front end (C++ language for gcc for example) turns that into an internal language, allocate an int with this name, and another add the two and blah. A pseudo code if you will. This gets a lot of the optimization work done on it then eventually the back end (which for gnu could be x86, mips, arm, etc independent to some extent of the front and middle). The LLVM tools, are at least capable of exposing that middle, internal, language to external files (external to the memory used by the compiler to do the compilation) and you can combine and optimize those bytecode files and then convert them to assembly or direct to object in the llvm world. I think this is an exception not a rule, others just use internal tables.
While I think it is wise and sane to use an assembly language step. Not all compilers do and do not assume that all compilers do. Some output objects.
Yes that assembly is naturally partial, external functions (labels) and variables (labels) cannot be resolved at the object level. The linker has to do that.
So the target (x86, arm, etc) does affect the construction of the elf file as
there are certain items, magic numbers specific to the target. As mentioned the operating system and or kernel do affect the elf in that there are rules for construction of the binary for that kernel or operating system. Remember that elf is just a container like tar or zip or mkv etc. Do not assume that the operating system can handle every possible choice you want to make with the contents that the linker will allow (the tools are dumb, do what they are told).
So your source.
All the relevant sources that go with it including system includes, compiler includes and your includes.
gcc/g++ is a wrapper program that manages the steps.
calls the pre-processor expands includes and defines into one file (no magic here)
call the compiler to parse that one file into internal tables, think pseudo code and data
many, many possible optimizers that operate on these structures
backend, including peephole optimizer, turns the tables into assembly language (for gnu at least)
assembler is called to turn the asm into an object
If all the objects are specified and gcc is told to link, then...
Linker combines all the objects for the binary, including the bootstrap, including already built libraries, stubs, etc, and command line or more likely a linker script (linker script and bootstrap have an intimate relationship they are not assumed to be separable and not part of the compiler they are part of a C library, etc).
Kernel module loader or operating system application loader fed the file and per the rules of that loader loads and runs the program.

Understanding static & dynamic library linking [duplicate]

Are there any compelling performance reasons to choose static linking over dynamic linking or vice versa in certain situations? I've heard or read the following, but I don't know enough on the subject to vouch for its veracity.
1) The difference in runtime performance between static linking and dynamic linking is usually negligible.
2) (1) is not true if using a profiling compiler that uses profile data to optimize program hotpaths because with static linking, the compiler can optimize both your code and the library code. With dynamic linking only your code can be optimized. If most of the time is spent running library code, this can make a big difference. Otherwise, (1) still applies.

Dynamic linking can reduce total resource consumption (if more than one process shares the same library (including the version in "the same", of course)). I believe this is the argument that drives its presence in most environments. Here "resources" include disk space, RAM, and cache space. Of course, if your dynamic linker is insufficiently flexible there is a risk of DLL hell.
Dynamic linking means that bug fixes and upgrades to libraries propagate to improve your product without requiring you to ship anything.
Plugins always call for dynamic linking.
Static linking, means that you can know the code will run in very limited environments (early in the boot process, or in rescue mode).
Static linking can make binaries easier to distribute to diverse user environments (at the cost of sending a larger and more resource-hungry program).
Static linking may allow slightly faster startup times, but this depends to some degree on both the size and complexity of your program and on the details of the OS's loading strategy.
Some edits to include the very relevant suggestions in the comments and in other answers. I'd like to note that the way you break on this depends a lot on what environment you plan to run in. Minimal embedded systems may not have enough resources to support dynamic linking. Slightly larger small systems may well support dynamic linking because their memory is small enough to make the RAM savings from dynamic linking very attractive. Full-blown consumer PCs have, as Mark notes, enormous resources, and you can probably let the convenience issues drive your thinking on this matter.
To address the performance and efficiency issues: it depends.
Classically, dynamic libraries require some kind of glue layer which often means double dispatch or an extra layer of indirection in function addressing and can cost a little speed (but is the function calling time actually a big part of your running time???).
However, if you are running multiple processes which all call the same library a lot, you can end up saving cache lines (and thus winning on running performance) when using dynamic linking relative to using static linking. (Unless modern OS's are smart enough to notice identical segments in statically linked binaries. Seems hard, does anyone know?)
Another issue: loading time. You pay loading costs at some point. When you pay this cost depends on how the OS works as well as what linking you use. Maybe you'd rather put off paying it until you know you need it.
Note that static-vs-dynamic linking is traditionally not an optimization issue, because they both involve separate compilation down to object files. However, this is not required: a compiler can in principle, "compile" "static libraries" to a digested AST form initially, and "link" them by adding those ASTs to the ones generated for the main code, thus empowering global optimization. None of the systems I use do this, so I can't comment on how well it works.
The way to answer performance questions is always by testing (and use a test environment as much like the deployment environment as possible).

1) is based on the fact that calling a DLL function is always using an extra indirect jump. Today, this is usually negligible. Inside the DLL there is some more overhead on i386 CPU's, because they can't generate position independent code. On amd64, jumps can be relative to the program counter, so this is a huge improvement.
2) This is correct. With optimizations guided by profiling you can usually win about 10-15 percent performance. Now that CPU speed has reached its limits it might be worth doing it.
I would add: (3) the linker can arrange functions in a more cache efficient grouping, so that expensive cache level misses are minimised. It also might especially effect the startup time of applications (based on results i have seen with the Sun C++ compiler)
And don't forget that with DLLs no dead code elimination can be performed. Depending on the language, the DLL code might not be optimal either. Virtual functions are always virtual because the compiler doesn't know whether a client is overwriting it.
For these reasons, in case there is no real need for DLLs, then just use static compilation.
EDIT (to answer the comment, by user underscore)
Here is a good resource about the position independent code problem http://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries/
As explained x86 does not have them AFAIK for anything else then 15 bit jump ranges and not for unconditional jumps and calls. That's why functions (from generators) having more then 32K have always been a problem and needed embedded trampolines.
But on popular x86 OS like Linux you do not need to care if the .so/DLL file is not generated with the gcc switch -fpic (which enforces the use of the indirect jump tables). Because if you don't, the code is just fixed like a normal linker would relocate it. But while doing this it makes the code segment non shareable and it would need a full mapping of the code from disk into memory and touching it all before it can be used (emptying most of the caches, hitting TLBs) etc. There was a time when this was considered slow.
So you would not have any benefit anymore.
I do not recall what OS (Solaris or FreeBSD) gave me problems with my Unix build system because I just wasn't doing this and wondered why it crashed until I applied -fPIC to gcc.

Dynamic linking is the only practical way to meet some license requirements such as the LGPL.

I agree with the points dnmckee mentions, plus:
Statically linked applications might be easier to deploy, since there are fewer or no additional file dependencies (.dll / .so) that might cause problems when they're missing or installed in the wrong place.

One reason to do a statically linked build is to verify that you have full closure for the executable, i.e. that all symbol references are resolved correctly.
As a part of a large system that was being built and tested using continuous integration, the nightly regression tests were run using a statically linked version of the executables. Occasionally, we would see that a symbol would not resolve and the static link would fail even though the dynamically linked executable would link successfully.
This was usually occurring when symbols that were deep seated within the shared libs had a misspelt name and so would not statically link. The dynamic linker does not completely resolve all symbols, irrespective of using depth-first or breadth-first evaluation, so you can finish up with a dynamically linked executable that does not have full closure.

1/ I've been on projects where dynamic linking vs static linking was benchmarked and the difference wasn't determined small enough to switch to dynamic linking (I wasn't part of the test, I just know the conclusion)
2/ Dynamic linking is often associated with PIC (Position Independent Code, code which doesn't need to be modified depending on the address at which it is loaded). Depending on the architecture PIC may bring another slowdown but is needed in order to get benefit of sharing a dynamically linked library between two executable (and even two process of the same executable if the OS use randomization of load address as a security measure). I'm not sure that all OS allow to separate the two concepts, but Solaris and Linux do and ISTR that HP-UX does as well.
3/ I've been on other projects which used dynamic linking for the "easy patch" feature. But this "easy patch" makes the distribution of small fix a little easier and of complicated one a versioning nightmare. We often ended up by having to push everything plus having to track problems at customer site because the wrong version was token.
My conclusion is that I'd used static linking excepted:
for things like plugins which depend on dynamic linking
when sharing is important (big libraries used by multiple processes at the same time like C/C++ runtime, GUI libraries, ... which often are managed independently and for which the ABI is strictly defined)
If one want to use the "easy patch", I'd argue that the libraries have to be managed like the big libraries above: they must be nearly independent with a defined ABI that must not to be changed by fixes.

Static linking is a process in compile time when a linked content is copied into the primary binary and becomes a single binary.
Cons:
compile time is longer
output binary is bigger
Dynamic linking is a process in runtime when a linked content is loaded. This technic allows to:
upgrade linked binary without recompiling a primary one that increase an ABI stability[About]
has a single shared copy
Cons:
start time is slower(linked content should be copied)
linker errors are thrown in runtime
[iOS Static vs Dynamic framework]

It is pretty simple, really. When you make a change in your source code, do you want to wait 10 minutes for it to build or 20 seconds? Twenty seconds is all I can put up with. Beyond that, I either get out the sword or start thinking about how I can use separate compilation and linking to bring it back into the comfort zone.

Best example for dynamic linking is, when the library is dependent on the used hardware. In ancient times the C math library was decided to be dynamic, so that each platform can use all processor capabilities to optimize it.
An even better example might be OpenGL. OpenGl is an API that is implemented differently by AMD and NVidia. And you are not able to use an NVidia implementation on an AMD card, because the hardware is different. You cannot link OpenGL statically into your program, because of that. Dynamic linking is used here to let the API be optimized for all platforms.

Dynamic linking requires extra time for the OS to find the dynamic library and load it. With static linking, everything is together and it is a one-shot load into memory.
Also, see DLL Hell. This is the scenario where the DLL that the OS loads is not the one that came with your application, or the version that your application expects.

On Unix-like systems, dynamic linking can make life difficult for 'root' to use an application with the shared libraries installed in out-of-the-way locations. This is because the dynamic linker generally won't pay attention to LD_LIBRARY_PATH or its equivalent for processes with root privileges. Sometimes, then, static linking saves the day.
Alternatively, the installation process has to locate the libraries, but that can make it difficult for multiple versions of the software to coexist on the machine.

Another issue not yet discussed is fixing bugs in the library.
With static linking, you not only have to rebuild the library, but will have to relink and redestribute the executable. If the library is just used in one executable, this may not be an issue. But the more executables that need to be relinked and redistributed, the bigger the pain is.
With dynamic linking, you just rebuild and redistribute the dynamic library and you are done.

Static linking includes the files that the program needs in a single executable file.
Dynamic linking is what you would consider the usual, it makes an executable that still requires DLLs and such to be in the same directory (or the DLLs could be in the system folder).
(DLL = dynamic link library)
Dynamically linked executables are compiled faster and aren't as resource-heavy.

static linking gives you only a single exe, inorder to make a change you need to recompile your whole program. Whereas in dynamic linking you need to make change only to the dll and when you run your exe, the changes would be picked up at runtime.Its easier to provide updates and bug fixes by dynamic linking (eg: windows).

There are a vast and increasing number of systems where an extreme level of static linking can have an enormous positive impact on applications and system performance.
I refer to what are often called "embedded systems", many of which are now increasingly using general-purpose operating systems, and these systems are used for everything imaginable.
An extremely common example are devices using GNU/Linux systems using Busybox. I've taken this to the extreme with NetBSD by building a bootable i386 (32-bit) system image that includes both a kernel and its root filesystem, the latter which contains a single static-linked (by crunchgen) binary with hard-links to all programs that itself contains all (well at last count 274) of the standard full-feature system programs (most except the toolchain), and it is less than 20 megabytes in size (and probably runs very comfortably in a system with only 64MB of memory (even with the root filesystem uncompressed and entirely in RAM), though I've been unable to find one so small to test it on).
It has been mentioned in earlier posts that the start-up time of a static-linked binaries is faster (and it can be a lot faster), but that is only part of the picture, especially when all object code is linked into the same file, and even more especially when the operating system supports demand paging of code direct from the executable file. In this ideal scenario the startup time of programs is literally negligible since almost all pages of code will already be in memory and be in use by the shell (and and init any other background processes that might be running), even if the requested program has not ever been run since boot since perhaps only one page of memory need be loaded to fulfill the runtime requirements of the program.
However that's still not the whole story. I also usually build and use the NetBSD operating system installs for my full development systems by static-linking all binaries. Even though this takes a tremendous amount more disk space (~6.6GB total for x86_64 with everything, including toolchain and X11 static-linked) (especially if one keeps full debug symbol tables available for all programs another ~2.5GB), the result still runs faster overall, and for some tasks even uses less memory than a typical dynamic-linked system that purports to share library code pages. Disk is cheap (even fast disk), and memory to cache frequently used disk files is also relatively cheap, but CPU cycles really are not, and paying the ld.so startup cost for every process that starts every time it starts will take hours and hours of CPU cycles away from tasks which require starting many processes, especially when the same programs are used over and over, such as compilers on a development system. Static-linked toolchain programs can reduce whole-OS multi-architecture build times for my systems by hours. I have yet to build the toolchain into my single crunchgen'ed binary, but I suspect when I do there will be more hours of build time saved because of the win for the CPU cache.

Another consideration is the number of object files (translation units) that you actually consume in a library vs the total number available. If a library is built from many object files, but you only use symbols from a few of them, this might be an argument for favoring static linking, since you only link the objects that you use when you static link (typically) and don't normally carry the unused symbols. If you go with a shared lib, that lib contains all translation units and could be much larger than what you want or need.

C++ Windows Compiler for smallest executables

guys I want to start programing with C++. I have written some programs in vb6, vb.net and now I want to gain knowledge in C++, what I want is a compiler that can compile my code to the smallest windows application. For example there is a Basic language compiler called PureBasic that can make Hello world standalone app's size 5 kb, and simple socket program which i compiled was only 12kb (without any DLL-s and Runtime files). I know it is amazing, so I want something like this for C++.
If I am wrong and there is not such kind of windows compiler can someone give me a website or book that can teach me how to reduce C++ executable size, or how to use Windows API calls?

Taking Microsoft Visual C++ compiler as example, if you turn off linking to the C runtime (/NODEFAULTLIB) your executable will be as small as 5KB.
There's a little problem though: you won't be able to use almost anything from the standard C or C++ libraries, nor standard features of C++ like exception handling, new and delete operators, floating point arithmetics, and more. You'll need to use only the features directly provided by WinAPI (e.g. create files with CreateFile, allocate memory with HeapAlloc, etc...).
It's also worth noting that while it's possible to create small executables with C++ using these methods, you may not be using most of C++ features at this point. In fact typical C++ code have some significant bloat due to heavy use of templates, polymorphism that prevents dead code elimination, or stack unwinding tables used for exception handling. You may be better off using something like C for this purpose.

I had to do this many years ago with VC6. It was necessary because the executable was going to be transmitted over the wire to a target computer, where it would run. Since it was likely to be sent over a modem connection, it needed to be as small as possible. To shrink the executable, I relied on two techniques:
Do not use the C or C++ runtime. Tell the compiler not to link them in. Implement all necessary functionality using a subset of the Windows API that was guaranteed to be available on all versions of Windows at the time (98, Me, NT, 2000).
Tell the linker to combine all code and data segments into one. I don't remember the switches for this and I don't know if it's still possible, especially with 64-bit executables.
The final executable size: ~2K

Reduction of the executable size for the code below from 24k to 1.6k bytes in Visual C++
int main (char argv[]) {
return 0;
}
Linker Switches (although the safe alignment is recommended to be 512):
/FILEALIGN:16
/ALIGN:16
Link with (in the VC++ project properties):
LIBCTINY.LIB
Additional pragmas (this will address Feruccio's suggestion)
However, I still see a section of ASCII(0) making a third of the executable, and the "Rich" Windows signature. (I'm reading the latter is not really needed for program execution).
#ifdef NDEBUG
#pragma optimize("gsy",on)
#pragma comment(linker,"/merge:.rdata=.data")
#pragma comment(linker,"/merge:.text=.data")
#pragma comment(linker,"/merge:.reloc=.data")
#pragma comment(linker,"/OPT:NOWIN98")
#endif // NDEBUG
int main (char argv[]) {
return 0;
}

I don't know why you are interested in this kind of optimization before learning the language, but anyways...
It doesn't make much difference of what compiler you use, but on how you use it. Chose a compiler like the Visual Studio C++'s or MinGW for example, and read its documentation. You will find information of how to optimize the compilation for size or performance (usually when you optimize for size, you lose performance, and vice-versa).
In Visual Studio, for example, you can minimize the size of the executable by passing the /O1 parameter to the compiler (or Project Properties/ C-C++ /Optimization).
Also don't forget to compile in "release" mode, or your executable may be full of debugging symbols, which will increase the size of your executable.

A modern desktop PC running Windows has at least 1Gb RAM and a huge hard drive, worrying about the size of a trivial program that is not representative of any real application is pointless.
Much of the size of a "Hello world" program in any language is fixed overhead to do with establishing an execution environment and loading and starting the code. For any non-trivial application you should be more concerned with the rate the code size increases as more functionality is added. And in that sense it is likley that C++ code in any compiler is pretty efficient. That is to say your PureBasic program that does little or nothing may be smaller than an equivalent C++ program, but that is not necessarily the case by the time you have built useful functionality into the code.
#user: C++ does produce small object code, however if the code for printf() (or cout<<) is statically linked, the resulting executable may be rather larger because printf() has a lot of functionality that is not used in a "hello world" program so is redundant. Try using puts() for example and you may find the code is smaller.
Moreover are you sure that you are comparing apples with apples? Some execution environments rely on a dynamically linked runtime library or virtual machine that is providing functionality that might be statically linked in a C++ program.

I don't like to reply to a dead post, but since none of the responses mentions this (except Mat response)...
Repeat after me: C++ != ( vb6 || vb.net || basic ). And I'm not only mentioning syntax, C++ coding style is typically different than the one in VB, as C++ programmers try to make things usually better designed than vb programmers...
P.S.: No, there is no place for copy-paste in C++ world. Sorry, had to say this...

A boot loader in C++

I have messed around a few times by making a small assembly boot loader on a floppy disk and was wondering if it's possible to make a boot loader in c++ and if so where might I begin? For all I know im not sure it would even use int main().
Thanks for any help.

If you're writing a boot loader, you're essentially starting from nothing: a small chunk of code is loaded into memory, and executed. You can write the majority of your boot loader in C++, but you will need to bootstrap your own C++ runtime environment first.
Assembly is really the only option for the first stage, as you need to set up a sensible environment for running anything higher-level. Doing enough to run C code is fairly straightforward -- you need:
code and data loaded in the right place;
there may be an additional part of the data area which must be zero-initialised;
you need to point the stack pointer at a suitable area of memory for the stack.
Then you can jump into the code at an appropriate point (e.g. main()) and expect that the basic language features will work. (It's possible that any features of the standard library that may have been implemented or linked in might require additional initialisation at this stage.)
Getting a suitable environment going for C++ requires more effort, as it needs more initialisation here, and also has core language features which require runtime support (again, this is before considering library features). These include:
running static constructors;
memory allocation to support new and delete;
support for run-time type information (RTTI);
support for exceptions;
probably some other things I've forgotten to mention.
None of these are required until the C environment is up and running, so the code that handles these can be written in C rather than assembler (or even in a subset of C++ that does not make use of the above features).
(The same principles apply in embedded systems, and it's not uncommon for such systems to make use of C++, but only in a limited way -- e.g. no exceptions and/or RTTI because the runtime support isn't implemented.)

It's been a while since I played with writing bootloaders, so I'm going off memory.
For an x86 bootloader, you need to have a C++ compiler that can emit x86 assembly, or, at the very least, you need to write your own preamble in 16-bit assembly that will put the CPU into 32-bit protected (or 64-bit long) mode, before you can call your C++ functions.
Once you've done that, though, you should be able to make use of most, if not all, of C++'s language features, so long as you stay away from things that require an underlying libc. But statically link everything without the CRT and you're golden.

Bootloaders don't have "int main()"s, unless you write assembly code to call it.
If you are writing a stage 1 bootloader, then it is seriously discouraged.
Otherwise, the osdev.org has great documentation on the topic.
While it is probably possible to make a bootloader in C++, remember not to link your code to any dynamic libraries, and remember that just because it is C++, that doesn't mean you can/should use the STL, etc.

Yes it is possible. You have elements of answer and usefull links in this question
You also can have a look here, there is a C++ bootloader example.
The main thing to understand is that you need to create a flat binary instead of the usual fancy executable file formats (PE on windows, or ELF on Unixes), because these file format need an OS to load them, and in a boot loader you don't have an OS yet.
Using library is not a problem if you link statically (no dynamic link because again of the above executable problem). But obviously all OS API related entry points are not available...

Compile and optimize for different target architectures

Summary: I want to take advantage of compiler optimizations and processor instruction sets, but still have a portable application (running on different processors). Normally I could indeed compile 5 times and let the user choose the right one to run.
My question is: how can I can automate this, so that the processor is detected at runtime and the right executable is executed without the user having to chose it?
I have an application with a lot of low level math calculations. These calculations will typically run for a long time.
I would like to take advantage of as much optimization as possible, preferably also of (not always supported) instruction sets. On the other hand I would like my application to be portable and easy to use (so I would not like to compile 5 different versions and let the user choose).
Is there a possibility to compile 5 different versions of my code and run dynamically the most optimized version that's possible at execution time? With 5 different versions I mean with different instruction sets and different optimizations for processors.
I don't care about the size of the application.
At this moment I'm using gcc on Linux (my code is in C++), but I'm also interested in this for the Intel compiler and for the MinGW compiler for compilation to Windows.
The executable doesn't have to be able to run on different OS'es, but ideally there would be something possible with automatically selecting 32 bit and 64 bit as well.
Edit: Please give clear pointers how to do it, preferably with small code examples or links to explanations. From my point of view I need a super generic solution, which is applicable on any random C++ project I have later.
Edit I assigned the bounty to ShuggyCoUk, he had a great number of pointers to look out for. I would have liked to split it between multiple answers but that is not possible. I'm not having this implemented yet, so the question is still 'open'! Please, still add and/or improve answers, even though there is no bounty to be given anymore.
Thanks everybody!

Yes it's possible. Compile all your differently optimised versions as different dynamic libraries with a common entry point, and provide an executable stub that that loads and runs
the correct library at run-time, via the entry point, depending on config file or other information.

Can you use script?
You could detect the CPU using script, and dynamically load the executable that is most optimized for architecture. It can choose 32/64 bit versions too.
If you are using a Linux you can query the cpu with
cat /proc/cpuinfo
You could probably do this with a bash/perl/python script or windows scripting host on windows. You probably don't want to force the user to install a script engine. One that works on the OS out of the box IMHO would be best.
In fact, on windows you probably would want to write a small C# app so you can more easily query the architecture. The C# app could just spawn whatever executable is fastest.
Alternatively you could put your different versions of code in a dll's or shared object's, then dynamically load them based on the detected architecture. As long as they have the same call signature it should work.

If you wish this to cleanly work on Windows and take full advantage in 64bit capable platforms of the additional 1. Addressing space and 2. registers (likely of more use to you) you must have at a minimum a separate process for the 64bit ones.
You can achieve this by having a separate executable with the relevant PE64 header. Simply using CreateProcess will launch this as the relevant bitness (unless the executable launched is in some redirected location there is no need to worry about WoW64 folder redirection
Given this limitation on windows it is likely that simply 'chaining along' to the relevant executable will be the simplest option for all different options, as well as making testing an individual one simpler.
It also means you 'main' executable is free to be totally separate depending on the target operating system (as detecting the cpu/OS capabilities is, by it's nature, very OS specific) and then do most of the rest of your code as shared objects/dlls.
Also you can 'share' the same files for two different architectures if you currently do not feel that there is any point using the differing capabilities.
I would suggest that the main executable is capable of being forced into making a specific choice so you can see what happens with 'lesser' versions on a more capable machine (or what errors come up if you try something different).
Other possibilities given this model are:
Statically linking to different versions of the standard runtimes (for ones with/without thread safety) and using them appropriately if you are running without any SMP/SMT capabilities.
Detect if multiple cores are present and whether they are real or hyper threading (also whether the OS knows how the schedule effectively in those cases)
checking the performance of things like the system timer/high performance timers and using code optimized to this behaviour, say if you do anything where you look for a certain amount of time to expire and thus can know your best possible granularity.
If you wish to optimize you choice of code based on cache sizing/other load on the box. If you are using unrolled loops then more aggressive unrolling options may depend on having a certain amount level 1/2 cache.
Compiling conditionally to use doubles/floats depending on the architecture. Less important on intel hardware but if you are targetting certain ARM cpu's some have actual floating point hardware support and others require emulation. The optimal code would change heavily, even to the extent you just use conditional compilation rather than using the optimizing compiler(1).
Making use of co-processor hardware like CUDA capable graphics cards.
detect virtualization and alter behaviour (perhaps trying to avoid file system writes)
As to doing this check you have a few options, the most useful one on Intel being the the cpuid instruction.
Windows
Use someone else's implementation but you'll have to pay
Use a free open source one
Linux
Use the built in one
You could also look at open source software doing the same thing
Pixman does a fair amount of this and is a permissive licence.
Alternatively re-implement/update an existing one using available documentation on the features you need.
Quite a lot of separate documents to work out how to detect things:
Intel:
SSE 4.1/4.2
SSE3
MMX
A large part of what you would be paying for in the CPU-Z library is someone doing all this (and the nasty little issues involved) for you.
be careful with this - it is hard to beat decent optimizing compilers on this

Have a look at liboil: http://liboil.freedesktop.org/wiki/ . It can dynamically select implementations of multimedia-related computations at run-time. You may find you can liboil itself and not just its techniques.

Since you mention you are using GCC, I'll assume your code is in C (or C++).
Neil Butterworth already suggested making separate dynamic libraries, but that requires some non-trivial cross-platform considerations (manually loading dynamic libraries is different on Linux, Windows, OSX, etc., and getting it right will likely take some time).
A cheap solution is to simply write all of your variants using unique names, and use a function pointer to select the proper one at runtime.
I suspect the extra dereference caused by the function pointer will be amortized by the actual work you are doing (but you'll want to confirm that).
Also, getting different compiler optimizations will likely require different .c/.cpp files, as well as some twiddling of your build tool. But it's probably less overall work than separate libraries (which needed this already in one form or another).

Since you didn't specify whether you have limits on the number of files, I propose another solution: compile 5 executables, and then create a sixth executable that launches the appropriate binary. Here is some pseudocode, for Linux
int main(int argc, char* argv[])
{
char* target_path[MAXPATH];
char* new_argv[];
char* specific_version = determine_name_of_specific_version();
strcpy(target_path, "/usr/lib/myapp/versions");
strcat(target_path, specific_version);
/* append NULL to argv */
new_argv = malloc(sizeof(char*)*(argc+1));
memcpy(new_argv, argv, argc*sizeof(char*));
new_argv[argc] = 0;
/* optionally set new_argv[0] to target_path */
execv(target_path, new_argv);
}
On the plus side, this approach allows to provide the user transparently with both 32-bit and 64-bit binaries, unlike any library methods that have been proposed. On the minus side, there is no execv in Win32 (but a good emulation in cygwin); on Windows, you have to create a new process, rather than re-execing the current one.

Lets break the problem down to its two constituent parts. 1) Creating platform dependent optimized code and 2) building on multiple platforms.
The first problem is pretty straightforward. Encapsulate the platform dependent code in a set of functions. Create a different implementation of each function for each platform. Put each implementation in its own file or set of files. It's easiest for the build system if you put each platform's code in a separate directory.
For part two I suggest you look at Gnu Atuotools (Automake, AutoConf, and Libtool). If you've ever downloaded and built a GNU program from source code you know you have to run ./configure before running make. The purpose of the configure script is to 1) verify that your system has all of the required libraries and utilities need to build and run the program and 2) customize the Makefiles for the target platform. Autotools is the set of utilities for generating the configure script.
Using autoconf, you can create little macros to check that the machine supports all of the CPU instructions your platform dependent code needs. In most cases, the macros already exists, you just have to copy them into your autoconf script. Then, automake and autoconf can set up the Makefiles to pull in the appropriate implementation.
All this is a bit much for creating an example here. It takes a little time to learn. But the documentation is all out there. There is even a free book available online. And the process is applicable to your future projects. For multi-platform support, this is really the most robust and easiest way to go, I think. A lot of the suggestions posted in other answers are things that Autotools deals with (CPU detection, static & shared library support) without you have to think about it too much. The only wrinkle you might have to deal with is finding out if Autotools are available for MinGW. I know they are part of Cygwin if you can go that route instead.

You mentioned the Intel compiler. That is funny, because it can do something like this by default. However, there is a catch. The Intel compiler didn't insert checks for the approopriate SSE functionality. Instead, they checked if you had a particular Intel chip. There would still be a slow default case. As a result, AMD CPUs would not get suitable SSE-optimized versions. There are hacks floating around that will replace the Intel check with a proper SSE check.
The 32/64 bits difference will require two executables. Both the ELF and PE format store this information in the exectuables header. It's not too hard to start the 32 bits version by default, check if you are on a 64 bit system, and then restart the 64 bit version. But it may be easier to create an appropriate symlink at installation time.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js