Optimization: .cpp or .obj/.o or .lib/.a - c++

I have this chuck of code that could be placed in a separate library but I'm unsure how that will affect the compiler's ability to optimize.
Option 1: include the code directly in the projects and compile it together with everything else.
Option 2: build the .obj/.o files and simply use them when building the projects.
Option 3: create a static library (.lib or .a) and link with that when building the projects.
Now, my question is: which of these will give the best performance? If you could discuss/explain the consequences of each of the options with regard to compiler optimization that would be super awesome!
Thanks in advance :-)

There should be no difference in performance:
An .a file is simply an archive of .o files. They are treated the same by the linker (except that .a files need to be unpacked first).
Directly compiling all sources together will still result in all compilation units be compiled separately, and subsequently linked together. It’s just that the compiler hides this and calls the linker behind your back. Nevertheless, the work is the same as when first compiling the compilation units separately and then linking them together in an explicit step.

There's no difference in the optimization a compiler can do. In every case, the object can be built with as much or as less optimization you want.
The only difference you might see, is when you build a shared library. Then you have a call overhead, which you have not, when linking the objects or a static library directly into the executable.

If by Option 1 you mean #include the code via header files, then the compiler may be able to optimise slightly better than linking multiple objects together, as in Options 2 and 3. This is because the compiler can see the entire source code, rather than just the object code, and may be able to inline functions.
There is no difference between Options 2 and 3, as an archive file - *.a - is just a collection of object files - *.o.
All this being said, The Architecture of Open Source Applications: LLVM implies that you can build LLVM IR code objects, which when linked can be optimised properly, including inlining of functions. So, if you are using clang++, this may be an option.

Related

In C or C++, does the compiler do implicit linking?

How does some std-lib, external-libs or any other pre-compiled src code such as the well-known header file <iostream> with its corresponding object file or static/dll lib get linked into my own application automatically? Does the compiler do it implicitly/under-the-hood or something like a compiler pre-linked list operation?
If such a case exist how do we use its functionality in our accord, Is there a way to put my own obj, dll, static-lib or src file into that ideal list via writing some special syntax without changing the initial directories of each of it, neither the help of an IDE config and outside-software, the goal is to drop the linking phase explicitly at terminal, want to do this configuration inside of the src-code.
Does every std-lib had a direct/inline special src-code that doing this kind of operation? If there are, then how do we take advantage of it? Or if everything is done by a compiler/handler and if it is generic-type then you could modified it with less problem but the delema is, it is fixed with the compiler and hate to forcefully modified/forked it. If there is alreadly a way to do this without explicitly tinkering it, for such doing it only at your onw src-code/write-time as a said at the first line of this block statement: "Does every std-lib/external-lib had a direct/inline special src-code that doing this kind of operation?".
// a.cpp
#include<iostream>
// there's no linking on iostream obj, src, dll, static-lib file
// love to have this kind of special features to our own none-std-lib/etc.
>c/cpp-compiler -c a.cpp
>c/cpp-compiler -o a a.o
Note: some of my terminologies are base on my own experience so watch out and be open-minded. For as I grow in the coding-community using terminology/standard way of communicating are a mess specially exploring from low to another low and to high to another high level prog-lang.
It depends on what you call "the compiler".
Most modern toolchains - including gcc, clang, Visual C++ - are based on a "compile then link" model, with several components. One of those components is the preprocessor (which does text substitution on C or C++ source code, to produce some modified source code), a "compiler" that translates preprocessed source code into object files, utility programs that produce libraries from sets of source files, a linker that produces an executable file from a set of object files and libraries, and - last but by no means least - a driver program that coordinates execution of other components.
The specifics are different between toolchains - e.g. VC++ does things quite differently than gcc/g++ or clang. The concepts are similar.
In what follows, I'll give a very over-simplistic (imprecise, details omitted) discussion of what gcc and g++ (in the gnu compiler collection) do.
When you use gcc or g++ at the command line you're actually using a driver program, that orchestrates execution of a bunch of other programs (the preprocessor, the compiler, the linker, etc). Depending on what options you provide, the result produced differs. For example, gcc -E only completes preprocessing of source files, g++ -c means the process stops after compiling source files to produce object files. If used to produce an executable, the driver program will use the linker to (well!) link object files and libraries together to produce an executable.
So, if you think of gcc or g++ (the program you execute directly) as the compiler then you could claim the compiler does implicit linking. When being used to create an executable, both execute the linker - and provide it information needed (e.g. names of libraries). gcc automatically links in libraries needed by C programs (e.g. the C standard library) while g++ automatically links in libraries needed by C++ programs (e.g. parts of the C++ standard library as well as the C standard library).
However, if you take a narrower view of the compiler - it is the program that only translates source files into object files - then there is no implicit linking of libraries by the compiler. It is the driver program that orchestrates compiling and linking, not the compiler that orchestrates linking.
If you read documentation for your favourite toolchain, it will describe various means (extensions of source files, settings, command line options, values of environment variables, etc) to control what it does. There is typically flexibility to do preprocessing only, compilation only, output assembler, linking only, or a complete "compile multiple source files then link them together to produce an executable" process.
The linker search libraries in some oreder in which the standard libs folder is searched first.
There are somed default libraries that gets loaded by default like glibc.
this way you dont need to specify to the linker to link with standard libs.
Gcc even have flags for not linking with some standard libs
https://docs.oracle.com/cd/E19205-01/819-5262/auto29/index.html
Note that while it is not standard Microsoft's Visual C++ has a #pragma based language extension that allows specifying files to link in the source:
#pragma comment(lib, "yourfile.lib") // or yourfile.obj
The comment pragma can also be used to specify a few other linker command line options, for example:
#pragma comment(linker,"\"/manifestdependency:type='win32' name='Microsoft.Windows.Common-Controls' version='6.0.0.0' processorArchitecture='' publicKeyToken='6595b64144ccf1df' language=''"")
Note that the list of linker options that can be specified this way is fairly limited and that while there are a few other legal 'comment' types only lib and linker really have meaning.
In C or C++, does the compiler do implicit linking?
As the "compiler" (understood as the whole group of tools in the chain that generate the final executable) has whole control over creating that final executable, it does everything related to every stage of compilation, including implicit linking.
How does some std-lib, external-libs or any other pre-compiled src code such as the well-known header file with its corresponding object file or static/dll lib get linked into my own application automatically?
The same as any other library is linked - linker searches the library for symbols and uses them.
Does the compiler do it implicitly/under-the-hood or something like a compiler pre-linked list operation?
Yes (for the compilers I worked with).
But it's very specific to the compiler. From the point of C++ language, there is no requirement on compiler command line options. If the compiler -needs-this-option-to-link-with-standard-library, it's fine and specific to that compiler. It's a quality of implementation issue. Surely users would want some things to be done implicitly with sane defaults for that compiler.
how do we use its functionality in our accord, Is there a way to put my own obj/dll/static-lib/src file into that ideal list via writing some special syntax without changing the initial directories of each of i
Because the compiler does it implicitly, you have to modify the compiler. That strongly depends on the compiler, and specific system and specific compiler own very specific configuration and build settings.
For example on Linux with gcc you can use the method in Enable AddressSanitizer by default in gcc . You can also use the method in Custom gcc preprocessor but overwrite collect2 stage.

C++: Does the order that obj files are linked matter?

Taking the classic example of a header file and implementation file that declare and define a simple function, and a second implementation file that contains a main() that calls the function, a compiler will generate two object files.
1) When linking these files to produce an executable, does the order matter?
This question has an answer that suggests the order does not matter.
This site explicitly agrees, giving an example using GCC.
2) If the order does matter, how does an IDE like Visual Studio determine the appropriate link order?
I have distinct memories of encountering unresolved symbol errors when building with gcc/g++ and needing to alter the order of object files in the makefile to fix this. However, I may be misremembering linking library files.
The order in which object files are linked does not matter. Library order indeed does matter, and that is the responsibility of the developer.
To be honest, linkers are pretty antique. Modern languages don't have linkers, but GCC in particular goes to great lengths to stay compatible with the past.

make SCons compile everything in one gcc line?

I have a rather complex SCons script that compiles a big C++ project.
This gcc manual page says:
The compiler performs optimization based on the knowledge it has of the program. Compiling multiple files at once to a single output file mode allows the compiler to use information gained from all of the files when compiling each of them.
So it's better to give all my files to a single g++ invocation and let it drive the compilation however it pleases.
But SCons does not do this. it calls g++ separately for every single C++ file in the project and then links them using ld
Is there a way to make SCons do this?
The main reason to have a build system with the ability to express dependencies is to support some kind of conditional/incremental build. Otherwise you might as well just use a script with the one command you need.
That being said, the result of having gcc/g++ optimize as the manual describe is substantial. In particular if you have C++ templates you use often. Good for run-time performance, bad for recompile performance.
I suggest you try and make your own builder doing what you need. Here is another question with an inspirational answer: SCons custom builder - build with multiple files and output one file
Currently the answer is no.
Logic similar to this was developed for MSVC only.
You can see this in the man page (http://scons.org/doc/production/HTML/scons-man.html) as follows:
MSVC_BATCH When set to any true value, specifies that SCons should
batch compilation of object files when calling the Microsoft Visual
C/C++ compiler. All compilations of source files from the same source
directory that generate target files in a same output directory and
were configured in SCons using the same construction environment will
be built in a single call to the compiler. Only source files that have
changed since their object files were built will be passed to each
compiler invocation (via the $CHANGED_SOURCES construction variable).
Any compilations where the object (target) file base name (minus the
.obj) does not match the source file base name will be compiled
separately.
As always patches are welcome to add this in a more general fashion.
In general this should be left up to the program developer. Trying to compile all together in an amalgamation may introduce unintended behaviour to the program if it even compiles in the first place. Your best bet if you want this kind of optimisation without editing the source yourself is to use a compiler with inter-process optimisation like icc -ipo.
Example where an amalgamation of two .c files would not compile is for example if they use two identical static symbols with different functionality.

Visual Studio: What exactly are lib files (used for)?

I'm learning C++ and came across those *.lib files that are obviously used by the linker. I had to set some additional dependencies for OpenGL.
What exactly are library files in this context used for?
What are their contents?
How are they generated?
Is there anything else worth knowing about them?
Or are they just nothing more than relocateable object code similiar to *.obj files?
In simple terms, yes - .lib files are just a collection of .obj files.
There is a slight complication on Windows that you can have two classes of lib files.
Static lib files essentially contain a collection of .obj and are linked with your program to provide all the functions inside the .lib. They are mainly a convenience to save you having as many files to deal with.
There are also stub .lib which provide just the definitions of functions which are contained in a .dll file.
The .lib file is used at compile time to tell the compiler what to expect from the function, but the code is loaded at run time from the dll.
.lib files are "libraries" and contain "collections" of compiled code so-to-speak. So it is a way to provide software components, without giving away the internal source-code for example. They can be generated as "output" of a "build" just like executables are.
The specific contents depend on your platform / development environment, but they will contain symbols for the linker to "hook up" function-calls provided by e.g. the header-file of the library.
Some libraries are "dynamic" (.DLL's on Windows), which means the "hooking" of function-calls is setup when the executable using the library is loaded, allowing the library implementation to be changed without rebuilding the executable.
One last thing. You say you're learning C++, and a common confusing point is, that "symbols" generated by C++ compilers are "mangled" (in order to allow e.g. function overloading), and this "mangling" is not standardized across different compilers, so libraries often resort to C for the "API" of the library (just like OpenGL), even though the library may be implemented in C++ internally.
I hope shed some light on .lib-files. Happy OpenGL coding :-)
What exactly are library files in this
context used for?
They are compiled and linked code just like your executable. They're called static libraries that other programs can link to at compile time. In the case of OpenGL, you link to their libraries to build an executable that can run OpenGL code. Dynamic libraries (DLLs) are another type of library that executables link against, except at runtime.
What are their contents?
Static libs contain linked object code just like an exe. The *.obj files are the object code that the compiler generates for the linker.
How are they generated?
When the compiler creates the object files, it passes the work to the linker. You can create them in your development environment, just like executables.
Is there anything else worth knowing
about them?
Yeah, they're used everywhere, so it doesn't hurt to get used to them.

why are my visual studio .obj files are massive in size compared to the output .exe?

As a background, I am a developer of an opensource project, a c++ library called openframeworks, that is a wrapper for different libraries, like opengl, quicktime, freeImage, etc. In the next release, we've added a c++ library called POCO, which is similar to boost in some ways in that it's an alternative for java foundation library type functionality.
I've just noticed, that in this latest release where I've added the POCO library as a statically linked library, the .obj files that are produced during the act of compilation are really massive - for example, several .obj files for really small .cpp files are 2mb each. The overall compiled .obj files are about 12mb or so. On the flip side, the exes that are produced are small - 300k to 1mb.
In comparison, the same library compiled in code::blocks produces .obj files that are roughly the same size at the exe - they are all fairly small.
Is there something happening with linking, and the .obj process in visual studio that I don't understand? for example, is it doing some kind of smart prelinking, or other thing, that's adding to the .obj size? I've experimented a bit with settings, such as incremental linking, etc, and not seen any changes.
thanks in advance for any ideas to try or insights !
-zach
note: thanks much! I just tried, dumpbin, which says "anonymous object" and doesn't return info about the object. this might be the reason why....
note 2, after checking out the above link, removing LTCG (link time code generation - /GL) the .obj files are much smaller and dumpbin understands them. thanks again !!
I am not a Visual Studio expert by any stretch of imagination, having hardly used it, but I believe Visual Studio employs link-time optimizations, which can make the resulting code run faster, but can cost a lot of space in the libraries. Also, it may be (I don't know the internals) that debugging information isn't stripped until the actual linking phase.
I'm sure someone's going to come with a better/more detailed answer anyway.
Possibly the difference is debug information.
The compiler outputs the debug information into the .obj, but the linker does not put that data into the .exe or .dll. It is either discarded or put into a .pdb.
In any case use the Visual Studio DUMPBIN utility on the .obj files to see what's in them.
Object files need to contain sufficient information for linking. In C++, this is name-based. Two object files refer to the same object (data/function/class) if they use the same name. This implies that all object files must contain names for all objects that might be referenced by other object files. The executable however will need the names visible from outside the library. In case of a DLL, this means only the names exported. The saving is twofold: there are less names, and those names are present only once in the DLL.
Modern C++ libraries will use namespaces. These namespaces mean that object names become longer, as they include the names of the encapsulating namespaces too.
The compiled library obj files will be huge because they must contain all of the functions, classes and template that your end users might eventually use.
Executables which link to your library will be smaller because they will include only the compiled code that they require to run. This will usually be a tiny subset of the library.