C++: Does the order that obj files are linked matter? - c++

Taking the classic example of a header file and implementation file that declare and define a simple function, and a second implementation file that contains a main() that calls the function, a compiler will generate two object files.
1) When linking these files to produce an executable, does the order matter?
This question has an answer that suggests the order does not matter.
This site explicitly agrees, giving an example using GCC.
2) If the order does matter, how does an IDE like Visual Studio determine the appropriate link order?
I have distinct memories of encountering unresolved symbol errors when building with gcc/g++ and needing to alter the order of object files in the makefile to fix this. However, I may be misremembering linking library files.

The order in which object files are linked does not matter. Library order indeed does matter, and that is the responsibility of the developer.
To be honest, linkers are pretty antique. Modern languages don't have linkers, but GCC in particular goes to great lengths to stay compatible with the past.

Related

How is linking failing: Undefined reference to library

I have inherited a code project that contains several individual libraries of code that compile separately and then are linked in the compiled tools. It's supposed to be a Chinese menu of what each tool wants. This all written on Linux, in C++, with Qt. There are several issues with the current design, but I'm learning to deal.
This latest issue has me really dumbfounded. The main library is the Utilities. It contains a handful of classes and it compiles into a .a library file. Another library is our DatabaseInterfaces. It has class files that refer to header files from Utilities (they are all shared) but the CPP files are not included in DatabaseInterfaces. DatabaseInterfaces also compiles into a .a library. Finally, we have a CMDPrompt tool that imports both the Utilities.a and DatabaseInterfaces.a libraries. CMDPrompt does not compile. Instead, I get several errors indicating that I have an undefined reference for one of the objects in Utilities.
After several different attempts to fix this, I finally directly included the CPP file in the CMDPrompt.pro. It worked or at least it is now finding new undefined references for other classes in Utilities. This confirms to me that somehow the projects are not linking correctly. I would have expected that because the Utilities library is linked in I would have gotten all of the H/CPP goodness with it. I suspect the problem is that the DatabaseInterfaces library is compiling against the H files only and needs the same Utilities.a library. I tried adding that LIB into the DatabaseInterfaces.pro, but it didn't have any effect.
I am not a C++ programmer by training and while I believe I understand the main points of the linking process, I am obviously missing something. Here are my questions. Given the relationship between the different libraries, how should the linker work? Why is the DatabaseInterfaces.a compiling at all with just the H files? What is the best way to resolve this issue?
You need to link the libraries in the right order - it sounds like Utilities.a needs to be linked last. I don't know how you do that in Qt but that's what you need to achieve.

What does it mean to link against something?

I commonly hear the term "to link against a library".
I'm new to compilers and thus linking, so I would like to understand this a bit more.
What does it mean to link against a library and when would not doing so cause a problem?
A library is an "archive" that contains already compiled code. Typically, you want to use a ready-made library to use some functionality that you don't want to implement on your own (e.g. decoding JPEGs, parsing XML, providing you GUI widgets, you name it).
Typically in C and C++ using a library goes like this: you #include some headers of the library that contain the function/class declarations - i.e. they tell the compiler that the symbols you need do exist somewhere, without actually providing their code. Whenever you use them, the compiler will place in the object file a placeholder, which says that that function call is to be resolved at link time, when the rest of the object modules will be available.
Then, at the moment of linking, you have to specify the actual library where the compiled code for the functions of the library is to be found; the linker then will link this compiled code with yours and produce the final executable (or, in the case of dynamic libraries, it will add the relevant information for the loader to perform the dynamic linking at runtime).
If you don't specify that the library is to be linked against, the linker will have unresolved references - i.e. it will see that some functions were declared, you used them in your code, but their implementation is nowhere to be found; this is the cause of the infamous "undefined reference errors".
Notice that all this process is identical to what normally happens when you compile a project that is made of multiple .cpp files: each .cpp is compiled independently (knowing of the functions defined in the others only via prototypes, typically written in .h files), and at the end everything is linked together to produce the final executable.

Optimization: .cpp or .obj/.o or .lib/.a

I have this chuck of code that could be placed in a separate library but I'm unsure how that will affect the compiler's ability to optimize.
Option 1: include the code directly in the projects and compile it together with everything else.
Option 2: build the .obj/.o files and simply use them when building the projects.
Option 3: create a static library (.lib or .a) and link with that when building the projects.
Now, my question is: which of these will give the best performance? If you could discuss/explain the consequences of each of the options with regard to compiler optimization that would be super awesome!
Thanks in advance :-)
There should be no difference in performance:
An .a file is simply an archive of .o files. They are treated the same by the linker (except that .a files need to be unpacked first).
Directly compiling all sources together will still result in all compilation units be compiled separately, and subsequently linked together. It’s just that the compiler hides this and calls the linker behind your back. Nevertheless, the work is the same as when first compiling the compilation units separately and then linking them together in an explicit step.
There's no difference in the optimization a compiler can do. In every case, the object can be built with as much or as less optimization you want.
The only difference you might see, is when you build a shared library. Then you have a call overhead, which you have not, when linking the objects or a static library directly into the executable.
If by Option 1 you mean #include the code via header files, then the compiler may be able to optimise slightly better than linking multiple objects together, as in Options 2 and 3. This is because the compiler can see the entire source code, rather than just the object code, and may be able to inline functions.
There is no difference between Options 2 and 3, as an archive file - *.a - is just a collection of object files - *.o.
All this being said, The Architecture of Open Source Applications: LLVM implies that you can build LLVM IR code objects, which when linked can be optimised properly, including inlining of functions. So, if you are using clang++, this may be an option.

linking vs including a file

When working with a large codebase, I've seen when certain object is being used, that object's header file is included. At other times that object's library is linked in the make file.
What is the reason for doing one or the other. If they have access to the source code, why not include all files whose object you are using instead of linking into their lib *.a files?
edit: made it clear based on first comment. It was a confusing statement
Normally, you need to do both. Header files tell the compiler what
functions are available, and what they look like. They have to be
present when you compile. Libraries contain the implementation, and
must be linked with the application in order for the compiler generated
calls to work.
In a few rare cases, the "library" may consist of just header files; C++
still requires the implementation of a template to be present in the
header, and not in a library, so a library which consists of nothing but
templates may be header-only. In such cases, it is sufficient to
include the headers; there's nothing more to link. (Of course, such
libraries drive compile times through the roof.)
There isn't necessarily a one-to-one relationship between header files and binaries. In fact, there quite commonly isn't. For example, just because you see foo.h being included doesn't necessarily mean there's going to be a foo.obj or foo.lib. The reverse is also true; ie, you might see foo.lib being linked but there is no foo.h.
Using Windows as an example, you need quite a few header files to use anything found in kernel32.lib, but there is no kernel32.h.
One good argument for use of libraries it's that can be used more easily: to compile large codebases, all the right dependencies must be available, and there could be specific steps required, not relevant for the task at hand. And of course, the compile time it's cut away.

Why can CImg achieve this kind of effect?

The compilation is done on the fly :
only CImg functionalities really used
by your program are compiled and
appear in the compiled executable
program. This leads to very compact
code, without any unused stuffs.
Any one knows the principle?
CImg is a header-only library, and they use templates liberally, which is what they're referring to.
If they used a precompiled library of some kind (.dll/.lib/.a/.so) the library file would have to contain the entire CImg library, regardless of which bits of it you actually use.
In the case of a statically linked library (.lib or .a), the linker can then strip out unused symbols, but that may depend on optimization settings.
When the entire library is included in one or two headers, it is only actually compiled when you #include it, and so it is part of the same compilation process as the rest of your program, and the compiler can easily determine which parts of the library are used, and which ones aren't.
And because the CImg API uses templates, no code is generated for functions that are never called.
They are overselling it a bit though, because as the other answers point out, unused symbols will usually be stripped out anyway.
Sounds like fairly standard behaviour to me - a C++ linker will usually throw away any unused library references rather than including uncallable code. Similarly, an optimized build will not include uncallable code.
This sounds like MSVC's Eliminate Unreferenced Data (/OPT:REF) linker command, GCC should have something similar too this too