g++ linking order dependency when linking c code to c++ code - c++

Prior to today I had always believed that the order that objects and libraries were passed to g++ during the linking stage was unimportant. Then, today, I tried to link from c++ code to c code. I wrapped all the C headers in an extern "C" block but the linker still had difficulties finding symbols which I knew were in the C object archives.
Perplexed, I created a relatively simple example to isolate the linking error but much to my surprise, the simpler example linked without any problems.
After a little trial and error, I found that by emulating the linking pattern used in the simple example, I could get the main code to link OK. The pattern was object code first, object archives second eg:
g++ -o serverCpp serverCpp.o algoC.o libcrypto.a
Can anyone shed some light on why this might be so?. I've never seen this problem when linking ordinary c++ code.

The order you specify object files and libraries is VERY important in GCC - if you haven't been bitten by this before you have lead a charmed life. The linker searches symbols in the order that they appear, so if you have a source file that contains a call to a library function, you need to put it before the library, or the linker won't know that it has to resolve it. Complex use of libraries can mean that you have to specify the library more than once, which is a royal pain to get right.

The library order pass to gcc/g++ does actually matter. If A depends on B, A must be listed first. The reason is that it optimizes out symbols that aren't referenced, so if it sees library B first, and no one has referenced it at that point then it won't link in anything from it at all.

A static library is a collection of object files grouped into an archive. When linking against it, the linker only chooses the objects it needs to resolve any currently undefined symbols. Since the objects are linked in order given on the command line, objects from the library will only be included if the library comes after all the objects that depend on it.
So the link order is very important; if you're going to use static libraries, then you need to be careful to keep track of dependencies, and don't introduce cyclic dependencies between libraries.

You can use --start-group archives --end-group
and write the 2 dependent libraries instead of archives:
gcc main.o -L. -Wl,--start-group -lobj_A -lobj_b -Wl,--end-group

Related

Is it possible to artificially induce object file extraction for a given static library?

I was recently reading this answer and noticed that it seems inconvenient for users to have to link static libraries in the correct order.
Is there some flag or #pragma I can pass to gcc when compiling my library so that my library's object files will always be included?
To be more specific, I want it to be case that the user can link my static library before another library that depends on it, and not wind up with unresolved symbols, in the same way that object files which are explicitly specified on the linker line are.
In particular, I am looking for solutions where the user of the library does not need to do anything special, and merely needs to pass -lMylibrary the way they would link any other library.
Is there some flag or #pragma I can pass to gcc
No.
I want it to be case that the user can link my static library before another library that depends on it, and not wind up with unresolved symbols, in the same way that object files which are explicitly specified on the linker line are.
Ship your "library" as a single object file. In other words, instead of:
ar ru libMyLibrary.a ${OBJS}
use:
ld -r -o libMyLibrary.a ${OBJS}
In particular, I am looking for solutions where the user of the library does not need to do anything special, and merely needs to pass -lMylibrary the way they would link any other library.
You can name your object file libMyLibrary.a. I believe the linker will search for it using the usual rules, but when it finds it, it will discover that this is an object file, and treat it as such, despite it being "misnamed". This should work at least on Linux and other ELF platforms. I am not sure whether it will work on Windows.

Why does the order of passing parameters to g++ matter

Recently, I was trying to build an application, which uses some libraries, available in form of shared object files. I wasted lot of time in compiling the CPP code and it didn't work.
Below is the command, previously I was trying to compile the code-
g++ -I/opt/ros/indigo/include/ -I/usr/include/eigen3/ -L/opt/ros/indigo/lib/ -lorocos-kdl -lkdl_parser test.cpp -o test
The above command always shows many undefined references errors. Just for the curiosity, I changed the order of parameters. Below is the command, which is working-
g++ -L/opt/ros/indigo/lib -I/opt/ros/indigo/include -I/usr/include/eigen3 test.cpp -lorocos-kdl -lkdl_parser -o test
I posted the complete code and solution here.
My question is why does the order of passing parameters to g++ matter? Is there any alternative to avoid such problems in future?
Generally the order of arguments doesn't matter, but there are of course exceptions. For example if you provide multiple -O flags it will be the last one that is used, the same for other flags.
Libraries are a little different though, because for them the order is significant. If object file or library A depends on library B, then A must come before B on the command line. This is because of how the linker scans for symbols: When you use a library the linker will check if there are any symbols that could be resolved. Once this scan is over the library is discarded and will not be searched again.
This means when you have -lorocos-kdl -lkdl_parser test.cpp the linker will scan the libraries orocos-kdl and kdl_parser first, notice that there aren't dependencies on these library, no symbols from the libraries are needed, and continue with the object file generated by the source file.
When you change the order to test.cpp -lorocos-kdl -lkdl_parser the linker will be able to resolve the undefined symbols referenced by test.cpp when it comes to the libraries.
You can (at least in some versions of gcc) use parenthesis around the libraries if you don't want to care about the order.
See:
Why does the order in which libraries are linked sometimes cause errors in GCC?
Specifically:
If a static library depends on another library, but the other library
again depends on the former library, there is a cycle. You can resolve
this by enclosing the cycling dependent libraries by -( and -), such
as -( -la -lb -) (you may need to escape the parens, such as -( and
-)). The linker then searches those enclosed lib multiple times to ensure cycling dependencies are resolved.

Linking a library twice and size of executable

When compiling a program with static libraries, it was suggested to me from many sources (including SO community) to include the library twice.
As in:
gcc main.c -lslA -lslB -lslC -lslA -lslB -o final
Does this result in a bigger executable (.i.e. is the linker smart enough to avoid double inclusion?).
Is this (multiple inclusion) the proper solution or a workaround (.i.e. will there always exist a more proper, even if harder way to handle it)
The only reason to include the library multiple times is, for example, if slA requires a symbol resolved by slB but slB requires a symbol required by slA. The linker does a single pass to resolve symbols, but repeating your library causes, in effect, a second pass on that library. It won't change the size of your output, but it's not necessary either:
Instead of presenting your libraries multiple times, you can tell the gcc linker to group certain libraries together -- letting it do what it needs to resolve the symbols within that group. For example:
gcc main.c -Wl,--start-group -lslA -lslB -lslC -Wl,--end-group -o final

How to make gcc/ld iterate over many '-l library' when using -static?

I want to compile statically pdf2svg so I will be able to use newest version in stable Debian. The ./configure doesn't give --enable-static option so I added manually in Makefile -static option for linker.
Unfortunately the result wasn't quite as I suspected. The linking gave me enormous amounts of undefined reference errors. After some googling I figured out that the problem is caused by wrong order of -lsome_lib. Gcc linker tries to statically link in each library once, when it first sees it - info and Stackoverflow question: Why does the order in which libraries are linked sometimes cause errors in GCC?.
Is there a possibility of making linker make multiple passes through the list of libraries?
Maybe this is what you search for (from gnu ld manpage):
-( archives -)
--start-group archives --end-group
The archives should be a list of archive files. They may be either explicit file names, or -l options.
The specified archives are searched repeatedly until no new undefined references are created. Normally, an archive is searched only once in the order that it is
specified on the command line. If a symbol in that archive is needed to resolve an undefined symbol referred to by an object in an archive that appears later on
the command line, the linker would not be able to resolve that reference. By grouping the archives, they all be searched repeatedly until all possible references
are resolved.
Using this option has a significant performance cost. It is best to use it only when there are unavoidable circular references between two or more archives.
A tick is, whenever possible, to add a static reference to an object of the class (or to the function) that were not linked in another cpp file of the same library (or in another library already used).
I have this situation:
library A with class clsA in clsA.cpp that gives the error
library A with foo.cpp that gives no reference errors
library B that uses class clsA
Application uses both libraries and uses classes/functions from foo.cpp
I get the unresolved reference in Application while using the object in library B that uses the clsA class.
Linking Application with library A and B give me the error. Since i use CodeLite, it's hard to change library order. I simply put a static object in foo.cpp:
#include "clsA.h"
clsA objA;
The linker now see that clsA are referenced in library A (between foo.cpp) and will link correctly in application because foo.cpp were already linked.
But the trick works even if the object were created in a dummy function, never called, so the object would never been allocated:
// foo.cpp
#include "clsA.h"
void dummyf()
{
clsA objA;
}

Shared libraries and .h files

I have some doubt about how do programs use shared library.
When I build a shared library ( with -shared -fPIC switches) I make some functions available from an external program.
Usually I do a dlopen() to load the library and then dlsym() to link the said functions to some function pointers.
This approach does not involve including any .h file.
Is there a way to avoid doing dlopen() & dlsym() and just including the .h of the shared library?
I guess this may be how c++ programs uses code stored in system shared library. ie just including stdlib.h etc.
Nick, I think all the other answers are actually answering your question, which is how you link libraries, but the way you phrase your question suggests you have a misunderstanding of the difference between headers files and libraries. They are not the same. You need both, and they are not doing the same thing.
Building an executable has two main phases, compilation (which turns your source into an intermediate form, containing executable binary instructions, but is not a runnable program), and linking (which combines these intermediate files into a single running executable or library).
When you do gcc -c program.c, you are compiling, and you generate program.o. This step is where headers matter. You need to #include <stdlib.h> in program.c to (for example) use malloc and free. (Similarly you need #include <dlfcn.h> for dlopen and dlsym.) If you don't do that the compiler will complain that it doesn't know what those names are, and halt with an error. But if you do #include the header the compiler does not insert the code for the function you call into program.o. It merely inserts a reference to them. The reason is to avoid duplication of code: The code is only going to need to be accessed once by every part of your program, so if you needed further files (module1.c, module2.c and so on), even if they all used malloc you would merely end up with many references to a single copy of malloc. That single copy is present in the standard library in either it's shared or static form (libc.so or libc.a) but these are not referenced in your source, and the compiler is not aware of them.
The linker is. In the linking phase you do gcc -o program program.o. The linker will then search all libraries you pass it on the command line and find the single definition of all functions you've called which are not defined in your own code. That is what the -l does (as the others have explained): tell the linker the list of libraries you need to use. Their names often have little to do with the headers you used in the previous step. For example to get use of dlsym you need libdl.so or libdl.a, so your command-line would be gcc -o program program.o -ldl. To use malloc or most of the functions in the std*.h headers you need libc, but because that library is used by every C program it is automatically linked (as if you had done -lc).
Sorry if I'm going into a lot of detail but if you don't know the difference you will want to. It's very hard to make sense of how C compilation works if you don't.
One last thing: dlopen and dlsym are not the normal method of linking. They are used for special cases where you want to dynamically determine what behavior you want based on information that is, for whatever reason, only available at runtime. If you know what functions you want to call at compile time (true in 99% of the cases) you do not need to use the dl* functions.
You can link shared libraries like static one. They are then searched for when launching the program. As a matter of fact, by default -lXXX will prefer libXXX.so to libXXX.a.
You need to give the linker the proper instructions to link your shared library.
The shared library names are like libNAME.so, so for linking you should use -lNAME
Call it libmysharedlib.so and then link your main program as:
gcc -o myprogram myprogram.c -lmysharedlib
If you use CMake to build your project, you can use
TARGET_LINK_LIBRARIES(targetname libraryname)
As in:
TARGET_LINK_LIBRARIES(myprogram mylibrary)
To create the library "mylibrary", you can use
ADD_LIBRARY(targetname sourceslist)
As in:
ADD_LIBRARY(mylibrary ${mylibrary_SRCS})
Additionally, this method is cross-platform (whereas simply passing flags to gcc is not).
Shared libraries (.so) are object files where the actual source code of function/class/... are stored (in binary)
Header files (.h) are files indicating (the reference) where the compiler can find function/class/... (in .so) that are required by the main code
Therefore, you need both of them.