Why doesn't the linker complain of duplicate symbols? - c++

I have a dummy.hpp
#ifndef DUMMY
#define DUMMY
void dummy();
#endif
and a dummy.cpp
#include <iostream>
void dummy() {
std::cerr << "dummy" << std::endl;
}
and a main.cpp which use dummy()
#include "dummy.hpp"
int main(){
dummy();
return 0;
}
Then I compiled dummy.cpp to three libraries, libdummy1.a, libdummy2.a, libdummy.so:
g++ -c -fPIC dummy.cpp
ar rvs libdummy1.a dummy.o
ar rvs libdummy2.a dummy.o
g++ -shared -fPIC -o libdummy.so dummy.cpp
When I try compile main and link the dummy libs
g++ -o main main.cpp -L. -ldummy1 -ldummy2
There is no duplicate symbol error produced by linker. Why does this happen when I link two identical libraries statically?
When I try
g++ -o main main.cpp -L. -ldummy1 -ldummy
There is also no duplicate symbol error, Why?
The loader seems always to choose dynamic libs and not the code compiled in the .o files.
Does it mean the same symbol is always loaded from the .so file if it is both in a .a and a .so file?
Does it mean symbols in the static symbol table in static library never conflict with those in the dynamic symbol table in a .so file?

There's no error in either Scenario 1 (dual static libraries) or Scenario 2 (static and shared libraries) because the linker takes the first object file from a static library, or the first shared library, that it encounters that provides a definition of a symbol it has not yet got a definition for. It simply ignores any later definitions of the same symbol because it already has a good one. In general, the linker only takes what it needs from a library. With static libraries, that's strictly true. With shared libraries, all the symbols in the shared library are available if it satisfied any missing symbol; with some linkers, the symbols of the shared library may be available regardless, but other versions only record the use a shared library if that shared library provides at least one definition.
It's also why you need to link libraries after object files. You could add dummy.o to your linking commands and as long as that appears before the libraries, there'll be no trouble. Add the dummy.o file after libraries and you'll get doubly-defined symbol errors.
The only time you run into problems with this double definitions is if there's an object file in Library 1 that defines both dummy and extra, and there's an object file in Library 2 that defines both dummy and alternative, and the code needs the definitions of both extra and alternative — then you have duplicate definitions of dummy that cause trouble. Indeed, the object files could be in a single library and would cause trouble.
Consider:
/* file1.h */
extern void dummy();
extern int extra(int);
/* file1.cpp */
#include "file1.h"
#include <iostream>
void dummy() { std::cerr << "dummy() from " << __FILE__ << '\n'; }
int extra(int i) { return i + 37; }
/* file2.h */
extern void dummy();
extern int alternative(int);
/* file2.cpp */
#include "file2.h"
#include <iostream>
void dummy() { std::cerr << "dummy() from " << __FILE__ << '\n'; }
int alternative(int i) { return -i; }
/* main.cpp */
#include "file1.h"
#include "file2.h"
int main()
{
return extra(alternative(54));
}
You won't be able to link the object files from the three source files shown because of the double-definition of dummy, even though the main code does not call dummy().
Regarding:
The loader seems always to choose dynamic libs and not compiled in the .o files.
No; the linker always attempts to load object files unconditionally. It scans libraries as it encounters them on the command line, collecting definitions it needs. If the object files precede the libraries, there's not a problem unless two of the object files define the same symbol (does 'one definition rule' ring any bells?). If some of the object files follow libaries, you can run into conflicts if libraries define symbols that the later object files define. Note that when it starts out, the linker is looking for a definition of main. It collects the defined symbols and referenced symbols from each object file it is told about, and keeps adding code (from libraries) until all the referenced symbols are defined.
Does it means the same symbol is always loaded from .so file, if it is both in .a and .so file?
No; it depends which was encountered first. If the .a was encountered first, the .o file is effectively copied from the library into the executable (and the symbol in the shared library is ignored because there's already a definition for it in the executable). If the .so was encountered first, the definition in the .a is ignored because the linker is no longer looking for a definition of that symbol — it's already got one.
Does it mean that symbols in static symbol table in a static library are never in conflict with those in dynamic symbol table in .so file?
You can have conflicts, but the first definition encountered resolves the symbol for the linker. It only runs into conflicts if the code that satisfies the reference causes a conflict by defining other symbols that are needed.
If I link 2 shared libs, can I get conflicts and the link phase failed?
As I noted in a comment:
My immediate reaction is "Yes, you can". It would depend on the content of the two shared libraries, but you could run into problems, I believe. […cogitation…] How would you show this problem? … It's not as easy as it seems at first sight. What is required to demonstrate such a problem? … Or am I overthinking this? … […time to go play with some sample code…]
After some experimentation, my provisional, empirical answer is "No, you can't" (or "No, on at least some systems, you don't run into a conflict"). I'm glad I prevaricated.
Taking the code shown above (2 headers, 3 source files), and running with GCC 5.3.0 on Mac OS X 10.10.5 (Yosemite), I can run:
$ g++ -O -c main.cpp
$ g++ -O -c file1.cpp
$ g++ -O -c file2.cpp
$ g++ -shared -o libfile2.so file2.o
$ g++ -shared -o libfile1.so file1.o
$ g++ -o test2 main.o -L. -lfile1 -lfile2
$ ./test2
$ echo $?
239
$ otool -L test2
test2:
libfile2.so (compatibility version 0.0.0, current version 0.0.0)
libfile1.so (compatibility version 0.0.0, current version 0.0.0)
/opt/gcc/v5.3.0/lib/libstdc++.6.dylib (compatibility version 7.0.0, current version 7.21.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1213.0.0)
/opt/gcc/v5.3.0/lib/libgcc_s.1.dylib (compatibility version 1.0.0, current version 1.0.0)
$
It is aconventional to use .so as the extension on Mac OS X (it's usually .dylib), but it seems to work.
Then I revised the code in the .cpp files so that extra() calls dummy() before the return, and so does alternative() and main(). After recompiling and rebuilding the shared libraries, I ran the programs. The first line of output is from the dummy() called by main(). Then you get the other two lines produced by alternative() and extra() in that order because the calling sequence for return extra(alternative(54)); demands that.
$ g++ -o test2 main.o -L. -lfile1 -lfile2
$ ./test2
dummy() from file1.cpp
dummy() from file2.cpp
dummy() from file1.cpp
$ g++ -o test2 main.o -L. -lfile2 -lfile1
$ ./test2
dummy() from file2.cpp
dummy() from file2.cpp
dummy() from file1.cpp
$
Note that the function called by main() is the first one that appears in the libraries it is linked with. But (on Mac OS X 10.10.5 at least) the linker does not run into a conflict. Note, though, that the code in each shared object calls 'its own' version of dummy() — there is disagreement between the two shared libraries about which function is dummy(). (It would be interesting to have the dummy() function in separate object files in the shared libraries; then which version of dummy() gets called?) But in the extremely simple scenario shown, the main() function manages to call just one of the dummy() functions. (Note that I'd not be surprised to find differences between platforms for this behaviour. I've identified where I tested the code. Please let me know if you find different behaviour on some platform.)

Related

Difference between linking libfdisk.a directly and through -lfdisk

I've made program, which formats storage devices. However, when I've created library (for python GUI) based on this program it starts to show the error:
/usr/bin/ld: fdisk/libfdisk.a(la-label.o): relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making a shared object; recompile with -fPIC
libfdisk.a, which I use, has been built from source util-linux-2.35.
And when -lfdisk is used instead of libfdisk.a, it compiles with no errors.
Compiles with errors:
g++ file1.cpp file2.cpp ... -o package.name ... libfdisk.a
Compiles correctly:
g++ file1.cpp file2.cpp ... -o package.name ... -lfdisk
What the difference between these 2 ways?
But there is another, optional, question about fdisk. When I compile my program (not library) with -lfdisk, program can not create 2 partitions due to error 28 (returns from fdisk_add_partition(...)).
I'll share the code if it needs.
With -lfdisk the linker is asked to figure out which library file exactly to use.
The usual linkers on Linux will prefix lib and then search for files with .so or .a ending in the library path, which because you didn't specify any, will be the system library path (probably /usr/lib/ or similar). If a .so is found, it will be preferred for linking if a static link wasn't requested.
Your other method will explicitly add in the file named libfdisk.a in the current directory. That is a static library, not a shared one, and if you try to build a shared library from it, then you need to have compiled libfdisk.a with -fPIC or if you try to build a PIE executable at least with -fPIE. If you are trying to build a non-PIE executable, then neither flag is required. GCC may be configured to build PIE by default (as a hardening measure).
So you are probably linking two completely different files.

GNU linker: Adapt to change of name mangling algorithm

I am trying to re-compile an existing C++ application.
Unfortunately, I must rely on a proprietary library I only have a pre-compiled static archive of.
I use g++ version 7.3.0 and ld version 2.30.
Whatever GCC version it was compiled with, it is ancient.
The header file defines the method:
class foo {
int bar(int & i);
}
As nm lib.a shows, the library archive contains the corresponding exported function:
T bar__4fooRi
nm app.o shows my recent compiler employing a different kind of name mangling:
U _ZN4foo9barERi
Hence the linker cannot resolve the symbols provided by the library.
Is there any option to chose the name mangling algorithm?
Can I introduce a map or define the mangled names explicitly?
#Botje's suggestion lead me to writing a linker script like this (the spaces in the PROVIDE stanza are significant):
EXTERN(bar__4fooRi);
PROVIDE(_ZN4foo9barERi = bar__4fooRi);
As far as I understood, this will regard bar__4fooRi as an externally defined symbol (which it is). If _ZN4foo9barERi is searched for, but not defined, bar__4fooRi will take its place.
I am calling the linker from the GNU toolchain like this (mind the order – the script needs to be after the dependant object but before the defining library):
g++ -o application application.o script.ld -lfoo
It looks like this could work.
At least in theory.
The linker now regards other parts of the library, which in turn depends on other unresolvable symbols including (but not limited to) __throw, __cp_pop_exception, and __builtin_delete. I have no idea where these functions are defined nowadays. Joxean Koret shows some locations in this blog post based on guesswork (__builtin_new probably is malloc) – but I am not that confident.
These findings lead me to the conclusion that the library relies on a different style of exception handling and probably memory management, too.
EDIT: The result may be purely academical due to ABI changes as pointed out by #eukaryota, a linker script can indeed be used to "alias" symbols. Here is a complete minimal example:
foo.h:
class Foo {
public:
int bar(int);
};
foo.cpp:
#include "foo.h"
int Foo::bar(int i) {
return i+21;
}
main.cpp:
class Foo {
public:
int baa(int); // use in-place "header" to simulate different name mangling algorithm
};
int main(int, char**) {
Foo f;
return f.baa(21);
}
script.ld:
EXTERN(_ZN3Foo3barEi);
PROVIDE(_ZN3Foo3baaEi = _ZN3Foo3barEi); /* declare "alias" */
Build process:
g++ -o libfoo.o -c foo.c
ar rvs libfoo.a libfoo.o # simulate building a library
g++ -o app main.o -L. script.ld -lfoo
app is compiled, can be executed and returns expected result.

How do I correctly link in static libraries with g++?

I have a rather complex build I'm trying to do, but I'll simplify it a little bit for this question. I have three c++ files (main.cpp file2.cpp and file3.cpp) that I am trying to compile and link against 3 static libs (libx.a liby.z libz.a) to produce an executable.
There are many dependencies involved.
All three c files are dependent on all 3 libs. libx is dependent on liby and libz. And finally, libx is also dependent on several callback functions contained in file2.cpp.
What command line would build this correctly? I have tried dozens of variations and nothing has satisfied the linker yet.
If it matters, the libs are pure c code compiled with gcc. Sources are c++ and I'm compiling/linking with g++. I have this working correctly as a visual studio project, and am trying to port to linux.
From your post:
g++ main.cpp file2.cpp file3.cpp -lx -ly -lz
However, if static linking is causing you problems, or you need to distribute any of the libs, then you may consider making them shared objects (.so files, commonly called DSOs). In that case, when you build libx.a, for example, compile all the sources to object files, and then combine them with
g++ -shared *.o -o libx.so -ly -lz
(this version assumes that liby.a and libz.a are stills static, and will be combined into libx.so
You may need to use extern "C" { } in your .cpp files to include the header for the C libs.
See Including C Headers in C++ in How To Mix C and C++.

linking a self-registering, abstract factory

I've been working with and testing a self-registering, abstract factory based upon the one described here:
https://stackoverflow.com/a/582456
In all my test cases, it works like a charm, and provides the features and reuse I wanted.
Linking in this factory in my project using cmake has been quite tricky (though it seems to be more of an ar problem).
I have the identical base.hpp, derivedb.hpp/cpp, and an equivalent deriveda.hpp/cpp to the example linked. In main, I simply instantiate the factory and call createInstance() twice, once each with "DerivedA" and "DerivedB".
The executable created by the line:
g++ -o testFactory main.cpp derivedb.o deriveda.o
works as expected. Moving my derived classes into a library (using cmake, but I have tested this with ar alone as well) and then linking fails:
ar cr libbase.a deriveda.o derivedb.o
g++ -o testFactory libbase.a main.cpp
only calls the first static instantiation (from derivedA.cpp) and never the second static instantiation, i.e.
// deriveda.cpp (if listed first in the "ar" line, this gets called)
DerivedRegister<DerivedA> DerivedA::reg("DerivedA");
// derivedb.cpp (if listed second in the "ar" line, this does not get called)
DerivedRegister<DerivedB> DerivedB::reg("DerivedB");
Note that swapping the two in the ar line calls only the derivedb.cpp static instantiation, and not the deriveda.cpp instantiation.
Am I missing something with ar or static libraries that somehow do not play nice with static variables in C++?
Contrary to intuition, including an archive in a link command is not the same as including all of the objects files that are in the archive. Only those object files within the archive necessary to resolve undefined symbols are included. This is a good thing if you consider that once there was no dynamic linking and otherwise the entirety of any libraries (think the C library) would be duplicated into each executable. Here's what the ld(1) manpage (GNU ld on linux) has to say:
The linker will search an archive only once, at the location where it is specified on the command line. If the archive defines a symbol which was undefined in some object which appeared before the archive on the command line, the linker will include the appropriate file(s) from the archive. However, an undefined symbol in an object appearing later on the command line will not cause the linker to search the archive again.
Unfortunately there's no standard way to include every member of an archive in the linked executable. On linux you can use g++ -Wl,-whole-archive and on Mac OS X you can use g++ -all_load.
So with GNU binutils ld, the link command should be
g++ -o testFactory -Wl,-whole-archive libbase.a -Wl,-no-whole-archive main.cpp
the -Wl,-no-whole-archive ensures that any archive appearing later in the final link command generated by g++ will be linked in the normal way.

Using created DLL-file (call function from application)

I have got a dummy question. I have three C files: f1.c f2.c f3.c, which contain:
// f1.c
int f1()
{
return 2;
}
// f2.c
int f2()
{
return 4;
}
// f3.c
int f3()
{
return 10;
}
I have already 3 object file, when i run the following command (i use mingc under windows7):
gcc -c f1.c f2.c f3.c
and i create dll file:
gcc f1.o f2.o f3.o -o test1.dll -shared
using DLL Export Viewer i have opened this file:
How I can use this file in my application (crossplatform)? How I can call functrion f1, f2, f3 ?
Sorry for my bad English
Assuming you have the header, the only thing that you are missing is the A archive, which the linker needs in order to figure out what a DLL provides.
Change this:
gcc f1.o f2.o f3.o -o test1.dll -shared
To this:
gcc f1.o f2.o f3.o -o test1.dll -shared -Wl,--out-implib,libtest1.a
Then, to link with the shared library, pass -ltest1 to gcc.
You will still need to recompile the library for each platform and architecture (x86, AMD64, IA64, etc.). Windows uses DLLs, but Linux uses Shared Objects, for example.
See http://www.mingw.org/wiki/sampleDLL for more information.
Include the header file in the calling program.
Link the calling program to the .lib file associated with the library.
Make sure that the library is in the library search path.