linking with two symbols. one defined in an archive file

linking with two symbols. one defined in an archive file - c++

I noticed that gtest provides a way to link again gtest_main so that end user doesn't need to write their own main function. This works in the following way. (A small example file named hello.cpp)
#include <gtest/gtest.h>
TEST(Hello, Basic) {}
One can compile this with:
g++ hello.cpp -lgtest -lgtest_main
and everything works out fine. The reason this works is that there is a main function defined in gtest_main.cc from which the libgtest_main.a is generated.
Now here is the thing. If I change my hello.cpp to
#include <gtest/gtest.h>
TEST(Hello, Basic) {}
int main(int argc, char** argv) {
testing::InitGoogleTest(&argc, argv);
return RUN_ALL_TESTS();
}
everything still works with the same command line! There are two main symbols now, and the linker has conveniently chosen the one main function which I defined in my hello.cpp.
What is the magic going on here?

No magic is going on. What you have observed is the normal default behaviour of
the linker.
A static library libxy.a is an ar archive of
object files x.o, y.o,...
If an object file x.o appears in the linker inputs of a program, the linker links it
into the program unconditionally.
If a static library libxy.a appears in the linker inputs, the linker examines the
archive to find any object files that provide definitions for symbols that have
already been referenced, but not already defined, in files already linked into
the program. It extracts just those object files, if any, from the archive and links
them into the program exactly as if they were individually named linker inputs
and the static library was not mentioned at all.
The usual reason that we offer a set of object files to the linker in a static library,
rather than as individual inputs, is so that the linker will select just the ones
it needs to obtain definitions for unresolved symbol references, rather than simply
linking all of them into the program whether they are needed or not.
Here is a elementary illustration in C1:-
main.c
extern void x(void);
int main(void)
{
x();
return 0;
}
lib_main.c
extern void y(void);
int main(void)
{
y();
return 0;
}
x.c
#include <stdio.h>
void x(void)
{
puts(__func__);
}
y.c
#include <stdio.h>
void y(void)
{
puts(__func__);
}
Compile all those to object files:
$ gcc -Wall -c main.c lib_main.c x.c y.c
Make a static library containing lib_main.o, x.o and y.o:
$ ar rcs libmxy.a lib_main.o x.o y.o
Link a program prog like this:
$ gcc -o prog main.o libmxy.a
It runs like:
$ ./prog
x
So the definition of main provided by main.o was linked and the other
definition of main in libmxy.a(lib_main.o) was ignored. Repeating the linkage
with some diagnostics sheds more light.
$ gcc -o prog main.o libmxy.a -Wl,-trace,-trace-symbol=main,-trace-symbol=x
/usr/bin/ld: mode elf_x86_64
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crti.o
/usr/lib/gcc/x86_64-linux-gnu/7/crtbeginS.o
main.o
(libmxy.a)x.o
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/usr/lib/gcc/x86_64-linux-gnu/7/crtendS.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crtn.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o: reference to main
main.o: definition of main
main.o: reference to x
libmxy.a(x.o): definition of x
The -trace option asks the linker to show us what files were actually used in
the linkage. -trace-symbol=name asks the linker to show us the files in which
symbol name was defined or referenced. Most of the files linked are boilerplate
that gcc adds to the linker commandline by default. The ones that we built are:
main.o
(libmxy.a)x.o
The linker found the symbol main first referenced in the boilerplate object
file /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o. Then
it found a definition of main in the object file main.o, which was linked
unconditionally. That resolved main. The linker didn't search libmxy.a for
another definition of main because it didn't need one.
In main.o it found an undefined reference to x and the next linker input
was libmxy.a. So it seached the object files in that archive for one that
defines x. It found libmxy.a(x.o) and extracted and linked it. Then it was
done.
The other object files that we offered to the linker in libmxy.a:
libmxy.a(lib_main.o)
libmxy.a(y.o)
were not needed. They might as well not have existed. The linkage is exactly
the same as:
$ gcc -o prog main.o x.o
$ ./prog
x
What is more interesting about libgtest_main.a...
... is the fact that here you have a static library that contains a member (libgtest_main.a(gtest_main.cc.o)) that will be linked
into your program even if your linkage does not input any object files before
libgtest_main.a:
$ g++ -o prog -lgtest_main -pthread
links successfully, and prog will run just to say that it has nothing to do.
If -lgtest_main is the very first linker input, then when the linker considers
it, it cannot have discovered any undefined references in files already linked,
since there are none, and therefore has no need to link any object file within
libgtest_main.a. But it does, and that behaviour might be described as a bit of
magic.
But we've already seen the explanation in the diagnostic output of:
$ gcc -o prog main.o libmxy.a -Wl,-trace,-trace-symbol=main,-trace-symbol=x
which informed us that main is first referenced in /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o.
That boilerplate object file is the GCC C runtime startup code, which performs standard initializations for program
execution and finishes by calling main. This is an object file, so it will be linked
unconditionally, and GCC places it before all other inputs in the generated linker commandline. Link in verbose
mode (gcc -v ...) to see that. So in fact there is always an object file, first in the program's linkage,
that makes reference to main, no matter what object files you explicitly link. And if you
do not yourself input an object file that defines main before you input libraries, then
the linker will search libraries for a definition of main. libgtest_main exploits that fact.
Of course, it is only practical to exploit this fact for googletest because for all normal
programs that link googletest, the definition of main is identical.
[1] The choice of C rather than C++ makes no difference, except that in C we
don't have to bother about name-mangling.

Related

Same object file in different static libraries when linking

clang++ ... foo.cpp ... -o dir1/foo.o
clang++ ... foo.cpp ... -o dir2/foo.o
//The only difference beween the above two clang++ command lines
//is the output directory
llvm-ar ... dir1/lib1.a ... dir1/foo.o ...
llvm-ar ... dir2/lib2.a ... dir2/foo.o ...
clang++ ... dir1/lib1.a dir2/lib2.a ... -o lib.so
What happens to the duplicated symbols from foo.cpp when generating lib.so? Is any flag reqired to not to generate symbol duplication errors?

Linking multiple static libraries, when the same object file occurs in more than one of the provided libraries, will not result in any duplicate symbol errors (by default).
This is because the linker does not "combine the static libraries" into a final executable. It only combines the provided object files into the executable. The linker processes the list of object files and archive libraries left-to-right. When a static library is encountered, the linker checks to see if any of the library provided object files define a currently undefined symbol. Then, and only then, will pull in that object file.
In your example:
clang++ ... dir1/lib1.a dir2/lib2.a ... -o lib.so
consider two additional object files:
clang++ obj1.o dir1/lib1.a dir2/lib2.a obj2.o -o lib.so
If obj1.o references a symbol that exists in foo.cpp:
The linker will process and add obj1.o to the lib.so, noting that said symbol is undefined.
The linker will open dir1/lib1.a and check if any object files contained in the archive define said symbol. Because foo.o defines the symbol, foo.o will be added to lib.so and the symbol will be marked defined.
The linker will open dir2/lib2.a. But there are no currently undefined symbols so the duplicate object file will be ignored.
The linker will process and add obj2.o to the lib.so. The linker does not go back and re-processes lib1.a or lib2.a
Therefore no duplicate symbol error should be raised (by default, on Linux). To change this behaviour, you can use the linker option --whole-archive
clang++ ... -Wl,--whole-archive dir1/lib1.a dir2/lib2.a -Wl,--no-whole-archive ... -o lib.so
With --whole-archive all object files from the specified archive libraries will be added to the output. The above command then results in a "multiple definition" error for any symbols in foo.cpp.
This answer describes the behaviour on Linux, I believe AIX is different and will always add all encountered object files (from static libraries) to the output.

I can have two main function in two static library of C++ when I link both of them?

I have three files:
test.cpp (it is empty) :
main1.cpp
int main()
{
printf("main_1\n");
return 0;
}
main2.cpp
int main()
{
printf("main_2\n");
return 0;
}
then I create two static library main1.a and main1.a.
g++ -c main1.cpp
ar r main1.a main1.o
g++ -c main2.cpp
ar r main2.a main2.o
I found that the output will different depends on the order of main1.a and main2.a as
main1.a is in front of main2.a
$ g++ -o out test.cpp main1.a main2.a
$ ./out
the output is "main_1"
main2.a is in front of main1.a
$ g++ -o out test.cpp main2.a main1.a
$ ./out
the output is "main_2"
why it will not have the error message "multiple definition of `main'" as the command?:
g++ -o out test.cpp main1.cpp main2.cpp

The linker handles these two cases very differently (by default). When compiling code into object files and linking them the linker will not allow two strong definitions to exist. Strong here is for example a function or a variable with a set value. It will allow one strong and multiple weak ones (like a variable with a set value in one object and another with the same variable but no set value).
With static libraries the linker goes through them in the order they’re given. It looks up the exports, if any of them are needed by the other linked objects it takes it in and restarts the process in that file (to find the functions the just found function possibly needs). So in this process when it gets to the first library it checks for its exports, sees main, determines it’s needed and takes it in. Then it goes to the second library, sees main, doesn’t see it in the list of undefined symbols and just skips it.
More information can be found on Eli Bendresky’s website

Using a shared library in another shared library

I am creating a shared library from a class from an example I got here C++ Dynamic Shared Library on Linux. I would like to call another shared library from the shared library created and then use it in the main program. So I have the myclass.so library and I want to call another library say anotherclass.so from the myclass.so library and then use this myclass.so library in the main program. Any idea on how I can do this please.

There is more than one way in which multiple shared libraries may be added to
the linkage of a program, if you are building all the libraries, and the program,
yourself.
The elementary way is simply to explicitly add all of the libraries to the
the linkage of the program, and this is the usual way if you are building only the
program and linking libraries built by some other party.
If an object file foo.o in your linkage depends on a library libA.so, then
foo.o should precede libA.so in the linkage sequence. Likewise if libA.so
depends on libB.so then libA.so should precede libB.so. Here's an illustration.
We'll make a shared library libsquare.so from the files:
square.h
#ifndef SQUARE_H
#define SQUARE_H
double square(double d);
#endif
and
square.cpp
#include <square.h>
#include <cmath>
double square(double d)
{
return pow(d,2);
}
Notice that the function square calls pow, which is declared in the
Standard header <cmath> and defined in the math library, libm.
Compile the source file square.cpp to a position-independent object file
square.o:
$ g++ -Wall -fPIC -I. -c square.cpp
Then link square.o into a shared library libsquare.so:
$ g++ -shared -o libsquare.so square.o
Next we'll make another shared library libcube.so from these files:
cube.h
#ifndef CUBE_H
#define CUBE_H
double cube(double d);
#endif
and
cube.cpp
#include <cube.h>
#include <square.h>
double cube(double d)
{
return square(d) * d;
}
See that the function cube calls square, so libcube.so is going to
depend on libsquare.so. Build the library as before:
$ g++ -Wall -fPIC -I. -c cube.cpp
$ g++ -shared -o libcube.so cube.o
We haven't bothered to link libsquare with libcube, even though libcube
depends on libsquare, and even though we could have, since we're building libcube.
For that matter, we didn't bother to link libm with libsquare. By default the
linker will let us link a shared library containing undefined references, and it
is perfectly normal. It won't let us link a program with undefined references.
Finally let's make a program, using these libraries, from this file:
main.cpp
#include <cube.h>
#include <iostream>
int main()
{
std::cout << cube(3) << std::endl;
return 0;
}
First, compile that source file to main.o:
$ g++ -Wall -I. -c main.cpp
Then link main.o with all three required libraries, making sure to list
the linker inputs in dependency order: main.o, libcube.so, libsquare.so, libm.so:
$ g++ -o prog main.o -L. -lcube -lsquare -lm
libm is a system library so there's no need to tell the linker where to look for
it. But libcube and libsquare aren't, so we need to tell the linker to look for
them in the current directory (.), because that's where they are. -L. does that.
We've successfully linked ./prog, but:
$ ./prog
./prog: error while loading shared libraries: libcube.so: cannot open shared object file: No such file or directory
It doesn't run. That's because the runtime loader doesn't know where to find libcube.so (or libsquare.so, though it didn't get that far).
Normally, when we build shared libraries we then install them in one of the loader's default
search directories (the same ones as the linker's default search directories), where they're available to any program, so this wouldn't happen. But I'm not
going to install these toy libraries on my system, so as a workaround I'll prompt the loader where to look
for them by setting the LD_LIBRARY_PATH in my shell.
$ export LD_LIBRARY_PATH=.
$ ./prog
27
Good. 3 cubed = 27.
Another and better way to link a program with shared libraries that aren't located
in standard system library directories is to link the program using the linker's
-rpath=DIR option. This will write some information into the executable to tell
the loader that it should search for required shared libraries in DIR before it tries
the default places.
Let's relink ./prog that way (first deleting the LD_LIBRARY_PATH from the shell so that it's not effective any more):
$ unset LD_LIBRARY_PATH
$ g++ -o prog main.o -L. -lcube -lsquare -lm -Wl,-rpath=.
And rerun:
$ ./prog
27
To use -rpath with g++, prefix it with -Wl, because it's an option for linker, ld,
that the g++ frontend doesn't recognise: -Wl tells g++ just to pass the
option straight through to ld.

I would like to add some points to the response of #Mike.
As you do not link libcube library with libsquare you are creating a sort of "incomplete library". When I say incomplete, I meant that when you link your application you must link it with both libcube and libsquare even though it does not use any symbol directly from libsquare.
It is better to link libcube directly with libsquare. This link will create the library with a NEEDED entry like:
readelf -d libcube.so
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libsquare.so]
Then when you link your application you can do:
g++ -o prog main.o -L. -lcube
Although, this will not link because the linker tries to locate the NEEDED library libsquare. You must precise its path by adding -Wl,-rpath-link=. to the linking command:
g++ -o prog main.o -L. -lcube -Wl,-rpath-link=.
Note: For runtime, you must still set LD_LIBRARY_PATH or link with rpath as mentioned by #Mike.

In your library if you are using any other shared library so simply your library user is also dependent on that library. While creating library you can use -l so the linker have notion for shared library and it will link when required.
But when you deliver your library as its dependent on some other library you need to export that too along with your and provide some environment variable or linker flag to load it from specified path (Your exported package). That will not lead any discrepancy other wise if its some standard library function user might get definition from his system's some other library and will lead in disastrous situation.

Simply use the library like you'd use it in any other application. You don't have to link to anotherclass.so, just to myclass.so.
However, you will have to make both libraries (myclass.so and anotherclass.so) available for your later application's runtime. If one of them is missing you'll get runtime errors just like it is with any other application.

symbol resolutions when creating (and linking) libraries

Suppose a.cc defines a function f_a() that uses a function f_b() defined in b.cc. From a.cc and b.cc I create a dynamic library libdynamic.so.
Suppose the file main.cc uses f_a, I'd compile it as follows:
g++ -o main main.cc -ldynamic
How does the dynamic linker bring the definition of f_a (and subsequently f_b) into the executable? Is the definition of f_a in libdynamic.so already resolved with f_b? Or the dynamic linker will also resolve this (internal) dependency at runtime?

Since you're using a shared library (*.so), the definition is not brought into the executable. It remains in the library itself and is resolved at run time, which is why if you remove the shared library the program will not function correctly.
On the other hand, all the internal symbols in the library (in your example, f_a and f_b) must be resolved when the library is built. This is evident from the compilation process:
g++ -fPIC -c a.cc
g++ -fPIC -c b.cc
g++ -shared -Wl,-soname,libdynamic.so -o libdynamic.so a.o b.o
In the last stage, g++ calls the linker (ld) to link f_a.o and f_b.o. In fact, you could (probably) call the linker directly instead:
ld -shared -soname=libdynamic.so -o libdynamic.so a.o b.o
If you're still curious about the whole process and all its gory details, here is a useful reference article: Linkers and Loaders, by Sandeep Grover.

Basically Dynamic libraries are linked with the Executable file at Run time(That is when you are running ./main). The compiler will take care about the solving the dependency at run time. If you want to check the dependency is resolved or not by nm command. The default information that the ‘nm’ command provides is-
Virtual address of the symbol
A character which depicts the symbol type. If the character is in lower case then the symbol is local but if the character is in upper case then the symbol is external
Name of the symbol
For more information nm.
After compiling your program just execute nm exefilename(i think for your's nm main).

g++: In what order should static and dynamic libraries be linked?

Let's say we got a main executable called "my_app" and it uses several other libraries: 3 libraries are linked statically, and other 3 are linked dynamically.
In which order should they be linked against "my_app"?
But in which order should these be linked?
Let's say we got libSA (as in Static A) which depends on libSB, and libSC which depends on libSB:
libSA -> libSB -> libSC
and three dynamic libraries:libDA -> libDB -> libDC (libDA is the basic, libDC is the highest)
in which order should these be linked? the basic one first or last?
g++ ... -g libSA libSB libSC -lDA -lDB -lDC -o my_app
seems like the currect order, but is that so? what if there are dependencies between any dynamic library to a static one, or the other way?

In the static case, it doesn't really matter, because you don't actually link static libraries - all you do is pack some object files together in one archive. All you have to is compile your object files, and you can create static libraries right away.
The situation with dynamic libraries is more convoluted, there are two aspects:
A shared library works exactly the same way as static library (except for shared segments, if they are present), which means, you can just do the same - just link your shared library as soon as you have the object files. This means for example symbols from libDA will appear as undefined in libDB
You can specify the libraries to link to on the command line when linking shared objects. This has the same effect as 1., but, marks libDB as needing libDA.
The difference is that if you use the former way, you have to specify all three libraries (-lDA, -lDB, -lDC) on the command line when linking the executable. If you use the latter, you just specify -lDC and it will pull the others automatically at link time. Note that link time is just before your program runs (which means you can get different versions of symbols, even from different libraries).
This all applies to UNIX; Windows DLL work quite differently.
Edit after clarification of the question:
Quote from the ld info manual.
The linker will search an archive only
once, at the location where it is
specified on the command line. If the
archive defines a symbol which was
undefined in some object which
appeared before the archive on the
command line, the linker will include
the appropriate file(s) from the
archive. However, an undefined symbol
in an object appearing later on the
command line will not cause the linker
to search the archive again.
See the `-(' option for a way to force
the linker to search archives multiple
times.
You may list the same archive multiple
times on the command line.
This type of archive searching is
standard for Unix linkers. However, if
you are using `ld' on AIX, note that
it is different from the behaviour of
the AIX linker.
That means:
Any static library or object that depends on other library should be placed before it in the command line. If static libraries depend on each other circularly, you can eg. use the -( command line option, or place the libraries on the command line twice (-lDA -lDB -lDA). The order of dynamic libraries doesn't matter.

This is the sort of question that's best solved by a trivial example. Really! Take 2 minutes, code up a simple example, and try it out! You'll learn something, and it's faster than asking.
For example, given files:
a1.cc
#include <stdio.h>
void a1() { printf("a1\n"); }
a2.cc
#include <stdio.h>
extern void a1();
void a2() { printf("a2\n"); a1(); }
a3.cc
#include <stdio.h>
extern void a2();
void a3() { printf("a3\n"); a2(); }
aa.cc
extern void a3();
int main()
{
a3();
}
Running:
g++ -Wall -g -c a1.cc
g++ -Wall -g -c a2.cc
g++ -Wall -g -c a3.cc
ar -r liba1.a a1.o
ar -r liba2.a a2.o
ar -r liba3.a a3.o
g++ -Wall -g aa.cc -o aa -la1 -la2 -la3 -L.
Shows:
./liba3.a(a3.o)(.text+0x14): In function `a3()':
/tmp/z/a3.C:4: undefined reference to `a2()'
Whereas:
g++ -Wall -g -c a1.C
g++ -Wall -g -c a2.C
g++ -Wall -g -c a3.C
ar -r liba1.a a1.o
ar -r liba2.a a2.o
ar -r liba3.a a3.o
g++ -Wall -g aa.C -o aa -la3 -la2 -la1 -L.
Succeeds. (Just the -la3 -la2 -la1 parameter order is changed.)
PS:
nm --demangle liba*.a
liba1.a:
a1.o:
U __gxx_personality_v0
U printf
0000000000000000 T a1()
liba2.a:
a2.o:
U __gxx_personality_v0
U printf
U a1()
0000000000000000 T a2()
liba3.a:
a3.o:
U __gxx_personality_v0
U printf
U a2()
0000000000000000 T a3()
From man nm:
If lowercase, the symbol is local; if uppercase, the symbol is global (external).
"T" The symbol is in the text (code) section.
"U" The symbol is undefined.

I worked in a project with a bunch of internal libraries that unfortunately depended on each other (and it got worse over time). We ended up "solving" this by setting up SCons to specify all libs twice when linking:
g++ ... -la1 -la2 -la3 -la1 -la2 -la3 ...

The dependencies for linking a library or executable have to be present at link-time, so you cannot link libXC before libXB is present. It doesn't matter if statically or dynamically.
Start with the most basic one, which has no (or just outside of your project) dependencies.

It's good practice to keep libraries independent of each other to avoid link order issues.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

linking with two symbols. one defined in an archive file - c++

Related

Same object file in different static libraries when linking

I can have two main function in two static library of C++ when I link both of them?

Using a shared library in another shared library

symbol resolutions when creating (and linking) libraries

g++: In what order should static and dynamic libraries be linked?

Categories

Resources