linking a self-registering, abstract factory - c++

I've been working with and testing a self-registering, abstract factory based upon the one described here:
https://stackoverflow.com/a/582456
In all my test cases, it works like a charm, and provides the features and reuse I wanted.
Linking in this factory in my project using cmake has been quite tricky (though it seems to be more of an ar problem).
I have the identical base.hpp, derivedb.hpp/cpp, and an equivalent deriveda.hpp/cpp to the example linked. In main, I simply instantiate the factory and call createInstance() twice, once each with "DerivedA" and "DerivedB".
The executable created by the line:
g++ -o testFactory main.cpp derivedb.o deriveda.o
works as expected. Moving my derived classes into a library (using cmake, but I have tested this with ar alone as well) and then linking fails:
ar cr libbase.a deriveda.o derivedb.o
g++ -o testFactory libbase.a main.cpp
only calls the first static instantiation (from derivedA.cpp) and never the second static instantiation, i.e.
// deriveda.cpp (if listed first in the "ar" line, this gets called)
DerivedRegister<DerivedA> DerivedA::reg("DerivedA");
// derivedb.cpp (if listed second in the "ar" line, this does not get called)
DerivedRegister<DerivedB> DerivedB::reg("DerivedB");
Note that swapping the two in the ar line calls only the derivedb.cpp static instantiation, and not the deriveda.cpp instantiation.
Am I missing something with ar or static libraries that somehow do not play nice with static variables in C++?

Contrary to intuition, including an archive in a link command is not the same as including all of the objects files that are in the archive. Only those object files within the archive necessary to resolve undefined symbols are included. This is a good thing if you consider that once there was no dynamic linking and otherwise the entirety of any libraries (think the C library) would be duplicated into each executable. Here's what the ld(1) manpage (GNU ld on linux) has to say:
The linker will search an archive only once, at the location where it is specified on the command line. If the archive defines a symbol which was undefined in some object which appeared before the archive on the command line, the linker will include the appropriate file(s) from the archive. However, an undefined symbol in an object appearing later on the command line will not cause the linker to search the archive again.
Unfortunately there's no standard way to include every member of an archive in the linked executable. On linux you can use g++ -Wl,-whole-archive and on Mac OS X you can use g++ -all_load.
So with GNU binutils ld, the link command should be
g++ -o testFactory -Wl,-whole-archive libbase.a -Wl,-no-whole-archive main.cpp
the -Wl,-no-whole-archive ensures that any archive appearing later in the final link command generated by g++ will be linked in the normal way.

Related

Passing `-l<libname>` vs passing `lib<libname>.a` directly to linker?

Suppose I have two files
// a.c
int a() {return 1;}
// b.c
int a();
int b() {return a();}
and I compile them to a.o and b.o, respectively.
In an attempt to make an executable or shared library, one can call gcc a.o b.o -o libab.so -shared. But I also noticed that one can also call gcc b.o -L. -l:a.o -o libab.so -shared to generate (apparently) the same output. To my surprise, even running gcc a.o -L. -l:b.o -shared results in a library that has both a() and b(). (Shouldn't linker discard the unused library b.o since a.o does not depend on it?)
The latter two presumably pass a in as if a.o was a library. Now if I run ar rcs liba.a a.o, gcc b.o -L. -l:liba.a -shared and gcc b.o liba.a -shared both run without any problem and give the same output.
However, I have also seen case where this trick doesn't work and results undefined references. My question is therefore as the titles says: what are the differences between passing an object as a library and as a normal object file, and are there any differences when it comes to C++?
The problem arose in a much larger project. Sorry for lacking mcve because I can't seem to isolate the problem.
[How does] Passing -l<libname> vs [differ from] passing lib<libname>.a directly to linker?
Passing -llibname.so will make GNU linker traverse the library only once when searching for a symbol (when not after --whole-archive option). Specifying .a file directly to the linker makes it search for every symbol in all the object files inside the .a file for every symbol, not only once.
From the GCC Linker options (emphasis mine):
-llibrary
...
It makes a difference where in the command you write this option; the linker searches and processes libraries and object files in the order they are specified. Thus, ‘foo.o -lz bar.o’ searches library ‘z’ after file foo.o but before bar.o. If bar.o refers to functions in ‘z’, those functions may not be loaded.
From binutils ld options:
-l namespec
...
The linker will search an archive only once, at the location where it is specified on the command line. If the archive defines a symbol which was undefined in some object which appeared before the archive on the command line, the linker will include the appropriate file(s) from the archive. However, an undefined symbol in an object appearing later on the command line will not cause the linker to search the archive again.
what are the differences between passing an object as a library and as a normal object file, and are there any differences when it comes to C++?
That depends on the implementation. In the most general sense, Unix-style linkers such as you are asking about search for objects named via -l options in a library search path, whereas if you name a file directly, you must specify the exact file.
Additionally, if you use an -l option to specify a file to link then, in the general case, the linker constructs a filename from the argument by prepending "lib" and appending ".a", or in some other way, such as by searching also or instead for ".so" files. (The GNU linker that you appear to be using provides an exception to this behavior when the first character of the argument is :. In that case it takes the rest of the argument as an exact file name, and searches for that.)
Many linkers also accept explicit library names specified on the command line (e.g. libfoo.a instead of -lfoo), so these need to be able to determine what type of file each is. Normally this is by examining the file, not by relying on its name. And GNU ld, at least, extends this file type detection to files specified via -l options.
The order in which objects and libraries are specified on the command line, by whatever specific form, matters to typical linker implementations. For example, the docs for GNU ld specify that
options which refer to files, such as ‘-l’ or ‘-T’, cause the file to
be read at the point at which the option appears in the command line,
relative to the object files and other file options
which is important because
The linker will search an archive only once, at the location where it
is specified on the command line. If the archive defines a symbol
which was undefined in some object which appeared before the archive
on the command line, the linker will include the appropriate file(s)
from the archive. However, an undefined symbol in an object appearing
later on the command line will not cause the linker to search the
archive again.
But of course
You may list the same archive multiple times on the command line.
The docs are not altogether clear on this, but empirically, the use of the term "archive" in the above is significant. It is effectively only archive files -- static libraries -- to which the "searched only once" provision applies. To a first approximation, the relative order of different ordinary object files and shared libraries on the GNU linker's command line, no matter how designated, does not impact symbol resolution.
So yes, it does matter whether you specify regular object files or static archives or shared libraries to the (GNU) linker, and their order matters to some extent, but the manner in which you specify them does not matter.
I have also seen case where this trick doesn't work and results undefined references.
With the GNU linker, that will be because of genuinely missing libraries or objects, or because of an unsuitable order of static archives relative to other object files or archives. Some other linkers are more sensitive.
Short answers:
The -L and -l options provide a shortcut for locating library archives (and shared libraries). But once you've used -l to locate a library (in the standard locations, or in a location specified by -L), the reading of that library is identical to the way it would be read if you specified its filename (e.g. /lib/libx.a) at the same spot on the command line explicitly.
When you specify a single object (.o) file, the entire contents of that file get loaded unconditionally. When you specify a library archive (.a) file, only those objects within it that are necessary (to satisfy outstanding undefined references) are loaded.

Linking to shared and static libraries with c++ on a Linux system

I am messing around with a test project, lets call it mytest, it has a .cpp and a .h file, the contents are not really important - imagine it contains a few simple hello_world() type functions...
So, I was making a generic makefile to compile this into the various library outputs where an ls -l on my output folder gives:
libmytest.a
libmytest.so -> libmytest.so.1.0
libmytest.so.1 -> libmytest.so.1.0
libmytest.so.1.0
All good so far, my shared / static libraries are created.
Now I have a make install target in my make file, which basically copies the header to /usr/local/include and all of these library files to /usr/local/lib
Then I made another test cpp file called usertest.cpp (sorry for the not-very-imaginative/descriptive names), which links to the library files.
I compiled in various ways:
g++ -Wall -Werror -I. -lmytest
g++ -Wall -Werror -I. -lmytest -static
Then I deleted the libmytest.so* files so I only had the libmytest.a library file in /usr/local/lib Then I did the same test:
g++ -Wall -Werror -I. -lmytest
g++ -Wall -Werror -I. -lmytest -static
Finally I deleted the libmytest.a file and copied back the .so files so I only had the libmytest.so* library files in /usr/local/lib Then I did the same test:
g++ -Wall -Werror -I. -lmytest
g++ -Wall -Werror -I. -lmytest -static
The file size results(in bytes) are:
1. 7736 - Makes sense, all libs dynamically linked
2. 19674488 - Makes sense, all libs statically linked
3. 64908 - hmm... not really sure why
4. 19674488 - Makes sense, same as 2.
5. 7736 - Makes sense, same as 1.
6. failed - Makes sense, no .so files!
I have three files sizes, the small (7736) is fully dynamically linked. The large is statically linked.... what is this medium one (64908)? So I have questions:
for 1. I assume the system looks for .so libraries first and .a libraries second?
For 3. What happened here? - is it dynamically linking the system libs but when it sees my .a lib it dynamically links it?
Note all outputs run fine and call functions from the library.
For 1. I assume the system looks for .so libraries first and .a libraries second?
That's roughly right, but read on.
For 3. What happened here? - is it dynamically linking the system libs but when it sees my .a lib it dynamically links it?
A static library cannot be dynamically linked: it is statically linked. The shared ( = dynamic) system libraries are linked,
assuming that the system libraries that the linker finds and prefers are in fact shared libraries.
By default, the linkage option -lmytest directs the linker to search for an input file called libmytest.so (shared library)
or libmytest.a (static library), first in the search directories you have specified in the commandline with
the -Ldirname option, in the order specified, and then in its default search directories, in the configured order.
It stops searching when it finds either of those files in one of those directories. If it finds both of them in
the same directory then it selects the shared library, libmytest.so. The selected file, if any, is input to the linkage.
If the search is unsuccessful the linker gives an error: cannot find -lmytest.
This default behaviour can be changed by the option -static. If it appears anywhere in the commandline, the linker
ignores all shared libraries: then -lmytest can only be satisfied by finding libmytest.a, and static system libraries must also be found.
/usr/local/lib is one of the linker's default search directories. So when you execute:
g++ -Wall -Werror -I. -lmytest
in the scenario (3) where, /usr/local/lib/libmytest.a is found by the linker and /usr/local/lib/libmytest.so is not,
libmytest.a satisfies -lmytest and is input to the linkage. The linker's default preference for shared libraries is unaffected.
The contribution that the linkage of libmytest.a makes to the size of the executable is not obvious.
A static library - quite unlike a shared library - is not an ELF binary that the linker has produced. It is
ar archive of object files, produced by ar: it is a bag of files just that
happen to be object files.
By default, when an ar archive is input to the linker, it looks in the bag to find any object files that
provide definitions for any undefined symbol references that have accrued from object files already
linked into the output file (program or shared library) when the archive was inspected. If it finds any
such object files, it extracts them from the archive and links them into the output file, exactly as if they
had been individually listed in the commandline and the archive not mentioned at all. Except as a bag from which
object files may be selected, the archive contributes nothing to the linkage.
Thus, if there are N object files in libmytest.a, inputting that archive to a linkage might
contribute between 0 and N object files to the output file, depending on what undefined references into members of
that set of object files accrue earlier in the linkage, and which object files provide definitions for those
references.
And even if you know exactly which object files in libmytest.a will be required in your linkage, you cannot
conclude that the sum of their sizes will be added to the size of the output file. An object file is
partitioned into sections by the compiler, a section being the smallest unit of input and output that the linker
recognizes. By default the linker will retain an input section for output only if that section provides the linker's selected definition of some symbol that the linkage must define. If an input section is of no such use, the
linker will just discard it. So, even if an object file is linked, the linker might omit redundant sections
within it from the output file.
The behaviour of the -l | --library linker option is documented in 2.1 Command Line Options
of the GNU ld manual
Most probably libmytest.a is not the one, who plays major role in the binary size increase, but bigger standard libraries (that explains why the size didn't grow much in 3.).
You can investigate all the dynamic dependencies of your binary using ldd:
ldd a.out
(and which of them are disappearing after using -static).

linking library for creating static library

I have written some code in Lib_file.h and Lib_file.cpp. I wish to convert this code to a static library. I am able to compile the code (using the command g++ -I <necessary include files> -o Lib_file.o Lib_file.cpp)to get Lib_file.o. I am also able to add it to an archive using the ar rvs Lib_file.a Lib_file.o command. Now when I try to use this library in some other code using the -L option, I get undefined reference errors. This errors point to the code in my Lib_file.o . So my question is how do I get the code in my Lib_file.cpp to link to the libraries that it uses.
I have tried the following options so far
I. After creating the Lib_file.o, I tried the following command
g++ -L<include path> -l<.a files> Lib_file.o . On executing this command, I get the following error
/usr/lib/../lib64/crt1.o: In function `_start':
init.c:(.text+0x20): undefined reference to `main'
collect2: ld returned 1 exit status
II. I tried to include all the necessary .a files in a new archive along with my Lib_file.o using the ar command. Still I get the undefined reference error when I try to use the Lib_file.a library with my application
Please help me out here
First of all, all libraries are normally named something like libxyz.a where xyz is the name of the library.
Secondly, you try to create a program using only the object file you used for the library, and also linking it with itself. This will of course not work, since the library have no main function which is needed for normal programs. You have to create another program, and link that one with the library.
Like
gcc myotherprogram.c -o myotherprogram -L/some/path -lxyz
As you can see in my command line above, I placed the library last on the command line. It's needed because the linker look for dependencies in kind of reversed order.
Edit: Linking your static library with other libraries: You don't. A static library is completely standalone, and if it needs other libraries itself to work then they have to be present on the command line when compiling the actual program.
For example, lets say that library xyz depends on the standard math library (i.e. the m library). You can't "link" with it when creating the xyz library as you don't actually link static libraries, you just put a collection of object files together in an archive (ar and the .a extension is for archive). When you build the actual application that needs the xyz library you also needs to link with whatever libraries that xyz needs:
gcc myotherprogram.c -o myotherprogram -L/some/path -lxyz -lm

A g++ link error - why the workaround works

I am working in a big project. And now encounter a link error.
This error can be avoided by a workaround but I can not figure out why it works.
Here is the file structure related to my problem:
project
|-package_a
|--a.cpp
|--...
|-package_b
|--b.cpp
|--c.cpp
|--...
|-package_others
All the *.o in package_a will be packed in to a.a and *.o in package_b
will be packed into b.a
"g++ -o exec -Bstatic b.a a.a ..." is used to generate the binary.
In package_b/b.cpp, I added a function foo().
And in package_a/a.cpp, I used this function.
But here I will get a link error saying undefined reference of foo() in a.o
I can verify (by objdump) that foo() is already in b.o.
By changing the link command into "g++ -o exec -Bstatic a.a b.a ...", the binary can be built successfully. I now understand that the linker do care about the order in linkage list. But please understand this is a big project I have no permission to change the project configuration so the original link order must be kept.
Then I added a dummy function bar() in package_b/c.cpp, which do nothing
but just calling foo(), then original "g++ -o exec -Bstatic b.a a.a ..." will run
through without any link error.
Can anybody show me some light why just adding a dummy function in
the same package will work in this case?
I'm using g++ 4.4.4 and linux 2.6.18-194.el5
Any comment will be appreciated
This is normal. When linking, the linker goes through the list of object files, finding undefined references which are then satisfied by other object files/libraries coming after it.
You can change this behaviour by either
including one of the archives twice, as in
g++ -o exec a.a b.a a.a
using the -( construct
g++ -o exec -( a.a b.a -)
But please understand this is a big project I have no permission to change the project configuration so the original link order must be kept.
Tough luck... Maybe the manager or whoever just doesn't want you to use functions in b from a.
Then I added a dummy function bar() in package_b/c.cpp, which do nothing but just calling foo(), then original "g++ -o exec -Bstatic b.a a.a ..." will run through without any link error.
Could be that another function of package_b/c.cpp was already referenced, and the linker took bar() with it (because they are in the same file) and this referenced foo(), which was subsequently included in the output, too. It succeeded, because foo was in b.a too.
You may like to read up on how linkers work. BTW, -Bstatic flag is unnecessary because .a. object file archives link statically only (as if the list of object files contained in .a was specified on the command line instead of .a).
Alternatively, you can always wrap a list of archives to link with --start-group/ --end-group options to make the linker scan the list of archives multiple times, so that no ordering of archives is required (like MS VC++ does):
g++ -o exec -Wl,--start-group a.a b.a -Wl,--end-group
See man ld:
-( archives -)
--start-group archives --end-group
The archives should be a list of archive files. They may be either
explicit file names, or -l options.
The specified archives are searched repeatedly until no new
undefined references are created. Normally, an archive is searched
only once in the order that it is specified on the command line.
If a symbol in that archive is needed to resolve an undefined
symbol referred to by an object in an archive that appears later on
the command line, the linker would not be able to resolve that
reference. By grouping the archives, they all be searched
repeatedly until all possible references are resolved.
Using this option has a significant performance cost. It is best
to use it only when there are unavoidable circular references
between two or more archives.
GCC, unlike the Visual-C++-linker, requires static libraries to be supplied in an order so that references are defined before they are used. Don't ask me why, but you will always have to check that you are listing the files to be linked in the correct order with GCC.
There is an in-depth explanation here.
When you are using a function from a static library, you must on the command line first place the file from which the function is used, then the library where the function is defined. Otherwise, if you place the definition first, gcc (or more specifically, ld) discards the "unused" function. That's how gcc works, sorry.

What does static linking against a library actually do?

Say I had a library called libfoo which contained a class, a few static variables, possibly something with 'C' linkage, and a few other functions.
Now I have a main program which looks like this:
int main() {
return 5+5;
}
When I compile and link this, I link against libfoo.
Will this have any effect? Will my executable increase in size? If so, why? Do the static variables or their addresses get copied into my executable?
Apologies if there is a similar question to this or if I'm being particularly stupid in any way.
It won't do anything in a modern linker, because it knows the executable doesn't actually use libfoo's symbols. With gcc 4.4.1 and ld 2.20 on my system:
g++ linker_test.cpp -static -liberty -lm -lz -lXp -lXpm -o linker_test_unnecessary
g++ linker_test.cpp -static -o linker_test_none
ls -l linker_test_unnecessary linker_test_none
They are both 626094 bytes. Note this also applies to dynamic linking, though the size they both are is much lower.
A library contains previously compiled object code - basically a static library is an archive of .o or .obj files.
The linker looks at your object code and sees if there are any unresolved names and if so looks for these in the library, if it finds them it includes the object file that contains them and repeats this.
Thus only the parts of the static library that are needed are included in your executable.
Thus in your case nothing from libfoo will be added to you executable