Why order(e.g. source.cxx -lstatic) is enforced while linking with static library? - c++

While linking with static library, why the order is enforced ?
g++ -ldynamic -lstatic src.cxx //ERROR
g++ -lstatic src.cxx -ldynamic //ERROR
g++ src.cxx -ldynamic -lstatic //SUCCESS
g++ -ldynamic src.cxx -lstatic //SUCCESS
Is there a technical reason why static library cannot be linked like dynamic libraries (at any order ) ?
Why linking libraries cannot be made generic (may be by mentioning while compiling/linking e.g. for static : -ls and for dynamic : -ld etc.) ?

The As Needed schism in Linux linkage
Your example:
g++ -ldynamic -lstatic src.cxx # ERROR
g++ -ldynamic src.cxx -lstatic # SUCCESS
indicates that your linux distro belongs to the RedHat clan. Let's just confirm
that, say on CentOS 7:
$ cat /proc/version
Linux version 3.10.0-693.el7.x86_64 (builder#kbuilder.dev.centos.org) \
(gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) \
#1 SMP Tue Aug 22 21:09:27 UTC 2017
$ cat foo.c
#include <stdio.h>
void foo(void)
{
puts(__func__);
}
$ cat bar.c
#include <stdio.h>
void bar(void)
{
puts(__func__);
}
$ cat main.c
extern void foo(void);
extern void bar(void);
int main(void)
{
foo();
bar();
return 0;
}
$ gcc -Wall -fPIC -c foo.c
$ gcc -shared -o libfoo.so foo.o
$ gcc -Wall -c bar.c
$ ar cr libbar.a bar.o
$ gcc -Wall -c main.c
$ gcc -o prog -L. -lfoo -lbar main.o -Wl,-rpath=$(pwd)
main.o: In function `main':
main.c:(.text+0xa): undefined reference to `bar'
collect2: error: ld returned 1 exit status
# :(
$ gcc -o prog -L. -lfoo main.o -lbar -Wl,-rpath=$(pwd)
$ # :)
$ ./prog
foo
bar
So you're right there.
Now let's check it out on a distro from the Debian clan:
$ cat /proc/version
Linux version 4.13.0-32-generic (buildd#lgw01-amd64-016) \
(gcc version 7.2.0 (Ubuntu 7.2.0-8ubuntu3)) \
#35-Ubuntu SMP Thu Jan 25 09:13:46 UTC 2018
Here, it all goes the same as far as:
$ gcc -o prog -L. -lfoo -lbar main.o -Wl,-rpath=$(pwd)
main.o: In function `main':
main.c:(.text+0x5): undefined reference to `foo'
main.c:(.text+0xa): undefined reference to `bar'
collect2: error: ld returned 1 exit status
when it gets different. Now the linkage can't resolve either foo - from the
shared library libfoo.so - or bar - from the static library libbar.a. And
to fix that we need:
$ gcc -o prog -L. main.o -lfoo -lbar -Wl,-rpath=$(pwd)
$ ./prog
foo
bar
with all the libraries mentioned after the object file(s) - main.o - that
reference the symbols they define.
The Centos-7 (RedHat) linkage behaviour is old-school. The Ubuntu 17.10 (Debian)
linkage behaviour was introduced in Debian 7, 2013, and trickled down
to the Debian-derived distros. As you see, it abolishes the distinction
between shared libraries and static libraries as regards the library needing,
or not needing, to appear in the linkage sequence after all input
files that reference it. They all must appear in dependency order (DO1),
shared libraries and static libraries alike.
This comes down to how the distro decides to build their version of the GCC
toolchain - how they choose the default options that get passed to the system
linker (ld) behind the scenes when it is invoked by one of the language
front-ends (gcc, g++, gfortran etc.) to execute a linkage for you.
Specifically, it comes down to whether the linker option --as-needed
is, or is not, inserted by default into the ld commandline before the libraries
are inserted.
If --as-needed is not in effect then a shared library libfoo.so is arrived at,
then it will be linked regardless of whether the linkage has so far accrued any
unresolved references to symbols that the shared library defines. In short,
it will be linked regardless of any proven need to link it. Maybe the further
progess of the linkage, to subsequent inputs, will give rise to unresolved references
that libfoo.so resolves, justifying its linkage. But maybe not. It gets linked
anyhow. That's the RedHat way.
If --as-needed is in effect when a libfoo.so is arrived at, then it
will be linked if and only if it exports a definition for at least one symbol
to which an unresolved reference has already accrued in the linkage, i.e.
there is a proven need to link it. It cannot end up linked if there is
no need to link it. That's the Debian way.
The RedHat way with shared library linkage was prevalent until Debian 7
broke ranks. But the linkage of static libraries has always conformed to the as needed principle
by default. There's no --as-needed option that applies to static libraries.
Instead there's the opposite, --whole-archive:
you need to use that to override the default behaviour and link object files from static libraries regardless of need.
So folks like you, in RedHat land, observe this puzzling difference: by default static libaries
have to be linked in DO; for shared libraries, any order will do by default.
Folks is Debian land see so such difference.
Wny so?
Since the Redhat way has this puzzling difference - a stumbling block for
the linkage efforts of the uninitiated - it's natural to ask why, historically,
it was as needed for static libraries, but not as needed for shared libraries,
as a matter of course, and why it still goes in RedHat land.
Simplifying grossly, the linker assembles a program (or shared library) by
incrementally populating sections and dynamic dependency records (DDRs2) in
a structure of sections and DDRs that starts off empty and
ends up being a binary file that the OS loader can parse and successfully map
into a process address space: e.g. an ELF executable or DSO. (Section
here is a genuine technical term. Dynamic dependency record isn't.
I've just coined now for convenience.)
Loosely speaking, the linker inputs that drive this process are object files,
shared libraries, or static libraries. But strictly speaking, they are either
object files or shared libraries. Because a static libary is simply
an ar archive of files that happen to be
object files. As far as the linker is concerned it is just a sequence of object
files that it might or might not need to use, archived together with a symbol table
by which the linker can query which of the object files, if any, defines a symbol.
When the linker arrives at an object file, that object file is always linked
into the program. The linker never asks whether it needs an object file
(whatever that might mean). Any object file is an unconditional source of linkage
needs, that further inputs have to satisfy.
When an object file is input, the linker has to dismantle it into
the input sections of which it composed and merge them into the output
sections in the program. When an input section S appears in one object
file, the chances are that a section S will appear in other object files;
maybe all of them. The linker has to stitch together all the input S sections
into a single output S section in the program, so it isn't finally done
with composing an output section S till the linkage is finished.
When a shared library libfoo.so is input to the linkage, the linker outputs
a DDR into the program (if it decides that the library is needed, or doesn't care). That is essentialy a memo that will be read at runtime by
the loader, telling it that libfoo.so is a dependency of the process that is
under construction; so it will locate libfoo.so by its standard search algorithm,
load it and map it into the process.
Consuming an object file is a relatively costly bit of linkage; consuming
a shared library is relatively cheap - especially if the linker does not
have to bother beforehand figuring out whether the shared library is needed.
The input/output section processing of an object file is typically more bother than writing out a DDR.
But more important than the effort, linking an object file typically makes the program signficantly
larger, and can make it arbitarily larger. Linking a shared library adds
only a DDR, which is always a tiny thing.
So there's a respectable rationale for a linkage strategy to reject the linking of
an object file unless it is needed, but to tolerate the linking of a shared library
without need. Linking an unnecessary object file adds an arbitrary amount of dead
weight to the program, at a proportional burden on the linkage. But
if the linker doesn't have to prove that a shared library is needed, then it
can link it in a jiffy adding negligibly to the bulk of the program. And if the developer has chosen to add the shared library to the linkage, chances are good it will be needed. The RedHat
camp still thinks that rationale is good enough.
The Debian camp also has a respectable rationale of course. Yes, a Debian linkage
involves the extra effort of determining whether libfoo.so, when it is
reached, defines any symbol to which there is an unresolved reference at that
point. But by only linking shared libraries that are needed: -
At runtime, the loader is spared the wastage of either loading redundant
dependencies, or figuring out that they are redundant so as not to load them.
Package management with respect to runtime dependencies is eased if
redundant runtime dependencies are weeded out at linktime.
Developers, like you, do not get tripped up by the inconsistent linkage rules
for static and shared libraries! - a snag that's aggravated by the fact that
-lfoo in the linker commandline does not reveal whether it will resolve
to libfoo.so or libfoo.a.
There are thornier pros and cons for each side in the schism.
Now consider how the linker uses a static library, libabc.a - a list of object files a.o, b.o, c.o.
The as needed principle is applied, like this: When the linker arrives at libabc.a,
it has 0 or more unresolved symbol references in hand that it has carried
forward from the 0 more object files and shared libraries it has already linked
into the program. The linker's question is: Are there any object files in
this archive that provide definitions for any of these unresolved symbol references?
If there are 0 such unresolved references, then the answer is trivially No. So
there's no need to look in the archive. libabc.a is passed over. The linker moves
on to the next input. If it has got some unresolved symbol references in hand, then the
linker inspects the symbols that are defined by the object files in the archive. It
extracts just those object files - if any - that provide symbol definitions that it needs
3 and inputs those object files to the linkage, exactly as if they were individually
named in the commandline and libabc.a was not mentioned at all. Then it moves
it on to the next input, if any.
It's obvious how the as needed principle for static libraries implies DO. No
object file will be extracted from a static library and linked unless an unresolved
reference to some symbol that the object file defines has accrued from some
object file (or shared library) already linked.
Must static libraries be As Needed ?
In RedHat land, where DO is waived for shared libraries, what we do in
its absence is just link every shared library that is mentioned. And as we've
seen, this is tolerably cheap in linkage resource and program size. If we
also waived DO for static libraries, the equivalent strategy would
be to link every object file in every static library that is mentioned. But
that is exorbitantly costly, in linkage resource and program dead weight.
If we wanted to be free of DO for static libraries, but still not link
object files with no need, how could the linker proceed?
Maybe like this?:-
Link all of the object files that are explicitly mentioned into the program.
Link all the shared libraries mentioned.
See if there remain any unresolved references. If so, then -
Extract all of the object files from all of the static libraries that are mentioned
into a pool of optional object files.
Then carry on the linkage on an as needed basis against this pool of optional
object files, to success or failure.
But nothing like this will fly. The first definition of a symbol that the linker
sees is the one that's linked. This means that the order in which object files
are linked matters, even as between two different linkage orders that are both
successful.
Say that object files a.o, b.o have already been linked; unresolved references
remain and then the linker has a choice of optional object files c.o, d.o, e.o, f.o
to continue with.
There may be more than one ordering of c.o, d.o, e.o, f.o that
resolves all references and gives us a program. It might be the case that linking,
say, e.o first resolves all outstanding references and yields no new ones,
giving a program; while linking say c.o first also resolves all outstanding
references but produces some new ones, which require the linking of some or
all of d.o, e.o, f.o - depending on the order - with each possible linkage
resulting in yet another different program.
That's not all. There may be more that one ordering of c.o, d.o, e.o, f.o such that, after some object file is linked - point P - all
previously outstanding references are resolved, but where:-
Some of those orderings either produce no new references at point P or produce only references that some further linkage order can resolve.
Other ones produce new references at point P that no further linkage order can resolve.
So, whenever the linker discovers it has made a type 2 choice at some earlier point, it would need
to backtrack to that point and try one of the other choices that were then available, that it hasn't already tried,
and only conclude that the linkage fails when it has tried them all unsuccessfully.
Linking like this against a pool of N optional object files will take time proportional
to factorial N to fail.
Living with DO for static libraries as we do now, we specify object files and/or static libraries Ij in the
linker commandline in some order:
I0, I1, ... In
and this equates to an ordering of object files which for argument's sake might
ressemble:
O0, O1, [02,... O2+j], O2+j+1, [O2+j+2,... O2+j+k] ...
where [Oi...] is a sub-sequence of optional object files (i.e. a static library) that will be
available to the linker at that point.
Whether we know it not when we compose the commandline, we are asserting not just that this order is
a good DO ordering that can be linked to yield some program, but also that this
ordering yields the program that we intend.
We might be mistaken on the first count ( = linkage failure). We might even be
mistaken on the second ( = a mean linkage bug). But if we stop caring about the order of these
inputs and just leave it to the linker somehow to find a good DO over them, or prove that there isn't one,
then:
We have actually stopped caring about which program we will get, if any.
We have stopped caring about whether linkage will terminate in any feasible time.
This is not going to happen.
Couldn't we get a warning for broken DO?
In a comment you asked why the linker could not at least warn us if our
static object files and static libraries are not in DO.
That would be in addition to failing the linkage, as it does now. But to give us this
additional warning the linker would have to prove that the linkage failed
because the object files and static libraries are not in DO, and not just because
there are references in the linkage that nothing in the linkage defines. And it
could only prove that the linkage failed because of broken DO by proving that
some program can be linked by some permutation of the object files and static libraries.
That's a factorial-scale task, and we don't care if some program can be linked,
if the program we intend to link can't be: the linker has no information about
what program we intend to link except the inputs we give it, in the order that
we give them.
It would be easy make the linker (or more plausibly, the GCC frontends) emit a warning if any library was mentioned
before any object file in the commandline. But it would have some nuisance value, because
such a linkage isn't necessarily going to fail and might in fact be the intended
linkage. "Libraries after object files" is just pretty good guidance for routine
invocations of the linker via a GCC frontend. Worse, such a warning would only be practical for object files after libraries and not for cases of broken DO between libraries, so it would only do some of the job.
[1] My abbreviation.
[2] Also my abbreviation.
[3] More precisely, object file extraction from a static library is recursive.
The linker extracts any object files that define unresolved references it
already had in hand, or any new unresolved references that accrue while
linking object files extracted from the library.

When the linker loads the library static, it will see if any symbols from it are needed. It will use the symbols that are needed, and discard the rest. That of course means that if no symbols are needed from the library then all are discarded.
This is the reason that putting the library in front of the object files that depends on it will not work.
As a rule of thumb, always place libraries (even dynamic) at the end of the command line. And in order of dependencies. If module (object file or library) A depend on module B, always put A before B.

Related

Passing `-l<libname>` vs passing `lib<libname>.a` directly to linker?

Suppose I have two files
// a.c
int a() {return 1;}
// b.c
int a();
int b() {return a();}
and I compile them to a.o and b.o, respectively.
In an attempt to make an executable or shared library, one can call gcc a.o b.o -o libab.so -shared. But I also noticed that one can also call gcc b.o -L. -l:a.o -o libab.so -shared to generate (apparently) the same output. To my surprise, even running gcc a.o -L. -l:b.o -shared results in a library that has both a() and b(). (Shouldn't linker discard the unused library b.o since a.o does not depend on it?)
The latter two presumably pass a in as if a.o was a library. Now if I run ar rcs liba.a a.o, gcc b.o -L. -l:liba.a -shared and gcc b.o liba.a -shared both run without any problem and give the same output.
However, I have also seen case where this trick doesn't work and results undefined references. My question is therefore as the titles says: what are the differences between passing an object as a library and as a normal object file, and are there any differences when it comes to C++?
The problem arose in a much larger project. Sorry for lacking mcve because I can't seem to isolate the problem.
[How does] Passing -l<libname> vs [differ from] passing lib<libname>.a directly to linker?
Passing -llibname.so will make GNU linker traverse the library only once when searching for a symbol (when not after --whole-archive option). Specifying .a file directly to the linker makes it search for every symbol in all the object files inside the .a file for every symbol, not only once.
From the GCC Linker options (emphasis mine):
-llibrary
...
It makes a difference where in the command you write this option; the linker searches and processes libraries and object files in the order they are specified. Thus, ‘foo.o -lz bar.o’ searches library ‘z’ after file foo.o but before bar.o. If bar.o refers to functions in ‘z’, those functions may not be loaded.
From binutils ld options:
-l namespec
...
The linker will search an archive only once, at the location where it is specified on the command line. If the archive defines a symbol which was undefined in some object which appeared before the archive on the command line, the linker will include the appropriate file(s) from the archive. However, an undefined symbol in an object appearing later on the command line will not cause the linker to search the archive again.
what are the differences between passing an object as a library and as a normal object file, and are there any differences when it comes to C++?
That depends on the implementation. In the most general sense, Unix-style linkers such as you are asking about search for objects named via -l options in a library search path, whereas if you name a file directly, you must specify the exact file.
Additionally, if you use an -l option to specify a file to link then, in the general case, the linker constructs a filename from the argument by prepending "lib" and appending ".a", or in some other way, such as by searching also or instead for ".so" files. (The GNU linker that you appear to be using provides an exception to this behavior when the first character of the argument is :. In that case it takes the rest of the argument as an exact file name, and searches for that.)
Many linkers also accept explicit library names specified on the command line (e.g. libfoo.a instead of -lfoo), so these need to be able to determine what type of file each is. Normally this is by examining the file, not by relying on its name. And GNU ld, at least, extends this file type detection to files specified via -l options.
The order in which objects and libraries are specified on the command line, by whatever specific form, matters to typical linker implementations. For example, the docs for GNU ld specify that
options which refer to files, such as ‘-l’ or ‘-T’, cause the file to
be read at the point at which the option appears in the command line,
relative to the object files and other file options
which is important because
The linker will search an archive only once, at the location where it
is specified on the command line. If the archive defines a symbol
which was undefined in some object which appeared before the archive
on the command line, the linker will include the appropriate file(s)
from the archive. However, an undefined symbol in an object appearing
later on the command line will not cause the linker to search the
archive again.
But of course
You may list the same archive multiple times on the command line.
The docs are not altogether clear on this, but empirically, the use of the term "archive" in the above is significant. It is effectively only archive files -- static libraries -- to which the "searched only once" provision applies. To a first approximation, the relative order of different ordinary object files and shared libraries on the GNU linker's command line, no matter how designated, does not impact symbol resolution.
So yes, it does matter whether you specify regular object files or static archives or shared libraries to the (GNU) linker, and their order matters to some extent, but the manner in which you specify them does not matter.
I have also seen case where this trick doesn't work and results undefined references.
With the GNU linker, that will be because of genuinely missing libraries or objects, or because of an unsuitable order of static archives relative to other object files or archives. Some other linkers are more sensitive.
Short answers:
The -L and -l options provide a shortcut for locating library archives (and shared libraries). But once you've used -l to locate a library (in the standard locations, or in a location specified by -L), the reading of that library is identical to the way it would be read if you specified its filename (e.g. /lib/libx.a) at the same spot on the command line explicitly.
When you specify a single object (.o) file, the entire contents of that file get loaded unconditionally. When you specify a library archive (.a) file, only those objects within it that are necessary (to satisfy outstanding undefined references) are loaded.

Linking to shared and static libraries with c++ on a Linux system

I am messing around with a test project, lets call it mytest, it has a .cpp and a .h file, the contents are not really important - imagine it contains a few simple hello_world() type functions...
So, I was making a generic makefile to compile this into the various library outputs where an ls -l on my output folder gives:
libmytest.a
libmytest.so -> libmytest.so.1.0
libmytest.so.1 -> libmytest.so.1.0
libmytest.so.1.0
All good so far, my shared / static libraries are created.
Now I have a make install target in my make file, which basically copies the header to /usr/local/include and all of these library files to /usr/local/lib
Then I made another test cpp file called usertest.cpp (sorry for the not-very-imaginative/descriptive names), which links to the library files.
I compiled in various ways:
g++ -Wall -Werror -I. -lmytest
g++ -Wall -Werror -I. -lmytest -static
Then I deleted the libmytest.so* files so I only had the libmytest.a library file in /usr/local/lib Then I did the same test:
g++ -Wall -Werror -I. -lmytest
g++ -Wall -Werror -I. -lmytest -static
Finally I deleted the libmytest.a file and copied back the .so files so I only had the libmytest.so* library files in /usr/local/lib Then I did the same test:
g++ -Wall -Werror -I. -lmytest
g++ -Wall -Werror -I. -lmytest -static
The file size results(in bytes) are:
1. 7736 - Makes sense, all libs dynamically linked
2. 19674488 - Makes sense, all libs statically linked
3. 64908 - hmm... not really sure why
4. 19674488 - Makes sense, same as 2.
5. 7736 - Makes sense, same as 1.
6. failed - Makes sense, no .so files!
I have three files sizes, the small (7736) is fully dynamically linked. The large is statically linked.... what is this medium one (64908)? So I have questions:
for 1. I assume the system looks for .so libraries first and .a libraries second?
For 3. What happened here? - is it dynamically linking the system libs but when it sees my .a lib it dynamically links it?
Note all outputs run fine and call functions from the library.
For 1. I assume the system looks for .so libraries first and .a libraries second?
That's roughly right, but read on.
For 3. What happened here? - is it dynamically linking the system libs but when it sees my .a lib it dynamically links it?
A static library cannot be dynamically linked: it is statically linked. The shared ( = dynamic) system libraries are linked,
assuming that the system libraries that the linker finds and prefers are in fact shared libraries.
By default, the linkage option -lmytest directs the linker to search for an input file called libmytest.so (shared library)
or libmytest.a (static library), first in the search directories you have specified in the commandline with
the -Ldirname option, in the order specified, and then in its default search directories, in the configured order.
It stops searching when it finds either of those files in one of those directories. If it finds both of them in
the same directory then it selects the shared library, libmytest.so. The selected file, if any, is input to the linkage.
If the search is unsuccessful the linker gives an error: cannot find -lmytest.
This default behaviour can be changed by the option -static. If it appears anywhere in the commandline, the linker
ignores all shared libraries: then -lmytest can only be satisfied by finding libmytest.a, and static system libraries must also be found.
/usr/local/lib is one of the linker's default search directories. So when you execute:
g++ -Wall -Werror -I. -lmytest
in the scenario (3) where, /usr/local/lib/libmytest.a is found by the linker and /usr/local/lib/libmytest.so is not,
libmytest.a satisfies -lmytest and is input to the linkage. The linker's default preference for shared libraries is unaffected.
The contribution that the linkage of libmytest.a makes to the size of the executable is not obvious.
A static library - quite unlike a shared library - is not an ELF binary that the linker has produced. It is
ar archive of object files, produced by ar: it is a bag of files just that
happen to be object files.
By default, when an ar archive is input to the linker, it looks in the bag to find any object files that
provide definitions for any undefined symbol references that have accrued from object files already
linked into the output file (program or shared library) when the archive was inspected. If it finds any
such object files, it extracts them from the archive and links them into the output file, exactly as if they
had been individually listed in the commandline and the archive not mentioned at all. Except as a bag from which
object files may be selected, the archive contributes nothing to the linkage.
Thus, if there are N object files in libmytest.a, inputting that archive to a linkage might
contribute between 0 and N object files to the output file, depending on what undefined references into members of
that set of object files accrue earlier in the linkage, and which object files provide definitions for those
references.
And even if you know exactly which object files in libmytest.a will be required in your linkage, you cannot
conclude that the sum of their sizes will be added to the size of the output file. An object file is
partitioned into sections by the compiler, a section being the smallest unit of input and output that the linker
recognizes. By default the linker will retain an input section for output only if that section provides the linker's selected definition of some symbol that the linkage must define. If an input section is of no such use, the
linker will just discard it. So, even if an object file is linked, the linker might omit redundant sections
within it from the output file.
The behaviour of the -l | --library linker option is documented in 2.1 Command Line Options
of the GNU ld manual
Most probably libmytest.a is not the one, who plays major role in the binary size increase, but bigger standard libraries (that explains why the size didn't grow much in 3.).
You can investigate all the dynamic dependencies of your binary using ldd:
ldd a.out
(and which of them are disappearing after using -static).

linking a self-registering, abstract factory

I've been working with and testing a self-registering, abstract factory based upon the one described here:
https://stackoverflow.com/a/582456
In all my test cases, it works like a charm, and provides the features and reuse I wanted.
Linking in this factory in my project using cmake has been quite tricky (though it seems to be more of an ar problem).
I have the identical base.hpp, derivedb.hpp/cpp, and an equivalent deriveda.hpp/cpp to the example linked. In main, I simply instantiate the factory and call createInstance() twice, once each with "DerivedA" and "DerivedB".
The executable created by the line:
g++ -o testFactory main.cpp derivedb.o deriveda.o
works as expected. Moving my derived classes into a library (using cmake, but I have tested this with ar alone as well) and then linking fails:
ar cr libbase.a deriveda.o derivedb.o
g++ -o testFactory libbase.a main.cpp
only calls the first static instantiation (from derivedA.cpp) and never the second static instantiation, i.e.
// deriveda.cpp (if listed first in the "ar" line, this gets called)
DerivedRegister<DerivedA> DerivedA::reg("DerivedA");
// derivedb.cpp (if listed second in the "ar" line, this does not get called)
DerivedRegister<DerivedB> DerivedB::reg("DerivedB");
Note that swapping the two in the ar line calls only the derivedb.cpp static instantiation, and not the deriveda.cpp instantiation.
Am I missing something with ar or static libraries that somehow do not play nice with static variables in C++?
Contrary to intuition, including an archive in a link command is not the same as including all of the objects files that are in the archive. Only those object files within the archive necessary to resolve undefined symbols are included. This is a good thing if you consider that once there was no dynamic linking and otherwise the entirety of any libraries (think the C library) would be duplicated into each executable. Here's what the ld(1) manpage (GNU ld on linux) has to say:
The linker will search an archive only once, at the location where it is specified on the command line. If the archive defines a symbol which was undefined in some object which appeared before the archive on the command line, the linker will include the appropriate file(s) from the archive. However, an undefined symbol in an object appearing later on the command line will not cause the linker to search the archive again.
Unfortunately there's no standard way to include every member of an archive in the linked executable. On linux you can use g++ -Wl,-whole-archive and on Mac OS X you can use g++ -all_load.
So with GNU binutils ld, the link command should be
g++ -o testFactory -Wl,-whole-archive libbase.a -Wl,-no-whole-archive main.cpp
the -Wl,-no-whole-archive ensures that any archive appearing later in the final link command generated by g++ will be linked in the normal way.

Resolving circular dependencies by linking the same library twice?

We have a code base broken up into static libraries. Unfortunately, the libraries have circular dependencies; e.g., libfoo.a depends on libbar.a and vice-versa.
I know the "correct" way to handle this is to use the linker's --start-group and --end-group options, like so:
g++ -o myApp -Wl,--start-group -lfoo -lbar -Wl,--end-group
But in our existing Makefiles, the problem is typically handled like this:
g++ -o myApp -lfoo -lbar -lfoo
(Imagine this extended to ~20 libraries with complex interdependencies.)
I have been going through our Makefiles changing the second form to the first, but now my co-workers are asking me why... And other than "because it's cleaner" and a vague sense that the other form is risky, I do not have a good answer.
So, can linking the same library multiple times ever create a problem? For example, could the link fail with multiply-defined symbols if the same .o gets pulled in twice? Or is there any risk we could wind up with two copies of the same static object, creating subtle bugs?
Basically, I want to know if there is any possibility of link-time or run-time failures from linking the same library multiple times; and if so, how to trigger them. Thanks.
All I can offer is a lack of counter-example. I've actually never seen the first form before (even though it's clearly better) and always seen this solved with the second form, and haven't observed problems as a result.
Even so I would still suggest changing to the first form because it clearly shows the relationship between the libraries rather than relying on the linker behaving in a particular way.
That said, I would suggest at least considering if there's a possibility of refactoring the code to pull out the common pieces into additional libraries.
The problem with
g++ -o myApp -lfoo -lbar -lfoo
is that there is no guarantee, that two passes over libfoo and one pass over libbar are enough.
The approach with Wl,--start-group ... -Wl,--end-group is better, because more robust.
Consider the following scenario (all symbols are in different object-files):
myApp needs symbol fooA defined in libfoo.
Symbol fooA needs symbol barB defined in libbar.
Symbol barB needs symbol fooC defined in libfoo. This is the circular dependency, which can be handled by -lfoo -lbar -lfoo.
Symbol fooC needs symbol barD defined in libbar.
To be able to build in the case above, we would need to pass -lfoo -lbar -lfoo -lbar to the linker. Why?
The linker sees libfoo for the first time and uses definitions of symbol fooA but not fooC, because so far it doesn't see a necessity to include also fooC into the binary. The linker however starts to look for definition of barB, because its definition is needed for fooA to function.
The linker sees -libbar, includes the definition of barB (but not barD) and starts to look for definition of fooC.
The definition of fooC is found in libfoo, when it processed for the second time. Now it becomes evident, that also the definition of barD is needed - but too late there is no libbar on the command line anymore!
The example above can be extended to an arbitrary dependency depth (but this happens seldom in real life).
Thus using
g++ -o myApp -Wl,--start-group -lfoo -lbar -Wl,--end-group
is a more robust approach, because linker passes as often over the library group as needed - only when a pass didn't change the symbol table will the linker move on to the the next library on the command line.
There is however a small performance penalty to pay: in the first example -lbar were scanned once more compared with the manual command line -lfoo -lbar -lfoo. Not sure whether it is worth mentioning/thinking about through.
Since it is a legacy application, I bet the structure of the libraries is inherited from some arrangement which probably does not matter any more, such as being used to build another product which you no longer do.
Even if still structural reasons remain for the inherited library structure, almost certainly, it would still be acceptable to build one more library from the legacy arrangement. Just put all the modules from the 20 libraries into a new library, liballofthem.a. Then every single application is simply g++ -o myApp -lallofthem ...

What does static linking against a library actually do?

Say I had a library called libfoo which contained a class, a few static variables, possibly something with 'C' linkage, and a few other functions.
Now I have a main program which looks like this:
int main() {
return 5+5;
}
When I compile and link this, I link against libfoo.
Will this have any effect? Will my executable increase in size? If so, why? Do the static variables or their addresses get copied into my executable?
Apologies if there is a similar question to this or if I'm being particularly stupid in any way.
It won't do anything in a modern linker, because it knows the executable doesn't actually use libfoo's symbols. With gcc 4.4.1 and ld 2.20 on my system:
g++ linker_test.cpp -static -liberty -lm -lz -lXp -lXpm -o linker_test_unnecessary
g++ linker_test.cpp -static -o linker_test_none
ls -l linker_test_unnecessary linker_test_none
They are both 626094 bytes. Note this also applies to dynamic linking, though the size they both are is much lower.
A library contains previously compiled object code - basically a static library is an archive of .o or .obj files.
The linker looks at your object code and sees if there are any unresolved names and if so looks for these in the library, if it finds them it includes the object file that contains them and repeats this.
Thus only the parts of the static library that are needed are included in your executable.
Thus in your case nothing from libfoo will be added to you executable