Linking to shared and static libraries with c++ on a Linux system - c++

I am messing around with a test project, lets call it mytest, it has a .cpp and a .h file, the contents are not really important - imagine it contains a few simple hello_world() type functions...
So, I was making a generic makefile to compile this into the various library outputs where an ls -l on my output folder gives:
libmytest.a
libmytest.so -> libmytest.so.1.0
libmytest.so.1 -> libmytest.so.1.0
libmytest.so.1.0
All good so far, my shared / static libraries are created.
Now I have a make install target in my make file, which basically copies the header to /usr/local/include and all of these library files to /usr/local/lib
Then I made another test cpp file called usertest.cpp (sorry for the not-very-imaginative/descriptive names), which links to the library files.
I compiled in various ways:
g++ -Wall -Werror -I. -lmytest
g++ -Wall -Werror -I. -lmytest -static
Then I deleted the libmytest.so* files so I only had the libmytest.a library file in /usr/local/lib Then I did the same test:
g++ -Wall -Werror -I. -lmytest
g++ -Wall -Werror -I. -lmytest -static
Finally I deleted the libmytest.a file and copied back the .so files so I only had the libmytest.so* library files in /usr/local/lib Then I did the same test:
g++ -Wall -Werror -I. -lmytest
g++ -Wall -Werror -I. -lmytest -static
The file size results(in bytes) are:
1. 7736 - Makes sense, all libs dynamically linked
2. 19674488 - Makes sense, all libs statically linked
3. 64908 - hmm... not really sure why
4. 19674488 - Makes sense, same as 2.
5. 7736 - Makes sense, same as 1.
6. failed - Makes sense, no .so files!
I have three files sizes, the small (7736) is fully dynamically linked. The large is statically linked.... what is this medium one (64908)? So I have questions:
for 1. I assume the system looks for .so libraries first and .a libraries second?
For 3. What happened here? - is it dynamically linking the system libs but when it sees my .a lib it dynamically links it?
Note all outputs run fine and call functions from the library.

For 1. I assume the system looks for .so libraries first and .a libraries second?
That's roughly right, but read on.
For 3. What happened here? - is it dynamically linking the system libs but when it sees my .a lib it dynamically links it?
A static library cannot be dynamically linked: it is statically linked. The shared ( = dynamic) system libraries are linked,
assuming that the system libraries that the linker finds and prefers are in fact shared libraries.
By default, the linkage option -lmytest directs the linker to search for an input file called libmytest.so (shared library)
or libmytest.a (static library), first in the search directories you have specified in the commandline with
the -Ldirname option, in the order specified, and then in its default search directories, in the configured order.
It stops searching when it finds either of those files in one of those directories. If it finds both of them in
the same directory then it selects the shared library, libmytest.so. The selected file, if any, is input to the linkage.
If the search is unsuccessful the linker gives an error: cannot find -lmytest.
This default behaviour can be changed by the option -static. If it appears anywhere in the commandline, the linker
ignores all shared libraries: then -lmytest can only be satisfied by finding libmytest.a, and static system libraries must also be found.
/usr/local/lib is one of the linker's default search directories. So when you execute:
g++ -Wall -Werror -I. -lmytest
in the scenario (3) where, /usr/local/lib/libmytest.a is found by the linker and /usr/local/lib/libmytest.so is not,
libmytest.a satisfies -lmytest and is input to the linkage. The linker's default preference for shared libraries is unaffected.
The contribution that the linkage of libmytest.a makes to the size of the executable is not obvious.
A static library - quite unlike a shared library - is not an ELF binary that the linker has produced. It is
ar archive of object files, produced by ar: it is a bag of files just that
happen to be object files.
By default, when an ar archive is input to the linker, it looks in the bag to find any object files that
provide definitions for any undefined symbol references that have accrued from object files already
linked into the output file (program or shared library) when the archive was inspected. If it finds any
such object files, it extracts them from the archive and links them into the output file, exactly as if they
had been individually listed in the commandline and the archive not mentioned at all. Except as a bag from which
object files may be selected, the archive contributes nothing to the linkage.
Thus, if there are N object files in libmytest.a, inputting that archive to a linkage might
contribute between 0 and N object files to the output file, depending on what undefined references into members of
that set of object files accrue earlier in the linkage, and which object files provide definitions for those
references.
And even if you know exactly which object files in libmytest.a will be required in your linkage, you cannot
conclude that the sum of their sizes will be added to the size of the output file. An object file is
partitioned into sections by the compiler, a section being the smallest unit of input and output that the linker
recognizes. By default the linker will retain an input section for output only if that section provides the linker's selected definition of some symbol that the linkage must define. If an input section is of no such use, the
linker will just discard it. So, even if an object file is linked, the linker might omit redundant sections
within it from the output file.
The behaviour of the -l | --library linker option is documented in 2.1 Command Line Options
of the GNU ld manual

Most probably libmytest.a is not the one, who plays major role in the binary size increase, but bigger standard libraries (that explains why the size didn't grow much in 3.).
You can investigate all the dynamic dependencies of your binary using ldd:
ldd a.out
(and which of them are disappearing after using -static).

Related

Passing `-l<libname>` vs passing `lib<libname>.a` directly to linker?

Suppose I have two files
// a.c
int a() {return 1;}
// b.c
int a();
int b() {return a();}
and I compile them to a.o and b.o, respectively.
In an attempt to make an executable or shared library, one can call gcc a.o b.o -o libab.so -shared. But I also noticed that one can also call gcc b.o -L. -l:a.o -o libab.so -shared to generate (apparently) the same output. To my surprise, even running gcc a.o -L. -l:b.o -shared results in a library that has both a() and b(). (Shouldn't linker discard the unused library b.o since a.o does not depend on it?)
The latter two presumably pass a in as if a.o was a library. Now if I run ar rcs liba.a a.o, gcc b.o -L. -l:liba.a -shared and gcc b.o liba.a -shared both run without any problem and give the same output.
However, I have also seen case where this trick doesn't work and results undefined references. My question is therefore as the titles says: what are the differences between passing an object as a library and as a normal object file, and are there any differences when it comes to C++?
The problem arose in a much larger project. Sorry for lacking mcve because I can't seem to isolate the problem.
[How does] Passing -l<libname> vs [differ from] passing lib<libname>.a directly to linker?
Passing -llibname.so will make GNU linker traverse the library only once when searching for a symbol (when not after --whole-archive option). Specifying .a file directly to the linker makes it search for every symbol in all the object files inside the .a file for every symbol, not only once.
From the GCC Linker options (emphasis mine):
-llibrary
...
It makes a difference where in the command you write this option; the linker searches and processes libraries and object files in the order they are specified. Thus, ‘foo.o -lz bar.o’ searches library ‘z’ after file foo.o but before bar.o. If bar.o refers to functions in ‘z’, those functions may not be loaded.
From binutils ld options:
-l namespec
...
The linker will search an archive only once, at the location where it is specified on the command line. If the archive defines a symbol which was undefined in some object which appeared before the archive on the command line, the linker will include the appropriate file(s) from the archive. However, an undefined symbol in an object appearing later on the command line will not cause the linker to search the archive again.
what are the differences between passing an object as a library and as a normal object file, and are there any differences when it comes to C++?
That depends on the implementation. In the most general sense, Unix-style linkers such as you are asking about search for objects named via -l options in a library search path, whereas if you name a file directly, you must specify the exact file.
Additionally, if you use an -l option to specify a file to link then, in the general case, the linker constructs a filename from the argument by prepending "lib" and appending ".a", or in some other way, such as by searching also or instead for ".so" files. (The GNU linker that you appear to be using provides an exception to this behavior when the first character of the argument is :. In that case it takes the rest of the argument as an exact file name, and searches for that.)
Many linkers also accept explicit library names specified on the command line (e.g. libfoo.a instead of -lfoo), so these need to be able to determine what type of file each is. Normally this is by examining the file, not by relying on its name. And GNU ld, at least, extends this file type detection to files specified via -l options.
The order in which objects and libraries are specified on the command line, by whatever specific form, matters to typical linker implementations. For example, the docs for GNU ld specify that
options which refer to files, such as ‘-l’ or ‘-T’, cause the file to
be read at the point at which the option appears in the command line,
relative to the object files and other file options
which is important because
The linker will search an archive only once, at the location where it
is specified on the command line. If the archive defines a symbol
which was undefined in some object which appeared before the archive
on the command line, the linker will include the appropriate file(s)
from the archive. However, an undefined symbol in an object appearing
later on the command line will not cause the linker to search the
archive again.
But of course
You may list the same archive multiple times on the command line.
The docs are not altogether clear on this, but empirically, the use of the term "archive" in the above is significant. It is effectively only archive files -- static libraries -- to which the "searched only once" provision applies. To a first approximation, the relative order of different ordinary object files and shared libraries on the GNU linker's command line, no matter how designated, does not impact symbol resolution.
So yes, it does matter whether you specify regular object files or static archives or shared libraries to the (GNU) linker, and their order matters to some extent, but the manner in which you specify them does not matter.
I have also seen case where this trick doesn't work and results undefined references.
With the GNU linker, that will be because of genuinely missing libraries or objects, or because of an unsuitable order of static archives relative to other object files or archives. Some other linkers are more sensitive.
Short answers:
The -L and -l options provide a shortcut for locating library archives (and shared libraries). But once you've used -l to locate a library (in the standard locations, or in a location specified by -L), the reading of that library is identical to the way it would be read if you specified its filename (e.g. /lib/libx.a) at the same spot on the command line explicitly.
When you specify a single object (.o) file, the entire contents of that file get loaded unconditionally. When you specify a library archive (.a) file, only those objects within it that are necessary (to satisfy outstanding undefined references) are loaded.

Why order(e.g. source.cxx -lstatic) is enforced while linking with static library?

While linking with static library, why the order is enforced ?
g++ -ldynamic -lstatic src.cxx //ERROR
g++ -lstatic src.cxx -ldynamic //ERROR
g++ src.cxx -ldynamic -lstatic //SUCCESS
g++ -ldynamic src.cxx -lstatic //SUCCESS
Is there a technical reason why static library cannot be linked like dynamic libraries (at any order ) ?
Why linking libraries cannot be made generic (may be by mentioning while compiling/linking e.g. for static : -ls and for dynamic : -ld etc.) ?
The As Needed schism in Linux linkage
Your example:
g++ -ldynamic -lstatic src.cxx # ERROR
g++ -ldynamic src.cxx -lstatic # SUCCESS
indicates that your linux distro belongs to the RedHat clan. Let's just confirm
that, say on CentOS 7:
$ cat /proc/version
Linux version 3.10.0-693.el7.x86_64 (builder#kbuilder.dev.centos.org) \
(gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) \
#1 SMP Tue Aug 22 21:09:27 UTC 2017
$ cat foo.c
#include <stdio.h>
void foo(void)
{
puts(__func__);
}
$ cat bar.c
#include <stdio.h>
void bar(void)
{
puts(__func__);
}
$ cat main.c
extern void foo(void);
extern void bar(void);
int main(void)
{
foo();
bar();
return 0;
}
$ gcc -Wall -fPIC -c foo.c
$ gcc -shared -o libfoo.so foo.o
$ gcc -Wall -c bar.c
$ ar cr libbar.a bar.o
$ gcc -Wall -c main.c
$ gcc -o prog -L. -lfoo -lbar main.o -Wl,-rpath=$(pwd)
main.o: In function `main':
main.c:(.text+0xa): undefined reference to `bar'
collect2: error: ld returned 1 exit status
# :(
$ gcc -o prog -L. -lfoo main.o -lbar -Wl,-rpath=$(pwd)
$ # :)
$ ./prog
foo
bar
So you're right there.
Now let's check it out on a distro from the Debian clan:
$ cat /proc/version
Linux version 4.13.0-32-generic (buildd#lgw01-amd64-016) \
(gcc version 7.2.0 (Ubuntu 7.2.0-8ubuntu3)) \
#35-Ubuntu SMP Thu Jan 25 09:13:46 UTC 2018
Here, it all goes the same as far as:
$ gcc -o prog -L. -lfoo -lbar main.o -Wl,-rpath=$(pwd)
main.o: In function `main':
main.c:(.text+0x5): undefined reference to `foo'
main.c:(.text+0xa): undefined reference to `bar'
collect2: error: ld returned 1 exit status
when it gets different. Now the linkage can't resolve either foo - from the
shared library libfoo.so - or bar - from the static library libbar.a. And
to fix that we need:
$ gcc -o prog -L. main.o -lfoo -lbar -Wl,-rpath=$(pwd)
$ ./prog
foo
bar
with all the libraries mentioned after the object file(s) - main.o - that
reference the symbols they define.
The Centos-7 (RedHat) linkage behaviour is old-school. The Ubuntu 17.10 (Debian)
linkage behaviour was introduced in Debian 7, 2013, and trickled down
to the Debian-derived distros. As you see, it abolishes the distinction
between shared libraries and static libraries as regards the library needing,
or not needing, to appear in the linkage sequence after all input
files that reference it. They all must appear in dependency order (DO1),
shared libraries and static libraries alike.
This comes down to how the distro decides to build their version of the GCC
toolchain - how they choose the default options that get passed to the system
linker (ld) behind the scenes when it is invoked by one of the language
front-ends (gcc, g++, gfortran etc.) to execute a linkage for you.
Specifically, it comes down to whether the linker option --as-needed
is, or is not, inserted by default into the ld commandline before the libraries
are inserted.
If --as-needed is not in effect then a shared library libfoo.so is arrived at,
then it will be linked regardless of whether the linkage has so far accrued any
unresolved references to symbols that the shared library defines. In short,
it will be linked regardless of any proven need to link it. Maybe the further
progess of the linkage, to subsequent inputs, will give rise to unresolved references
that libfoo.so resolves, justifying its linkage. But maybe not. It gets linked
anyhow. That's the RedHat way.
If --as-needed is in effect when a libfoo.so is arrived at, then it
will be linked if and only if it exports a definition for at least one symbol
to which an unresolved reference has already accrued in the linkage, i.e.
there is a proven need to link it. It cannot end up linked if there is
no need to link it. That's the Debian way.
The RedHat way with shared library linkage was prevalent until Debian 7
broke ranks. But the linkage of static libraries has always conformed to the as needed principle
by default. There's no --as-needed option that applies to static libraries.
Instead there's the opposite, --whole-archive:
you need to use that to override the default behaviour and link object files from static libraries regardless of need.
So folks like you, in RedHat land, observe this puzzling difference: by default static libaries
have to be linked in DO; for shared libraries, any order will do by default.
Folks is Debian land see so such difference.
Wny so?
Since the Redhat way has this puzzling difference - a stumbling block for
the linkage efforts of the uninitiated - it's natural to ask why, historically,
it was as needed for static libraries, but not as needed for shared libraries,
as a matter of course, and why it still goes in RedHat land.
Simplifying grossly, the linker assembles a program (or shared library) by
incrementally populating sections and dynamic dependency records (DDRs2) in
a structure of sections and DDRs that starts off empty and
ends up being a binary file that the OS loader can parse and successfully map
into a process address space: e.g. an ELF executable or DSO. (Section
here is a genuine technical term. Dynamic dependency record isn't.
I've just coined now for convenience.)
Loosely speaking, the linker inputs that drive this process are object files,
shared libraries, or static libraries. But strictly speaking, they are either
object files or shared libraries. Because a static libary is simply
an ar archive of files that happen to be
object files. As far as the linker is concerned it is just a sequence of object
files that it might or might not need to use, archived together with a symbol table
by which the linker can query which of the object files, if any, defines a symbol.
When the linker arrives at an object file, that object file is always linked
into the program. The linker never asks whether it needs an object file
(whatever that might mean). Any object file is an unconditional source of linkage
needs, that further inputs have to satisfy.
When an object file is input, the linker has to dismantle it into
the input sections of which it composed and merge them into the output
sections in the program. When an input section S appears in one object
file, the chances are that a section S will appear in other object files;
maybe all of them. The linker has to stitch together all the input S sections
into a single output S section in the program, so it isn't finally done
with composing an output section S till the linkage is finished.
When a shared library libfoo.so is input to the linkage, the linker outputs
a DDR into the program (if it decides that the library is needed, or doesn't care). That is essentialy a memo that will be read at runtime by
the loader, telling it that libfoo.so is a dependency of the process that is
under construction; so it will locate libfoo.so by its standard search algorithm,
load it and map it into the process.
Consuming an object file is a relatively costly bit of linkage; consuming
a shared library is relatively cheap - especially if the linker does not
have to bother beforehand figuring out whether the shared library is needed.
The input/output section processing of an object file is typically more bother than writing out a DDR.
But more important than the effort, linking an object file typically makes the program signficantly
larger, and can make it arbitarily larger. Linking a shared library adds
only a DDR, which is always a tiny thing.
So there's a respectable rationale for a linkage strategy to reject the linking of
an object file unless it is needed, but to tolerate the linking of a shared library
without need. Linking an unnecessary object file adds an arbitrary amount of dead
weight to the program, at a proportional burden on the linkage. But
if the linker doesn't have to prove that a shared library is needed, then it
can link it in a jiffy adding negligibly to the bulk of the program. And if the developer has chosen to add the shared library to the linkage, chances are good it will be needed. The RedHat
camp still thinks that rationale is good enough.
The Debian camp also has a respectable rationale of course. Yes, a Debian linkage
involves the extra effort of determining whether libfoo.so, when it is
reached, defines any symbol to which there is an unresolved reference at that
point. But by only linking shared libraries that are needed: -
At runtime, the loader is spared the wastage of either loading redundant
dependencies, or figuring out that they are redundant so as not to load them.
Package management with respect to runtime dependencies is eased if
redundant runtime dependencies are weeded out at linktime.
Developers, like you, do not get tripped up by the inconsistent linkage rules
for static and shared libraries! - a snag that's aggravated by the fact that
-lfoo in the linker commandline does not reveal whether it will resolve
to libfoo.so or libfoo.a.
There are thornier pros and cons for each side in the schism.
Now consider how the linker uses a static library, libabc.a - a list of object files a.o, b.o, c.o.
The as needed principle is applied, like this: When the linker arrives at libabc.a,
it has 0 or more unresolved symbol references in hand that it has carried
forward from the 0 more object files and shared libraries it has already linked
into the program. The linker's question is: Are there any object files in
this archive that provide definitions for any of these unresolved symbol references?
If there are 0 such unresolved references, then the answer is trivially No. So
there's no need to look in the archive. libabc.a is passed over. The linker moves
on to the next input. If it has got some unresolved symbol references in hand, then the
linker inspects the symbols that are defined by the object files in the archive. It
extracts just those object files - if any - that provide symbol definitions that it needs
3 and inputs those object files to the linkage, exactly as if they were individually
named in the commandline and libabc.a was not mentioned at all. Then it moves
it on to the next input, if any.
It's obvious how the as needed principle for static libraries implies DO. No
object file will be extracted from a static library and linked unless an unresolved
reference to some symbol that the object file defines has accrued from some
object file (or shared library) already linked.
Must static libraries be As Needed ?
In RedHat land, where DO is waived for shared libraries, what we do in
its absence is just link every shared library that is mentioned. And as we've
seen, this is tolerably cheap in linkage resource and program size. If we
also waived DO for static libraries, the equivalent strategy would
be to link every object file in every static library that is mentioned. But
that is exorbitantly costly, in linkage resource and program dead weight.
If we wanted to be free of DO for static libraries, but still not link
object files with no need, how could the linker proceed?
Maybe like this?:-
Link all of the object files that are explicitly mentioned into the program.
Link all the shared libraries mentioned.
See if there remain any unresolved references. If so, then -
Extract all of the object files from all of the static libraries that are mentioned
into a pool of optional object files.
Then carry on the linkage on an as needed basis against this pool of optional
object files, to success or failure.
But nothing like this will fly. The first definition of a symbol that the linker
sees is the one that's linked. This means that the order in which object files
are linked matters, even as between two different linkage orders that are both
successful.
Say that object files a.o, b.o have already been linked; unresolved references
remain and then the linker has a choice of optional object files c.o, d.o, e.o, f.o
to continue with.
There may be more than one ordering of c.o, d.o, e.o, f.o that
resolves all references and gives us a program. It might be the case that linking,
say, e.o first resolves all outstanding references and yields no new ones,
giving a program; while linking say c.o first also resolves all outstanding
references but produces some new ones, which require the linking of some or
all of d.o, e.o, f.o - depending on the order - with each possible linkage
resulting in yet another different program.
That's not all. There may be more that one ordering of c.o, d.o, e.o, f.o such that, after some object file is linked - point P - all
previously outstanding references are resolved, but where:-
Some of those orderings either produce no new references at point P or produce only references that some further linkage order can resolve.
Other ones produce new references at point P that no further linkage order can resolve.
So, whenever the linker discovers it has made a type 2 choice at some earlier point, it would need
to backtrack to that point and try one of the other choices that were then available, that it hasn't already tried,
and only conclude that the linkage fails when it has tried them all unsuccessfully.
Linking like this against a pool of N optional object files will take time proportional
to factorial N to fail.
Living with DO for static libraries as we do now, we specify object files and/or static libraries Ij in the
linker commandline in some order:
I0, I1, ... In
and this equates to an ordering of object files which for argument's sake might
ressemble:
O0, O1, [02,... O2+j], O2+j+1, [O2+j+2,... O2+j+k] ...
where [Oi...] is a sub-sequence of optional object files (i.e. a static library) that will be
available to the linker at that point.
Whether we know it not when we compose the commandline, we are asserting not just that this order is
a good DO ordering that can be linked to yield some program, but also that this
ordering yields the program that we intend.
We might be mistaken on the first count ( = linkage failure). We might even be
mistaken on the second ( = a mean linkage bug). But if we stop caring about the order of these
inputs and just leave it to the linker somehow to find a good DO over them, or prove that there isn't one,
then:
We have actually stopped caring about which program we will get, if any.
We have stopped caring about whether linkage will terminate in any feasible time.
This is not going to happen.
Couldn't we get a warning for broken DO?
In a comment you asked why the linker could not at least warn us if our
static object files and static libraries are not in DO.
That would be in addition to failing the linkage, as it does now. But to give us this
additional warning the linker would have to prove that the linkage failed
because the object files and static libraries are not in DO, and not just because
there are references in the linkage that nothing in the linkage defines. And it
could only prove that the linkage failed because of broken DO by proving that
some program can be linked by some permutation of the object files and static libraries.
That's a factorial-scale task, and we don't care if some program can be linked,
if the program we intend to link can't be: the linker has no information about
what program we intend to link except the inputs we give it, in the order that
we give them.
It would be easy make the linker (or more plausibly, the GCC frontends) emit a warning if any library was mentioned
before any object file in the commandline. But it would have some nuisance value, because
such a linkage isn't necessarily going to fail and might in fact be the intended
linkage. "Libraries after object files" is just pretty good guidance for routine
invocations of the linker via a GCC frontend. Worse, such a warning would only be practical for object files after libraries and not for cases of broken DO between libraries, so it would only do some of the job.
[1] My abbreviation.
[2] Also my abbreviation.
[3] More precisely, object file extraction from a static library is recursive.
The linker extracts any object files that define unresolved references it
already had in hand, or any new unresolved references that accrue while
linking object files extracted from the library.
When the linker loads the library static, it will see if any symbols from it are needed. It will use the symbols that are needed, and discard the rest. That of course means that if no symbols are needed from the library then all are discarded.
This is the reason that putting the library in front of the object files that depends on it will not work.
As a rule of thumb, always place libraries (even dynamic) at the end of the command line. And in order of dependencies. If module (object file or library) A depend on module B, always put A before B.

Static and Dynamic/Shared Linking with MinGW

I want to start with a simple linking usage to explain my problem. Lets assume that there is a library z which could be compiled to shared library libz.dll(D:/libs/z/shared/libz.dll) or to static library libz.a (D:/libs/z/static/libz.a).
Let I want to link against it, then I do this:
gcc -o main.exe main.o -LD:/libs/z/static -lz
According to this documentation, gcc would search for libz.a, which is
archive files whose members are object files
I also can do the following:
gcc -o main.exe main.o -LD:/libs/z/shared -lz
It is not mentioned in the documentation above that -l flag will search for lib<name>.so.
What will happen if I libz.a and libz.dll will be in the same directory? How the library will be linked with a program? Why I need the flags -Wl,-Bstatic and -Wl,-Bdynamic if -l searches both for shared and static libraries?
Why some developers provide .a files with .dll files for the same modules, if I compile a shared library distribution?
For example, Qt provides .dll files in bin directory with .a files in lib directory. Is it the same library, but built like shared and static, respectively? Or .a files are some kind of dummy libraries which provide linking with shared libraries, where there are real library implementations?
Another example is OpenGL library on Windows. Why every compiler must provide the static OpenGL lib like libopengl32.a in MingW?
What are files with .dll.a and .la extensions used for?
P.S. There are a lot of questions here, but I think each one depends on the previous one and there is no need to split them into several questions.
Please, have a look at ld and WIN32 (cygwin/mingw). Especially, the direct linking to a dll section for more information on the behavior of -l flag on Windows ports of LD. Extract:
For instance, when ld is called with the argument -lxxx it will attempt to find, in the first directory of its search path,
libxxx.dll.a
xxx.dll.a
libxxx.a
cygxxx.dll (*)
libxxx.dll
xxx.dll
before moving on to the next directory in the search path.
(*) Actually, this is not cygxxx.dll but in fact is <prefix>xxx.dll, where <prefix> is set by the ld option -dll-search-prefix=<prefix>. In the case of cygwin, the standard gcc spec file includes -dll-search-prefix=cyg, so in effect we actually search for cygxxx.dll.
NOTE: If you have ever built Boost with MinGW, you probably recall that the naming of Boost libraries exactly obeys the pattern described in the link above.
In the past there were issues in MinGW with direct linking to *.dll, so it was advised to create a static library lib*.a with exported symbols from *.dll and link against it instead. The link to this MinGW wiki page is now dead, so I assume that it should be fine to link directly against *.dll now. Furthermore, I did it myself several times with the latest MinGW-w64 distribution, and had no issues, yet.
You need link flags -Wl,-Bstatic and -Wl,-Bdynamic because sometimes you want to force static linking, for example, when the dynamic library with the same name is also present in a search path:
gcc object1.o object2.o -lMyLib2 -Wl,-Bstatic -lMyLib1 -Wl,-Bdynamic -o output
The above snippet guarantees that the default linking priority of -l flag is overridden for MyLib1, i.e. even if MyLib1.dll is present in the search path, LD will choose libMyLib1.a to link against. Notice that for MyLib2 LD will again prefer the dynamic version.
NOTE: If MyLib2 depends on MyLib1, then MyLib1 is dynamically linked too, regardless of -Wl,-Bstatic (i.e. it is ignored in this case). To prevent this you would have to link MyLib2 statically too.

linking a self-registering, abstract factory

I've been working with and testing a self-registering, abstract factory based upon the one described here:
https://stackoverflow.com/a/582456
In all my test cases, it works like a charm, and provides the features and reuse I wanted.
Linking in this factory in my project using cmake has been quite tricky (though it seems to be more of an ar problem).
I have the identical base.hpp, derivedb.hpp/cpp, and an equivalent deriveda.hpp/cpp to the example linked. In main, I simply instantiate the factory and call createInstance() twice, once each with "DerivedA" and "DerivedB".
The executable created by the line:
g++ -o testFactory main.cpp derivedb.o deriveda.o
works as expected. Moving my derived classes into a library (using cmake, but I have tested this with ar alone as well) and then linking fails:
ar cr libbase.a deriveda.o derivedb.o
g++ -o testFactory libbase.a main.cpp
only calls the first static instantiation (from derivedA.cpp) and never the second static instantiation, i.e.
// deriveda.cpp (if listed first in the "ar" line, this gets called)
DerivedRegister<DerivedA> DerivedA::reg("DerivedA");
// derivedb.cpp (if listed second in the "ar" line, this does not get called)
DerivedRegister<DerivedB> DerivedB::reg("DerivedB");
Note that swapping the two in the ar line calls only the derivedb.cpp static instantiation, and not the deriveda.cpp instantiation.
Am I missing something with ar or static libraries that somehow do not play nice with static variables in C++?
Contrary to intuition, including an archive in a link command is not the same as including all of the objects files that are in the archive. Only those object files within the archive necessary to resolve undefined symbols are included. This is a good thing if you consider that once there was no dynamic linking and otherwise the entirety of any libraries (think the C library) would be duplicated into each executable. Here's what the ld(1) manpage (GNU ld on linux) has to say:
The linker will search an archive only once, at the location where it is specified on the command line. If the archive defines a symbol which was undefined in some object which appeared before the archive on the command line, the linker will include the appropriate file(s) from the archive. However, an undefined symbol in an object appearing later on the command line will not cause the linker to search the archive again.
Unfortunately there's no standard way to include every member of an archive in the linked executable. On linux you can use g++ -Wl,-whole-archive and on Mac OS X you can use g++ -all_load.
So with GNU binutils ld, the link command should be
g++ -o testFactory -Wl,-whole-archive libbase.a -Wl,-no-whole-archive main.cpp
the -Wl,-no-whole-archive ensures that any archive appearing later in the final link command generated by g++ will be linked in the normal way.

What does static linking against a library actually do?

Say I had a library called libfoo which contained a class, a few static variables, possibly something with 'C' linkage, and a few other functions.
Now I have a main program which looks like this:
int main() {
return 5+5;
}
When I compile and link this, I link against libfoo.
Will this have any effect? Will my executable increase in size? If so, why? Do the static variables or their addresses get copied into my executable?
Apologies if there is a similar question to this or if I'm being particularly stupid in any way.
It won't do anything in a modern linker, because it knows the executable doesn't actually use libfoo's symbols. With gcc 4.4.1 and ld 2.20 on my system:
g++ linker_test.cpp -static -liberty -lm -lz -lXp -lXpm -o linker_test_unnecessary
g++ linker_test.cpp -static -o linker_test_none
ls -l linker_test_unnecessary linker_test_none
They are both 626094 bytes. Note this also applies to dynamic linking, though the size they both are is much lower.
A library contains previously compiled object code - basically a static library is an archive of .o or .obj files.
The linker looks at your object code and sees if there are any unresolved names and if so looks for these in the library, if it finds them it includes the object file that contains them and repeats this.
Thus only the parts of the static library that are needed are included in your executable.
Thus in your case nothing from libfoo will be added to you executable