Can I have the gcc linker create a static library? - c++

I have a library consisting of some 300 c++ files.
The program that consumes the library does not want to dynamically link to it. (For various reasons, but the best one is that some of the supported platforms do not support dynamic linking)
Then I use g++ and ar to create a static library (.a), this file contains all symbols of all those files, including ones that the library doesn't want to export.
I suspect linking the consuming program with this library takes an unnecessary long time, as all the .o files inside the .a still need to have their references resolved, and the linker has more symbols to process.
When creating a dynamic library (.dylib / .so) you can actually use a linker, which can resolve all intra-lib symbols, and export only those that the library wants to export. The result however can only be "linked" into the consuming program at runtime.
I would like to somehow get the benefits of dynamic linking, but use a static library.
If my google searches are correct in thinking this is indeed not possible, I would love to understand why this is not possible, as it seems like something that many c and c++ programs could benefit from.

Static libraries are just archives (hence ".a"), a collection of .o files. Like a tar archive, just even more plain. Since ar is not a linker, there is no conglomeration (as "ld -r" would do) and thus no intralibrary symbol elimination.
That's why shared libraries were invented in the first place, and they are pretty common now, so people just ignore the drawbacks of static libraries. They simply go by "it compiles? ship it.".

I haven't tried or tested this, but it looks like ld's ability to perform incremental or partial linking might be what you're looking for. Check if the --relocatable option (you might also need to look at the -Ur option if dealing with C++) when applied to the object files that would go into the library will do what you want.
I think you should then be able to use the output of that operation as an object file (or have it in a static library itself) for your program's final link step.

Related

How to create a static library which includes another static library

I have a C++ project called testlib.pro (using Qtcreator) which will create a static library libtest.a.
The project is also included staticlib.a (example) and staticlib1.a. i.e using someother static libraries im creating one static library. After creating static library, Im creating C wrapper (testApi.c) to use the C++ code using the static library. I am compiling the testApi.c using the below option
gcc -o demo testApi.c -L ./testlib -ltest
But it is giving linker errors which stats that it requires the static libraries which i used to link the libtest.a. So i recompile the program with below comment and it works fine
gcc -o demo testApi.c -L ./testlib -ltest -lstatuclib -lstaticlib1
My understanding is If I ship the libtest.a to someother machine and try to compile testApi.c file it may requires staticlib.a and staticlib1.a in that machine. But I would like to use only newly created static library libtest.a. Am i missing any?
NOTE: I have included staticlib.a, staticlib1.a using -l option in my testlib.pro
If your library uses other static libraries then they will also need to be provided at link time.
There is an ugly way around it (on Linux). You can unpack existing static library (or several of them) and repack them into a new library. So you could, if you felt particularly frisky, unpack those libraries that yours depends on, then pack their contents along with your own stuff into a new library. Ugly, confusing, possibly causing all sorts of other problems, but if that is the way you want to go...
The static library concept is archaic in nature. When a program had a lot of modules, it was sometimes impossible to put them all in the command line for the linker to add them to the program. Also, for libraries, as the .o modules where all being included in the final executable, there had to be some mechanism to allow the linker select only the needed modules and not to include all of them in the final executable, or the executables will grow a lot including modules that the program will not use. Both things where solved with the introduction of dynamic shared objects, so using .a files is somewhat deprecated today and it is only used for statically linking programs.
Anyway, the algorithm to select the object modules in the linker is not recursive, so when it opens a .a library to search for dependent files to be included in the final executable, it searches only for .o (and probably .so, but I have not tested this), and it will ignore any .a file it finds in there. Many systems include an index file in the archive that has a mapping between provided identifiers, and the name of the module that provides them, so in one pass the compiler knows which archived objects need to be extracted. That index file should be appended (and rebuilt) in case a library (with its own index) where included in the file, so this justifies not using recursion at all in the library search.
The solution for this problem, is to link all those libraries you need to make the final executable, or as you have already been told, to extract the .o files in the library and put them in another library. There is still a third solution, that is: The linker allows you to specify a file that has options (and you can specify library names, and .o files you want it to scan) and it will read that file to check the set of libraries you want it to scan.
Another point is that the linker never includes a library as such. A library is just an archive (like a .tar or .zip file) in which the linker explores and extracts the files it needs, so there's no need to make the search algorithm recursive at all. And there's no difference between an archived file in a library and that same file out of the archive.

C/C++: What is the difference between a statically-linked library and an object file?

I understand that code included in an executable at compile-time can come from object files (.o files) and statically-linked libraries (.lib/.a files). What is fundamentally and conceptually the difference between these two? Why is there a different concept between "object code" and a "statically-linked library"? What are the advantages and disadvantages to each, and why use one as opposed to the other? Can statically-linked library(ies) be made from object file(s), and vise-versa, can object file(s) be made from statically-linked library(ies)?
Object files are compiled but unlinked code. Libraries contain object files. Thus your question becomes, "Why use statically-linked libs if I can just use object files?" Here's why.
Unlike a collection of objects, each of which have their own symbol tables, a library has a single, unified symbol table, created when ar is called by the library developer using the s switch. s calls ranlib to create a unified symbol table for all objects in that archive.
Running ranlib in shell shows in the first line of help text:
Generate an index to speed access to archives.
And from the generic ranlib docs:
An archive with such an index speeds up linking to the library and
allows routines in the library to call each other without regard to
their placement in the archive. T
See also the FreeBSD ranlib docs - different wording, same idea: Speed of linkage.
A library is simply a file containing many object files, which can be searched to resolve symbols.
So typically, when you link objects together, you get all the objects in one executable (though some optimising linkers can throw out unused ones).
When you give a library to the linker, it examines each of the object files within it and brings in those that are needed to satisfy unresolved symbols (and will probably continue to bring them in until either all symbols are resolved or no more can be).
It's just a way of efficiently packaging up a lot of objects into a single file so that the linker can do more of your work - you don't have to worry about which objects you need.
If you think of the C library, you may have a printf.o, puts.o, fopen.o as a result of keeping your source well separated. You don't want the user to have to explicitly list every single object file they want so you package the whole lot up into libc.a and tell them they just need to link with that single file.
The statically-linked bit is irrelevant here, it just decides that the objects should go into the executable at link time rather than being dynamically loaded at run time. It's explained here.

How static libraries work? (C/C++)

I know how to use and create them, but I can't find a text about how it is implemented, how function call happens and so on, can someone help me with that information? Because I want to understand them, but not just know what is it and how it is working
As you may know, when you compile a source file you get an object file. Depending on your platform its extension may be .o or .obj or anything else. A static library is basically a collection of object files, kind of like a .zip file but probably not compressed. The linker, when trying to generate an executable tries to resolve the referenced symbols, i.e. locate in which object file (be it in a library or otherwise) they are defined and links them together. So, a static library may also contain an index of defined symbols in order to facilitate this. Exact implementation depends on the specific linker and library file format but the basic architecture is as mentioned.
You may want to check the italicized keywords in Wikipedia or something for more info on them.
I think wikipedia explains it well:
In computer science, a static library or statically-linked library is
a set of routines, external functions and variables which are resolved
in a caller at compile-time and copied into a target application by a
compiler, linker, or binder, producing an object file and a
stand-alone executable. This executable and the process of compiling
it are both known as a static build of the program. Historically,
libraries could only be static. Static libraries are either merged
with other static libraries and object files during building/linking
to form a single executable, or they may be loaded at run-time into
the address space of the loaded executable at a static memory offset
determined at compile-time/link-time.
A static library is purely a collection of .o files, put together in an archive that's something like a zip file (with no compression). When you use it for linking, the linker will search the library for .o files that provide any of the missing symbols in the main program, and pull in those .o files for linking, as if they had been included on the command line like .o files in your main program. This process is applied recursively, so if any of the .o files pulled in from the library have unresolved symbols, the library is searched again for other .o files that provide the definitions.

Why does the C++ linker require the library files during a build, even though I am dynamically linking?

I have a C++ executable and I'm dynamically linking against several libraries (Boost, Xerces-c and custom libs).
I understand why I would require the .lib/.a files if I choose to statically link against these libraries (relevant SO question here). However, why do I need to provide the corresponding .lib/.so library files when linking my executable if I'm dynamically linking against these external libraries?
The compiler isn't aware of dynamic linking, it just knows that a function exists via its prototype. The linker needs the lib files to resolve the symbol. The lib for a DLL contains additional information like what DLL the functions live in and how they are exported (by name, by ordinal, etc.) The lib files for DLL's contain much less information than lib files that contain the full object code - libcmmt.lib on my system is 19.2 MB, but msvcrt.lib is "only" 2.6 MB.
Note that this compile/link model is nearly 40 years old at this point, and predates dynamic linking on most platforms. If it were designed today, dynamic linking would be a first class citizen (for instance, in .NET, each assembly has rich metadata describing exactly what it exports, so you don't need separate headers and libs.)
Raymond Chen wrote a couple blog entries about this specific to Windows. Start with The classical model for linking and then follow-up with Why do we have import libraries anyway?.
To summarize, history has defined the compiler as the component that knows about detailed type information, whereas the linker only knows about symbol names. So the linker ends up creating the .DLL without type information, and therefore programs that want to link with it need some sort of metadata to tell it about how the functions are exported and what parameter types they take and return.
The reason .DLLs don't have all the information you need to link with them directly is is historic, and not a technical limitation.
For one thing, the linker inserts the versions of the libraries that exist at link time so that you have some chance of your program working if library versions are updated. Multiple versions of shared libraries can exist on a system.
The linker has the job of validating that all your undefined symbols are accounted for, either with static content or dynamic content.
By default, then, it insists on all your symbols being present.
However, that's just the default. See -z, and --allow-shlib-undefined, and friends.
Perhaps this dynamic linking is done via import libraries (function has __declspec(dllimport) before definition).
If this is the way than compilator expects that there's __imp_symbol function declared and this function is responsible for forwarding call to the right library dynamically loaded.
Those functions are generated during linkage of symbols with __declspec(dllimport) keyword
Here is a very SIMPLIFIED description that may help. Static linking puts all of the code needed to run your program into the executable so everything is found. Dynamic linking means some of the required code does not get put into the executable and will be found at runtime. Where do I find it? Is function x() there? How do I make a call to function x()? That is what the library tells the linker when you are dynamically linking.

Dynamic and Static Libraries in C++

In my quest to learn C++, I have come across dynamic and static libraries.
I generally get the gist of them: compiled code to include into other programs.
However, I would like to know a few things about them:
Is writing them any different than a normal C++ program, minus the main() function?
How does the compiled program get to be a library? It's obviously not an executable, so how do I turn, say 'test.cpp' into 'test.dll'?
Once I get it to its format, how do I include it in another program?
Is there a standard place to put them, so that whatever compilers/linkers need them can find them easily?
What is the difference (technically and practically) between a dynamic and static library?
How would I use third party libraries in my code (I'm staring at .dylib and .a files for the MySql C++ Connector)
Everything I have found relating to libraries seems to be targeting those who already know how to use them. I, however, don't. (But would like to!)
Thanks!
(I should also note I'm using Mac OS X, and although would prefer to remain IDE-neutral or command-line oriented, I use QtCreator/Netbeans)
Is writing them any different than a normal C++ program, minus the main() function?
No.
How does the compiled program get to be a library? It's obviously not an executable, so how do I turn, say 'test.cpp' into 'test.dll'?
Pass the -dynamiclib flag when you're compiling. (The name of the result is still by default a.out. On Mac OS X you should name your dynamic libraries as lib***.dylib, and on Linux, lib***.so (shared objects))
Once I get it to its format, how do I include it in another program?
First, make a header file so the the other program can #include to know what functions can be used in your dylib.
Second, link to your dylib. If your dylib is named as libblah.dylib, you pass the -lblah flag to gcc.
Is there a standard place to put them, so that whatever compilers/linkers need them can find them easily?
/usr/lib or /usr/local/lib.
What is the difference (technically and practically) between a dynamic and static library?
Basically, for a static lib, the whole library is embedded into the file it "links" to.
How would I use third party libraries in my code (I'm staring at .dylib and .a files for the MySql C++ Connector)
See the 3rd answer.
Is writing them any different than a normal C++ program, minus the main() function?
Except for the obvious difference that a library provides services for other programs to use, usually (*) there isn't a difference.
* in gcc classes/functions are exported by default - this isn't the case in VC++, there you have to explicitly export using __declspec(export).
How does the compiled program get to be a library? It's obviously not an executable, so how do I turn, say 'test.cpp' into 'test.dll'?
This depends on your compiler. In Visual Studio you specify this in your project configuration. In gcc to create a static library you compile your code normally and then package it in an archive using ar. To create a shared you compile first (with the -fpic flag to enable position independent code generation, a requirement for shared libraries), then use the -shared flag on the object files. More info can be found in the man pages.
Once I get it to its format, how do I include it in another program?
Again this is a little compiler-dependant. In VS, if it's a shared library, when including the class/function you wish to use it should be marked with a __declspec(import) (this is usually done with ifdefs) and you have to specify the .lib file of the shared library for linkage. For a static library you only have to specify the .lib file (no export/import needed since the code will end up in your executable).
In gcc you only need to specify the library which you link against using -llibrary_name.
In both cases you will need to provide your client some header files with the functions/classes that are intended for public use.
Is there a standard place to put them, so that whatever compilers/linkers need them can find them easily?
If it's your own library then it's up to you. Usually you can specify the linker additional folders to look in. We have a lib folder in our source tree where all .lib (or .a/.so) files end up and we add that folder to the additional folder to look in.
If you're shipping a library on UNIX the common place is usually /usr/lib (or /usr/local/lib), this is also where gcc searches in by default.
What is the difference (technically and practically) between a dynamic and static library?
When you link a program to static libraries the code of the libraries ends up in your executable. Practically this makes your executable larger and makes it harder to update/fix a static library for obvious reasons (requires a new version of your executable).
Shared libraries are separate from your executable and are referenced by your program and (usually) loaded at runtime when needed.
It's also possible to load shared libraries without linking to them. It requires more work since you have to manually load the shared library and any symbol you wish to use. On Windows this is done using LoadLibrary/GetProcAddress and on POSIX systems using dlsym/dlopen.
How would I use third party libraries in my code?
This is usually accomplished by including the necessary header files and linking with the appropriate library.
A simple example to link with a static library foo would look like this: gcc main.cpp -o main.o -L/folder/where/foo.a/is/at -lfoo.
Most open source projects have a readme that gives more detailed instructions, I'd suggest to take a look at it if there is one.
Is writing [libraries] any different than a normal C++ program, minus the main() function?
That depends on your definition of "different." From the language's point of view, you write a file or collection of files, don't put in a main() and you tell the compiler to generate a library instead of an executable.
However, designing libraries is much harder because you have no control over the code that calls you. Libraries need to be more robust against failure than normal code. You can't necessarily delete pointers somebody passes to your function. You can't tell what macros will mess with your code. You also can't accidentally pollute the global namespace (eg., don't put using namespace std at the beginning of your header files).
How does the compiled program get to be a library? It's obviously not an executable, so how do I turn, say 'test.cpp' into 'test.dll'?
That depends on the compiler. In Visual C++ this is a project config setting. In gcc (going from memory) it's something like gcc -c foo.c -shared.
Once I get it to its format, how do I include it in another program?
That depends on your compiler and linker. You make sure the header files are available via a project setting or environment variable, and you make sure the binaries are available via a different project setting or compiler variable.
Is there a standard place to put them, so that whatever compilers/linkers need them can find them easily?
That depends on the operating system. In UNIX you're going to put things in places like /usr/lib, /usr/local/lib. On Windows people used to put DLLs in places like C:\WINDOWS but that's no longer allowed. Instead you put it in your program directory.
What is the difference (technically and practically) between a dynamic and static library?
Static libraries are the easier, original model. At compile time the linker puts all the functions from the library into your executable. You can ship the executable without the library, because the library is baked in.
Dynamic libraries (also called shared libraries) involve the compiler putting enough information in the executable that at runtime the linker will be able to find the correct libraries and call the methods in there. The libraries are shared across the whole system among the programs that use them. Using dynamic linking (dlsym(), et. al.) adds a few details to the picture.
How would I use third party libraries in my code (I'm staring at .dylib and .a files for the MySql C++ Connector)
That's going to depend on your platform, and unfortunately I can't tell you much about .dylib files. .a files are static libraries, and you simply need to add them to your final call to gcc (gcc main.c foo.a -o main if you know where foo.a is, or gcc main.c -lfoo -o main if the system knows where foo.a, foo.la, or foo.so are). Generally you make sure the compiler can find the library and leave the linker to do the rest.
The difference between a static and dynamic library is that the linking is done at compile time for static libraries, embedding the executable code into your binary, while for dynamic libraries linking is done dynamically at program start. The advantages are that the libraris can be separately distributed, updated and the code (memory) can be shared among several programs.
To use a library you simply provide -l to g++ for a lib.a or lib.so
I'm writing this to be more pragmatic than technically correct. It's enough to give you the general idea of what you're after.
Is writing them any different than a normal C++ program, minus the main() function?
For a static library, there's really not much difference.
For a dynamic library, the most likely difference you'll need to be aware of is that you may need to export the symbols you want to be available outside your library. Basically everything you don't export is invisible to users of your library. Exactly how you export, and whether you even need to by default, depends on your compiler.
For a dynamic library you also need to have all symbols resolved, which means the library can't depend on a function or variable that comes from outside the library. If my library uses a function called foo(), I need to include foo() in my library by writing it myself or by linking to another library that supplies it. I can't use foo() and just assume the user of my library will supply it. The linker won't know how to call a foo() that doesn't yet exist.
How does the compiled program get to be a library? It's obviously not an executable, so how do I turn, say 'test.cpp' into 'test.dll'?
It's similar to how you turn test.cpp into test.exe - compile and link. You pass options to the compiler to tell it whether to create an executable, a static library, or a dynamic library.
Once I get it to its format, how do I include it in another program?
In your source code, you include header files necessary to use the library, much as you would include a header file for code that's not in a library. You'll also need to include the library on your link line, telling the linker where to find the library. For many systems, creating a dynamic library generates two files, the shared library and a link library. It's the link library that you include on the link line.
Is there a standard place to put them, so that whatever compilers/linkers need them can find them easily?
There is an environment variable that tells the linker where to look for libraries. The name of that variable is different from one system to another. You can also tell the linker about additional places to look.
What is the difference (technically and practically) between a dynamic and static library?
A static library gets copied into the thing it is linked to. An executable will include a copy of the static library and can be run on another machine without also copying the static library.
A dynamic library stays in a separate file. The executable loads that separate file when it runs. You have to distribute a copy of the dynamic library with your program or it won't run. You can also replace the dynamic library with a new version, and as long as the new library has the same interface it will still run with the old executable. It also may save space if several executables use the same dynamic library. In fact dynamic libraries are often called shared libraries.
How would I use third party libraries in my code
Same as you would use one you created yourself, as described above.