Shared libraries and .h files - c++

I have some doubt about how do programs use shared library.
When I build a shared library ( with -shared -fPIC switches) I make some functions available from an external program.
Usually I do a dlopen() to load the library and then dlsym() to link the said functions to some function pointers.
This approach does not involve including any .h file.
Is there a way to avoid doing dlopen() & dlsym() and just including the .h of the shared library?
I guess this may be how c++ programs uses code stored in system shared library. ie just including stdlib.h etc.

Nick, I think all the other answers are actually answering your question, which is how you link libraries, but the way you phrase your question suggests you have a misunderstanding of the difference between headers files and libraries. They are not the same. You need both, and they are not doing the same thing.
Building an executable has two main phases, compilation (which turns your source into an intermediate form, containing executable binary instructions, but is not a runnable program), and linking (which combines these intermediate files into a single running executable or library).
When you do gcc -c program.c, you are compiling, and you generate program.o. This step is where headers matter. You need to #include <stdlib.h> in program.c to (for example) use malloc and free. (Similarly you need #include <dlfcn.h> for dlopen and dlsym.) If you don't do that the compiler will complain that it doesn't know what those names are, and halt with an error. But if you do #include the header the compiler does not insert the code for the function you call into program.o. It merely inserts a reference to them. The reason is to avoid duplication of code: The code is only going to need to be accessed once by every part of your program, so if you needed further files (module1.c, module2.c and so on), even if they all used malloc you would merely end up with many references to a single copy of malloc. That single copy is present in the standard library in either it's shared or static form (libc.so or libc.a) but these are not referenced in your source, and the compiler is not aware of them.
The linker is. In the linking phase you do gcc -o program program.o. The linker will then search all libraries you pass it on the command line and find the single definition of all functions you've called which are not defined in your own code. That is what the -l does (as the others have explained): tell the linker the list of libraries you need to use. Their names often have little to do with the headers you used in the previous step. For example to get use of dlsym you need libdl.so or libdl.a, so your command-line would be gcc -o program program.o -ldl. To use malloc or most of the functions in the std*.h headers you need libc, but because that library is used by every C program it is automatically linked (as if you had done -lc).
Sorry if I'm going into a lot of detail but if you don't know the difference you will want to. It's very hard to make sense of how C compilation works if you don't.
One last thing: dlopen and dlsym are not the normal method of linking. They are used for special cases where you want to dynamically determine what behavior you want based on information that is, for whatever reason, only available at runtime. If you know what functions you want to call at compile time (true in 99% of the cases) you do not need to use the dl* functions.

You can link shared libraries like static one. They are then searched for when launching the program. As a matter of fact, by default -lXXX will prefer libXXX.so to libXXX.a.

You need to give the linker the proper instructions to link your shared library.
The shared library names are like libNAME.so, so for linking you should use -lNAME
Call it libmysharedlib.so and then link your main program as:
gcc -o myprogram myprogram.c -lmysharedlib

If you use CMake to build your project, you can use
TARGET_LINK_LIBRARIES(targetname libraryname)
As in:
TARGET_LINK_LIBRARIES(myprogram mylibrary)
To create the library "mylibrary", you can use
ADD_LIBRARY(targetname sourceslist)
As in:
ADD_LIBRARY(mylibrary ${mylibrary_SRCS})
Additionally, this method is cross-platform (whereas simply passing flags to gcc is not).

Shared libraries (.so) are object files where the actual source code of function/class/... are stored (in binary)
Header files (.h) are files indicating (the reference) where the compiler can find function/class/... (in .so) that are required by the main code
Therefore, you need both of them.

Related

Is it possible to artificially induce object file extraction for a given static library?

I was recently reading this answer and noticed that it seems inconvenient for users to have to link static libraries in the correct order.
Is there some flag or #pragma I can pass to gcc when compiling my library so that my library's object files will always be included?
To be more specific, I want it to be case that the user can link my static library before another library that depends on it, and not wind up with unresolved symbols, in the same way that object files which are explicitly specified on the linker line are.
In particular, I am looking for solutions where the user of the library does not need to do anything special, and merely needs to pass -lMylibrary the way they would link any other library.
Is there some flag or #pragma I can pass to gcc
No.
I want it to be case that the user can link my static library before another library that depends on it, and not wind up with unresolved symbols, in the same way that object files which are explicitly specified on the linker line are.
Ship your "library" as a single object file. In other words, instead of:
ar ru libMyLibrary.a ${OBJS}
use:
ld -r -o libMyLibrary.a ${OBJS}
In particular, I am looking for solutions where the user of the library does not need to do anything special, and merely needs to pass -lMylibrary the way they would link any other library.
You can name your object file libMyLibrary.a. I believe the linker will search for it using the usual rules, but when it finds it, it will discover that this is an object file, and treat it as such, despite it being "misnamed". This should work at least on Linux and other ELF platforms. I am not sure whether it will work on Windows.

How to make a library in c++ like stl

I have made my own implementations of many of the STL features like Vectors, Lists, BST, Queue, Stack and given them all the functions that an STL corresponding library has....
Now i want to use this library by
#include "myLibName.h"
What I Did :
g++ -o -c myLib myLib.cpp
From This I got the object file...
But when i compile programs i have to link the object file myself...
Is there any way that i can do without linking...like the iostream and the other libraries are linked automatically.
I know that a SHARED OBJECT file (eg. libc.so in C) is where all the implementations are held in C....
If that's the solution then how do i make any and use it like other standard libraries in C++ without linking object file every time.
PS: After a lot of efforts i have created these libraries myself...Now Struck at the final step...Pls Help...
You can't unless you're going to write your own toolchain. GCC links in its runtime and standard library because it's GCC and knows that it should; it won't magically do the same with your library.
Conventionally, either make your library header-only or ship a .a/.so/.dll for devs to link against at linktime. In the latter two cases you'll also need to ship the .so/.dll for users to link against at runtime.
To make your build process cleaner for large projects in which you need to link multiple projects, you can use Makefiles.
After that you need to just type make at the terminal to compile and build the whole project.
Another solution is the following, although many people don't recommend it,
header.h
class Foo
{
// some variable and method declarations.
}
header.h is your header file which will contain your declarations.
implement.cpp // this is the implementation file
#include "header.h"
// Now implement various methods you declared in your "header.h" file.
implement.cpp is your implementation file which contains the implementation and the definition of static members.
main.cpp
#include "header.cpp"
// use your methods.
Now you don't need to link your object files, just do g++ -Wall main.cpp
First of all, you should probably differentiate between STL and the Standard C++ Library.
Each compiler comes with its own implementation of the Standard Library, some of them being (at least mostly) compatible (see clang++ and g++). So basically your way to go would be to modify the compiler you are using.
If you are writing header-only implementations, then no library is needed to be built and you can use it without linking. But in that case your work has to be distributed as source and not as library + header.
If you want to simply distribute your library and do not mind to link against the shared or static library you distributed, you should build a shared or static library, depending on the case. But you will have to link it when it is used.

g++ linking order dependency when linking c code to c++ code

Prior to today I had always believed that the order that objects and libraries were passed to g++ during the linking stage was unimportant. Then, today, I tried to link from c++ code to c code. I wrapped all the C headers in an extern "C" block but the linker still had difficulties finding symbols which I knew were in the C object archives.
Perplexed, I created a relatively simple example to isolate the linking error but much to my surprise, the simpler example linked without any problems.
After a little trial and error, I found that by emulating the linking pattern used in the simple example, I could get the main code to link OK. The pattern was object code first, object archives second eg:
g++ -o serverCpp serverCpp.o algoC.o libcrypto.a
Can anyone shed some light on why this might be so?. I've never seen this problem when linking ordinary c++ code.
The order you specify object files and libraries is VERY important in GCC - if you haven't been bitten by this before you have lead a charmed life. The linker searches symbols in the order that they appear, so if you have a source file that contains a call to a library function, you need to put it before the library, or the linker won't know that it has to resolve it. Complex use of libraries can mean that you have to specify the library more than once, which is a royal pain to get right.
The library order pass to gcc/g++ does actually matter. If A depends on B, A must be listed first. The reason is that it optimizes out symbols that aren't referenced, so if it sees library B first, and no one has referenced it at that point then it won't link in anything from it at all.
A static library is a collection of object files grouped into an archive. When linking against it, the linker only chooses the objects it needs to resolve any currently undefined symbols. Since the objects are linked in order given on the command line, objects from the library will only be included if the library comes after all the objects that depend on it.
So the link order is very important; if you're going to use static libraries, then you need to be careful to keep track of dependencies, and don't introduce cyclic dependencies between libraries.
You can use --start-group archives --end-group
and write the 2 dependent libraries instead of archives:
gcc main.o -L. -Wl,--start-group -lobj_A -lobj_b -Wl,--end-group

static link library

I am writing a hello world c++ application, in the instruction #include help the compiler or linker to import the c++ library. My " cout << "hello world"; " use a cout in the library. The question is after compile and generated exe is about 96k in size, so what instructions are actually contained in this exe file, does this file also contains the iostream library?
Thanks
In the general case, the linker will only bring in what it needs. Once the compiler phase has turned your source code into an object file, it's treated much the same as all other object files. You have:
the C start-up code which prepares the execution environment (sets up argv, argv and so on) then calls your main or equivalent.
your code itself.
whatever object files need to be dragged in from libraries (dynamic linking is a special case of linking that happens at runtime and I won't cover that here since you asked specifically about static linking).
The linker will include all the object files you explicitly specify (unless it's a particularly smart linker and can tell you're not using the object file).
With libraries, it's a little different. Basically, you start with a list of unresolved symbols (like cout). The linker will search all the object files in all the libraries you specify and, when it finds an object file that satisfies that symbol, it will drag it in and fix up the symbol references.
This may, of course, add even more unresolved symbols if, for example, there was something in the object file that relies on the C printf function (unlikely but possible).
The linker continues like this until all symbols are satisfied (when it gives you an executable) or one cannot be satisfied (when it complains to you bitterly about your coding practices).
So as to what is in your executable, it may be the entire iostream library or it may just be the minimum required to do what you asked. It will usually depend on how many object files the iostream library was built into.
I've seen code where an entire subsystem went into one object file so, that if you wanted to just use one tiny bit, you still got the lot. Alternatively, you can put every single function into its own object file and the linker will probably create an executable as small as possible.
There are options to the linker which can produce a link map which will show you how things are organised. You probably won't generally see it if you're using the IDE but it'll be buried deep within the compile-time options dialogs under MSVC.
And, in terms of your added comment, the code:
cout << "hello";
will quite possibly bring in sizeable chunks of both the iostream and string processing code.
Use cl /EHsc hello.cpp -link /MAP. The .map file generated will give you a rough idea which pieces of the static library are present in the .exe.
Some of the space is used by C++ startup code, and the portions of the static library that you use.
In windows, the library or part of the libraries (which are used) are also usually included in the .exe, the case is different in case of Linux. However, there are optimization options.
I guess this Wiki link will be useful : Static Libraries

Includes with the Linux GCC Linker

I don't understand how GCC works under Linux. In a source file, when I do a:
#include <math.h>
Does the compiler extract the appropriate binary code and insert it into the compiled executable OR does the compiler insert a reference to an external binary file (a-la Windows DLL?)
I guess a generic version of this question is: Is there an equivalent concept to Windows DLLs under *nix?
Well. When you include math.h the compiler will read the file that contains declarations of the functions and macros that can be used. If you call a function declared in that file (header), then the compiler inserts a call instruction into that place in your object file that will be made from the file you compile (let's call it test.c and the object file created test.o). It also adds an entry into the relocation table of that object-file:
Relocation section '.rel.text' at offset 0x308 contains 1 entries:
Offset Info Type Sym.Value Sym. Name
0000001c 00000902 R_386_PC32 00000000 bar
This would be a relocation entry for a function bar. An entry in the symbol table will be made noting the function is yet undefined:
9: 00000000 0 NOTYPE GLOBAL DEFAULT UND bar
When you link the test.o object file into a program, you need to link against the math library called libm.so . The so extension is similar to the .dll extension for windows. It means it is a shared object file. The compiler, when linking, will fix-up all the places that appear in the relocation table of test.o, replacing its entries with the proper address of the bar function. Depending on whether you use the shared version of the library or the static one (it's called libm.a then), the compiler will do that fix-up after compiling, or later, at runtime when you actually start your program. When finished, it will inject an entry in the table of shared libraries needed for that program. (can be shown with readelf -d ./test):
Dynamic section at offset 0x498 contains 22 entries:
Tag Type Name/Value
0x00000001 (NEEDED) Shared library: [libm.so.6]
0x00000001 (NEEDED) Shared library: [libc.so.6]
... ... ...
Now, if you start your program, the dynamic linker will lookup that library, and will link that library to your executable image. In Linux, the program doing this is called ld.so. Static libraries don't have a place in the dynamic section, as they are just linked to the other object files and then they are forgotten about; they are part of the executable from then on.
In reality it is actually much more complex and i also don't understand this in detail. That's the rough plan, though.
There are several aspects involved here.
First, header files. The compiler simply includes the content of the file at the location where it was included, nothing more. As far as I know, GCC doesn't even treat standard header files differently (but I might be wrong there).
However, header files might actually not contain the implementation, only its declaration. If the implementation is located somewhere else, you've got to tell the compiler/linker that. By default, you do this by simply passing the appropriate library files to the compiler, or by passing a library name. For example, the following two are equivalent (provided that libcurl.a resides in a directory where it can be found by the linker):
gcc codefile.c -lcurl
gcc codefile.c /path/to/libcurl.a
This tells the link editor (“linker”) to link your code file against the implementation of the static library libcurl.a (the compiler gcc actually ignores these arguments because it doesn't know what to do with them, and simply passes them on to the linker). However, this is called static linking. There's also dynamic linking, which takes place at startup of your program, and which happens with .dlls under Windows (whereas static libraries correspond to .lib files on Windows). Dynamic library files under Linux usually have the file extension .so.
The best way to learn more about these files is to familiarize yourself with the GCC linker, ld, as well as the excellent toolset binutils, with which you can edit/view library files effortlessly (any binary code files, really).
Is there an equivalent concept to Windows DLLs under *nix?
Yes they are called "Shared Objects" or .so files. They are dynamically linked into your binary at runtime. In linux you can use the "ldd" command on your executable to see which shared objects your binary is linked to. You can use ListDLLs from sysinternals to accomplish the same thing in windows.
The compiler is allowed to do whatever it pleases, as long as, in effect, it acts as if you'd included the file. (All the compilers I know of, including GCC, simply include a file called math.h.)
And no, it doesn't usually contain the function definitions itself. That's libm.so, a "shared object", similar to windows .DLLs. It should be on every system, as it is a companion of libc.so, the C runtime.
Edit: And that's why you have to pass -lm to the linker if you use math functions - it instructs it to link against libm.so.
There is. The include does a textual include of the header file (which is standard C/C++ behavior). What you're looking for is the linker . The -l argument to gcc/g++ tells the linker what library(ies) to add in. For math (libm.so), you'd use -lm. The common pattern is:
source file: #include <foo.h>
gcc/g++ command line: -lfoo
shared library: libfoo.so
math.h is a slight variation on this theme.