I've got some .so libraries that I'd like to combine into one shared library so that it doesn't depend on the original .so files anymore.
The .so files have dependencies to each other.
How can I do this? Can I do this?
This assumes you have the source code to all shared objects:
Provided there are no name space conflicts (which there should not be if the two co-exist as it is), it would not be too terribly hard to build them into one shared object.
If the shared libraries themselves depend on code from another library, order is going to matter. The real work is just getting the dependencies worked out in the makefile. I've never seen circular dependencies in SO's successfully link, so I doubt that you have them to begin with. I.e. foo() depends on bar() which depends on foo().
I've done this several times, though the libraries themselves were trivial. I took parts from ustr (string handler), a configuration file handler, some other custom parsers and other utility functions and created a custom mash up.
The real pain is bringing in upstream improvements to each once you have combined them, however I'm not sure if that's an issue for you.
So if you have:
libfoo.so: $(LIB_FOO_OBJECTS) $(LIB_BAR_OBJECTS) $(LIBFOOBAR_OBJECTS)
Where:
LIB_FOO_OBJECTS = \
$(libfoo)/foo.o \
$(libfoo)/strings.o
LIB_BAR_OBJECTS = \
$(libbar)/bar.o
....
... and the order is correct .. the rest is pretty easy. Note I didn't show header deps, everyone does that a little differently. They are important when making mash-ups though, as you'd probably want to avoid recompiling the whole library every time one header changes.
NB: If all three projects are using autotools .. your task just got exponentially easier (or harder) depending.
If you DON'T have the source code
If there is a static version of each library, you may be able to extract the objects and use them. I.e.:
$ cp /usr/lib/foo.a ./foo.a
$ ar x foo.a
$ gcc -fPIC -shared *.o -o foo.so
Of course its quite a bit more involved than illustrated.
I have never tried that and don't know how to handle SOs that have main() when it comes to linking in that case.
Related
I have a large C++ project with around 250 cpp files.
I didn't write this code, I'm just trying to write a test for fuzzing(testing) purpose. Therefore:
I wrote my own main cpp file, called wrapper.cpp, containing the int main()
I included in this file some header files needed
I compiled after removing the inital main from the Makefile and adding my wrapper.cpp
It works, it produces a functionnal executable. However, the binary size is quiet important. I'm pretty sure I can reduce the size as a lot of object files are linked but not used. Therefore, I built all the object files and now I'm thinking about how to link the needed ones with the executables. But after many tries, it seems impossible:
The executable is linked against the object files, some static libraries and some dynamic lib
The order matters for the static libs (interdependencies between them and some *.o files)
There a several definitions for some symbols and this is allowed by the zmuldefs linker option
Thus, I first tried to create a bug static libs with all the object files and to link the executable against it assuming only the right .o files would be picked by the linker. I didn't think about the order problem ... Some of these object files need symbols contained in other static lib and vice versa (interdependencies). No matter where I place the static lib I created, there will be issues. So I can't go this way, it is too complex.
Then, I tried to add the -Wl,--start-group/-Wl,--end-group linker option. It allows my to compile but the binary will segfault. I guess this is because of the zmuldefs option that allows multiple definitons, so the order is really important.
So I was wondering if there was a way to this, maybe an obvious way that I'm missing ? Cause it seems to be a pretty common use case to me(imagine if you want to test a single function), but I cannot cannot find anything online.
Thank you in advance for your precious help
As I saw here (Combine static libraries) I may combine more than one static library using libtool
libtool -static -o new.a old1.a old2.a
As far as I know, this will concatenate every single function from the old libraries to the new one. But what I really want are the functions from the new.a library, the others are there for dependency purpose. Is there a way to combine only the part required by the new.a from the other libraries without carrying a bunch of unnecessary code?
You can extract from the old libraries those object files you wish to incorporate in the new. But there really isn't much point in worrying about it; the linker will only link those object files that are necessary, unlike a shared library where all the symbols defined in the shared library are available to the executable (not that it uses them all, usually).
The old-fashioned way to do the job would be:
mkdir new
cd new
ar x ../old1.a
ar x ../old2.a
ar rv ../new.a *.o
cd ..
rm -fr new
After the two x operations, you can weed and whittle the object files to keep what you want for use in new.a.
Super-simple, totally boring setup: I have a directory full of .hpp and .cpp files. Some of these .cpp files need to be built into executables; naturally, these .cpp files #include some of the .hpp files in the same directory, which may then include others, etc. etc. Most of those .hpp files have corresponding .cpp files, which is to say: if some_application.cpp #includes foo.hpp, either directly or transitively, then chances are there's also a foo.cpp file that needs to be compiled and linked into the some_application executable.
Super-simple, but I'm still clueless about what the "best" way to build it is, either in SCons or CMake (neither of which I have any expertise in yet, other than staring at documentation for the last day or so and becoming sad). I fear that the sort of solution I want may actually be impossible (or at least grossly overcomplicated) to pull off in most build systems, but if so, it'd be nice to know that so I can just give up and be less picky. Naturally, I'm hoping I'm wrong, which wouldn't be surprising given how ignorant I am about build systems (in general, and about CMake and SCons in particular).
CMake and SCons can, of course, both automatically detect that some_application.cpp needs to be recompiled whenever any of the header files it depends on (either directly or transitively) changes, since they can "parse" C++ files well enough to pick out those dependencies. OK, great: we don't have to list each .cpp-#includes-.hpp dependency by hand. But: we still need to decide what subset of object files need to get sent to the linker when it's time to actually generate each executable.
As I understand it, the two most straightforward alternatives to dealing with that part of the problem are:
A. Explicitly and laboriously enumerating the "anything using this object file needs to use these other object files too" dependencies by hand, even though those dependencies are exactly mirrored by the corresponding-.cpp-transitively-includes-the-corresponding-.hpp dependencies that the build system already went to the trouble of figuring out for us. Why? Because computers.
B. Dumping all the object files in this directory into a single "library", and then having all executables depend on and link in that one library. This is much simpler, and what I understand most people would do, but it's also kinda sloppy. Most of the executables don't actually need everything in that library, and wouldn't actually need to be rebuilt if only the contents of one or two .cpp files changed. Isn't this setting up exactly the kind of unnecessary computation a supposed "build system" should be avoiding? (I suppose maybe they wouldn't need to be rebuilt if the library were dynamically linked, but suffice it to say I dislike dynamically linked libraries for other reasons.)
Can either CMake or SCons do better than this in any remotely straightforward fashion? I see a bunch of limited ways to twiddle the automatically generated dependency graph, but no general-purpose way to do so interactively ("OK, build system, what do you think the dependencies are? Ah. Well, based on that, add the following dependencies and think again: ..."). I'm not too surprised about that. I haven't yet found a special-purpose mechanism in either build system for dealing with the super-common case where link-time dependencies should mirror corresponding compile-time #include dependencies, though. Did I miss something in my (admittedly somewhat cursory) reading of the documentation, or does everyone just go with option (B) and quietly hate themselves and/or their build systems?
Your statement in point A) "anything using this object file needs to use these other object files too" is something that will indeed need to be done by hand. Compilers dont automatically find object files needed by a binary. You have to explicitly list them at link time. If I understand your question correctly, you dont want to have to explicitly list the objects needed by a binary, but want the build tool to automatically find them. I doubt there is any build too that does this: SCons and Cmake definitely dont do this.
If you have an application some_application.cpp that includes foo.hpp (or other headers used by these cpp files), and subsequently needs to link the foo.cpp object, then in SCons, you will need to do something like this:
env = Environment()
env.Program(target = 'some_application',
source = ['some_application.cpp', 'foo.cpp'])
This will only link when 'some_application.cpp', 'foo.hpp', or 'foo.cpp' have changed. Assuming g++, this will effectively translate to something like the following, independently of SCons or Cmake.
g++ -c foo.cpp -o foo.o
g++ some_application.cpp foo.o -o some_application
You mention you have "a directory full of .hpp and .cpp files", I would suggest you organize those files into libraries. Not all in one library, but logically organize them into smaller, cohesive libraries. Then your applications/binaries would link the libraries they need, thus minimizing recompilations due to not used objects.
I had more or less the same problem as you have and I solved it as follows:
import SCons.Scanner
import os
def header_to_source(header_file):
"""Specify the location of the source file corresponding to a given
header file."""
return header_file.replace('include/', 'src/').replace('.hpp', '.cpp')
def source_files(main_file, env):
"""Returns list of source files the given main_file depends on. With
the function header_to_source one must specify where to look for
the source file corresponding to a given header. The resulting
list is filtered for existing files. The resulting list contains
main_file as first element."""
## get the dependencies
node = File(main_file)
scanner = SCons.Scanner.C.CScanner()
path = SCons.Scanner.FindPathDirs("CPPPATH")(env)
deps = node.get_implicit_deps(env, scanner, path)
## collect corresponding source files
root_path = env.Dir('#').get_abspath()
res = [main_file]
for dep in deps:
source_path = header_to_source(
os.path.relpath(dep.get_abspath(), root_path))
if os.path.exists(os.path.join(root_path, source_path)):
res.append(source_path)
return res
The header_to_source method is the one you need to modify such that it returns the source file corresponding to a given header file. Then the method source_file gives you all the source files you need to build the given main_file (including the main_file as first element). Non existing files are automatically removed. So the following should be sufficient to define the target for an executable:
env.Program(source_files('main.cpp', env))
I am not sure whether this works in all possible setups, but at least for me it works.
I have access to a large C++ project, full of files and with a very complicated makefile courtesy of automake & friends
Here is an idea of the directory structure.
otherproject/
folder1/
some_headers.h
some_files.cpp
...
folderN/
more_headers.h
more_files.cpp
build/
lots_of things here
objs/
lots_of_stuff.o
an_executable_I_dont_need.exe
my_stuff/
my_program.cpp
I want to use a class from the big project, declared in say, "some_header.h"
/* my_program.cpp */
#include "some_header.h"
int main()
{
ThatClass x;
x.frobnicate();
}
I managed to compile my file by painstakingly passing lots of "-I" options to gcc so that it could find all the header files
g++ my_program.cpp -c -o myprog.o -I../other/folder1 ... -I../other/folderN
When it comes to compiling I have to manually include all his ".o"s, which is probably overkill
g++ -o my_executable myprog.o ../other/build/objs/*.o
However, not only do I have to do things like manually removing his "main.o" from the list, but this isn't even enough since I forgot to also link against all the libraries that he happened to use.
otherproject/build/objs/StreamBuffer.h:50: undefined reference to `gzread'
At this point I am starting to feel I am probably doing something very wrong. How should I proceed? What is the usual and what is the best approach this kind of issue?
I need this to work on Linux in case something platform-specific needs to be done.
Generally the project's .o files should come grouped together into a library (on Linux, .a file if it's a static library, or .so if it's a dynamic library), and you link to the library using the -L option to specify the location and the -l option to specify the library name.
For example, if the library file is at /path/to/big_project/libbig_project.a, you would add the options -L /path/to/big_project -l big_project to your gcc command line.
If the project doesn't have a library file that you can link to (e.g. it's not a library but an executable program and you just want some of the code used by the executable program), you might want to try asking the project's author to create such a library file (if he/she is familiar with "automake and friends" it shouldn't be too much trouble for him), or try doing so yourself.
EDIT Another suggestion: you said the project comes with a makefile. Try makeing it with the makefile, and see what its compiler command line looks like. Does it have many includes and individual object files as well?
Treating an application which was not developed as a library as if it was a library isn't likely to work. As an offhand example, omitting the main might wind up cutting out initialization code that the class you want depends upon.
The responsible thing to do here is to read the code, understand it, and turn the functionality you want into a proper library. Build the "exe you don't need" with debug symbols and set breakpoints in the constructors and methods of the class. Step into them so you get a grasp on the functionality and what parts of the program are relevant and irrelevant to your needs.
Hopefully the code is under some kind of version control system that supports branching (such as Git). If not, make your own repository that does. Edit the files until you've organized them into a library and code that uses the library. Make sure it works properly within the context of the original program. Then turn around and use this library in your own program.
If you've done a good job, you might be able to convince the original authors to accept the separation back into their original codebase. If not, at least version control has your back so you can manage integration of future changes.
I have some doubt about how do programs use shared library.
When I build a shared library ( with -shared -fPIC switches) I make some functions available from an external program.
Usually I do a dlopen() to load the library and then dlsym() to link the said functions to some function pointers.
This approach does not involve including any .h file.
Is there a way to avoid doing dlopen() & dlsym() and just including the .h of the shared library?
I guess this may be how c++ programs uses code stored in system shared library. ie just including stdlib.h etc.
Nick, I think all the other answers are actually answering your question, which is how you link libraries, but the way you phrase your question suggests you have a misunderstanding of the difference between headers files and libraries. They are not the same. You need both, and they are not doing the same thing.
Building an executable has two main phases, compilation (which turns your source into an intermediate form, containing executable binary instructions, but is not a runnable program), and linking (which combines these intermediate files into a single running executable or library).
When you do gcc -c program.c, you are compiling, and you generate program.o. This step is where headers matter. You need to #include <stdlib.h> in program.c to (for example) use malloc and free. (Similarly you need #include <dlfcn.h> for dlopen and dlsym.) If you don't do that the compiler will complain that it doesn't know what those names are, and halt with an error. But if you do #include the header the compiler does not insert the code for the function you call into program.o. It merely inserts a reference to them. The reason is to avoid duplication of code: The code is only going to need to be accessed once by every part of your program, so if you needed further files (module1.c, module2.c and so on), even if they all used malloc you would merely end up with many references to a single copy of malloc. That single copy is present in the standard library in either it's shared or static form (libc.so or libc.a) but these are not referenced in your source, and the compiler is not aware of them.
The linker is. In the linking phase you do gcc -o program program.o. The linker will then search all libraries you pass it on the command line and find the single definition of all functions you've called which are not defined in your own code. That is what the -l does (as the others have explained): tell the linker the list of libraries you need to use. Their names often have little to do with the headers you used in the previous step. For example to get use of dlsym you need libdl.so or libdl.a, so your command-line would be gcc -o program program.o -ldl. To use malloc or most of the functions in the std*.h headers you need libc, but because that library is used by every C program it is automatically linked (as if you had done -lc).
Sorry if I'm going into a lot of detail but if you don't know the difference you will want to. It's very hard to make sense of how C compilation works if you don't.
One last thing: dlopen and dlsym are not the normal method of linking. They are used for special cases where you want to dynamically determine what behavior you want based on information that is, for whatever reason, only available at runtime. If you know what functions you want to call at compile time (true in 99% of the cases) you do not need to use the dl* functions.
You can link shared libraries like static one. They are then searched for when launching the program. As a matter of fact, by default -lXXX will prefer libXXX.so to libXXX.a.
You need to give the linker the proper instructions to link your shared library.
The shared library names are like libNAME.so, so for linking you should use -lNAME
Call it libmysharedlib.so and then link your main program as:
gcc -o myprogram myprogram.c -lmysharedlib
If you use CMake to build your project, you can use
TARGET_LINK_LIBRARIES(targetname libraryname)
As in:
TARGET_LINK_LIBRARIES(myprogram mylibrary)
To create the library "mylibrary", you can use
ADD_LIBRARY(targetname sourceslist)
As in:
ADD_LIBRARY(mylibrary ${mylibrary_SRCS})
Additionally, this method is cross-platform (whereas simply passing flags to gcc is not).
Shared libraries (.so) are object files where the actual source code of function/class/... are stored (in binary)
Header files (.h) are files indicating (the reference) where the compiler can find function/class/... (in .so) that are required by the main code
Therefore, you need both of them.