I have access to a large C++ project, full of files and with a very complicated makefile courtesy of automake & friends
Here is an idea of the directory structure.
otherproject/
folder1/
some_headers.h
some_files.cpp
...
folderN/
more_headers.h
more_files.cpp
build/
lots_of things here
objs/
lots_of_stuff.o
an_executable_I_dont_need.exe
my_stuff/
my_program.cpp
I want to use a class from the big project, declared in say, "some_header.h"
/* my_program.cpp */
#include "some_header.h"
int main()
{
ThatClass x;
x.frobnicate();
}
I managed to compile my file by painstakingly passing lots of "-I" options to gcc so that it could find all the header files
g++ my_program.cpp -c -o myprog.o -I../other/folder1 ... -I../other/folderN
When it comes to compiling I have to manually include all his ".o"s, which is probably overkill
g++ -o my_executable myprog.o ../other/build/objs/*.o
However, not only do I have to do things like manually removing his "main.o" from the list, but this isn't even enough since I forgot to also link against all the libraries that he happened to use.
otherproject/build/objs/StreamBuffer.h:50: undefined reference to `gzread'
At this point I am starting to feel I am probably doing something very wrong. How should I proceed? What is the usual and what is the best approach this kind of issue?
I need this to work on Linux in case something platform-specific needs to be done.
Generally the project's .o files should come grouped together into a library (on Linux, .a file if it's a static library, or .so if it's a dynamic library), and you link to the library using the -L option to specify the location and the -l option to specify the library name.
For example, if the library file is at /path/to/big_project/libbig_project.a, you would add the options -L /path/to/big_project -l big_project to your gcc command line.
If the project doesn't have a library file that you can link to (e.g. it's not a library but an executable program and you just want some of the code used by the executable program), you might want to try asking the project's author to create such a library file (if he/she is familiar with "automake and friends" it shouldn't be too much trouble for him), or try doing so yourself.
EDIT Another suggestion: you said the project comes with a makefile. Try makeing it with the makefile, and see what its compiler command line looks like. Does it have many includes and individual object files as well?
Treating an application which was not developed as a library as if it was a library isn't likely to work. As an offhand example, omitting the main might wind up cutting out initialization code that the class you want depends upon.
The responsible thing to do here is to read the code, understand it, and turn the functionality you want into a proper library. Build the "exe you don't need" with debug symbols and set breakpoints in the constructors and methods of the class. Step into them so you get a grasp on the functionality and what parts of the program are relevant and irrelevant to your needs.
Hopefully the code is under some kind of version control system that supports branching (such as Git). If not, make your own repository that does. Edit the files until you've organized them into a library and code that uses the library. Make sure it works properly within the context of the original program. Then turn around and use this library in your own program.
If you've done a good job, you might be able to convince the original authors to accept the separation back into their original codebase. If not, at least version control has your back so you can manage integration of future changes.
Related
I'm doing the "Hello World" in the GTKMM tutorial, the "app" uses three files, the main.cc, helloworld.h and helloworld.cc.
At the beginning I thought that compiling the main.cc :
g++ -o HW main.cc $(pkg-config ... )
would be enough, but gives an error (undefined reference to Helloworld::Helloworld), etc.
In other words, it compiles the main and the header, but not the HW class, and this makes sense because the header is included in Main but not the Helloworld.cc. The thing is I'm kinda scared of including it because I read in other question that "including everything was a bad practice".
That being said, when I compile using all the files in the same command:
g++ -o HW main.cc helloworld.cc $(pkg-config ... )
the "app" works without errors.
So, since using the last command works, is compiling in this way a good practice?
What happens if my app uses a big ton of classes?
Must I manually write them all down in the command?
If not, must I use #include?
Is it good practice using #include for all cc used files?
Is normal to list all the cpp/cc files when compiling with g++?
Yes, completely.
How else will it know what source code you want it to compile?
The thing is I'm kinda scared of including it because I read in other question that including everything was a bad practice.
#includeing excess headers is bad practice.
Passing your complete source code to the compiler is not.
Is it good practice using #include for all cc used files?
Absolutely not.
What happens if my app uses a big ton of classes? Must I manually write them all down in the command?
No. You should be using a build system that handles this for you. That could be an IDE which takes all the files in your project and passes them to the compiler in turn, or it could be a CMakeLists.txt/Makefile with a *.cpp wildcard in (although I actually recommend listing source files explicitly, one-by-one; it's not hard).
Invoking g++ manually on the command-line is fine for a quick test, but for real usage you don't want to be clowning around with such machinery.
is good practice using #include for all cc used files
It's not only bad practice, never do it.
In order to create an executable you actually have to do two things:
Compile all the source code files to object files or libraries.
Link all the object files and needed libraries into an executable.
You seem to be missing the point that the link phase is where symbols defined in separate source files are resolved or linked.
Must I manually write them all down in the command?
For the compiler to know about the DEFINTION of the symbols DECLARED in your headers, you must include all source files. Exceptions to this rule can be (but are not limited to) headers containing template metaprogramming (TMP) code that usually exist entirely in header files.
What happens if my app uses a big ton of classes?
Most of the large C++ projects utilize build configuration tools such as CMAKE to handle the generation of makefiles for them.
I have read several posts on stack overflow and read about dynamic linking online. And this is what I have taken away from all those readings -
Dynamic linking is an optimization technique that was employed to take full advantage of the virtual memory of the system. One process can share its pages with other processes. For example the libc++ needs to be linked with all C++ programs but instead of copying over the executable to every process, it can be linked dynamically with many processes via shared virtual pages.
However this leads me to the following questions
When a C++ program is compiled. It needs to have references to the C++ library functions and code (say for example the code of the thread library). How does the compiler make the executable have these references? Does this not result in a circular dependency between the compiler and the operating system? Since the compiler has to make a reference to the dynamic library in the executable.
How and when would you use a dynamic library? How do you make one? What is the specific compiling command that is used to produce such a file from a standard *.cpp file?
Usually when I install a library, there is a lib/ directory with *.a files and *.dylib (on mac-OSX) files. How do I know which ones to link to statically as I would with a regular *.o file and which ones are supposed to be dynamically linked with? I am assuming the *.dylib files are dynamic libraries. Which compiler flag would one use to link to these?
What are the -L and -l flags for? What does it mean to specify for example a -lusb flag on the command line?
If you feel like this question is asking too many things at once, please let me know. I would be completely ok with splitting this question up into multiple ones. I just ask them together because I feel like the answer to one question leads to another.
When a C++ program is compiled. It needs to have references to the C++
library functions and code (say for example the code for the library).
Assume we have a hypothetical shared library called libdyno.so. You'll eventually be able to peek inside it using using objdump or nm.
objdump --syms libdyno.so
You can do this today on your system with any shared library. objdump on a MAC is called gobjdump and comes with brew in the binutils package. Try this on a mac...
gobjdump --syms /usr/lib/libz.dylib
You can now see that the symbols are contained in the shared object. When you link with the shared object you typically use something like
g++ -Wall -g -pedantic -ldyno DynoLib_main.cpp -o dyno_main
Note the -ldyno in that command. This is telling the compiler (really the linker ld) to look for a shared object file called libdyno.so wherever it normally looks for them. Once it finds that object it can then find the symbols it needs. There's no circular dependency because you the developer asked for the dynamic library to be loaded by specifying the -l flag.
How and when would you use a dynamic library? How do you make one? As in what
is the specific compiling command that is used to produce such a file from a
standard .cpp file
Create a file called DynoLib.cpp
#include "DynoLib.h"
DynamicLib::DynamicLib() {}
int DynamicLib::square(int a) {
return a * a;
}
Create a file called DynoLib.h
#ifndef DYNOLIB_H
#define DYNOLIB_H
class DynamicLib {
public:
DynamicLib();
int square(int a);
};
#endif
Compile them to be a shared library as follows. This is linux specific...
g++ -Wall -g -pedantic -shared -std=c++11 DynoLib.cpp -o libdyno.so
You can now inspect this object using the command I gave earlier ie
objdump --syms libdyno.so
Now create a file called DynoLib_main.cpp that will be linked with libdyno.so and use the function we just defined in it.
#include "DynoLib.h"
#include <iostream>
using namespace std;
int main(void) {
DynamicLib *lib = new DynamicLib();
std::cout << "Square " << lib->square(1729) << std::endl;
return 1;
}
Compile it as follows
g++ -Wall -g -pedantic -L. -ldyno DynoLib_main.cpp -o dyno_main
./dyno_main
Square 2989441
You can also have a look at the main binary using nm. In the following I'm seeing if there is anything with the string square in it ie is the symbol I need from libdyno.so in any way referenced in my binary.
nm dyno_runner |grep square
U _ZN10DynamicLib6squareEi
The answer is yes. The uppercase U means undefined but this is the symbol name for our square method in the DynamicLib Class that we created earlier. The odd looking name is due to name mangling which is it's own topic.
How do I know which ones to link to statically as I would with a regular
.o file and which ones are supposed to be dynamically linked with?
You don't need to know. You specify what you want to link with and let the compiler (and linker etc) do the work. Note the -l flag names the library and the -L tells it where to look. There's a decent write up on how the compiler finds thing here
gcc Linkage option -L: Alternative ways how to specify the path to the dynamic library
Or have a look at man ld.
What are the -L and -l flags for? What does it mean to specify
for example a -lusb flag on the command line?
See the above link. This is from man ld..
-L searchdir
Add path searchdir to the list of paths that ld will search for
archive libraries and ld control scripts. You may use this option any
number of times. The directories are searched in the order in which
they are specified on the command line. Directories specified on the
command line are searched before the default directories. All -L
options apply to all -l options, regardless of the order in which the
options appear. -L options do not affect how ld searches for a linker
script unless -T option is specified.`
If you managed to get here it pays dividends to learn about the linker ie ld. It plays an important job and is the source of a ton of confusion because most people start out dealing with a compiler and think that compiler == linker and this is not true.
The main difference is that you include static linked libraries with your app. They are linked when you build your app. Dynamic libraries are linked at run time, so you do not need to include them with your app. These days dynamic libraries are used to reduce the size of apps by having many dynamic libraries on everyone's computer.
Dynamic libraries also allow users to update libraries without re-building the client apps. If a bug is found in a library that you use in your app and it is statically linked, you will have to rebuild your app and re-issue it to all your users. If a bug is found in a dynamically linked library, all your users just need to update their libraries and your app does not need an update.
I've got some .so libraries that I'd like to combine into one shared library so that it doesn't depend on the original .so files anymore.
The .so files have dependencies to each other.
How can I do this? Can I do this?
This assumes you have the source code to all shared objects:
Provided there are no name space conflicts (which there should not be if the two co-exist as it is), it would not be too terribly hard to build them into one shared object.
If the shared libraries themselves depend on code from another library, order is going to matter. The real work is just getting the dependencies worked out in the makefile. I've never seen circular dependencies in SO's successfully link, so I doubt that you have them to begin with. I.e. foo() depends on bar() which depends on foo().
I've done this several times, though the libraries themselves were trivial. I took parts from ustr (string handler), a configuration file handler, some other custom parsers and other utility functions and created a custom mash up.
The real pain is bringing in upstream improvements to each once you have combined them, however I'm not sure if that's an issue for you.
So if you have:
libfoo.so: $(LIB_FOO_OBJECTS) $(LIB_BAR_OBJECTS) $(LIBFOOBAR_OBJECTS)
Where:
LIB_FOO_OBJECTS = \
$(libfoo)/foo.o \
$(libfoo)/strings.o
LIB_BAR_OBJECTS = \
$(libbar)/bar.o
....
... and the order is correct .. the rest is pretty easy. Note I didn't show header deps, everyone does that a little differently. They are important when making mash-ups though, as you'd probably want to avoid recompiling the whole library every time one header changes.
NB: If all three projects are using autotools .. your task just got exponentially easier (or harder) depending.
If you DON'T have the source code
If there is a static version of each library, you may be able to extract the objects and use them. I.e.:
$ cp /usr/lib/foo.a ./foo.a
$ ar x foo.a
$ gcc -fPIC -shared *.o -o foo.so
Of course its quite a bit more involved than illustrated.
I have never tried that and don't know how to handle SOs that have main() when it comes to linking in that case.
Super-simple, totally boring setup: I have a directory full of .hpp and .cpp files. Some of these .cpp files need to be built into executables; naturally, these .cpp files #include some of the .hpp files in the same directory, which may then include others, etc. etc. Most of those .hpp files have corresponding .cpp files, which is to say: if some_application.cpp #includes foo.hpp, either directly or transitively, then chances are there's also a foo.cpp file that needs to be compiled and linked into the some_application executable.
Super-simple, but I'm still clueless about what the "best" way to build it is, either in SCons or CMake (neither of which I have any expertise in yet, other than staring at documentation for the last day or so and becoming sad). I fear that the sort of solution I want may actually be impossible (or at least grossly overcomplicated) to pull off in most build systems, but if so, it'd be nice to know that so I can just give up and be less picky. Naturally, I'm hoping I'm wrong, which wouldn't be surprising given how ignorant I am about build systems (in general, and about CMake and SCons in particular).
CMake and SCons can, of course, both automatically detect that some_application.cpp needs to be recompiled whenever any of the header files it depends on (either directly or transitively) changes, since they can "parse" C++ files well enough to pick out those dependencies. OK, great: we don't have to list each .cpp-#includes-.hpp dependency by hand. But: we still need to decide what subset of object files need to get sent to the linker when it's time to actually generate each executable.
As I understand it, the two most straightforward alternatives to dealing with that part of the problem are:
A. Explicitly and laboriously enumerating the "anything using this object file needs to use these other object files too" dependencies by hand, even though those dependencies are exactly mirrored by the corresponding-.cpp-transitively-includes-the-corresponding-.hpp dependencies that the build system already went to the trouble of figuring out for us. Why? Because computers.
B. Dumping all the object files in this directory into a single "library", and then having all executables depend on and link in that one library. This is much simpler, and what I understand most people would do, but it's also kinda sloppy. Most of the executables don't actually need everything in that library, and wouldn't actually need to be rebuilt if only the contents of one or two .cpp files changed. Isn't this setting up exactly the kind of unnecessary computation a supposed "build system" should be avoiding? (I suppose maybe they wouldn't need to be rebuilt if the library were dynamically linked, but suffice it to say I dislike dynamically linked libraries for other reasons.)
Can either CMake or SCons do better than this in any remotely straightforward fashion? I see a bunch of limited ways to twiddle the automatically generated dependency graph, but no general-purpose way to do so interactively ("OK, build system, what do you think the dependencies are? Ah. Well, based on that, add the following dependencies and think again: ..."). I'm not too surprised about that. I haven't yet found a special-purpose mechanism in either build system for dealing with the super-common case where link-time dependencies should mirror corresponding compile-time #include dependencies, though. Did I miss something in my (admittedly somewhat cursory) reading of the documentation, or does everyone just go with option (B) and quietly hate themselves and/or their build systems?
Your statement in point A) "anything using this object file needs to use these other object files too" is something that will indeed need to be done by hand. Compilers dont automatically find object files needed by a binary. You have to explicitly list them at link time. If I understand your question correctly, you dont want to have to explicitly list the objects needed by a binary, but want the build tool to automatically find them. I doubt there is any build too that does this: SCons and Cmake definitely dont do this.
If you have an application some_application.cpp that includes foo.hpp (or other headers used by these cpp files), and subsequently needs to link the foo.cpp object, then in SCons, you will need to do something like this:
env = Environment()
env.Program(target = 'some_application',
source = ['some_application.cpp', 'foo.cpp'])
This will only link when 'some_application.cpp', 'foo.hpp', or 'foo.cpp' have changed. Assuming g++, this will effectively translate to something like the following, independently of SCons or Cmake.
g++ -c foo.cpp -o foo.o
g++ some_application.cpp foo.o -o some_application
You mention you have "a directory full of .hpp and .cpp files", I would suggest you organize those files into libraries. Not all in one library, but logically organize them into smaller, cohesive libraries. Then your applications/binaries would link the libraries they need, thus minimizing recompilations due to not used objects.
I had more or less the same problem as you have and I solved it as follows:
import SCons.Scanner
import os
def header_to_source(header_file):
"""Specify the location of the source file corresponding to a given
header file."""
return header_file.replace('include/', 'src/').replace('.hpp', '.cpp')
def source_files(main_file, env):
"""Returns list of source files the given main_file depends on. With
the function header_to_source one must specify where to look for
the source file corresponding to a given header. The resulting
list is filtered for existing files. The resulting list contains
main_file as first element."""
## get the dependencies
node = File(main_file)
scanner = SCons.Scanner.C.CScanner()
path = SCons.Scanner.FindPathDirs("CPPPATH")(env)
deps = node.get_implicit_deps(env, scanner, path)
## collect corresponding source files
root_path = env.Dir('#').get_abspath()
res = [main_file]
for dep in deps:
source_path = header_to_source(
os.path.relpath(dep.get_abspath(), root_path))
if os.path.exists(os.path.join(root_path, source_path)):
res.append(source_path)
return res
The header_to_source method is the one you need to modify such that it returns the source file corresponding to a given header file. Then the method source_file gives you all the source files you need to build the given main_file (including the main_file as first element). Non existing files are automatically removed. So the following should be sufficient to define the target for an executable:
env.Program(source_files('main.cpp', env))
I am not sure whether this works in all possible setups, but at least for me it works.
Say I'm working on a library, foo. Within my libraries source files, I'd like to include headers the same way a user of my library would:
#include <foo/bar.hpp>
// code defining bar methods here
In boost for example, includes of other headers within boost are done that way, e.g. <boost/shared_ptr.hpp>, rather than the relative quoted "../shared_ptr.hpp" style. I looked at how some other libraries accomplish this and it appears they add a redundant directory to their file layout in order to do it, e.g. the boost code lives in "boost_1_4_1/boost" rather than just "boost_1_4_1/".
Switching to that scheme is annoying if you already have source control using an existing layout. What's the best way with GNU make to layer it on? My only thought is to add a target that all build targets depend on that makes a hidden folder with a symlink inside to my source tree, and add that hidden folder to the include path. Perhaps there's a less obfuscated way?
Couldn't you use -I gcc key of INC option for your Makefile?
gcc:
gcc -c -I/home/joseph/dev/foo/headers
Makefile:
INC=-I/home/joseph/dev/foo/headers
In this case you would have only one place to make this change, Makefile.