fast compilation in c++ - c++

I have a c++ program which has many many functions and I have different .cpp files for each of the function. From the main program, I only supply a few parameters and just call the functions. However, the compilation of the full thing takes a lot of time. For each compilation I only change a few parameters in the main program and leave all the functions as it is.
Is there anyway to speed up the compilation.?

You are recompiling unnecessary code. Usually IDEs handle this automatically. Otherwise, it depends on how you compile your code. For example lines like this:
g++ *.cpp
or
g++ -o program a.cpp b.cpp c.cpp
are terribly slow, because on every compilation, you recompile everything.
If you are writing Makefiles, you should carefully write it to avoid recompilation. For example:
.PHONY: all
all: program
program: a.o b.o c.o
g++ -o $# $^ $(LDFLAGS)
%.o: %.cpp
g++ $(CXXFLAGS) -o $# $<
# other dependencies:
a.o: a.h
b.o: b.h a.h
c.o: c.h
In the above example, changing c.cpp causes compilation of c.cpp and linking of the program. Changing a.h causes compilation of a.o and b.o and linking of the program. That is, on each build, you compile the minimum number of files possible to make the program up-to-date.
Side note: be careful when writing Makefiles. If you miss a dependency, you will may not compile enough files and you may end up getting hard-to-spot segmentation faults (at best). Take a look also at the manual of gcc for -M* options where you can use gcc itself to generate the dependencies and then include the generated output in the Makefile.

Try to minimize the code impacted by your parameter changes, ideally only change one source file no one depens on (main.cpp).
Check your includes: do you really need it all? Use forward declaration where possible (e.g. #include instead of ), for your own classes, forward declare what you can.
Try using the clang (llvm.org) compiler. It sometimes compiles faster than gcc (assuming you're on linux/unix) and gives more readable errors.
Edit: I was assuming you were only recompiling what's needed. As others suggested, use a buildsystem (Makefile, IDE, CMake...) to run a minimal number of compiles.

Maybe this will or won't help much, but I run code through ssh and I know that it takes forever to run/compile. If you are reading from data files, instead of running entire sets of data, run only over a file or two to see your intended result. This will be a sample of your final result, but should still be accurate (just with less statistics). Once you've tweaked your code to work to your satisfaction, then run everything. usually you will have no problems that way, and your compile time is much quicker comparatively speaking.

Related

Makefile with multiple executables and dependencies

So far, I have a Makefile that looks like this:
# Program list:
# a.cpp, a.h,
# b.cpp, b.h,
# c.cpp. c.h,
# d.cpp, d.h,
# commonA.cpp, commonA.h,
# commonB.cpp, commonB.h,
# mainA.cpp, mainB.cpp,
# mainC.cpp, mainD.cpp.
CXX=g++
CXXFLAGS = -std=c++11 -g -Wall
prog1: mainA.cpp a.cpp
$(CXX) $(CXXFLAGS) programOne mainA.cpp a.cpp
prog2: mainB.cpp b.cpp
$(CXX) $(CXXFLAGS) programTwo mainB.cpp b.cpp
prog3: mainC.cpp c.cpp commonA.cpp
$(CXX) $(CXXFLAGS) programThree mainC.cpp c.cpp commonA.cpp
prog4: mainD.cpp d.cpp commonA.cpp commonB.cpp
$(CXX) $(CXXFLAGS) programFour mainD.cpp d.cpp commonA.cpp commonB.cpp
# etc...
I've had a look at the make GNU documentation, and it's fairly daunting - I've tried to understand as much as I can from it. I was wondering given my example how would I be able to shorten this even more than what it currently is? Is there a good rule to use that could make all the object files for each of the files in the program list above, and then I include the particular objects for each part? Any help is much appreciated!
Here's a tip: make has built-in rules that know how to create all kinds of different targets. Taking advantage of them will give you the shortest possible makefile.
Here's another tip: make uses string comparison to match up targets, prerequisites, and rules. So choosing the names of your files and programs wisely will give the best chance of taking advantage of make's built-in rules.
For example, you know already that a source file foo.cpp will be compiled into an object file foo.o. Make has a built-in rule that will handle that for you. If you also choose to name one of your source files so that it maps to the name of the program you want to create, you can take advantage of a built-in rule that will link your program as well.
As mentioned in the comments above, your rules are broken because the makefile target is named one thing (e.g., prog1) but the link line generates a different thing: programOne (here I'm assuming you simply forgot the -o option in your link line when you posted your code... please remember that when asking for help on SO--or really anywhere--it's best to create a SSCE). That should never happen for a non-special target; every rule should create a file with the identical name as the target.
Typically for C/C++ programs, the source file containing main() is named the same as the program you want to create. So for example if you want to create a program programOne then the source file containing main() should be named programOne.cpp.
If you follow these recommendations, then your entire makefile can be just:
CXX = g++
CXXFLAGS = -std=c++11 -g -Wall
all: programOne
programOne: programOne.o a.o
and that's it. Make knows how to build those targets for you so you don't have to tell it. If you want more programs, just add them as well:
all: programOne programTwo
programOne: programOne.o a.o
programTwo: programTwo.o b.o
The one issue left is header files prerequisites. If you want you can declare them yourself like this:
programOne.o: programOne.h a.h
etc. That's very simple but tedious to maintain. If you want to get make to configure them for you you can, but it's not simple. See this discussion for some ideas.

Understanding makefiles

I was looking at this flow diagram to understand how makefiles really operate but I'm still struggling to 100% understand what's going on.
I have a main.cpp file that calls upon some function that is defined in function.h and function.cpp. Then, I'm given the makefile:
main: main.cpp function.o
g++ main.cpp function.o -o main
mainAssembly: main.cpp
g++ -S main.cpp
function.o: function.cpp
g++ -c function.cpp
clean:
rm -f *.o *.S main
linkerError: main.cpp function.o
g++ main.cpp function.o -o main
What's going on? From what I understand so far is that we are compiling function.cpp, which turns into an object file? Why is this necessary?
I don't know what the mainAssembly part is really doing. I tried reading the g++ flags but I still have trouble understand what this is. Is this just compiling main.cpp with the headers? Shouldn't we also convert main into an object file as well?
I guess main is simply linking everything together into an exe called main? And I'm completely lost on what clean and linkerError are trying to do. Can someone help me understand what is going on?
That flowchart confuses more than it explains as it seems needlessly complicated. Each step is actually quite simple in isolation, and there's no point in jamming them all into one chart.
Remember a Makefile simply establishes a dependency chain, an order of operations which it tries to follow, where the file on the left is dependent on the files on the right.
Here's your first part where function.o is the product of function.cpp:
function.o: function.cpp
g++ -c function.cpp
If function.cpp changes, then the .o file must be rebuilt. This is perhaps incomplete if function.h exists, as function.cpp might #include it, so the correct definition is probably:
function.o: function.cpp function.h
g++ -c function.cpp
Now if you're wondering why you'd build a single .cpp into a single .o file, consider programs at a much larger scale. You don't want to recompile every source file every time you change anything, you only want to compile the things that are directly impacted by your changes. Editing function.cpp should only impact function.o, and not main.o as that's unrelated. However, changing function.h might impact main.o because of a reference in main.cpp. It depends on how things are referenced with #include.
This part is a little odd:
mainAssembly: main.cpp
g++ -S main.cpp
That just dumps out the compiled assembly code for main.cpp. This is an optional step and isn't necessary for building the final executable.
This part ham-fistedly assembles the two parts:
main: main.cpp function.o
g++ main.cpp function.o -o main
I say that because normally you'd compile all .cpp files to .o and then link the .o files together with your libstdc++ library and any other shared libraries you're using with a tool like ld, the linker. The final step in any typical compilation is linking to produce a binary executable or library, though g++ will silently do this for you when directed to, like here.
I think there's much better examples to work from than what you have here. This file is just full of confusion.

c++ linking and compiling flags

I may have a stupid question but as no question is stupid i'll ask it... let's imagine i have the files matrix.hpp and matrix.cpp. In those files i use assert(...) to make sure that some condition is respected. I compile this file and get a matrix.o file. Now i will use this matrix.o file in many different programs, some of them are only tests and need to check the assert(...) conditions, others are working programs that don't need these checks.
My question is : can i compile the matrix.o without the -DNDEBUG flag, thus in general, the assert(...) condition will be checked. but when i link the .o files for a program which doesn't need the checks, i add this flag without recompiling the matrix.o file.
To be more precise, would this do what i want :
# the test program with the "assert(..)" checks
test:test.o matrix.o
gcc -o $# $^
test.o:test.cpp matrix.hpp
gcc -c $^
# the real program without the "assert(..)" checks
prog:prog.o matrix.o
gcc -o $# $^ -DNDEBUG
prog.o:prog.cpp matrix.hpp
gcc -c -DNDEBUG $^
# the matrix.o that can be either checked or not if the -DNDEBUG flag
# is given when the .o files are linked
matrix.o:matrix.cpp matrix.hpp
gcc -c $^
ok thank you for your answer ! So i can't do that simply using the flags -DNDEBUG. What if each time i use "assert(...)" in the matrix files i add :
#ifdef CHECK
assert(...)
#endif
and now when i compile the "test" program i use a CHECK flag but not with the "prog" program ? I guess it won't work either...
The short answer is no. Depending on your exact circumstances there might be some clever tricks you can pull (e.g. linking in a different "assert failed" function).
Have you considered throwing an exception instead of asserting? Then, 'prog' and 'test' could take different approaches to handling it.
No, not with GCC. I see two options:
compile two versions of matrix.o and link the appropriate version into each program, or
replace assert with a manual check that throws an exception.
The latter option obviously has some runtime cost even in the non-test programs, so use it with care (not inside an inner loop).

Compile Time & Memory Usage of a large C++ Project?

Suppose one has about 50,000 different .cpp files.
Each .cpp file contains just one class that has about ~1000 lines of code in it (the code itself is not complicated -- involves in-memory operations on matrices & vectors -- i.e, no special libraries are used).
I need to build a project (in a Linux environment) that will have to import & use all of these 50,000 different .cpp files.
A couple of questions come to mind:
How long will it roughly take to compile this? What will be the approx. size of the compiled file?
What would be a better approach -- keep 50,000 different .so files (compiled extenstions) and have the main program import them one by one, or alternatively, unite these 50,000 different .cpp files into one large .cpp file, and just deal with that? Which method will be faster / more efficient?
Any insights are greatly appreicated.
There is no answer, just advice.
Right back at you: What are you really trying to do? Are you trying to make a code library from different source files? Or is that an executable? Did you actually code that many .cpp files?
50,000 source files is well... a massively sized project. Are you trying to do something common across all files (e.g. every source file represents a resource, record, image, or something unique). Or it just 50K disparate code files?
Most of your compile time will not be based on the size of each source file. It will be based on the amount of header files (and the headers they include) that will be brought in with each cpp file. Headers, while not usually containing implementations, just declarations, have to go through a compile process. And redundant headers across the code base can slow your build time down.
Large projects at that kind of scale use precompiled headers. You can include all the commonly used header files in one header file (common.h) and build common.h. Then all the other source files just include "common.h". The compiler can be configured to automatically use the compiled header file when it sees the #include "common.h" for each source.
(i) There are way too many factors involved in determining this, even an approximation is impossible. Compilation can be memory, cpu or hard drive bound. The complexity of the files matter (from your description, your complexity is low).
(ii) The typical way of doing this is to make a library and let the system figure out linking or loading. You can choose static or dynamic linking.
static linking
Assuming you are using gcc, this would look like this:
g++ -c file1.cpp -o file1.o
g++ -c file2.cpp -o file2.o
...
g++ -c filen.cpp -o filen.o
ar -rc libvector.a file1.o file2.o ... filen.o
Then, when you build your own code, your final link looks like this:
g++ myfile.cpp libvector.a -o mytask
dynamic linking
Again, assuming you are using gcc, this would look like this:
g++ -c file1.cpp -fPIC -o file1.o
g++ -c file2.cpp -fPIC -o file2.o
...
g++ -c filen.cpp -fPIC -o filen.o
ld -G file1.o file2.o ... filen.o -o libvector.so
Then, when you build your own code, your final link looks like this:
g++ myfile.cpp libvector.so -o mytask
You will need libvector.so to be in the loader's path for your executable to work.
In any case, as long as the 50,000 files don't change, you will only need to do the last command (which will be much faster).
You can build each object file from a '.cpp' with having the '.h' file having lots (and I MEAN LOTS) of forward declarations - so when you change a .h file it does not need to recompile the rest of the program. Usually a function/method needs the name of the object in its parmaters or what it is returing. If it needs other details - yes it needs to be included.
Please get a book by Scott Myers - Will help you a lot.
Oh - When trying to eat a big cake - divied it up. The slices are more manageable.
We can't really say the time it will take to compile, but what you should do is compile each .cpp/.h pair into a .o file:
$ g++ -c -o test.o test.cpp ...
Once you have all of these, you compile the main program as so:
$ g++ -c -o main.o main.cpp
$ g++ -o main main.o test.o blah.o otherThings.o foo.o bar.o baz.o etc...
Your idea of using .sos is pretty much asking "how quickly can I crash the program and possibly the OS?". Shared libraries are ment for large libraries in small numbers, not 50,000 .sos linked to a binary (especially if you load them dynamicly...that would be BAD).

Automatically choosing object files for compilation

I have recently begun writing unit tests (using GoogleTest) for a C++ project. Building the main project is fairly simple: I use GCC's -MM and -MD flags to automatically generate dependencies for my object files, then I link all of the object files together for the output executable. No surpirses.
But as I'm writing unit tests, is there a way to have make or GCC figure out which object files are needed to compile each test? Right now, I have a fairly naive solution (if you can call it that) which compiles ALL available object files together for EVERY unit test, which is obviously wasteful (in terms of both time and space). Is there a way (with make, gcc, sed, or whatever else) to divine which object files are needed for a given unit test in a fashion similar to how dependencies are generated for the original source files?
It sounds like you have two groups of source files: one that actually implements your program, and another that's all the unit tests. I assume each unit test has its own main function, and unit tests never need to call each other.
If all that's true, you can put all the files from the first group in a static library, and link each of the unit tests against that library. The linker will automatically pull from the library only the object files that are needed.
In concrete Makefile terms:
LIBRARY_OBJECTS = a.o b.o c.o d.o # etc
UNIT_TESTS = u1 u2 u3 u4 # etc
UNIT_TEST_OBJECTS = $(UNIT_TESTS:=.o)
libprogram.a: $(LIBRARY_OBJECTS)
ar cr $# $?
$(UNIT_TESTS): %: %.o libprogram.a
$(CC) $(CFLAGS) -o $# $< -lprogram
You should look to a higher abstraction of project management, like Cmake or GNU Automake.
In your Makefile
SOURCES.cpp = a.cpp b.cpp ...
OBJECTS = $(SOURCES.cpp:%.cpp=%.o)
all: program
program: $(OBJECTS)
$(LINK) -o $# $(OBJECTS)
Maybe, depending on how orderly your test system is.
If there's a nice one-to-one relationship between header and source files, then you can use some text-converting functions (or a call to sed) to convert the machine-generated rule you already have:
foo.o: foo.cc foo.h bar.h gaz.h
into a rule for the corresponding test:
unit_test_foo: unit_test_foo.o foo.o stub_bar.o stub_gaz.o
Or if you use a lot of stubs without corresponding headers (which is a warning sign) you can link with every stub except stub_foo.o. These object files are small and don't change often, so it's cheap.