Does #include affect program size?

Does #include affect program size? - c++

When my cpp file uses #include to add some header, does my final program's size gets bigger? Header aren't considered as compilation units, but the content of the header file is added to the actual source file by the preprocessor, so will the size of the output file (either exe or dll) be affected by this?
Edit: I forgot to mention that the question is not about templates/inline functions. I meant what will happen if I place an #include to a header that doesn't have any implementation detail of functions. Thanks.

It depends on the contents, and how your compiler is implmented. It is quite possible that if you don't use anything in the header your compiler will be smart enough to not add any of it to your executable.
However, I wouldn't count on that. I know that back in the VC++ 6 days we discovered that meerly #including Windows.h added 64K to the excecutable for each source file that did it.

You clarified that:
[The header has no] templates/inline functions... doesn't have any implementation detail of functions.
Generally speaking, no, adding a header file won't affect program size.
You could test this. Take a program that already builds, and check the executable size. Then go into each .cpp file and include a standard C or C++ header file that isn't actually needed in that file. Build the program and check the executable size again - it should be the same size as before.
By and large, the only things that affect executable size are those that cause the compiler to either generate different amounts of code, global/static variable initializations, or DLLs/shared library usages. And even then, if any such items aren't needed for the program to operate, most modern linkers will toss those things out.
So including header files that only contain things like function prototypes, class/struct definitions without inlines, and definitions of enums shouldn't change anything.
However, there are certainly exceptions. Here are a few.
One is if you have an unsophisticated linker. Then, if you add a header file that generates things the program doesn't actually need, and the linker doesn't toss them out, the executable size will bloat. (Some people deliberately build linkers this way because the link time can become insanely fast.)
Many times, adding a header file that adds or changes a preprocessor symbol definition will change what the compiler generates. For instance, assert.h (or cassert) defines the assert() macro. If you include a header file in a .c/.cpp file that changes the definition of the NDEBUG preprocessor symbol, it will change whether assert() usages generate any code, and thus change the executable size.
Also, adding a header file that changes compiler options will change the executable size. For instance, many compilers let you change the default "packing" of structs via something like a #pragma pack line. So if you add a header file that changes structure packing in a .c/.cpp file, the compiler will generate different code for dealing with structs, and hence change the executable size.
And as someone else pointed out, when you're dealing with Visual C++/Visual Studio, all bets are off. Microsoft has, shall we say, a unique perspective around their development tools which is not shared by people writing compiler systems on other platforms.

With modern compilers included files only affect the binaries size if they contain static data or if you use normal or inline function that are defined in them.

Yes, because for example inline functions can be defined in header files, and the code of those inline functions will ofcourse be added to your program's code when you call those functions. Also template instantiations will be added to the code of your program.

And you can also define global variables in header files (although I wouldn't recommend it). If you do, you should surround them with #ifdef/#end blocks so that they don't get defined in more than one compilation unit (or the linker would complain). In any case, this would increase the program's size.

Remember, #define sentences can bloat the code at compile time into a huge, yet functional compiled file.

For headers that are entirely declarative (as they generally should be), no. However the pre-processor and compiler have to take time to parse the headers, so headers can increase compile time; especially large and deeply nested ones such as - which is why some compilers use 'precompiled headers'.
Even inline and template code does not increase code size directly. If the template is never instantiated or an inline function not called, the code will not be generated. However if it is called/instantiated, code size can grow rapidly. If the compiler actually inlines the code (a compiler is not obliged to do so, and most don't unless forced to), the code duplication may be significant. Even if it is not truly inlined by the compiler, it is still instantiated statically in every object module that references it - it takes a smart linker to remove duplicates, and that is not a given - if separate object files were compiled with different options, inline code from the same source may not generate identical code in each object file, so would not even be duplicated. In the case of templates, and separate instantiation will be created for each type in is invoked for.

It's good practice to limit the #includes in a file to those that are necessary. Besides affecting the executable size, having extra #includes will cause a larger list of compile-time dependencies which will increase your build-time if you change a commonly #included header file.

Related

Disadvantages of condensing .cpp files?

When compiling a 'static library' project in MSVC++, I often get .lib files that are several MB in size. If I use conditional macros and include directives to "condense" all my .cpp files in one .cpp file at compile time, the .lib file size decreases considerably.
Are there any disadvantages with this practice?

The main problem of Unity Builds as they are called is that they break the way C++ works.
In C++, a source file, with its includes preprocessed, is called a Translation Unit. Some symbols are "private" to this translation unit:
symbols declared static at namespace level
anything declared in anonymous namespace
If you merge several C++ files, then the compiler will share those private symbols among all the files that are merged together since from its point of view this has become a single Translation Unit.
You will get an error if two local classes suddenly have the same name, and idem for constants. Annoying as hell, but at least you are notified.
For functions however, it may break silently because of overload. When before the compiler would pick static void launch(short u); for your call to launch(1), then suddenly it will shift to static void launch(int i, Target t = "Irak");. Oups ?
Unity Builds are dangerous. What you are looking for is called WPO (Whole Program Optimization) or LTO (Link Time Optimization), look into the innards of your compiler manual to know how to activate it.

A disadvantage would be if you change a single line in the cpp you have to compile the whole code.

Your file might get more complex and you'll have to recompile everyting even if you just change one single source file. Other then that, there's no real disadvantage, unless the files are redefining local functions or variables that might screw you up, when merging everything (e.g. due to multiple definitions within one translation unit).
The size decrease you notice is due to advanced optimizations that become available that way (e.g. reusing more code). Depending on your code you might get similar results by enabling all optimizations for size as well as link time optimizations, which might result in some acceptable solution between both approaches.

It's usually a confusing practice to include cpp to another cpp (at least you should leave explanatory comment about why did you do this).

Are functions in C/C++ headers a no-go?

I'm working on a very tiny piece of C/C++ source code. The program reads input values from stdin, processes them with an algorithm and writes the results to stdout.
I would just implement all that in a single file, but I also want test cases for the algorithm (not the input/output reading), so I have the following files in my project:
main.cpp
sort.hpp
sort_test.cpp
I implement the algorithm in sort.hpp right away, no sort.cpp. It's rather short and doesn't have any dependencies.
Would you say that, in some cases, functions defined in headers are okay, even if they are sophisticated algorithms and not just simple accessors/mutators? Or is there a reason I should avoid this? When should I move code from header to source file?

There is nothing wrong with having functions in header files, as long as you understand the tradeoff. Putting them in a header file means they'll have to be compiled (and recompiled) in any translation unit that includes the header. (and they have to be declared inline, or you will get linker errors.)
In projects with many translation units, that may add up to a noticeable slowdown in compile times, if you do it a lot.
On the other hand, it ensures that the function definition is visible everywhere the function is called -- and that means that it can be trivially inlined, so the resulting program may run faster.
And finally, with function templates, you typically have no realistic alternative. The definition must be visible at the call site, and the only practical way to achieve that is to put it in a header.
A final consideration is that header-only libraries are easier to deploy and use. You don't need to link against anything, you don't have to worry about ABI's or anything else. You just add the headers to your project, include them and off you go.
Quite a few popular libraries use a header-only strategy.

When you put functions in headers you have to make sure to declare them inline. This is required to avoid a duplicate definition warning when more than one .cpp file include that header file. Generally you should only put small functions inside header files because it will be compiled for each cpp file that includes the header which will slow down compilation time and also results in code bloat; a larger executable file.

It's OK to put any function in the header as long as it's inline. Things such as functions defined inside class { } and templates are implicitly inline.
If the resulting application becomes too large, then optimize the code size. Optimizing before there is a problem is an anti-pattern, especially when there is a benefit to doing it "your way," and the fix is as simple as moving from one file to another and erasing inline.
Of course, if you want to distribute the code as a library, then deciding between a header, static library, or dynamic library binary is an important decision affecting the users.

The vast majority of the boost libraries are header-only, so I'd say: Yes, this is an established and accepted practice. Just don't forget to inline.

That really is a stile choice. But putting it in the header does mean that it will be inline code rather than a function. If you wanted that same functionality, you could use the inline keyword:
inline int max(int a, int b)
{
return (a > b) ? a : b;
}
http://en.wikipedia.org/wiki/Inline_function

The reason you should avoid this in general (for non inline functions) is because multiple source files will be including your header, creating linker errors.
It doesn't matter if you have a pramga once or similar trick - the duplication will show up if you have more than one compilation unit (e.g. cpp files) including the same header.

If you wish to inline the function, it MUST be in the header else it can't get inlined.
If you publish a header with your libraries and the header has some sort of implementation in it, you can be sure that after a few years if you change the implementation and it doesn't work exactly the same way as it did before, some peoples code will break since thay will have come to rely on the implementation they saw in the header. Yeah i know one should not do it but many people do look in header for the implementation and other behaviour they can exploit/use in a not intended way to overcome some problem they are having.
If you are planning to use templates then you have no choice but to put it all in header. (this might not be necessary if you compiler supports export templates but there is only 1 i know of).

Its ok to have the implementation in the header. It depends on what you need. If you separate the definition to a different file then the compiler will create symbols with external linkage if you dont want that you can define the functions inside the header itself. But you would be wasting some amount of memory for the code segment. If you include this header file in two different files then both files codes segment will have this function definition.
If other header file is going to have a function with similar name then its going to be a problem. Then you have to use inline.

Why is including a header file such an evil thing?

I have seen many explanations on when to use forward declarations over including header files, but few of them go into why it is important to do so. Some of the reasons I have seen include the following:
compilation speed
reducing complexity of header file management
removing cyclic dependencies
Coming from a .net background I find header management frustrating. I have this feeling I need to master forward declarations, but I have been scrapping by on includes so far.
Why cannot the compiler work for me and figure out my dependencies using one mechanism (includes)?
How do forward declarations speed up compilations since at some point the object referenced will need to be compiled?
I can buy the argument for reduced complexity, but what would a practical example of this be?

"to master forward declarations" is not a requirement, it's a useful guideline where possible.
When a header is included, and it pulls in more headers, and yet more, the compiler has to do a lot of work processing a single translation module.
You can see how much, for example, with gcc -E:
A single #include <iostream> gives my g++ 4.5.2 additional 18,560 lines of code to process.
A #include <boost/asio.hpp> adds another 74,906 lines.
A #include <boost/spirit/include/qi.hpp> adds 154,024 lines, that's over 5 MB of code.
This adds up, especially if carelessly included in some file that's included in every file of your project.
Sometimes going over old code and pruning unnecessary includes improves the compilation dramatically just because of that. Replacing includes with forward declarations in the translation modules where only references or pointers to some class are used, improves this even further.

Why cannot the compiler work for me and figure out my dependencies using one mechanism (includes)?
It cannot because, unlike some other languages, C++ has an ambiguous grammar:
int f(X);
Is it a function declaration or a variable definition? To answer this question the compiler must know what does X mean, so X must be declared before that line.

Because when you're doing something like this :
bar.h :
class Bar {
int foo(Foo &);
}
Then the compiler does not need to know how the Foo struct / class is defined ; so importing the header that defines Foo is useless. Moreover, importing the header that defines Foo might also need importing the header that defines some other class that Foo uses ; and this might mean importing the header that defines some other class, etc.... turtles all the way.
In the end, the file that the compiler is working against is almost like the result of copy pasting all the headers ; so it will get big for no good reason, and when someone makes a typo in a header file that you don't need (or import , or something like that), then compiling your class starts to take waaay too much time (or fail for no obvious reason).
So it's a good thing to give as little info as needed to the compiler.

How do forward declarations speed up compilations since at some point the object referenced will need to be compiled?
1) reduced disk i/o (fewer files to open, fewer times)
2) reduced memory/cpu usage
most translations need only a name. if you use/allocate the object, you'll need its declaration.
this is probably where it will click for you: each file you compile compiles what is visible in its translation.
a poorly maintained system will end up including a ton of stuff it does not need - then this gets compiled for every file it sees. by using forwards where possible, you can bypass that, and significantly reduce the number of times a public interface (and all of its included dependencies) must be compiled.
that is to say: the content of the header won't be compiled once. it will be compiled over and over. everything in this translation must be parsed, checked that it's a valid program, checked for warnings, optimized, etc. many, many times.
including lazily only adds significant disk/cpu/memory increase, which turns into intolerable build times for you, while introducing significant dependencies (in non-trivial projects).
I can buy the argument for reduced complexity, but what would a practical example of this be?
unnecessary includes introduce dependencies as side effects. when you edit an include (necessary or not), then every file which includes it must be recompiled (not trivial when hundreds of thousands of files must be unnecessarily opened and compiled).
Lakos wrote a good book which covers this in detail:
http://www.amazon.com/Large-Scale-Software-Design-John-Lakos/dp/0201633620/ref=sr_1_1?ie=UTF8&s=books&qid=1304529571&sr=8-1

Header file inclusion rules specified in this article will help reduce the effort in managing header files.

I used forward declarations simply to reduce the amount of navigation between source files done. e.g. if module X calls some glue or interface function F in module Y, then using a forward declaration means the writing the function and the call can be done by only visiting 2 places, X.c and Y.c not so much of an issue when a good IDE helps you navigate, but I tend to prefer coding bottom-up creating working code then figuring out how to wrap it rather than through top down interface specification.. as the interfaces themselves evolve it's handy to not have to write them out in full.
In C (or c++ minus classes) it's possible to truly keep structure details Private by only defining them in the source files that use them, and only exposing forward declarations to the outside world - a level of black boxing that requires performance-destroying virtuals in the c++/classes way of doing things. It's also possible to avoid needing to prototype things (visiting the header) by listing 'bottom-up' within the source files (good old static keyword).
The pain of managing headers can sometimes expose how modular your program is or isn't - if its' truly modular, the number of headers you have to visit and the amount of code & datastructures declared within them should be minimized.
Working on a big project with 'everything included everywhere' through precompiled headers won't encourage this real modularity.
module dependancies can correlate with data-flow relating to performance issues, i.e. both i-cache & d-cache issues. If a program involves many modules that call each other & modify data at many random places, it's likely to have poor cache-coherency - the process of optimizing such a program will often involve breaking up passes and adding intermediate data.. often playing havoc with many'class diagrams'/'frameworks' (or at least requiring the creation of many intermediates datastructures). Heavy template use often means complex pointer-chasing cache-destroying data structures. In its optimized state, dependancies & pointer chasing will be reduced.

I believe forward declarations speed up compilation because the header file is ONLY included where it is actually used. This reduces the need to open and close the file once. You are correct that at some point the object referenced will need to be compiled, but if I am only using a pointer to that object in my other .h file, why actually include it? If I tell the compiler I am using a pointer to a class, that's all it needs (as long as I am not calling any methods on that class.)
This is not the end of it. Those .h files include other .h files... So, for a large project, opening, reading, and closing, all the .h files which are included repetitively can become a significant overhead. Even with #IF checks, you still have to open and close them a lot.
We practice this at my source of employment. My boss explained this in a similar way, but I'm sure his explanation was more clear.

How do forward declarations speed up compilations since at some point the object referenced will need to be compiled?
Because include is a preprocessor thing, which means it is done via brute force when parsing the file. Your object will be compiled once (compiler) then linked (linker) as appropriate later.
In C/C++, when you compile, you've got to remember there is a whole chain of tools involved (preprocessor, compiler, linker plus build management tools like make or Visual Studio, etc...)

Good and evil. The battle continues, but now on the battle field of header files. Header files are a necessity and a feature of the language, but they can create a lot of unnecessary overhead if used in a non optimal way, e.g. not using forward declarations etc.
How do forward declarations speed up
compilations since at some point the
object referenced will need to be
compiled?
I can buy the argument for reduced
complexity, but what would a practical
example of this be?
Forward declarations are bad ass. My experience is that a lot of c++ programmers are not aware of the fact that you don't have to include any header file, unless you actually want to use some type, e.g. you need to have the type defined so the compiler understands what you want to do. It's important to try and refrain from including header files in other header files.
Just passing around a pointer from one function to another, only requires a forward declaration:
// someFile.h
class CSomeClass;
void SomeFunctionUsingSomeClass(CSomeClass* foo);
Including someFile.h does not require you to include the header file of CSomeClass, since you are merely passing a pointer to it, not using the class. This means that the compiler only needs to parse one line (class CSomeClass;) instead of an entire header file (that might be chained to other header files etc etc).
This reduces both compile time and link time, and we are talking big optimizations here if you have many headers and many classes.

using macro defined in header files

I have a macro definition in header file like this:
// header.h
ARRAY_SZ(a) = ((int) sizeof(a)/sizeof(a[0]));
This is defined in some header file, which includes some more header files.
Now, i need to use this macro in some source file that has no other reason to include header.h or any other header files included in header.h, so should i redefine the macro in my source file or simply include the header file header.h.
Will the latter approach affect the code size/compile time (I think yes), or runtime (i think no)?
Your advice on this!

Include the header file or break it out into a smaller unit and include that in the original header and in your code.
As for code size, unless your headers do something incredibly ill-advised, like declaring variables or defining functions, they should not affect the memory footprint much, if at all. They will affect your compile time to a certain degree as well as polluting your name space.

Including the header in the source file might affect compile time slightly unless you are using pre-compiled headers. It shouldn't affect the code size though. Redefining the macro shouldn't have any effect on compile time or size. It is more of a maintenance and consistency issue though.

should i redefine the macro in my source file or simply include the header file header.h.
Neither. Instead you should clean up the code and break header.h so that one can use ARRAY_SZ() without also getting unrelated stuff.

You ask:
Will the latter approach affect the
code size/compile time (I think yes)
In the case of the specific macro, the answer is "no" to the size, because the sizeof expression can be evaluated at compile time, and therefore "yes" to the time. Neither are likely to be remotely significant.

Unless you're running this on a really limited bit of hardware, or this is called billions and billions of times, you won't notice any difference between the two at either compile time or run time.
Go for whatever seems more readable / maintainable.
Personally, I'd suggest there are better ways of achieving what you're doing there without involving macros (namely inline functions and/or function templates). You have to be careful using your solution because there are a few gotchas you need to keep an eye on.

Including that header and all other headers included into it will increase the compile time. It can affect runtime if there're other definitions that will change how your code compiles - if your code compiles differently because of those defines it will of course run differently. Although the latter is not usual be careful.

Writing function definition in header files in C++

I have a class which has many small functions. By small functions, I mean functions that doesn't do any processing but just return a literal value. Something like:
string Foo::method() const{
return "A";
}
I have created a header file "Foo.h" and source file "Foo.cpp". But since the function is very small, I am thinking about putting it in the header file itself. I have the following questions:
Is there any performance or other issues if I put these function definition in header file? I will have many functions like this.
My understanding is when the compilation is done, compiler will expand the header file and place it where it is included. Is that correct?

If the function is small (the chance you would change it often is low), and if the function can be put into the header without including myriads of other headers (because your function depends on them), it is perfectly valid to do so. If you declare them extern inline, then the compiler is required to give it the same address for every compilation unit:
headera.h:
inline string method() {
return something;
}
Member functions are implicit inline provided they are defined inside their class. The same stuff is true for them true: If they can be put into the header without hassle, you can indeed do so.
Because the code of the function is put into the header and visible, the compiler is able to inline calls to them, that is, putting code of the function directly at the call site (not so much because you put inline before it, but more because the compiler decides that way, though. Putting inline only is a hint to the compiler regarding that). That can result in a performance improvement, because the compiler now sees where arguments match variables local to the function, and where argument doesn't alias each other - and last but not least, function frame allocation isn't needed anymore.
My understanding is when the compilation is done, compiler will expand the header file and place it where it is included. Is that correct?
Yes, that is correct. The function will be defined in every place where you include its header. The compiler will care about putting only one instance of it into the resulting program, by eliminating the others.

Depending on your compiler and it's settings it may do any of the following:
It may ignore the inline keyword (it
is just a hint to the compiler, not a
command) and generate stand-alone
functions. It may do this if your
functions exceed a compiler-dependent
complexity threshold. e.g. too many
nested loops.
It may decide than your stand-alone
function is a good candidate for
inline expansion.
In many cases, the compiler is in a much better position to determine if a function should be inlined than you are, so there is no point in second-guessing it. I like to use implicit inlining when a class has many small functions only because it's convenient to have the implementation right there in the class. This doesn't work so well for larger functions.
The other thing to keep in mind is that if you are exporting a class in a DLL/shared library (not a good idea IMHO, but people do it anyway) you need to be really careful with inline functions. If the compiler that built the DLL decides a function should be inlined you have a couple of potential problems:
The compiler building the program
using the DLL might decide to not
inline the function so it will
generate a symbol reference to a
function that doesn't exist and the
DLL will not load.
If you update the DLL and change the
inlined function, the client program
will still be using the old version
of that function since the function
got inlined into the client code.

There will be an increase in performance because implementation in header files are implicitly inlined. As you mentioned your functions are small, inline operation will be so beneficial for you IMHO.
What you say about compiler is also true.There is no difference for compiler—other than inlining—between code in header file or .cpp file.

If your functions are that simple, make them inline, and you'll have to stick them in the header file anyway. Other than that, any conventions are just that - conventions.
Yes, the compiler does expand the header file where it encounters the #include statements.

It depends on the coding standards that apply in your case but:
Small functions without loops and anything else should be inlined for better performance (but slightly larger code - important for some constrained or embedded applications).
If you have the body of the function in the header you will have it by default inline(d) (which is a good thing when it comes to speed).
Before the object file is created by the compiler the preprocessor is called (-E option for gcc) and the result is sent to the compiler which creates the object out of code.
So the shorter answer is:
-- Declaring functions in header is good for speed (but not for space) --

C++ won’t complain if you do, but generally speaking, you shouldn’t.
when you #include a file, the entire content of the included file is inserted at the point of inclusion. This means that any definitions you put in your header get copied into every file that includes that header.
For small projects, this isn’t likely to be much of an issue. But for larger projects, this can make things take much longer to compile (as the same code gets recompiled each time it is encountered) and could significantly bloat the size of your executable. If you make a change to a definition in a code file, only that .cpp file needs to be recompiled. If you make a change to a definition in a header file, every code file that includes the header needs to be recompiled. One small change can cause you to have to recompile your entire project!
Sometimes exceptions are made for trivial functions that are unlikely to change (e.g. where the function definition is one line).
Source: http://archive.li/ACYlo (previous version of Chapter 1.9 on learncpp.com)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Does #include affect program size? - c++

With modern compilers included files only affect the binaries size if they contain static data or if you use normal or inline function that are defined in them.

Yes, because for example inline functions can be defined in header files, and the code of those inline functions will ofcourse be added to your program's code when you call those functions. Also template instantiations will be added to the code of your program.

Remember, #define sentences can bloat the code at compile time into a huge, yet functional compiled file.

It's good practice to limit the #includes in a file to those that are necessary. Besides affecting the executable size, having extra #includes will cause a larger list of compile-time dependencies which will increase your build-time if you change a commonly #included header file.

Related

Disadvantages of condensing .cpp files?

Are functions in C/C++ headers a no-go?

Why is including a header file such an evil thing?

using macro defined in header files

Writing function definition in header files in C++

Categories

Resources