Are unnecessary include files an overhead? - c++

I have seen a couple of questions on how to detect unnecessary #include files in a C++ project. This question has often intrigued me, but I have never found a satisfactory answer.
If there are some header files included which, are not being used in a c++ project, is that an overhead? I understand that it means that before compilation the contents of all the header files would be copied into the included source files and that would result in a lot of unnecessary compilation.
How far does this kind of overhead spread to the compiled object files and binaries?
Aren't compilers able to do some optimizations to make sure that this
kind of overhead is not transferred to the resulting object files and
binaries ?
Considering the fact, that I probably know nothing about compiler optimization, I still want to ask this, in case there is an answer.
As a programmer who uses a wide variety of c++ libraries for his work,
what kind of programming practices should I follow to keep avoiding
such overheads ? Is making myself intimately familiar with each
library's working the only way out ?

It does not affect the performance of the binary or even the contents of the binary file, for almost all headers. Declarations generate no code at all, inline/static/anonymous-namespace definitions are optimized away if they aren't used, and no header should include externally visible definitions (that breaks if the header is included by more than one translation unit).
As #T.C. points out, the exception are internally visible static objects with nontrivial constructors. iostream does this, for example. The program must behave as if the constructor is called, and the compiler usually doesn't have enough information to optimize the constructor away.
It does, however, affect how long compilation takes and how many files will be recompiled when a header is changed. For large projects, this is enough incentive to care about unnecessary includes.

Besides the obviously longer compile times, there might be other issues. The most important one IMHO is dependencies to external libraries. You don't want your program to depend on more libraries then necessary.
You also then need to install those libraries in every system you want to the program to build on. This can become a nightmare, especially when the next programmer needs to install some database client library although the program never uses a database.
Also, especially library headers often tend to define macros. Sometimes those macros have very generic names which will break you code or which are incompatible with other library headers you might actually need.

Of course any #include is an overhead. The compiler needs to parse that file.
So avoid them. Use forward declarations where ever possible.
It will speed up compilation. See Scott Myers book on the subject

The simple answer is YES its an overhead as far as the compilation is concerned but for runtime it is merely going to create any difference. Reason being lets say you add #include <iostream> (just for example) and assume that you are not using any of its function then g++ 4.5.2 has some additional 18,560 lines of code to process(compilation). But as far as the runtime overhead is concerned I hardly think that it creates a performance issue.
You can also refer Are unused includes harmful in C/C++? where I really liked this point made by David Young
Any singletons declared as external in a header and defined in a
source file will be included in your program. This obviously increases
memory usage and possibly contributes to a performance overhead by
causing one to access their page file more often (not much of a
problem now, as singletons are usually small-to-medium in size and
because most people I know have 6+ GB of RAM).

Related

Modern Compilers Inlining Across Cpp Files and PImpl Idiom Overhead

I was taught that in general practice its best not to try to beat the compiler, at least until its proven to be stupid. So in general, and since its only used as a hint anyways, the inline tag has never been a priority for me to use because I was taught that the compiler is pretty good when it comes to understanding the trade offs your system will have to make when it comes to inlining vs. looped calls to a function pointer for example.
But then I just read recently that functions cannot be inlined unless they are defined in the header file because otherwise the compiler won't see it when it starts to work on other header and cpp files (read from . I did some more research and I found that some people were saying that this is the old way things worked and that modern compilers perform "Whole Program Optimization" or "Link-Time Code Generation" which is responsible for the construction of the obj files that I'm used to seeing which can then be linked together and optimized further such as allowing the compiler to see functions in the obj file that were once "hidden" in the cpp file and then inline optimize them where appropriate. This sounds great but I'm wondering which source to trust and was wondering if I could get an additional opinion on the matter.
Further research lead me to a deeper understanding of the Pimpl idiom and why it is used, most obviously to save on compile time costs and increase readability and stability, but as I continued reading I noticed that the cppreference.com/cpp/pimpl documentation states that separating header information from implementation details can lead to runtime overhead as calls made to functions require at least one level of indirection in addition to increased spacial cost due to the additional pointers used. So if I was to create a real time application where runtime performance is high priority is it best to avoid the Pimpl idiom and if so should I instead use their suggested alternative and tag all functions inline and place them in the header file (if I'm understanding their suggested alternative correctly), or is their an alternative to that such as only using cpp files without causing lnk2005 errors that I'm not seeing (maybe only using one large cpp file).
Thanks for the help in advance I know it was a bit of a text crawl to get to this point.

Does storing code in header files cause memory management issues in C++?

Not so long ago one man told me that storing all the code in .h files produces some memory management issues. Because I get too many duplicates of one class. And I'll avoid this issue if I'll store a code in .h/.cpp files. Is that true?
I've already googled some info on this subject and I read that it's all just about habit. So what about memory problems in case of storing all the code in .h files?
... one man told me that storing all the code in .h files produces some memory management issues. [...] And I'll avoid this issue if I'll store a code in .h/.cpp files. Is that true?
Memory management usually refers to handling of dynamic memory at runtime. For clarity: Writing all your code in headers has nothing to do with that. However, doing so may increase the amount of memory that the compiler uses indeed. So, if that's what the one man meant, then yes, it could potentially be true - but it's not the biggest problem with the approach.
Because I get too many duplicates of one class
This is a silly argument. In fact, using only header files for definitions means that there is exactly one translation unit where the class definition is included exactly once.
The potential problem is that your single translation unit includes all definitions in all header files. Since everything is processed in one go, the maximum memory required by the compiler is potentially higher. It's probably insignificant to most projects, but for something like libreoffice, it could probably become a problem.
A bigger problem with defining all functions in headers - i.e. using a single translation unit - is that any change to any header, however small, will cause the single massive translation unit to change, and you'll be required to compile it again. With multiple, smaller, translation units, only ones that are affected by the change will need recompilation.
So, the problem with your approach will be that every recompilation is as slow as compiling from scratch. Sure, if the compilation takes a minute, it doesn't matter, but for projects that take hours to compile from scratch, this is essential.
Your "one man" doesn't know what he's talking about.
Use of header files can only (potentially) cause the compiler (or programs that implement phases of compilation) to consume more memory during compilation. That depends on how the compiler is implemented though. And a quality compiler will deal with it. If some compilation unit has enough code to cause a compiler to run out of memory then that's another concern .... and largely unrelated to use of header files.
If you write code that causes memory management concerns in your program, putting that code into header files makes no difference - bad code will cause a program to have memory management concerns (leaks, using dangling pointers, etc) regardless of whether parts of that code is in header files.
If header files are used carelessly, there are some situations where multiple object files can each contain a definition of some functions. Technically, this can increase memory usage of a program, but there are also coding techniques to mitigate that - albeit some details of the techniques may differ a little if using header files. In other words, it is writing poor code that makes the difference, not (on its own) putting code into header files.
There are other concerns with use of header files (or the preprocessor more generally). But there are benefits to using header files, and techniques to manage such concerns. And those concerns are practically unrelated to memory management issues anyway.
Practically, it is usually better to split code sensibly between header files and compilation units (aka .h and .cpp files). But there are also cases where putting function definitions for a library in header files are beneficial (e.g template libraries).

Are unused includes harmful in C/C++?

What are the negative consequences of unused includes?
I'm aware they result in increased binary size (or do they?), anything else?
Increases compilation time (potentially serious issue)
Pollutes global namespace.
Potential clash of preprocessor names.
If unused headers are included from third-party libraries, it may make such libraries unnecessarily maintained as dependencies.
They don't necessarily increase binary size, but will increase compile time.
The main problem is clutter. These are the three main aspects in which the clutter manifests:
Visual pollution; while you are trying to figure other includes that you do need.
Logical pollution; it is more likely to have collision of functions, more time to compile (it might be really small for a couple of includes, but if it becomes a "policy" to not cleaning up unneeded includes, it might become a significant hurdle).
Dependency opacity; since there are more headers to analyze is it harder to determine the dependency cycles in your code. Knowing what are the dependencies in your code is crucial when your codebase grows to any significant level beyond the hobbyist level.
Generally speaking, yes, it does cause some problems. Logically speaking, if you don't need it then don't include it.
Any singletons declared as external in a header and defined in a source file will be included in your program. This obviously increases memory usage and possibly contributes to a performance overhead by causing one to access their page file more often (not much of a problem now, as singletons are usually small-to-medium in size and because most people I know have 6+ GB of RAM).
Compilation time is increased, and for large commercial projects where one compiles often, this can cause a loss of money. It might only add a few seconds on to your total time, but multiply that by the several hundred compiles or so you might need to test and debug and you've got a huge waste of time which thus translates into a loss in profit.
The more headers you have, the higher the chance that you may have a prepossessing collision with a macro you defined in your program or another header. This can be avoided via correct use of namespaces but it's still such a hassle to find. Again, lost profit.
Contributes to code bloat (longer files and thus more to read) and can majorly increase the number of results you find in your IDE's auto complete tool (some people are religiously against these tools, but they do increase productivity to an extent).
You can accidentally link other external libraries into your program without even knowing it.
You may inadvertently cause the end of the world by doing this.
I'll assume the headers can all be considered as "sincere", that is, are not precisely written with the aim of sabotaging your code.
It will usually slow the compilation (pre-compiled headers will mitigate this point)
it implies dependencies where none really exist (this is a semantic errors, not an actual error)
macros will pollute your code (mitigated by the prefixing of macros with namespace-like names, as in BOOST_FOREACH instead of FOREACH)
an header could imply a link to another library. in some case, an unused header could ask the linker to link your code with an external library (see MSCV's #pragma comment(lib, "")). I believe a good linker would not keep the library's reference if it's not used (IIRC, MSVC's linker will not keep the reference of an unused library).
a removed header is one less source of unexpected bugs. if you don't trust the header (some coders are better than others...), then removing it removes a risk (you won't like including an header changing the struct alignment of everything after it : the generated bugs are... illuminating...).
an header's static variable declaration will pollute your code. Each static variable declaration will result in a global variable declared in your compiled source.
C symbol names will pollute your code. The declarations in the header will pollute your global or struct namespace (and more probably, both, as structs are usually typedef-ed to bring their type into the global namespace). This is mitigated by libraries prefixing their symbols with some kind of "namespace name", like SDL_CreateMutex for SDL.
non-namespaced C++ symbol names will pollute your code. For the same reasons above. The same goes for headers making wrong use of the using namespace statement. Now, correct C++ code will namespace its symbols. Yes, this means that you should usually not trust a C++ header declaring its symbols in the global namespace...
Whether or not they increase the binary size really depends on what's in them.
The main side-effect is probably the negative impact on compilation speed. Again, how big an impact depends on what's in them, how much and whether they include other headers.
Well for one leaving them there only prolong the compile time and adds unnecessary compilation dependencies.
They represent clumsy design.
If you are not sure what to include and what not to include, it shows the developer had no idea what he was doing.
Include files are meant to be included only when the are need. It may not be that much of issue as the computer memory and speed is growing by leaps and bounds these days but it was once perhaps.
If an include is not needed but included anyhow, I would recommend to put a comment next to it saying why you included it. If a new developers get on to your code, he will have much appreciation for you, if you have done it the right way.
include means you are adding some more declarations. So when you are writing your own global function, you need to be carefull wheather that function is already declaerd in the header included.
Ex. if you write your own class auto_ptr{} without including "memory", it will work fine. but whenever you will include memory, compiler gives error as it already has been declared in memory header file
Yes, they can increase binary size because of extern unused variables.
//---- in unused includes ----
extern int /* or a big class */ unused_var;
//---- in third party library ----
int unused_var = 13;

How to find compilation bottlenecks?

How do I find which parts of code are taking a long time to compile?
I am already using precompiled headers for all of my headers, and they definitely improve the compilation speed. Nevertheless, whenever I make a change to my C++ source file, compiling it takes a long time (this is CPU/memory-bound, not I/O-bound -- it's all cached). Furthermore, this is not related to the linking portion, just the compilation portion.
I've tried turning on /showIncludes, but of course, since I'm using precompiled headers, nothing is getting included after stdafx.h. So I know it's only the source code that takes a while to compile, but I don't know what part of it.
I've also tried doing a minimal build, but it doesn't help. Neither does /MP, because it's a single source file anyway.
I could try dissecting the source code and figuring out which part is a bottleneck by adding/removing it, but that's a pain and doesn't scale. Furthermore, it's hard to remove something and still let the code compile -- error messages, if any, come back almost immediately.
Is there a better way to figure out what's slowing down the compilation?
Or, if there isn't a way: are there any language constructs (e.g. templates?) that take a lot longer to compile?
What I have in my C++ source code:
Three (relatively large) ATL dialog classes (including the definitions/logic).
They could very well be the cause, but they are the core part of the program anyway, so obviously they need to be recompiled whenever I change them.
Random one-line (or similarly small) utility functions, e.g. a byte-array-to-hex converter
References to (inline) classes found inside my header files. (One of the header files is gigantic, but it uses templates only minimally, and of course it's precompiled. The other one is the TR1 regex -- it's huge, but it's barely used.)
Note:
I'm looking for techniques that I can apply more generally in figuring out the cause of these issues, not specific recommendations for my very particular situation. Hopefully that would be more useful to other people as well.
Two general ways to improve the compilation time :
instead of including headers in headers, use forward declare (include headers only in the source files)
minimize templated code (if you can avoid using templates)
Only these two rules will greatly improve your build time.
You can find more tricks in "Large-Scale C++ Software Design" by Lakos.
For visual studio (I am not sure if it is too old), take a look into this : How should I detect unnecessary #include files in a large C++ project?
Template code generally takes longer to compile.
You could investigate using "compiler firewalls", which reduce the frequency of a .cpp file having to build (they can reduce time to read included files as well because of the forward declarations).
You can also shift time spent doing code generation from the compiler to the linker by using Link-Time Code Generation and/or Whole Program Optimization, though generally you lose time in the long run.

What's the rationale behind headers?

I don't quite understand the point of having a header; it seems to violate the DRY principle! All the information in a header is (can be) contained in the implementation.
It simplifies the compilation process. When you want to compile units independently, you need something to describe the parts that will be linked to without having to import the entirety of all the other files.
It also allows for code hiding. One can distribute a header to allow others to use the functionality without having to distribute the implementation.
Finally, it can encourage the separation of interface from implementation.
They are not the only way to solve these problems, but 30 years ago they were a good one. We probably wouldn't use header files for a language today, but they weren't invented in 2009.
The architects of many modern languages such as Java, Eiffel and C# clearly agree with you -- those languages extract the metadata about a module from the implementation. However, per se, the concept of headers doesn't preclude that -- it would obviously be a simple task for a compiler to extract a .h file while compiling a .c, for example, just like the compilers for those other languages do implicitly. The fact that typical current C compilers do not do it is not a language design issue -- it's an implementation issue; apparently there's no demand by users for such a feature, so no compiler vendor bothers implementing it.
As a language design choice, having separate .h files (in a human-readable and editable text format) gives you the best of both worlds: you can start separately compiling client code based on a module implementation that doesn't yet exist, if you wish, by writing the .h file by hand; or you (assuming by absurd a compiler implementation that supplies it;-) can get the .h file automatically from the implementation as a side effect of compiling it.
If C, C++, &c, keep thriving (apparently they're still doing fine today;-), and demand like yours for not manually writing headers grows, eventually compiler writers will have to supply the "header generation" option, and the "best of both worlds" won't stay theoretical!-)
It helps to think a bit about the capabilities of the computers that were available when, say c, was written. Main memory was measured in kilowords, and not necessarily very many of them. Disks were bigger, but not much. Serrious storage meant reel-to-reel tapes, mounted by hand, by grumpy operators, who really wanted you to go away so they could play hunt the wumpus. A 1 MIPS machine was screaming fast. And with all these limitation you had to share it. Possibly with a score of other users.
Anything that reduced the space or time complexity of compilation was a big win. And headers do both.
Don't forget the documentation a header provides. There is usually anything in it you need to know for using the module. I for my part don't want to scan through a looong sourcecode to learn what there is that I need to use and how to call it... You would extract this information anyway, which effectively results in -- a header file. No longer an issue with modern IDEs, of course, but working with some old C code I really love to have hand-crafted header files that include comments about the usage and about pre- and postconditions.
Keeping source, header and additional documentation in sync still is another can of worms...
The whole idea of inspecting the binary output files of language processors would have been hard to comprehend when C invented .h files. There was a system called JOVIAL that did something like it, but it was exotic and confined more-or-less exclusively to military projects. (I've never seen a JOVIAL program, I've only heard about it.)
So when C came out the usual design pattern for modularity was "no checks whatsoever". There might be a restriction that .text symbols could only link to .text and .data to .data, but that was it. That is, the compilers of the day typically processed one source file at a time and then linkers put them together without the slightest level of error checking other than, if you were lucky, "I'm a function symbol" vs "I'm a data symbol".
So the idea of actually having the compiler understand the thing you were calling was somewhat new.
Even today, if you make a totally bogus header, no one catches you in most AOT compilers. Clever things like CLR languages and Java actually do encode things in the class files.
So yes, in the long run, we probably won't have header files.
No you dont have headers in Java -- but you do have interfaces and I every serious Java guru recommends you define anything used by other projects/systems as an interface and an implementation.
Lets see a java interface definition contains call signatures, type definitions and contants.
MOST C header files contain call signatures, type definitions and constants.
So for all pratical purposes C/C++ header files are just interface definitions and should thus be considered a Good Thing. Now I know its possible to define a myriad other things in header files as well (MARCROs, constants etc. etc. ) but that just part of the whole wonderful world of C:-
int function target () {
// Default for shoot
return FOOT;
}
For Detail Read this
A header file commonly contains forward declarations of classes, subroutines, variables, and other identifiers. Programmers who wish to declare standardized identifiers in more than one source file can place such identifiers in a single header file, which other code can then include whenever the header contents are required.
The C standard library and C++ standard library traditionally declare their standard functions in header files.
And what if you want to give somebody else the declarations to use your library without giving them the implementation?
As another answer points out - the original reason for headers was to make the parse/compile easier on platforms with very simple and limited tools. It was a great step forward to have a machine with 2 floppies so you could have the compiler on one and your code on the other - made things a lot easier.
When you divide code in header and source files you divide declaration and definition. When you look in header files you can see what you have and if you wand to see implementation details you go to source file.