Detecting superfluous #includes in C/C++? - c++

I often find that the headers section of a file get larger and larger all the time but it never gets smaller. Throughout the life of a source file classes may have moved and been refactored and it's very possible that there are quite a few #includes that don't need to be there and anymore. Leaving them there only prolong the compile time and adds unnecessary compilation dependencies. Trying to figure out which are still needed can be quite tedious.
Is there some kind of tool that can detect superfluous #include directives and suggest which ones I can safely remove?
Does lint do this maybe?

Google's cppclean (links to: download, documentation) can find several categories of C++ problems, and it can now find superfluous #includes.
There's also a Clang-based tool, include-what-you-use, that can do this. include-what-you-use can even suggest forward declarations (so you don't have to #include so much) and optionally clean up your #includes for you.
Current versions of Eclipse CDT also have this functionality built in: going under the Source menu and clicking Organize Includes will alphabetize your #include's, add any headers that Eclipse thinks you're using without directly including them, and comments out any headers that it doesn't think you need. This feature isn't 100% reliable, however.

Also check out include-what-you-use, which solves a similar problem.

It's not automatic, but doxygen will produce dependency diagrams for #included files. You will have to go through them visually, but they can be very useful for getting a picture of what is using what.

The problem with detecting superfluous includes is that it can't be just a type dependency checker. A superfluous include is a file which provides nothing of value to the compilation and does not alter another item which other files depend. There are many ways a header file can alter a compile, say by defining a constant, redefining and/or deleting a used macro, adding a namespace which alters the lookup of a name some way down the line. In order to detect items like the namespace you need much more than a preprocessor, you in fact almost need a full compiler.
Lint is more of a style checker and certainly won't have this full capability.
I think you'll find the only way to detect a superfluous include is to remove, compile and run suites.

I thought that PCLint would do this, but it has been a few years since I've looked at it. You might check it out.
I looked at this blog and the author talked a bit about configuring PCLint to find unused includes. Might be worth a look.

The CScout refactoring browser can detect superfluous include directives in C (unfortunately not C++) code. You can find a description of how it works in this journal article.

Sorry to (re-)post here, people often don't expand comments.
Check my comment to crashmstr, FlexeLint / PC-Lint will do this for you. Informational message 766. Section 11.8.1 of my manual (version 8.0) discusses this.
Also, and this is important, keep iterating until the message goes away. In other words, after removing unused headers, re-run lint, more header files might have become "unneeded" once you remove some unneeded headers. (That might sound silly, read it slowly & parse it, it makes sense.)

I've never found a full-fledged tool that accomplishes what you're asking. The closest thing I've used is IncludeManager, which graphs your header inclusion tree so you can visually spot things like headers included in only one file and circular header inclusions.

You can write a quick script that erases a single #include directive, compiles the projects, and logs the name in the #include and the file it was removed from in the case that no compilation errors occurred.
Let it run during the night, and the next day you will have a 100% correct list of include files you can remove.
Sometimes brute-force just works :-)
edit: and sometimes it doesn't :-). Here's a bit of information from the comments:
Sometimes you can remove two header files separately, but not both together. A solution is to remove the header files during the run and not bring them back. This will find a list of files you can safely remove, although there might a solution with more files to remove which this algorithm won't find. (it's a greedy search over the space of include files to remove. It will only find a local maximum)
There may be subtle changes in behavior if you have some macros redefined differently depending on some #ifdefs. I think these are very rare cases, and the Unit Tests which are part of the build should catch these changes.

I've tried using Flexelint (the unix version of PC-Lint) and had somewhat mixed results. This is likely because I'm working on a very large and knotty code base. I recommend carefully examining each file that is reported as unused.
The main worry is false positives. Multiple includes of the same header are reported as an unneeded header. This is bad since Flexelint does not tell you what line the header is included on or where it was included before.
One of the ways automated tools can get this wrong:
In A.hpp:
class A {
// ...
};
In B.hpp:
#include "A.hpp
class B {
public:
A foo;
};
In C.cpp:
#include "C.hpp"
#include "B.hpp" // <-- Unneeded, but lint reports it as needed
#include "A.hpp" // <-- Needed, but lint reports it as unneeded
If you blindly follow the messages from Flexelint you'll muck up your #include dependencies. There are more pathological cases, but basically you're going to need to inspect the headers yourself for best results.
I highly recommend this article on Physical Structure and C++ from the blog Games from within. They recommend a comprehensive approach to cleaning up the #include mess:
Guidelines
Here’s a distilled set of guidelines from Lakos’ book that minimize the number of physical dependencies between files. I’ve been using them for years and I’ve always been really happy with the results.
Every cpp file includes its own header file first. [snip]
A header file must include all the header files necessary to parse it. [snip]
A header file should have the bare minimum number of header files necessary to parse it. [snip]

If you are using Eclipse CDT you can try http://includator.com which is free for beta testers (at the time of this writing) and automatically removes superfluous #includes or adds missing ones. For those users who have FlexeLint or PC-Lint and are using Elicpse CDT, http://linticator.com might be an option (also free for beta test). While it uses Lint's analysis, it provides quick-fixes for automatically remove the superfluous #include statements.

This article explains a technique of #include removing by using the parsing of Doxygen. That's just a perl script, so it's quite easy to use.

CLion, the C/C++ IDE from JetBrains, detects redundant includes out-of-the-box. These are grayed-out in the editor, but there are also functions to optimise includes in the current file or whole project.
I've found that you pay for this functionality though; CLion takes a while to scan and analyse your project when first loaded.

Here is a simple brute force way of identifying superfluous header includes. It's not perfect but eliminates the "obvious" unnecessary includes. Getting rid of these goes a long way in cleaning up the code.
The scripts can be accessed directly on GitHub.

Maybe a little late, but I once found a WebKit perl script that did just what you wanted. It'll need some adapting I believe (I'm not well versed in perl), but it should do the trick:
http://trac.webkit.org/browser/branches/old/safari-3-2-branch/WebKitTools/Scripts/find-extra-includes
(this is an old branch because trunk doesn't have the file anymore)

There is a free tool Include File Dependencies Watcher which can be integrated in the visual studio. It shows superfluous #includes in red.

There's two types of superfluous #include files:
A header file actually not needed by
the module(.c, .cpp) at all
A header file is need by the module
but being included more than once, directly, or indirectly.
There's 2 ways in my experience that works well to detecting it:
gcc -H or cl.exe /showincludes (resolve problem 2)
In real world,
you can export CFLAGS=-H before make,
if all the Makefile's not override
CFLAGS options. Or as I used, you
can create a cc/g++ wrapper to add -H
options forcibly to each invoke of
$(CC) and $(CXX). and prepend the
wrapper's directory to $PATH
variable, then your make will all
uses you wrapper command instead. Of
course your wrapper should invoke the
real gcc compiler. This tricks
need to change if your Makefile uses
gcc directly. instead of $(CC) or
$(CXX) or by implied rules.
You can also compile a single file by tweaking with the command line. But if you want to clean headers for the whole project. You can capture all the output by:
make clean
make 2>&1 | tee result.txt
PC-Lint/FlexeLint(resolve problem
both 1 and 2)
make sure add the +e766 options, this warning is about:
unused header files.
pclint/flint -vf ...
This will cause pclint output included header files, nested header files will be indented appropriately.

clangd is doing that for you now. Possibly clang-tidy will soon be able to do that as well.

To end this discussion: the c++ preprocessor is turing complete. It is a semantic property, whether an include is superfluous. Hence, it follows from Rice's theorem that it is undecidable whether an include is superfluous or not. There CAN'T be a program, that (always correctly) detects whether an include is superfluous.

Related

Is there any way to know which headers are automatically included in C++

This is a follow-up question for this which says that
In C++, unlike C, standard headers are allowed to #include other standard headers.
Is there any way to know which headers were automatically included, since it may be difficult to guess which symbols are defined in which headers.
Motivation: My homework compiles and works correctly on my computer but TA told me it was not compiling and needed couple of headers (mutex and algorithm) to compile. How I can be sure the code I submit in future be bulletproof.
My compiler is not giving any warning about implicit declaration.
I'm using clang++ -std=c++11 to compile my code.
The standard lists the symbols made available by each header. There are no guarantees beyond that, neither that symbols which are obviously used nor that there not all symbols are declared. You'll need to include each header for any name you are using. You should not rely on indirect includes.
On the positive side, there is no case in the standard library where any of the standard library headers requires extra headers.
If you want to know what other headers a particular header file pulls, the easiest way to do so is to run the include file through the compiler's preprocessor phase only, instead of compiling it fully. For example, if you want to know what <iostream> pulls in, create a file containing only:
#include <iostream>
then preprocess it. With gcc, the -E option runs the preprocessor only, without compiling the file, and dumps the preprocessed file to standard output. The resulting output begins with:
# 1 "t.C"
That's my one-line source file.
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
Apparently, gcc automatically pulls in this header file, no matter what. This can be ignored.
# 1 "<command-line>" 2
# 1 "t.C"
# 1 "/usr/include/c++/6.2.1/iostream" 1 3
Ok, now we finally get to the actual #include statement in my one-line source file. That's where my <iostream> is:
# 36 "/usr/include/c++/6.2.1/iostream" 3
# 37 "/usr/include/c++/6.2.1/iostream" 3
# 1 "/usr/include/c++/6.2.1/x86_64-redhat-linux/bits/c++config.h" 1 3
Ok, so iostream itself #includes this "c++-config.h" header file, obviously an internal compiler header.
If I keep going, I can see that <iostream> pulls in, unsurprisingly, <ios>, <type_traits>, as well as C header files like stdio.h.
It shouldn't be too hard to write a quick little script that takes a header file, runs the compiler in preprocessing phase, and produces a nice, formatted list of all header files that got pulled in.
As far as I know, there is no way to do what you want.
If you try to compile your code on several example platforms, and it is successful, there is a greater chance that it will compile on any other platform, but there is no easy way to be sure.
In my experience, MinGW C++ headers use fewer #includes to each other. So MinGW can be a practical tool for checking portability.
This question has considerable overlap with Are there tools that help organizing #includes? , and the latter would nowadays be considered as "Off topic" because it's asking for external tools/resources. Strictly speaking, this question is only about a way to find indirect includes, but the goal is certainly the same.
The problem of using "the right" include statements is distressingly hard. What was described in the question is mainly about indirect inclusion: One header includes a certain symbol, but on another machine with another compiler and another header, the symbol might not be included.
Some of the comments and other answers suggested seemingly pragmatic but unrealistic approaches:
Try it out with another compiler: This is brittle, and does not tell you anything about a possible third compiler
Read the included headers, to see which other headers they include: No. The idea of headers is exactly the opposite, namely not having to read them, but just to use the API that they offer
Generate preprocessor output and write a tool to solve the problem semi-automatically: Nope. This won't solve the problem of different header versions anyhow.
Explicitly include the headers that you need for the thing that you are using: This is basically impossible to maintain. When you move one function from your file to another during a refactoring, you never know which includes you can safely remove in one file, and which ones you have to add to the other one.
(And when you change your includes, you'll likely break third-party code that included your headers, because everybody is facing the same problem...)
All this does not cover the caveat that indirect inclusion is not always a problem, but sometimes intended: When you want to use std::vector, you'd #include <vector>, and not #include <bits/stl_vector.h>, even though the latter contains the definition...
So the only realistic solution for this is to rely on tools that support the developer. Some tools are mentioned in the answers of Are there tools that help organizing #includes? , and originally, I mentioned CDT in a comment to the question three years ago, but I think it's worth being mentioned here as well:
Eclipse CDT (C/C++ Development Tooling)
This is an IDE that offers an "Organize Includes" feature. The features is described in the article at https://www.eclipse.org/community/eclipse_newsletter/2013/october/article3.php , which points out the difficulties and caveats (and I mentioned some of them above), and how they are tackled in CDT.
I have tried this out (quite a while ago), although only in a very simple test project. So I can say that it basically works, but considering the complexity of the task, this comes with a disclaimer: It's close to magic, but there might still be cases where it fails.
C++ is not intended to be parsed and compiled. This statement is true, because one of my previous answers here on stack overflow contained the line #define not certainly.
The article also points to another tool:
Include What You Use
I have not tried this out, but it seems to be quite actively maintained. The documentation pages also list some of the difficulties and goals. From a first glance, it does not seem to be as powerful and configurable as CDT (and of course, does not come in an IDE), but some might want to try it out.

Why does Xcode 4 include iostream in every header?

I noticed that newly created CPP files in Xcode ~4 all #include <iostream>. I never use any iostream functionality, so usually strip them out (hearing they can gradually slow build times from the Google blink team blog). Are there any useful generic functions of iostream that make including it all the time valuable? Such as instrumentation or reflection features that not having it everywhere would break?
It seems like a bold step to add everywhere - especially given how conservative many groups software engineering! - so feel there must be something important I'm missing.
Does anyone know the reason why this header has become so important it must be everywhere?
I sign juanchopanzas statement: There is no header that needs to be included everywhere. Each #include should only be in a file when it is really needed.

Fast way to identify necessary includes for C++ [duplicate]

This question already has answers here:
Is there anyway to figure out what STL header file has not been included directly?
(2 answers)
Closed 9 years ago.
On Linux, what is a fast way to identify what are the necessary #include statements that I need for a C++ project?
I mean, let's say someone gives you a snippet from the web, but fails to provide the necessary #include statements. Is there potentially a way where you can run a Linux command or compiler command option and identify which functions or classes are missing, and, as a bonus, identify on the hard drive where I might have these things in a header file.
Basically you need some analyzer to parse your sources and headers and build a complete dependency graph which it spits out in the end for you to read and process further.
I'd follow john's advice on g++ and Clang for this purpose but I highly doubt they got what it takes.
What you actually can do, at least with g++, is print out a graph for already existing includes. Use the -H option to print a tree or -M to get a list.
I also refer you to this related topic: Tool to track #include dependencies
Not exactly what you want, but the tools mentioned there might be helpful.
I think Clang's "include-what-you-use" tool is what you want.
If, by necessary you mean minimal (i.e. if A includes B and B includes C then A doesn't need to include C) I don't know of a fast way.
One good approach, however, is for each cpp file to include its own header file first (after any precompiled headers.) That insures that each header file includes (directly or indirectly) all the header files it needs to define the symbols used in the header.
Also a project of reasonable size should be designed in layers such that Layer A knows about/depends on layer B which depends on layer C, etc, but lower layers never include higher layers (i.e. C never includes anything from layer A)
In that case the includes in each cpp or hpp should be in Layer order (A, B, C). If you do this it is fairly easy to check to see if any of the layer C headers can be eliminated (comment them out temporarily) because one of the includes that comes before them has already included them. This happens quite a lot and can significantly reduce the number of #includes in each file.
Having said all of that, this is a much less critical issue than it used to be because compilers are smarter. A combination of #pragma once and precompiled headers can keep build times down without requiring that you spend a lot of time optimizing includes.
The best way I know of to find undefined identifiers in a program is just to try to compile it. Depending on exactly what compiler you’re using, you might be able simply to pipe the output of GCC or Clang into grep, looking for phrases like “undeclared identifier.”
As for determining where the symbols are defined, I would recommend as a starting point looking at Ctags to parse your system headers (best managed using a Makefile) and using the resulting tags table to look up anything grep catches from GCC.
The fastest way... that's not how you should think of it.
https://stackoverflow.com/a/18544093/2112028
I wrote a lovely (I'm quite proud :P) answer there talking about how linking works (with templates) and proving it works and such, understand that.
The goal of #include directives is to create a "translation unit" where every symbol is declared (even if not defined) there's an example in my answer where I simply copy and paste the prototype into a code file, rather than use include.
You ought not worry about the "fastest" way if you use something called "Header guards" (these are mentioned briefly right at the bottom, but this isn't sufficient detail) they go like this:
#ifndef __WHATEVER_H
#define __WHATEVER_H
/*Your code here*/
#endif
So now you can include "whatever.h" AS MANY times as you like. the first time IN THE TRANSLATION UNIT, will define __WHATEVER_H, so the next file that includes it (however many includes deep from the file being compiled) will be empty. as everything between the #ifndef and #endif will be gone.
Hope this helps.
Also if you have unnecessary inputs, use -Wextra and -Wall, GCC will tell you about unused functions, typedefs and so forth. you can use the pragma error push and pop things to control this. For example wxWidget's header files may contain a lot of unused things, so you push the warnings onto the stack, remove the unused warning flags, include the file, pop the warnings stack (turning them back on), less you get thousands of lines of warnings.

Abolish include-files in C++

Suppose i have the following code (literally) in a C++ source file:
// #include <iostream> // superfluous, commented-out
using std::cout;
using std::endl;
int main()
{
cout << "Hello World" << endl;
return 0;
}
I can compile this code even though #include <iostream> is commented-out:
g++ -include my_cpp_std_lib_hack source.cpp
Where my_cpp_std_lib_hack is a file in some central location that includes all the files of the C++ Standard Library:
#include <ciso646>
#include <climits>
#include <clocale>
...
#include <valarray>
#include <vector>
Of course, i can use proper compilation options for all compilers i care about (that being MS Visual Studio and maybe a few others), and i also use precompiled headers.
Using such a hack gives me the following advantages:
Fast compilation (because all of the Standard Library is precompiled)
No need to add #includes when all i want is adding some debugging output
No need to remember or look up all the time where the heck std::max is declared
A feeling that the STL is magically built-in to the language
So i wonder: am i doing something very wrong here?
Will this hack break down when writing large projects?
Maybe everyone else already uses this, and no one told me?
So i wonder: am i doing something very wrong here?
Yes. Sure, your headers are precompiled, but the compiler still has to do things like name lookups on the entire included mass of stuff which slows down compilation.
Will this hack break down when writing large projects?
Yes, that's pretty much the problem. Plus, if anyone else looks at that code, they're going to be wondering where std::cout (well, assume that's a user defined type) came from. Without the #includes they're going to have no idea whatsoever.
Not to mention, now you have to link against a ton of standard library features that you may have (probably could have) avoided linking against in the first place.
If you want to use precompilation that's fine, but someone should be able to build each and every implementation file even when precompilation is disabled.
The only thing "wrong" is that you are relying upon a compiler-specific command-line flag to make the files compilable. You'd need to do something different if not using GCC. Most compilers probably do provide an equivalent feature, but it is best to write portable source code rather than to unnecessarily rely on features of your specific build environment.
Other programmers shouldn't have to puzzle over your Makefiles (or Ant files, or Eclipse workspaces, or whatever) to figure out how things are working.
This may also cause problems for users of IDE's. If the IDE doesn't know what files are being included, it may not be able to provide automatic completion, source browsing, refactoring, and other such features.
(FWIW, I do think it is a good idea to have one header file that includes all of the Standard Library headers that you are using in your project. It makes precompilation easier, makes it easier to port to a non-standard environment, and also helps deal with those issues that sometimes arise when headers are included in different orders in different source files. But that header file should be explicitly included by each source file; there should be no magic.)
Forget the compilation speed-up - a precompiled header with templates isn't really "precompiled" except for the name and the parse, as far as I've heard. I won't believe in the compilation speed up until I see it in the benchmarks. :)
As for the usefulness:
I prefer to have an IDE which handles my includes for me (this is still bad for C++, but Eclipse already adds known includes with ctrl+shift+n with... well, acceptable reliability :)).
Doing 'clandestine' includes like this would also make testing more difficult. You want to compile a smallest-possible subset of code when testing a particular component. Figuring out what that subset is would be difficult if the headers/sources aren't being honest about their dependencies, so you'd probably just drag your my_cpp_std_lib_hack into every unit test. This would increase compilation time for your test suites a lot. Established code bases often have more than three times as much test code as regular code, so this is likely to become an issue as your code base grows.
From the GCC manual:
-include file
Process file as if #include "file" appeared as the first line of the
primary source file. However, the
first directory searched for file is
the preprocessor's working directory
instead of the directory containing
the main source file. If not found
there, it is searched for in the
remainder of the #include "..." search
chain as normal.
So what you're doing is essentially equivalent to starting each file with the line
#include "my_cpp_std_lib_hack"
which is what Visual Studio does when it gathers up commonly-included files in stdafx.h. There are some benefits to that, as outlined by others, but your approach hides this include in the build process, so that nobody who looked directly at one of your source files would know of this hidden magic. Making your code opaque in this way does not seem like a good style to me, so if you're keen on all the precompiled header benefits I suggest you explicitly include your hack file.
You are doing something very wrong. You are effectively including lots of headers that may not be needed. In general, this is a very bad idea, because you are creating unnecessary dependencies, and a change in any header would require recompilation of everything. Even if you are avoiding this by using precompiled headers, you are still linking to lots of object that you may not need, making your executable much larger than it needs to be.
There is really nothing wrong with the standard way of using headers. You should include everything you are using, and no more (forward declarations are your friends). This makes code easier to follow, and helps you keep dependencies under control.
We try not to include the unused or even the rarely used stuff for example in VC++ there is
#define WIN32_LEAN_AND_MEAN //exclude rarely used stuff
and what we hate in MFC is that if u want to make a simple application u will produce large executable file with the whole library (if statically linked), so it's not a good idea what if u only want to use the cout while the other no??
another thing i don't like to pass arguments via command line coz i may leave the project for a while, and forget what are the arguments... e.g. i prefer using
#pragma (comment, "xxx.lib")
than using it in command line, it reminds me at least with what file i want
That's is my own opinion make your code stable and easy to compile in order to to rot as code rotting is a very nasty thing !!!!!

Optimizing Headers

I'm looking for a tool which can do at least one of two things.
Guess what headers might be unused and can be removed.
Guess which headers should be included in a file, but are included indirectly through inclusion of other files. Thus allows proper compilation of the file.
Is there such a tool?
You could use GCC warnings "-Wmissing-declarations" and "-Wredundant-decls". It's not precisely what you want, but might help a lot.
A colleague of mine wrote a very simple script to achieve part of this (and slow too...).
Basically the idea is to try to comment each include in turn and then try to compile the object, it doesn't deal with include within headers but already remove a substantial number of unused files :)
EDIT:
Pseudo code of the algorithm
for s in sourceFiles:
while t := commentNextInclude(s):
if compilationOk(): s := t
As I said, comment each #include in turn, and each time check if the program still compiles, if it does, validate the commenting, and move on to the next one.
I don't have the rights to disclose the script source though.