Getting only necessary headers out of boost - c++

I need to submit an assignment, but I only want to include the header files from boost that I actually used (I made use of boost::shared_ptr and boost::function). I tried doing so manually, but I'm missing some header files and everytime i go to add them, it turns out I'm missing more. Is there a quick easy way to find out exactly what headers I actually need?
Thanks

The bcp command is made for this:
NAME
bcp - extract subsets of Boost
SYNOPSIS
bcp --list [options] module-list
bcp [options] module-list output-path
bcp --report [options] module-list html-file
bcp --help
DESCRIPTION
Copies all the files, including dependencies, found in module-list to
output-path. output-path must be an existing path.
But you will probably be surprised to see just how interdependent these Boost headers are.

There is a tool called bcp to do exactly that -- copy out the parts of Boost you need and no more.

There is actually another solution to your issue: the preprocessor.
The compiler you use should have a switch to only run the preprocessor: -E on gcc and clang. Given this, you can preprocess the two files you include, and stash the result of this run into a header file (each) of your own.
Add header guards, include the already preprocessed headers in lieu of the regular boost headers, and you're done.
Of course there might be some repetition between the two headers, a diff tool could potentially help you spotting it and factoring it in another common header... but for an assignment I would certainly not bother.
You might also consider telling your teacher that as he does not ask you to provide the standard library headers you compiled with, he should not be asking for the boost headers you used.

Related

Correct installation of config.h for shared library using autotools

I am converting a C++ program which uses the autotools build system to use a shared library, introducing the use of libtool. Most of the program functionality is being placed in the shared library, which is loaded by the main program, so that the common code can be accessed by other programs in future.
Throughout the program and library sources the autoheader generated config.h is used with the usual macro:
#if HAVE_CONFIG_H
# include <config.h>
#endif
In configure.ac I use the macro to generate it:
AC_CONFIG_HEADERS([config.h])
My question is, do I need to install config.h for others to be able to use my library, and if so, what is the appropriate way to do it, and should it be renamed to avoid conflicts etc?
The most information I have found on this is here:
http://www.openismus.com/documents/linux/building_libraries/building_libraries#installingheaders
But this is hardly an official source.
Never ever install autoheader's config.h.
The last thing the users of your library need is interference from the macros leaking out of your config.h. Your library may have HAVE_FOOBAR, but my software might be compiled in a way that foobar is disabled, so that HAVE_FOOBAR will break my compilation.
The AX_PREFIX_CONFIG macro from the archive is a workaround, where everything gets prefixed.
A better approach is to create a template file (e.g. blargconfig.h.in) with lines like:
typedef #BLARG_TYPE# blarg_int_t;
#BLARG_RANDOM_INCLUDE#
And then AC_SUBST() those variables in configure.ac:
AC_SUBST(BLARG_TYPE, ["unsigned short"])
AC_SUBST(BLARG_RANDOM_INCLUDE, ["#include <somerandomheader.h>"])
Then list it as an output file:
AC_CONFIG_FILES([Makefile
src/Makefile
...
include/blargconfig.h])
The .h file should be listed with nodist_include_HEADERS; the .h.in file will be automatically distributed because it's listed in AC_CONFIG_FILES.
Destination for such files is commonly $libdir/packagename/include. See GLib for example, although they generate glibconfig.h without a template (by writing the whole creation code inline in configure.ac, as the autobook suggests). I find this approach less maintainable than using AC_SUBST, but it's more flexible.
Of course, to help the compiler find the platform-dependent header you'll probably also want to write a pkgconfig script, like GLib does.
You will need to install config.h if it affects the interface. In practical terms, if the #define's are required by the header(s), not just the .cc implementation / compilation units.
If config.h is a problem, you can specify another name in the AC_CONFIG_HEADERS macro. e.g., AC_CONFIG_HEADERS([foo_config.h]).
The easiest way to install the header, assuming automake, is with:
nodist_include_HEADERS = foo_config.h
in the top-level Makefile.am. the nodist prefix tells automake that foo_config.h is generated rather than distributed with the package.
If not using automake, install foo_config.h in $includedir. $exec_prefix/include is arguably a more correct location for a generated header, but in practice the former location is fine.
I avoid using config.h, by passing relevant definitions in CPPFLAGS or foo_CPPFLAGS along with AC_SUBST to Makefile.am for source builds, or AC_SUBST to foo.h.in to generate headers at configure-time. A lot of config.h is test-generated noise. It requires more infrastructure, but it's what I prefer. I wouldn't recommend this approach unless you're comfortable with the autotools.

Using 3rd party header files with Rcpp

I have a header file called coolStuff.h that contains a function awesomeSauce(arg1) that I would like to use in my cpp source file.
Directory Structure:
RworkingDirectory
sourceCpp
theCppFile.cpp
cppHeaders
coolStuff.h
The Code:
#include <Rcpp.h>
#include <cppHeaders/coolStuff.h>
using namespace Rcpp;
// [[Rcpp::export]]
double someFunctionCpp(double someInput){
double someOutput = awesomeSauce(someInput);
return someOutput;
}
I get the error:
theCppFile.cpp:2:31: error: cppHeaders/coolStuff.h: No such file or directory
I have moved the file and directory all over the place and can't seem to get this to work. I see examples all over the place of using 3rd party headers that say just do this:
#include <boost/array.hpp>
(Thats from Hadley/devtools)
https://github.com/hadley/devtools/wiki/Rcpp
So what gives? I have been searching all morning and can't find an answer to what seems to me like a simple thing.
UPDATE 01.11.12
Ok now that I have figured out how to build packages that use Rcpp in Rstudio let me rephrase the question. I have a stand alone header file coolStuff.h that contains a function I want to use in my cpp code.
1) Where should I place coolStuff.h in the package directory structure so the function it contains can be used by theCppFile.cpp?
2) How do I call coolStuff.h in the cpp files? Thanks again for your help. I learned a lot from the last conversation.
Note: I read the vignette "Writing a package that uses Rcpp" and it does not explain how to do this.
The Answer:
Ok let me summarize the answer to my question since it is scattered across this page. If I get a detail wrong feel free to edit this or let me know and I will edit it:
So you found a .h or .cpp file that contains a function or some other bit of code you want to use in a .cpp file you are writing to use with Rcpp.
Lets keep calling this found code coolStuff.h and call the function you want to use awesomeSauce(). Lets call the file you are writing theCppFile.cpp.
(I should note here that the code in .h files and in .cpp files is all C++ code and the difference between them is for the C++ programer to keep things organized in the proper way. I will leave a discussion of the difference out here, but a simple search here on SO will lead you to discussion of the difference. For you the R programer needing to use a bit o' code you found, there is no real difference.)
IN SHORT: You can use a file like coolStuff.h provided it calls no other libraries, by either cut-and-pasteing into theCppFile.cpp, or if you create a package you can place the file in the \src directory with the theCppFile.cpp file and use #include "coolStuff.h" at the top of the file you are writing. The latter is more flexible and allows you to use functions in coolStuff.h in other .cpp files.
DETAILS:
1) coolStuff.h must not call other libraries. So that means it cannot have any include statements at the top. If it does, what I detail below probably will not work, and the use of found code that calls other libraries is beyond the scope of this answer.
2) If you want to compile the file with sourceCpp() you need to cut and paste coolStuff.h into theCppFile.cpp. I am told there are exceptions, but sourceCpp() is designed to compile one .cpp file, so thats the best route to take.
(NOTE: I make no guarantees that a simple cut and paste will work out of the box. You may have to rename variables, or more likely switch the data types being used to be consistent with those you are using in theCppFile.cpp. But so far, cut-and-paste has worked with minimal fuss for me with 6 different simple .h files)
3) If you only need to use code from coolStuff.h in theCppFile.cpp and nowhere else, then you should cut and paste it into theCppFile.cpp.
(Again I make no guarantees see the note above about cut-and-paste)
4) If you want to use code contained in coolStuff.h in theCppFile.cpp AND other .cpp files, you need to look into building a package. This is not hard, but can be a bit tricky, because the information out there about building packages with Rcpp ranges from the exhaustive thorough documentation you want with any R package (but that is above your head as a newbie), and the newbie sensitive introductions (that may leave out a detail you happen to need).
Here is what I suggest:
A) First get a version of theCppFile.cpp with the code from coolStuff.h cut-and-paste into theCppFile.cpp that compiles with sourceCpp() and works as you expect it to. This is not a must, but if you are new to Rcpp OR packages, it is nice to make sure your code works in this simple situation before you move to the more complicated case below.
B) Now build your package using Rcpp.package.skeleton() or use the Build functionality in RStudio (HIGHLY recommended). You can find details about using Rcpp.package.skeleton() in hadley/devtools or Rcpp Attributes Vignette. The full documentation for writing packages with Rcpp is in the Writing a package that uses Rcpp, however this one assumes you know your way around C++ fairly well, and does not use the new "Attributes" way of doing Rcpp.
Don't forget to "Build & Reload" if using RStudio or compileAttributes() if you are not in RStudio.
C) Now you should see in your \R directory a file called RcppExports.R. Open it and check it out. In RcppExports.R you should see the R wrapper functions for all the .cpp files you have in your \src directory. Pretty sweet.
D) Try out the R function that corresponds to the function you wrote in theCppFile.cpp. Does it work? If so move on.
E) With your package built you can move coolStuff.h into the src folder with theCppFile.cpp.
F) Now you can remove the cut-and-paste code from theCppFile.cpp and at the top of theCppFile.cpp (and any other .cpp file you want to use code from coolStuff.h) put #include "coolStuff.h" just after #include <Rcpp.h>. Note that there are no brackets around ranker.h, rather there are "". This is a C++ convention when including local files provided by the user rather than a library file like Rcpp or STL etc...
G) Now you have to rebuild the package. In RStudio this is just "Build & Reload" in the Build menu. If you are not using RStudio you should run compileAttributes()
H) Now try the R function again just as you did in step D), hopefully it works.
The problem is that sourceCpp is expressly designed to build only a single standalone source file. If you want sourceCpp to have dependencies then they need to either be:
In the system include directories (i.e. /usr/local/lib or /usr/lib); or
In an R package which you list in an Rcpp::depends attribute
As Dirk said, if you want to build more than one source file then you should consider using an R package rather than sourceCpp.
Note that if you are working on a package and perform a sourceCpp on a file within the src directory of the package it will build it as if it's in the package (i.e. you can include files from the src directory or inst/include directory).
I was able to link any library (MPFR in this case) by setting two environment variables before calling sourceCpp:
Sys.setenv("PKG_CXXFLAGS"="-I/usr/include")
Sys.setenv("PKG_LIBS"="-L/usr/lib/x86_64-linux-gnu/ -lm -lmpc -lgmp -lmpfr")
The first variable contains the path of the library headers. The second one includes the path of the library binary and its file name. In this case other dependent libraries are also required. For more details check g++ compilation and link flags. This information can usually be obtained using pkg-config:
pkg-config --cflags --libs mylib
For a better understanding, I recommend using sourceCpp with verbose output in order to print the g++ compilation and linking commands:
sourceCpp("mysource.cpp", verbose=TRUE, rebuild=TRUE)
I was able to link a boost library using the following global command in R before calling sourceCpp
Sys.setenv("PKG_CXXFLAGS"="-I \path-to-boost\")
Basically mirroring this post but with a different compiler option: http://gallery.rcpp.org/articles/first-steps-with-C++11/
Couple of things:
"Third party header libraries" as in your subject makes no sense.
Third-party headers can work via templated code where headers are all you need, ie there is only an include step and the compiler resolves things.
Once you need libraries and actual linking of object code, you may not be able the powerful and useful sourceCpp unless you gave it meta-information via plugins (or env. vars).
So in that case, write a package.
Easy and simple things are just that with Rcpp and the new attributes, or the older inline and cxxfunction. More for complex use --- and external libraries is more complex you need to consult the documentation. We added several vignettes to Rcpp for that.
Angle brackets <> are for system includes, such as the standard libs.
For files local to your own project, use quotes: "".
Also, if you are placing headers in a different directory, the header path should be specified local to the source file including it.
So for your example this ought to work:
#include "../cppHeaders/coolStuff.h"
You can configure the search paths such that the file could be found without doing that, but it's generally only worth doing that for stuff you want to include across multiple projects, or otherwise would expect someone to 'install'.
We can add it by writing path to the header in the PKG_CXXFLAGS variable of the .R/Makevars file as shown below. The following is an example of adding header file of xtensor installed with Anaconda in macOS.
⋊> ~ cat ~/.R/Makevars
CC=/usr/local/bin/gcc-7
CXX=/usr/local/bin/g++-7
CPLUS_INCLUDE_PATH=/opt/local/include:$CPLUS_INCLUDE_PATH
PKG_CXXFLAGS=-I/Users/kuroyanagi/.pyenv/versions/miniconda3-4.3.30/include
LD_LIBRARY_PATH=/opt/local/lib:$LD_LIBRARY_PATH
CXXFLAGS= -g0 -O3 -Wall
MAKE=make -j4
This worked for me in Windows:
Sys.setenv("PKG_CXXFLAGS"='-I"C:/boost/boost_1_66_0"')
Edit: Actually you don't need this if you use Boost Headers (thanks to Ralf Stubner):
// [[Rcpp::depends(BH)]]

Using Boost unordered_map

I want to include boost::unordered_map in my project without downloading the whole Boost package. How can I do that?
Use bcp: http://www.boost.org/doc/libs/1_52_0/tools/bcp/doc/html/index.html
cd $BOOST_DIR
bcp unordered_map /tmp/TEST
Now /tmp/TEST contains only the things required for unordered_map, in my case 15Mb (as opposed to 734Mb for the full boost library)
You need at least headers because Boost packages depend on each other. You might want to select only needed header files but it will really be pain in the neck and will take you many hours. The algorithm is:
Include only boost/unordered_map.
While preprocessor complains about header that is not found:
Add that header.
Recompile.
You will end up with only necessary headers. But I can't see any advantages of this solution.

Combining C++ header files

Is there an automated way to take a large amount of C++ header files and combine them in a single one?
This operation must, of course, concatenate the files in the right order so that no types, etc. are defined before they are used in upcoming classes and functions.
Basically, I'm looking for something that allows me to distribute my library in two files (libfoo.h, libfoo.a), instead of the current bunch of include files + the binary library.
As your comment says:
.. I want to make it easier for library users, so they can just do one single #include and have it all.
Then you could just spend some time, including all your headers in a "wrapper" header, in the right order. 50 headers are not that much. Just do something like:
// libfoo.h
#include "header1.h"
#include "header2.h"
// ..
#include "headerN.h"
This will not take that much time, if you do this manually.
Also, adding new headers later - a matter of seconds, to add them in this "wrapper header".
In my opinion, this is the most simple, clean and working solution.
A little bit late, but here it is. I just recently stumbled into this same problem myself and coded this solution: https://github.com/rpvelloso/oneheader
How does it works?
Your project's folder is scanned for C/C++ headers and a list of headers found is created;
For every header in the list it analyzes its #include directives and assemble a dependency graph in the following way:
If the included header is not located inside the project's folder then it is ignored (e.g., if it is a system header);
If the included header is located inside the project's folder then an edge is create in the dependency graph, linking the included header to the current header being analyzed;
The dependency graph is topologically sorted to determine the correct order to concatenate the headers into a single file. If a cycle is found in the graph, the process is interrupted (i.e., if it is not a DAG);
Limitations:
It currently only detects single line #include directives (e.g., #include );
It does not handles headers with the same name in different paths;
It only gives you a correct order to combine all the headers, you still need to concatenate them (maybe you want remove or modify some of them prior to merging).
Compiling:
g++ -Wall -ggdb -std=c++1y -lstdc++fs oneheader.cpp -o oneheader[.exe]
Usage:
./oneheader[.exe] project_folder/ > file_sequence.txt
(Adapting an answer to my dupe question:)
There are several other libraries which aim for a single-header form of distribution, but are developed using multiple files; and they too need such a mechanism. For some (most?) it is opaque and not part of the distributed code. Luckily, there is at least one exception: Lyra, a command-line argument parsing library; it uses a Python-based include file fuser/joiner script, which you can find here.
The script is not well-documented, but they way you use it is with 3 command-line arguments:
--src-include - The include file to convert, i.e. to merge its include directives into its body. In your case it's libfoo.h which includes the other files.
--dst-include - The output file to write - the result of the merging.
--src-include-dir - The directory relative to which include files are specified (i.e. an "include search path" of one directory; the script doesn't support the complex mechanism of multiple include paths and search priorities which the C++ compiler offers)
The script acts recursively, so if file1.h includes another file under the --src-include-dir, that should be merged in as well.
Now, I could nitpick at the code of that script, but - hey, it works and it's FOSS - distributed with the Boost license.
If your library is so big that you cannot build and maintain a single wrapping header file like Kiril suggested, this may mean that it is not architectured well enough.
So if your library is really huge (above a million lines of source code), you might consider automating that, with tools like
GCC make dependency generator preprocessor options like -M -MD -MF etc, with another hand made script sorting them
expensive commercial static analysis tools like coverity
customizing a compiler thru plugins or (for GCC 4.6) MELT extensions
But I don't understand why you want an automated way of doing this. If the library is of reasonable size, you should understand it and be able to write and maintain a wrapping header by hand. Automating that task will take you some efforts (probably weeks, not minutes) so is worthwhile only for very large libraries.
If you have a master include file that includes all others available, you could simply hack a C preprocessor re-implementation in Perl. Process only ""-style includes and recursively paste the contents of these files. Should be a twenty-liner.
If not, you have to write one up yourself or try at random. Automatic dependency tracking in C++ is hard. Like in "let's see if this template instantiation causes an implicit instantiation of the argument class" hard. The only automated way I see is to shuffle your include files into a random order, see if the whole bunch compiles, and re-shuffle them until it compiles. Which will take n! time, you might be better off writing that include file by hand.
While the first variant is easy enough to hack, I doubt the sensibility of this hack, because you want to distribute on a package level (source tarball, deb package, Windows installer) instead of a file level.
You really need a build script to generate this as you work, and a preprocessor flag to disable use of the amalgamate (that could be for your uses).
To simplify this script/program, it helps to have your header structures and include hygiene in top form.
Your program/script will need to know your discovery paths (hint: minimise the count of search paths to one if possible).
Run the script or program (which you create) to replace include directives with header file contents.
Assuming your headers are all guarded as is typical, you can keep track of what files you have already physically included and perform no action if there is another request to include them. If a header is not found, leave it as-is (as an include directive) -- this is required for system/third party headers -- unless you use a separate header for external includes (which is not at all a bad idea).
It's good to have a build phase/translation that includes header alone and produces zero warnings or errors (warnings as errors).
Alternatively, you can create a special distribution repository so they never need to do more than pull from it occasionally.
What you want to do sounds "javascriptish" to me :-) . But if you insist, there is always "cat" (or the equivalent in Windows):
$ cat file1.h file2.h file3.h > my_big_file.h
Or if you are using gcc, create a file my_decent_lib_header.h with the following contents:
#include "file1.h"
#include "file2.h"
#include "file3.h"
and then use
$ gcc -C -E my_decent_lib_header.h -o my_big_file.h
and this way you even get file/line directives that will refer to the original files (although that can be disabled, if you wish).
As for how automatic is this for your file order, well, it is not at all; you have to decide the order yourself. In fact, I would be surprised to hear that a tool that orders header dependencies correctly in all cases for C/C++ can be built.
usually you don't want to include every bit of information from all your headers into the special header that enables the potential user to actually use your library. The non-trivial removal of type definitions, further includes or defines, that are not necessary for the user of your interface to know can not be automatedly done. As far as I know.
Short answer to your main question:
No.
My suggestions:
manually make a new header, that contains all relevant information (nothing more, nothing less) for the user of your library interface. Add nice documentation comments for each component it contains.
use forward declarations where possible, instead of full-fledged included definitions. Put the actual includes in your implementation files. The less include statements you have in your headers, the better.
don't build a deeply nested hierarchy of includes. This makes it extremely hard to keep an overview on the contents of every bit you include. The user of your library will look into the header to learn how to use it. And he will probably not be able to distinguish relevant code from irrelevant on the first sight. You want to maximize the ratio of relevant code per total code in the main header for your library.
EDIT
If you really do have a toolkit library, and the order of inclusion really does not matter, and you have a bunch of independent headers, that you want to enumerate just for convenience into a single header, then you can use a simple script. Like the following Python (untested):
import glob
with open("convenience_header.h", 'w') as f:
for header in glob.glob("*.h"):
f.write("#include \"%s\"\n" % header)

Header file inclusion static analysis tools?

A colleague recently revealed to me that a single source file of ours includes over 3,400 headers during compile time. We have over 1,000 translation units that get compiled in a build, resulting in a huge performance penalty over headers that surely aren't all used.
Are there any static analysis tools that would be able to shed light on the trees in such a forest, specifically giving us the ability to decide which ones we should work on paring out?
UPDATE
Found some interesting information on the cost of including a header file (and the types of include guards to optimize its inclusion) here, originating from this question.
The output of gcc -w -H <file> might be useful (If you parse it and put some counts in) the -w is there to suppress all warnings, which might be awkward to deal with.
From the gcc docs:
-H
Print the name of each header file used, in addition to other normal activities. Each name is indented to show how deep in the
#include stack it is. Precompiled header files are also printed,
even if they are found to be invalid; an invalid precompiled header
file is printed with ...x and a valid one with ...!.
The output looks like this:
. /usr/include/unistd.h
.. /usr/include/features.h
... /usr/include/bits/predefs.h
... /usr/include/sys/cdefs.h
.... /usr/include/bits/wordsize.h
... /usr/include/gnu/stubs.h
.... /usr/include/bits/wordsize.h
.... /usr/include/gnu/stubs-64.h
.. /usr/include/bits/posix_opt.h
.. /usr/include/bits/environments.h
... /usr/include/bits/wordsize.h
.. /usr/include/bits/types.h
... /usr/include/bits/wordsize.h
... /usr/include/bits/typesizes.h
.. /usr/lib/x86_64-linux-gnu/gcc/x86_64-linux-gnu/4.5.2/include/stddef.h
.. /usr/include/bits/confname.h
.. /usr/include/getopt.h
. /usr/include/stdio.h
.. /usr/lib/x86_64-linux-gnu/gcc/x86_64-linux-gnu/4.5.2/include/stddef.h
.. /usr/include/libio.h
... /usr/include/_G_config.h
.... /usr/lib/x86_64-linux-gnu/gcc/x86_64-linux-gnu/4.5.2/include/stddef.h
.... /usr/include/wchar.h
... /usr/lib/x86_64-linux-gnu/gcc/x86_64-linux-gnu/4.5.2/include/stdarg.h
.. /usr/include/bits/stdio_lim.h
.. /usr/include/bits/sys_errlist.h
Multiple include guards may be useful for:
/usr/include/bits/confname.h
/usr/include/bits/environments.h
/usr/include/bits/predefs.h
/usr/include/bits/stdio_lim.h
/usr/include/bits/sys_errlist.h
/usr/include/bits/typesizes.h
/usr/include/gnu/stubs-64.h
/usr/include/gnu/stubs.h
/usr/include/wchar.h
If you are using gcc/g++, the -M or -MM option will output a line with the information you seek. (The former will include system headers while the latter will not. There are other variants; see the manual.)
$ gcc -M -c foo.c
foo.o: foo.c /usr/include/stdint.h /usr/include/features.h \
/usr/include/sys/cdefs.h /usr/include/bits/wordsize.h \
/usr/include/gnu/stubs.h /usr/include/gnu/stubs-64.h \
/usr/include/bits/wchar.h
You would need to remove the foo.o: foo.c at the beginning, but the rest is a list of all headers that the file depends on, so it would not be too hard to write a script to gather these and summarize them.
Of course this suggestion is only useful on Unix and only if nobody else has a better idea. :-)
a few things-
use "preprocess only" to look at your preprocessor output. gcc -E option, other compilers have the function too
use precompiled headers.
gcc has -verbose and --trace options which also display the full include tree, MSVC has the /showIncludes option found under Advanced C++ property page
Also, Displaying the #include hierarchy for a C++ file in Visual Studio
"Large Scale C++ Software Design" by John Lakos had tools that extracted the compile-time dependencies among source files.
Unfortunately, their repository on Addison-Wesley's site is gone (along with AW's site itself), but I found a tarball here:
http://prdownloads.sourceforge.net/introspector/LSC-rpkg-0.1.tgz?download
I found it useful several jobs ago, and it has the virtue of being free.
BTW, if you haven't read Lakos's book, it sounds like your project would benefit. (The current edition is a bit dated, but I hear that Lakos has another book coming out in 2012.)
GCC has a -M flag that will output a list of dependencies for a given source file. You could use that information to figure out which of your files have the most dependencies, which files are most depended on, etc.
Check out the man page for more information. There are several variants of -M.
Personally I don't know if there is a tool that will say "Remove this file". It's really a complex matter that depends on a lot of things. Looking at a tree of include statements is surely going to drive you nuts.... It would drive me crazy, as well as ruin my eyes. There are better ways to do things to reduce your compile times.
De-inline your class methods.
After deinlining them, re-examine your include statements and attempt to remove them. Usually helpful to delete them, and start over.
Prefer to use forward declarations are much as possible. If you de-inline methods in your header files you can do this alot.
Break up large header files into smaller files. If a class in a file is used more often than most, then put it in a header file all by itself.
1000 translational units is not very much actually. We have between 10-20 thousand. :)
Get Incredibuild if your compile times are still too long.
I heard there are some tools do it, but I don't use them.
I created some tool https://sourceforge.net/p/headerfinder may be this is useful. Unfortunately it is "HOME MADE" tool with following issues,
Developed in Vb.Net
Source code need to compiled
Very slow and consumes memory.
No help available.
GCC Has a flag (-save-temps) with which you can save intermediate files. This includes .ii files, which are the results of the preprocessor (so before compilation). You can write a script to parse this and determine the weight/cost/size of what is included, as well as the dependency tree.
I wrote a Python script to do just this (publicly available here: https://gitlab.com/p_b_omta/gcc-include-analyzer).