Can a makefile enforce dependency restrictions in C++ - c++

We are refactoring our code base, and trying to limit the direct dependencies between different components. Our source tree has several top level directories: src/a, src/b and src/c.
We want to enforce a set of restirctions:
Files in a cannot depend of files in b or c
Files in b can depend on files a but not c
Files in c can directly depend on files in b but not a
Enforcing the first one is simple. I have an implicit rule like this:
build/a/%.o : src/a/%.cpp
$(CXX) -I src/a $(OTHER_FLAGS) -o $# $<
If a file under a tries to include a header file from b or c, the build fails as the header is not found.
The second rule has a similar rule, which specifies src/a and src/b as include directories. The problem arises with building c. The following is allowed.
src/c/C.cpp
#include "b.h"
void C() { ... }
src/b/b.h
#include "a.h"
class B { ... };
src/a/a.h
class A { ... };
Here, a file from c includes a file from b (allowed), which in turn includes a file from a (also allowed). We want to prevent code like this:
src/c/C_bad.cpp
// Direct inclusion of a
#include "a.h"
src/c/c_bad.h
// Direct inclusion of a
#include "a.h"
For the allowed case to compile, the compile command for building files in src/c must include a -Isrc/a, but that allows the second cases to also compile.
I suspect that the answer to my problem is writing a script which looks at the dependencies generated from the compiler, finds potentially illegal dependencies and then looks at the source files to determine if this is a direct dependency. Is there a reasonable way to do this combining the compiler and/or makefile constructs?
If it matters, we are using GNU Make 3.81 and g++ 4.5.3, but would like to be portable if possible.
Update
We are looking for something where it takes effort to violate the rules, not one where it takes effort to follow the rules. (Past experience has shown that the latter is unlikely to work.) While there are some good ideas in the other answer, I'm accepting the one that says to write a script, since that is the one that takes the most effort to work around.
Thanks to everyone for your answers.

Considering the fact that you're applying this on an existing code base, I would opt for the "validation script" approach.
So instead of modifying the build process and severing dependencies one at a time as the build fails, you get presented with a list of files that are non-complaint. You can then refactor your codebase having the "big picture" in mind and any changes you make will be built using the same Makefiles as before thus simplifying testing and debugging.
Once refactored, the analysis script can continue to be used as a compliance checker to validate future updates.
A possible starting point for such an analysis would be to use makedepend or cpp -MM. For example, using the cpp/h files you've listed in the question:
[me#home]$ find .
.
./b
./b/b.h
./a
./a/a.h
./c
./c/C_bad.cpp
./c/C.cpp
./c/c_bad.h
[me#home]$ cpp -MM -Ia -Ib -Ic */*.cpp
C_bad.o: c/C_bad.cpp a/a.h
C.o: c/C.cpp b/b.h a/a.h
[me#home]$ # This also works for header files
[me#home]$ cpp -Ia -Ib -Ic -MM c/c_bad.h
c_bad.o: c/c_bad.h a/a.h
It should be reasonably straight-forward to parse those output to determine the dependencies of each cpp file and flag up those that are non-compliant.
The drawback to this approach is that it cannot differentiate between direct and indirect dependencies, so if that matters you may need to include an extra step to inspect the source and pick out direct dependencies.

You can make the -I options target-specific:
build/b/%.o: CPPFLAGS += -Isrc/a
build/c/%.o: CPPFLAGS += -Isrc/b
This is specific to gnu-make, though, so it's not portable.

Yes. But it takes some manual effort and discipline.
When building C you can depend on headers in src/b/*.h.
Inside project B any header files in the main directory should be self-contained and not have dependencies on other projects. You also need a subdirectory inside B src/b/detail/*.h. In here header files are allowed to include src/a/*.h and src/b/*.h but this is a private implementation detail and only available to source files for the b project.

The easiest way is to change your include path to -Isrc for everything. Include statements then have the complete relative path
#include <a/a.h>
for example. This makes it much easier to check the code automatically (perhaps in a commit hook rather than the makefile).
Alternatively, you could do something nasty with macros in the A and B headers:
// src/a/a.h
#ifndef SRC_A_H
#define SRC_A_H
#ifndef ALLOW_A
#error "you're not allowed to include A headers here"
#endif
//...
and
// src/b/b.h
#ifndef SRC_B_H
#define SRC_B_H
#ifdef ALLOW_A_INDIRECT
#define ALLOW_A
#endif
#include <a/a.h>
//...
#ifdef ALLOW_A_INDIRECT
#undef ALLOW_A
#endif
#endif // include guard
Now these make rules will allow A and B to build ok:
build/a/%.o: CPPFLAGS += -DALLOW_A
build/b/%.o: CPPFLAGS += -DALLOW_A
and this will allow C access only via B (and the macros in B's headers)
build/c/%.o: CPPFLAGS += -DALLOW_A_INDIRECT
Note this requires some discipline especially in B's headers, but I suppose if it sits alongside existing include guards, it ... ok, it's actually still pretty nasty.

Related

Is there a way to tell the g++ compiler, not to look for include header in a certain -I path?

I'm trying to compile a cpp file that uses include headers from 2 folder locations. Both the folders has lot of headers that are necessary for my file.
Now, one of the header file is present in both the folders, but the problem is they are of different version. Hence the functions in that common header have same name but different API signature.
Something like this:
Folder A:
foo.hpp
bar1.hpp
bar2.hpp
bar3.hpp
Folder B:
foo.hpp
bar4.hpp
bar5.hpp
bar6.hpp
API of function foobar from foo.hpp of folder A:
void foobar(arg1, arg2);
API of function foobar from foo.hpp of folder B:
void foobar(arg1, arg2, arg3);
#include "foo.hpp"
#include "bar1.hpp"
...
#include "bar4.hpp"
...
...
int main(){
...
foobar (arg1, arg2);
...
}
g++ main.cpp -o MyExe -I< path-to-folder-A > -I< path-to-folder-B >
This throws the errors like multiple redefinition of function, no matching function call etc., for various functions in the header.
So, my question is: Are there any flags to tell the compiler only to consider the definition found from folder A and ignore the one from folder B?
Note on code limitations: I cannot alter the folders or files of A and B in any manner. Neither can I give absolute paths to the headers instead of -I.
Assuming that you can give more elaborate relative paths:
If folder b is a subfolder of main.cpp's, you can just use
#include "<relative-path-to-folder-b>/foo.hpp"
If folder b isn't a subfolder and you can add another -I directive, then add another -I directive:
g++ main.cpp -o MyExe -I<path-to-folder-before-b> -I< path-to-folder-A > -I< path-to-folder-B >
and then add the include
#include "b/foo.hpp"
One (very brittle, hack-ish and possibly slow-to-compile) way of solving this relies on the manual include guards (#ifndef MY_HEADER etc.) that are presumably present in the headers in question:
Gather all the header files that you want to use (excluding all the ones you don't want/need).
Create a central include file that #includes all of these files (by as absolute of a path as you can, i.e. make sure this picks the correct foo.hpp gathered above, not one of the "original" ones).
Tell your compiler to force-include this central include file. Realistically, you should just make this a precompiled header (or include it in yours).
Due to force-including all these headers at the start of every translation unit, all the include guard defines are already set before you ever reach the point of the "wrong" headers being included. The compiler will still copy-paste them in there, but the include guards will prevent any code in them from being considered.

How to structure a "library" of C++ source?

I'm developing a collection of C++ classes and am struggling with how to share the code in a way that maintains organization without compromising ease of compilation for a user of the collection.
Options that I have seen include:
Distribute compiled library file
Put the source in the header file (with implicit inline as discussed in this answer)
Use symbolic links to allow the compiler to find the files.
I'm currently using the third option where, for each class the I want to include I symbolic link each classess headers and source files (e.g. ln -s <path_to_class folder>/myclass.cpp) This works well except that I can't move the project folder location (it breaks all the symlinks) and I have to have all those symlinked files hanging around.
I like the second option (it has the appearance of Java), but I'm worried about code size bloat if everything is declared inline.
A user of the collection will create a project folder somewhere, and somehow include the collection into their compilation process.
I'd like a few things to be possible:
Easy compilation (something like gcc *.cpp from the project folder)
Easy distribution of library in uncompiled form.
Library organization by module.
Compiled code size is not bloated.
I'm not worried about documentation (Doxygen takes care of that) or compile time: the overall modules are small and even the largest projects on the slowest machines won't take more than a few seconds to compile.
I'm using the GCC compiler, if it makes any difference.
A library is the best option (in my opinion) of the three you raised. Then provide the header file(s) in the include path and the library in the linker path.
Since you also want to distribute the library in source code form, I would be inclined to provide a compressed archive (gzip, 7-zip, tarball, or other preferred format) in a central repository.
If I understand correctly, you do not want users to have to include the .cpp files in their build, but instead just want them to use either: (i) the headers directly, (ii) use a compiled form of the lib.
Your requirements are a bit unusual, but they can be achieved. It seems to me like you could organize your code in the following manner. First, have a global define that dictates whether or not you are compiling the library:
// global.h
// ...
#define LIB_SOURCE
// ...
Then in every header file, you check whether that define is set: if the library is distributed as a static/shared lib, the definitions are not included, otherwise, the '.cpp' file is included from the header file.
// A.h
#ifndef _A_H
#include "global.h"
#ifdef LIB_SOURCE
#include "A.cpp"
#endif
// ...
#endif
where 'A.cpp' would contain the actual implementation.
Again, this is a very strange way of doing things and I would actually advise against such practice. A better way (but one which requires more work) is to always distribute a shared library. But to keep things independent of the compiler, write a C layer around it. This way, you have a portable, maintainable library.
As for some of the other requirements:
Keep the build process simple by providing a Makefile
If you worry about the code size of the compiled library, look into gcc's optimization options (-Os). If you worry about the code size of the library when distributed in source-form in the headers, this is more tricky. Since the (inlined) code will actually be in the headers, the code will obviously grow with each inclusion in a .cpp file by the user.
I ended up using inline headers for all of the code. You can see the library here:
https://github.com/libpropeller/libpropeller/tree/master/libpropeller
The library is structured as:
library folder
class A
classA.h
classA.test.h
class B
classB.h
classB.test.h
class C
...
With this structure I can distribute the library as source, and all the user has to do is include -I/path/to/library in their makefile, and #include "library/classA/classA.h" in their source files.
And, as it turns out, having inline headers actually reduces the code size. I've done a full analysis of this, and it turns out that inline code in the headers allows the compiler to make the final binary roughly 5% smaller.

Generalizing include statements in c++ files when building with make

Hello (I am using Windows, mingw g++ compiler and mingw32-make)
To generalize my question I would like to learn how to write a c++ source file as follows:
Assuming that foo.cpp depends on foo.h where foo.cpp is in src\ and foo.h is in include\
// foo.cpp
#include "foo.h"
Normally I would just write it like this
//foo.cpp
#include "..\include\foo.h"
but I have found that as my project grows, and I begin to need more organization, that this method isn't dynamic enough. Reason being I have to change every include for every file if I want to move foo.h to a new directory (say include\bar\foo.h). Is there a way for make to achieve this. If so can it be done for header file dependencies as well.
As a side note I am new to makefiles. I am not even sure that it knows these includes are there since they are within the code (in fact from what I understand it doesn't). That would lead me to an unfortunate secondary question, which is can make see these includes? If not is it possible to change it so that it can? Feel free to answer how you would approach this problem because I have a feeling I am going about this the wrong way by putting the includes in the file rather than linking them in the makefile.
The compiler is always looking into some default paths to look for .h-files. You can add your path.
For example gcc takes multiple -I arguments which contain a path. In your foo.cpp you do:
#include "foo.h"
and when compiling you say:
g++ -I../include foo.cpp -c [other options]
.
Regarding the second part of your question: The makefile and the call to make does not normally know anything about the files to be compiled and about your project. However there are several default variables and directives in make which lead to that impression: It could be, that in your environment you only need to change the CFLAGS or CPPFLAGS variable to add the -I-argument and it will work.
Patrick B has answered very well on how to make the compiler know where to include from, but not the following bit:
As a side note I am new to makefiles. I am not even sure that it knows
these includes are there since they are within the code (in fact from
what I understand it doesn't). That would lead me to an unfortunate
secondary question, which is can make see these includes? If not is it
possible to change it so that it can?
No, make doesn't understand what your source files contain, or how they depend on other files [make also doesn't really care if you are programming in C, C++, Fortran, Pascal, ADA, Lisp, Cobol or Haskell - as long as there is a "If you have a file like this, and want a file like that by doing something" relationship between files, make will sort it for you.
There are several ways to do this. You can manually add:
foo.cpp: foo.h
Or you can use a dependency file for your include-file, and let make built it automatically, by adding this, for example:
SOURCES = foo.cpp # Add any further source files here.
INCLUDES = -I../includes # Add other include directories if needed.
CFLAGS += ${INCLUDES}
TARGET = foo.exe # in Windows. Just foo in linux/MacOS.
all: ${TARGET} deps.mk
${TARGET}: ${SOURCES}
gcc -O $# $^
desp.mk: ${SOURCES}
gcc -MM ${INCLUDES} $^ > $#
include deps.mk
Note that makefiles are RELYING on indentation being tabs. This post uses spaces, so you will need to "tabify" the recepies. Also note that in a "proper" makefile, you'd make foo.o from foo.cpp, etc, and link all the different .o files together. That way, the compile is a fair bit quicker for large projects. I've simplified it for readability.
Maybe I should expand a little bit:
gcc -MM gives a list (to standard out) of the files that are being "compiled" and all of it's dependencies. It doesn't actually compile the code (and as long as the code is at least SOMEWHAT) close to being compileable, it will happily process your files.
For more details on gcc -MM and related, have a look at the GCC invocation documentation.
The $# and $&^ are what make calls "Automatic variables" - they expand to the "target" (easy to remember, as it looks sort of like a target for shooting arrows at or similar) and "all dependencies" (no visual clue here, I'm afraid - and every now and again, I have to remind myself) respectively. Check out here for more details.

Combining C++ header files

Is there an automated way to take a large amount of C++ header files and combine them in a single one?
This operation must, of course, concatenate the files in the right order so that no types, etc. are defined before they are used in upcoming classes and functions.
Basically, I'm looking for something that allows me to distribute my library in two files (libfoo.h, libfoo.a), instead of the current bunch of include files + the binary library.
As your comment says:
.. I want to make it easier for library users, so they can just do one single #include and have it all.
Then you could just spend some time, including all your headers in a "wrapper" header, in the right order. 50 headers are not that much. Just do something like:
// libfoo.h
#include "header1.h"
#include "header2.h"
// ..
#include "headerN.h"
This will not take that much time, if you do this manually.
Also, adding new headers later - a matter of seconds, to add them in this "wrapper header".
In my opinion, this is the most simple, clean and working solution.
A little bit late, but here it is. I just recently stumbled into this same problem myself and coded this solution: https://github.com/rpvelloso/oneheader
How does it works?
Your project's folder is scanned for C/C++ headers and a list of headers found is created;
For every header in the list it analyzes its #include directives and assemble a dependency graph in the following way:
If the included header is not located inside the project's folder then it is ignored (e.g., if it is a system header);
If the included header is located inside the project's folder then an edge is create in the dependency graph, linking the included header to the current header being analyzed;
The dependency graph is topologically sorted to determine the correct order to concatenate the headers into a single file. If a cycle is found in the graph, the process is interrupted (i.e., if it is not a DAG);
Limitations:
It currently only detects single line #include directives (e.g., #include );
It does not handles headers with the same name in different paths;
It only gives you a correct order to combine all the headers, you still need to concatenate them (maybe you want remove or modify some of them prior to merging).
Compiling:
g++ -Wall -ggdb -std=c++1y -lstdc++fs oneheader.cpp -o oneheader[.exe]
Usage:
./oneheader[.exe] project_folder/ > file_sequence.txt
(Adapting an answer to my dupe question:)
There are several other libraries which aim for a single-header form of distribution, but are developed using multiple files; and they too need such a mechanism. For some (most?) it is opaque and not part of the distributed code. Luckily, there is at least one exception: Lyra, a command-line argument parsing library; it uses a Python-based include file fuser/joiner script, which you can find here.
The script is not well-documented, but they way you use it is with 3 command-line arguments:
--src-include - The include file to convert, i.e. to merge its include directives into its body. In your case it's libfoo.h which includes the other files.
--dst-include - The output file to write - the result of the merging.
--src-include-dir - The directory relative to which include files are specified (i.e. an "include search path" of one directory; the script doesn't support the complex mechanism of multiple include paths and search priorities which the C++ compiler offers)
The script acts recursively, so if file1.h includes another file under the --src-include-dir, that should be merged in as well.
Now, I could nitpick at the code of that script, but - hey, it works and it's FOSS - distributed with the Boost license.
If your library is so big that you cannot build and maintain a single wrapping header file like Kiril suggested, this may mean that it is not architectured well enough.
So if your library is really huge (above a million lines of source code), you might consider automating that, with tools like
GCC make dependency generator preprocessor options like -M -MD -MF etc, with another hand made script sorting them
expensive commercial static analysis tools like coverity
customizing a compiler thru plugins or (for GCC 4.6) MELT extensions
But I don't understand why you want an automated way of doing this. If the library is of reasonable size, you should understand it and be able to write and maintain a wrapping header by hand. Automating that task will take you some efforts (probably weeks, not minutes) so is worthwhile only for very large libraries.
If you have a master include file that includes all others available, you could simply hack a C preprocessor re-implementation in Perl. Process only ""-style includes and recursively paste the contents of these files. Should be a twenty-liner.
If not, you have to write one up yourself or try at random. Automatic dependency tracking in C++ is hard. Like in "let's see if this template instantiation causes an implicit instantiation of the argument class" hard. The only automated way I see is to shuffle your include files into a random order, see if the whole bunch compiles, and re-shuffle them until it compiles. Which will take n! time, you might be better off writing that include file by hand.
While the first variant is easy enough to hack, I doubt the sensibility of this hack, because you want to distribute on a package level (source tarball, deb package, Windows installer) instead of a file level.
You really need a build script to generate this as you work, and a preprocessor flag to disable use of the amalgamate (that could be for your uses).
To simplify this script/program, it helps to have your header structures and include hygiene in top form.
Your program/script will need to know your discovery paths (hint: minimise the count of search paths to one if possible).
Run the script or program (which you create) to replace include directives with header file contents.
Assuming your headers are all guarded as is typical, you can keep track of what files you have already physically included and perform no action if there is another request to include them. If a header is not found, leave it as-is (as an include directive) -- this is required for system/third party headers -- unless you use a separate header for external includes (which is not at all a bad idea).
It's good to have a build phase/translation that includes header alone and produces zero warnings or errors (warnings as errors).
Alternatively, you can create a special distribution repository so they never need to do more than pull from it occasionally.
What you want to do sounds "javascriptish" to me :-) . But if you insist, there is always "cat" (or the equivalent in Windows):
$ cat file1.h file2.h file3.h > my_big_file.h
Or if you are using gcc, create a file my_decent_lib_header.h with the following contents:
#include "file1.h"
#include "file2.h"
#include "file3.h"
and then use
$ gcc -C -E my_decent_lib_header.h -o my_big_file.h
and this way you even get file/line directives that will refer to the original files (although that can be disabled, if you wish).
As for how automatic is this for your file order, well, it is not at all; you have to decide the order yourself. In fact, I would be surprised to hear that a tool that orders header dependencies correctly in all cases for C/C++ can be built.
usually you don't want to include every bit of information from all your headers into the special header that enables the potential user to actually use your library. The non-trivial removal of type definitions, further includes or defines, that are not necessary for the user of your interface to know can not be automatedly done. As far as I know.
Short answer to your main question:
No.
My suggestions:
manually make a new header, that contains all relevant information (nothing more, nothing less) for the user of your library interface. Add nice documentation comments for each component it contains.
use forward declarations where possible, instead of full-fledged included definitions. Put the actual includes in your implementation files. The less include statements you have in your headers, the better.
don't build a deeply nested hierarchy of includes. This makes it extremely hard to keep an overview on the contents of every bit you include. The user of your library will look into the header to learn how to use it. And he will probably not be able to distinguish relevant code from irrelevant on the first sight. You want to maximize the ratio of relevant code per total code in the main header for your library.
EDIT
If you really do have a toolkit library, and the order of inclusion really does not matter, and you have a bunch of independent headers, that you want to enumerate just for convenience into a single header, then you can use a simple script. Like the following Python (untested):
import glob
with open("convenience_header.h", 'w') as f:
for header in glob.glob("*.h"):
f.write("#include \"%s\"\n" % header)

C++ header-implementation-header-implementation dependency chain

I'm trying to create simple C++ incremental-build tool with dependency resolver.
I've been confused about one problem with cpp build process.
Imagine we have a library consists several files:
// h1.h
void H1();
// s1.cpp
#include "h1.h"
#include "h2.h"
void H1(){ H2(); }
// h2.h
void H2();
// s2.cpp
#include "h2.h"
#include "h3.h"
void H2(){ /*some implementation*/ }
void H3(){ /*some implementation*/ }
// h3.h
void H3();
When in client code including h1.h
// app1.cpp
#include "h1.h"
int main()
{
H1();
return 0;
}
there is implicit dependency of s2.cpp implementation:
our_src -> h3 -> s1 -> h2 -> s2. So we need to link with two obj files:
g++ -o app1 app1.o s1.o s2.o
In contrast when h3.h included
// app2.cpp
#include "h3.h"
int main()
{
H3();
return 0;
}
there is only one source dependency:
our_src -> h3 -> s2
So when we include h3.h we need only s2.cpp compiled (in spite of s1.cpp -> h2.h inclusion):
g++ -o app2 app2.o s2.o
This is very simple example of the problem, in real projects surely we may have several hundreds files and chains of inefficient includes may contain much more files.
So my question is: Is there a way or instruments to find out which header inclusion could be omitted when we check dependencies (without CPP parsing)?
I would appreciate for any responce.
In the case you stated to see the implicit dependence on s2.cpp you need to parse the implementation module s1.cpp because only there you will find that the s1 module is using s2. So to the question "can I solve this problem without parsing .cpp files" the answer is clearly a no.
By the way as far as the language is concerned there is no difference between what you can put in an header file or in an implementation file. The #include directive doesn't work at the C++ level, it's just a textual macro function without any understanding of the language.
Moreover even parsing "just" C++ declarations is a true nightmare (the difficult part of C++ syntax are the declarations, not the statements/expressions).
May be you can use the result of gccxml that parses C++ files and returns an XML data structure that can be inspected.
This is not an easy problem. Just a couple of many things that make this difficult:
What if one header file is implemented in N>1 source files? For example, suppose class Foo is defined in foo.h but implemented in foo_cotr_dotr.cpp, foo_this_function.cpp, and foo_that_function.cpp.
What if the same capability is implemented in multiple source files? For example, suppose Foo::bar() has implementations in foo_bar_linux.cpp, foo_bar_osx.cpp, foo_bar_sunos.cpp. The implemention to be used depends on the target platform.
One easy solution is to build a shared or dynamic library and link against that library. Let the toolchain resolve those dependencies. Problem #1 disappears entirely, and problem #2 does too if you have a smart enough makefile.
If you insist on bucking this easy solution you are going to need to do something to resolve those dependencies yourself. You can eliminate the above problems (not an exhaustive list) by a project rule one header file == one source file. I have seen such a rule, but not nearly as often as I've seen a project rule that says one function == one source file.
You may have a look at how I implemented Wand. It uses a directive to add dependencies for individual source files. The documentation is not fully completed yet, but there are examples of Wand directives in the source code of Gabi.
Examples
Thread class include file
Thread.h needs thread.o at link time
#ifdef __WAND__
dependency[thread.o]
target[name[thread.h] type[include]]
#endif
Thread class implementation on windows (thread-win32.cpp)
This file should only be compiled when Windows is the target platform
#ifdef __WAND__
target[name[thread.o] type[object] platform[;Windows]]
#endif
Thread class implementation on GNU/Linux (thread-linux.cpp)
This file should only be compiled when GNU/Linux is the target platform. On GNU/Linux, the external library pthread is needed when linking.
#ifdef __WAND__
target
[
name[thread.o] type[object] platform[;GNU/Linux]
dependency[pthread;external]
]
#endif
Pros and cons
Pros
Wand can be extended to work for other programming languages
Wand will save all necessary data needed to successfully link a new program by just giving the command wand
The project file does not need to mention any dependencies since these are stored in the source files
Cons
Wand requires extra directives in each source file
The tool is not yet widely used by library writers