How to extract all the required definitions from header files in C++

How to extract all the required definitions from header files in C++ - c++

I would like to know how can one produce (either using gcc or an external tool) a single .h file that contains only the definitions of the types and functions used in the .cpp file.
For example: let's say I have multiple header files that contain a lot of functions (including their implementations) and I have a .cpp file that includes a single header file which in turns include all the other header files.
Inside the single .cpp file I call a function abc(some_argument); and I want to somehow grab only all the parts of code that I need for abc(some_argument) to work. Kind of producing a single header file that contains only the definitions required for abc(some_argument).
I know this idea might sound crazy, probable ugly, but I really need something like that.
Thank you!!

Related

Why use a "tpp" file when implementing templated functions and classes defined in a header?

Please refer to the first answer in this question about implementing templates.
Specifically, take note of this quote
A common solution to this is to write the template declaration in a header file, then implement the class in an implementation file (for example .tpp), and include this implementation file at the end of the header.
I bolded the part that I'm most interested in.
What's the significance of a .tpp file? I tried doing exactly what was suggested in that page and it worked. But then, I changed the file extension to any random gibberish (like .zz or .ypp) and it still worked! Is it supposed to work? Does it matter if it's .tpp or any other extension? And why not use a .cpp?
Here's another thing I'm confused about.
If my implementations were written in a .cpp and the header defined non-templated functions, then I'd only need to compile the .cpp file once, right? at least until I changed something in the .cpp file.
But if I have a header defining templated functions and my implementations are in a file with a random funky extension, how is it compiled? And are the implementations compiled every time I compile whatever source code #includes said header?

Does it matter if it's .tpp or any other extension? And why not use a .cpp?
It does not matter what the extension is, but don't use .cpp because it goes against conventions (it will still work, but don't do it; .cpp files are generally source files). Other than that it's a matter of what your codebase uses. For example I (and the Boost codebase) use .ipp for this purpose.
What's the significance of a .tpp file?
It's used when you don't want the file that contains the interface of a module to contain all the gory implementation details. But you cannot write the implementation in a .cpp file because it's a template. So you do the best you can (not considering explicit instantiations and the like). For example
Something.hpp
#pragma once
namespace space {
template <typename Type>
class Something {
public:
void some_interface();
};
} // namespace space
#include "Something.ipp"
Something.ipp
#pragma once
namespace space {
template <typename Type>
void Something<Type>::some_interface() {
// the implementation
}
} // namespace space
I thought the whole point of writing definitions in headers and the implementations in a separate file is to save compilation time, so that you compile the implementations only once until you make some changes
You can't split up general template code into an implementation file. You need the full code visible in order to use the template, that's why you need to put everything in the header file. For more see Why can templates only be implemented in the header file?
But if the implementation file has some funky looking file extension, how does that work in terms of compiling? Is it as efficient as if the implementations were in a cpp?
You don't compile the .tpp, .ipp, -inl.h, etc files. They are just like header files, except that they are only included by other header files. You only compile source (.cpp, .cc) files.

Files extensions are meaningless to the preprocessor; there's nothing sacred about .h either. It's just convention, so other programmers know and understand what the file contains.
The preprocessor will allow you to include any file into any translation unit (it's a very blunt tool). Extensions like that just help clarify what should be included where.

Does it matter if it's .tpp or any other extension? And why not use a .cpp?
It doesn't matter much which extension is actually used, as long it is different from any of the standard extensions used for C++ translation units.
The reasoning is to have a different file extension as they are usually detected by any C++ build systems for translation units (.cpp, .cc, ...). Because translating these as a source file would fail. They have to be #included by the corresponding header file containing the template declarations.
But if the implementation file has some funky looking file extension, how does that work in terms of compiling?
It needs to be #included to be compiled as mentioned.
Is it as efficient as if the implementations were in a cpp?
Well, not a 100% as efficient regarding compile time like a pure object file generated from a translation unit. It will be compiled again, as soon the header containing the #include statement changes.
And are the implementations compiled every time I compile whatever source code #includes said header?
Yes, they are.

The file extensions of header files do not matter in C++, even though standard source-file extensions such as .cpp should be avoided.
However, there are established conventions. These help human programmers navigate the code. Calling template implementation files .tpp is one of such conventions.
Something nobody has mentioned yet is that some external tools might rely on such conventions.
For instance, I routinely employ a popular grep substitute which allows searching only in files of a given type. This program will recognize .tpp files as C++, but not, say, .zz files

Splitting .cpp files without code changes

I have a .cpp that's getting rather large, and for easy management I'd like to split it into a few files. However, there are numerous globals, and I'd like to avoid the upkeep of managing a bunch of extern declarations across different files. Is there a way to have multiple .cpp files act as a single file? In essence, I'd like a way to divide the code without the division being recognized by the compiler.

Is there a way to have multiple .cpp files act as a single file?
Yes. That is the definition of #include. When you #include a file, you make a textual substitution of the included file in place of the #include directive. Thus, multiple included files act together to form one translation unit.
In your case, chop the file into several bits. Do this exactly -- do not add or detract any lines of text. Do not add header guards or anything else. You may break your files at almost any convenient location. The limitations are: the break must not occur inside a comment, nor inside a string, and it must occur at the end of a logical line.
Name the newly-created partial files according to some convention. They are not fully-formed translation units, so don't name them *.cpp. They are not proper header files, so don't name them *.h. Rather, they are partially-complete translation units. Perhaps you could name them *.pcpp.
As for the basename, choose the original file name, with a sequentially-numbered suffix: MyProg01.pcpp, MyProg02.pcpp, etc.
Finally, replace your original file with a series of #include statements:
#include "MyProg01.pcpp"
#include "MyProg02.pcpp"
#include "MyProg03.pcpp"

Of course, you can always just #include the various CPP-files into one master file which is the one that the compiler sees. It's a very bad idea though, and you will eventually get into headaches far worse than refactoring the file properly.

whilst you can declare the same set of globals in many cpp files, you will get a separate instance of each as the compiler compiles each file, which will then fail to link as they are combined.
The only answer is to put all your globals in their own file, then cut&paste them into a header file that contains extern declarations (this can easily be automated, but I find using the arrow keys to just paste 'extern' in front of them is quick and simple).
You could refactor everything, but often its not worth the effort (except when you need to change something for other reasons).
You could try splitting the files, and then using the compiler to tell you which globals are needed by each new file, and re-introducing just those directly into each file, keeping the true globals separately.
If you don't want to do this, just #include the cpp files.

C/C++ Forward declaration vs. Include

What is happening when you include some file and what is happening when you forward declare some function/class? If two files include the same file will the first one success to read all the function the second will fail but still be able to use the functions??
What happens when I forward declare some function? Is this function now "saved" and I can use it anywhere or it's known only for the same file? then why two files with include(to a file with guards) will work?
Can I just include every thing in the main and won't bother any longer?
EDIT:
And why the cpp files should include their headers?? What If i won't include them?

What is happening when you include some file and what is happening when you forward declare some function/class?
When you include a file, its contents get "copy and pasted" into the inclusion source by the preprocessor. When you forward declare a function/class you are declaring an incomplete type, letting the rest of the translation unit know that a function/class with that name exist and making it usable within context where an incomplete declaration is allowed.
If two files include the same file will the first one success to read all the function the second will fail but still be able to use the functions??
If the included file includes proper include guards, the second inclusion within the same translation unit will be effectively a no-op. If two different source files include that same header file, the full content will be included in both files.
What happens when I forward declare some function? Is this function now "saved" and I can use it anywhere or it's known only for the same file? then why two files with include(to a file with guards) will work?
The function can only be used within the translation unit that contains the forward declaration. Generally each source file (.cpp) is a different translation unit, macro definitions (those of the header guards) as well as declarations/definitions are valid within that translation unit. Header guards prevent the same header file from being included more than once within the same translation unit, to prevent multiple declaration errors.

What is happening when you include some file and what is happening
when you forward declare some function/class?
When you include a file, the preprocessor effectively copy pastes the entire included file into the file doing the includeing. When you forward declare a function/class, you're telling the compiler that it exists, but you don't need the entire header file. This is required when you have circular dependancies, and and greatly reduce compile times in other places.
If two files include the same file will the first one success to read
all the function the second will fail but still be able to use the
functions??
If the same file gets included twice in one translation unit (.cpp file), then the both will "succeed", but if the header has include guards of any sort, nothing will be loaded the second time, because the preprocessor has already "copied" it into the translation unit, and to do it a second time would make duplicates of everything, which would be a bug. So all files involved can use all the functions in all the headers included up to that point.
What happens when I forward declare some function? Is this function
now "saved" and I can use it anywhere or it's known only for the same
file? then why two files with include(to a file with guards) will work?
Yes, if you forward declare a function/class in a header, it can be used by any other file that includes that header.
Can I just include every thing in the main and won't bother any longer?
Probably. Once you get to more complicated examples, you'll end up with circular dependancies, which require certain things to be declared and/or defined in a certain order. Other than that, yes. You can include everything in main and keep it simple. However, your code will take FOREVER to compile.
And why the cpp files should include their headers?? What If i won't include them?
Then that .cpp file won't know that anything else exists outside of itself. Not very useful.

Short answer:
Forwarding a class/function allows the compiler to not actually have to compile the entire class/function unless needed.
Long answer:
Forwarding a class/function is just like declaring a class/function without defining it. You're promising to define it later, but for now you just want to inform the compiler it exists. You usually do these forward decelerations in header files. This generally results in faster compile times because in .cpp files that include your header, only those that actually need the class you've forwarded and include it's appropriate header file need to actually compile that included class's code.

where should "include" be put in C++

I'm reading some c++ code and Notice that there are "#include" both in the header files and .cpp files . I guess if I move all the "#include" in the file, let's say foo.cpp, to its' header file foo.hh and let foo.cpp only include foo.hh the code should work anyway taking no account of issues like drawbacks , efficiency and etc .
I know my "all of sudden" idea must be in some way a bad idea, but what is the exact drawbacks of it? I'm new to c++ so I don't want to read lots of C++ book before I can answer this question by myself. so just drop the question here for your help . thanks in advance.

As a rule, put your includes in the .cpp files when you can, and only in the .h files when that is not possible.
You can use forward declarations to remove the need to include headers from other headers in many cases: this can help reduce compilation time which can become a big issue as your project grows. This is a good habit to get into early on because trying to sort it out at a later date (when its already a problem) can be a complete nightmare.
The exception to this rule is templated classes (or functions): in order to use them you need to see the full definition, which usually means putting them in a header file.

The include files in a header should only be those necessary to support that header. For example, if your header declares a vector, you should include vector, but there's no reason to include string. You should be able to have an empty program that only includes that single header file and will compile.
Within the source code, you need includes for everything you call, of course. If none of your headers required iostream but you needed it for the actual source, it should be included separately.
Include file pollution is, in my opinion, one of the worst forms of code rot.
edit: Heh. Looks like the parser eats the > and < symbols.

You would make all other files including your header file transitively include all the #includes in your header too.
In C++ (as in C) #include is handled by the preprocessor by simply inserting all the text in the #included file in place of the #include statement. So with lots of #includes you can literally boast the size of your compilable file to hundreds of kilobytes - and the compiler needs to parse all this for every single file. Note that the same file included in different places must be reparsed again in every single place where it is #included! This can slow down the compilation to a crawl.
If you need to declare (but not define) things in your header, use forward declaration instead of #includes.

While a header file should include only what it needs, "what it needs" is more fluid than you might think, and is dependent on the purpose to which you put the header. What I mean by this is that some headers are actually interface documents for libraries or other code. In those cases, the headers must include (and probably #include) everything another developer will need in order to correctly use your library.

Including header files from within header files is fine, so is including in c++ files, however, to minimize build times it is generally preferable to avoid including a header file from within another header unless absolutely necessary especially if many c++ files include the same header.

.hh (or .h) files are supposed to be for declarations.
.cpp (or .cc) files are supposed to be for definitions and implementations.
Realize first that an #include statement is literal. #include "foo.h" literally copies the contents of foo.h and pastes it where the include directive is in the other file.
The idea is that some other files bar.cpp and baz.cpp might want to make use of some code that exists in foo.cc. The way to do that, normally, would be for bar.cpp and baz.cpp to #include "foo.h" to get the declarations of the functions or classes that they wanted to use, and then at link time, the linker would hook up these uses in bar.cpp and baz.cpp to the implementations in foo.cpp (that's the whole point of the linker).
If you put everything in foo.h and tried to do this, you would have a problem. Say that foo.h declares a function called doFoo(). If the definition (code for) this function is in foo.cc, that's fine. But if the code for doFoo() is moved into foo.h, and then you include foo.h inside foo.cpp, bar.cpp and baz.cpp, there are now three definitions for a function named doFoo(), and your linker will complain because you are not allowed to have more than one thing with the same name in the same scope.

If you #include the .cpp files, you will probably end up with loads of "multiple definition" errors from the linker. You can in theory #include everything into a single translation unit, but that also means that everything must be re-built every time you make a change to a single file. For real-world projects, that is unacceptable, which is why we have linkers and tools like make.

There's nothing wrong with using #include in a header file. It is a very common practice, you don't want to burden a user a library with also remembering what other obscure headers are needed.
A standard example is #include <vector>. Gets you the vector class. And a raft of internal CRT header files that are needed to compile the vector class properly, stuff you really don't need nor want to know about.

You can avoid multiple definition errors if you use "include guards".
(begin myheader.h)
#ifndef _myheader_h_
#define _myheader_h_
struct blah {};
extern int whatsit;
#endif //_myheader_h_
Now if you #include "myheader.h" in other header files, it'll only get included once (due to _myheader_h_ being defined). I believe MSVC has a "#pragma once" with the equivalent functionality.

Should every C or C++ file have an associated header file?

Should every .C or .cpp file should have a header (.h) file for it?
Suppose there are following C files :
Main.C
Func1.C
Func2.C
Func3.C
where main() is in Main.C file. Should there be four header files
Main.h
Func1.h
Func2.h
Func3.h
Or there should be only one header file for all .C files?
What is a better approach?

For a start, it would be unusual to have a main.h since there's usually nothing that needs to be exposed to the other compilation units at compile time. The main() function itself needs to be exposed for the linker or start-up code but they don't use header files.
You can have either one header file per C file or, more likely in my opinion, a header file for a related group of C files.
One example of that is if you have a BTree implementation and you've put add, delete, search and so on in their own C files to minimise recompilation when the code changes.
It doesn't really make sense in that case to have separate header files for each C file, as the header is the API. In other words, it's the view of the library as seen by the user. People who use your code generally care very little about how you've structured your source code, they just want to be able to write as little code as possible to use it.
Forcing them to include multiple distinct header files just so they can create, insert into, delete from, and search, a tree, is likely to have them questioning your sanity :-)
You would be better off with one btree.h file and a single btree.lib file containing all of the BTree object files that were built from the individual C files.
Another example can be found in the standard C headers.
We don't know for certain whether there are multiple C files for all the stdio.h functions (that's how I'd do it but it's not the only way) but, even if there were, they're treated as a unit in terms of the API.
You don't have to include stdio_printf.h, stdio_fgets.h and so on - there's a single stdio.h for the standard I/O part of the C runtime library.

Header files are not mandatory.
#include simply copy/paste whatever file included (including .c source files)
Commonly used in real life projects are global header files like config.h and constants.h that contains commonly used information such as compile-time flags and project wide constants.
A good design of a library API would be to expose an official interface with one set of header files and use an internal set of header files for implementation with all the details. This adds a nice extra layer of abstraction to a C library without adding unnecessary bloat.
Use common sense. C/C++ is not really for the ones without it.

I used to follow the "it depends" trend until I realized that consistency, uniformity and simplicity are more important than saving the effort to create a file, and that "standards are good even when they are bad".
What I mean is the following: a .cpp/.h pair of files is pretty much what all "modules" end up anyway. Making the existing of both a requirement saves a lot of confusion and bad engineering.
For instance, when I see some interface of something in a header file, I know exactly where to search for / place its implementation. Conversely, if I need to expose the interface of something that was previously hidden in .cpp file (e.g. static function becoming global), I know exactly where to put it.
I've seen too many bad consequences of not following this simple rule. Unnecessary inline functions, breaking any kind of rules about encapsulation, (non)separation of the interface and implementation, misplaced code, to name a few -- all due to the fact that the appropriate sibling header or cpp file was never added.
So: always define both .h and .c files. Make it a standard, follow it, and safely rely on it. Life is much simpler this way, and simplicity is the most important thing in software.

Generally it's best to have a header file for each .c file, containing the declarations for functions etc in the .c file that you want to expose. That way, another .c file can include the .h file for the functions it needs, and won't need to be recompiled if a header file it didn't include got changed.

Generally there will be one .h file for each .c/.cpp file.

Bjarne Stroustrup Explains it beautifully in his book "The C++ Programming Language"....
The single header style of physical partitioning is most useful when the program is small and its parts are not intended for separate use. When namespaces are used, the logical structure of the program can still be explained in a single header file.
For larger Programs, the single header file approach is unworkable in a conventional file-based development environment. A change to the common header forces recompilation of the whole program, and updates of that single header by several programmers are error prone. Unless strong emphasis is placed on programming styles relying heavily on namespaces and classes, the logical structure deteriorates as program grows.
An alternative physical organization lets each logical module have its own header defining the facilities it provides. Each .c file then has a corresponding h. file specifying what it provides(its interface). Each .c module includes its own .h file and usually also other .h files that specifies what it needs from other modules in order to implement the services advertised in its interface. This physical organization corresponds to the logical organization of a module. The multiple header approach makes it easy to determine the dependencies. The single header approach forces us to look at every declarations used by any module and decide if its relevant. The simple fact is that maintenance of a code is invariably done with incomplete information and from a local perspective.
The better localization leads to less information to compile a module and thus faster compilation..

It depends. Usually your reason for having separate .c files will dictate whether you need separate .h files.

Generally cpp/c files are for implementation and h/hpp (hpp are not used often) files are for header files (prototypes and declarations only). Cpp files don't always have to have a header file associated with it but it usually does as the header file acts like a bridge between cpp files so each cpp file can use code from another cpp file.
One thing that should be strongly enforced is the no use of code within a header file! There's been too many times where header files break compiles in any size project because of redefinitions. And that's simply when you include the header file in 2 different cpp files. Header files should always be designed to be included multiple times as well. Cpp files should never be included.

It's all about what code needs to be aware of what other code. You want to reduce the amount other files are aware of to the bare minimum for them to do their jobs.
They need to know that a function exists, what types they need to pass into it, and what types it will return, but not what it's doing internally. Note that it's also important from the programmers point of view to know what those types actually mean. (e.g which int is the row, and which is the column) but the code itself doesn't care. This is why naming the function and parameters sensibly is worthwhile.
As others have said, if there's nothing in a cpp file worth exposing to other parts of the code, as is normally the case with main.c, then there's no need for a header file.
It's occasionally worth putting everything you want to expose in a single header file (e.g, Func1and2and3.h), so that anything that knows about Func1 knows about Func2 as well, but I'm personally not keen on this, as it means that you tend to load a hell of a lot of junk along with the stuff you actually want.
Summary:
Imagine that you trust that someone can write code and that their algorithms, design, etc. are all good. You want to use code they've written. All you need to know is what to give them to get something to happen, what you should give it to, and what you'll get back. That's what needs to go in the header files.

I like putting interfaces into header files and implementation in cpp files. I don't like writing C++ where I need to add member variables and prototypes to the header and then the method again in the C++. I prefer something like:
module.h
struct IModuleInterface : public IUnknown
{
virtual void SomeMethod () = 0;
}
module.cpp
class ModuleImpl : public IModuleInterface,
public CObject // a common object to do the reference
// counting stuff for IUnknown (so we
// can stick this object in a smart
// pointer).
{
ModuleImpl () : m_MemberVariable (0)
{
}
int m_MemberVariable;
void SomeInternalMethod ()
{
// some internal code that doesn't need to be in the interface
}
void SomeMethod ()
{
// implementation for the method in the interface
}
// whatever else we need
};
I find this is a really clean way of separating implementation and interface.

There is no better approach, only common and less common cases.
The more common case is when you have a class/function interface to declare/define. It's better to have only one .cpp/.c with the definitions, and one header for the declarations.
Giving them the same name makes easy to understand that they are directly related.
But that's not a "rule", that's the common way and the most efficient in almost all cases.
Now in some cases( like template classes or some tiny struct definition ) you'll not need any .c/.cpp file, just the header. We often have some virtual class interface definition in only a header file for example, with only virtual pure functions or trivial functions.
And in other rare cases (like an hypothetical main.c/.cpp file) if wouldn't be always required to allow code from external compilation unit to call the function of a given compilation unit. The main function is an example (no header/declaration needed), but there are others, mostly when it's code that "connect all the other parts together" and is not called by other parts of the application. That's very rare but in this case a header make no sense.

If your file exposes an interface - that is, if it has functions which will be called from other files - then it should have a header file. Otherwise, it shouldn't.

As already noted, generally, there will be one header (.h) file for each source (.c or .cpp) file.
However, you should look at the cohesiveness of the files. If the various source files provide separate, individually reusable sets of functions - an ideal organization - then you should certainly have one header per file. If, however, the three source files provide a composite set of functions (that is too big to fit into one file), then you would use a more complex organization. There would be one header for the external services used by the main program - and that would be used by other programs needing the same services. There would also be a second header used by the cooperating source files that provides 'internal' definitions shared by those files.
(Also noted by Pax): The main program does not normally need its own header - no other source code should be using the services it provides; it uses the services provided by other files.

If you want your compiled code to be used from another compilation unit you will need the header files. There are some situations for which you do now need/want to have a headers.
The first such case are main.c/cpp files. This class is not meant to be included and as such there is no need for a header file.
In some cases you can have a header file that defines behavior of a set of different implementations that are loaded through a dll that is loaded at runtime. There will be different set of .c/.cpp files that implement variations of the same header. This can be common in plugin systems.

In general, I don't think there is any explicit relationship between .h and .c files. In many cases (probably most), a unit of code is a library of functionality with a public interface (.h) and an opaque implementation (.c). Sometimes a number of symbols are needed, like enums or macros, and you get a .h with no corresponding .c and in a few circumstances, you will have a lump of code with no public interface and no corresponding .h
in particular, there are a number of times when, for the sake of readability, the headers or implementations (seldom both) are so big and hairy that they end up being broken into many smaller files, for the sake of the programmer's sanity.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js