Splitting .cpp files without code changes - c++

I have a .cpp that's getting rather large, and for easy management I'd like to split it into a few files. However, there are numerous globals, and I'd like to avoid the upkeep of managing a bunch of extern declarations across different files. Is there a way to have multiple .cpp files act as a single file? In essence, I'd like a way to divide the code without the division being recognized by the compiler.

Is there a way to have multiple .cpp files act as a single file?
Yes. That is the definition of #include. When you #include a file, you make a textual substitution of the included file in place of the #include directive. Thus, multiple included files act together to form one translation unit.
In your case, chop the file into several bits. Do this exactly -- do not add or detract any lines of text. Do not add header guards or anything else. You may break your files at almost any convenient location. The limitations are: the break must not occur inside a comment, nor inside a string, and it must occur at the end of a logical line.
Name the newly-created partial files according to some convention. They are not fully-formed translation units, so don't name them *.cpp. They are not proper header files, so don't name them *.h. Rather, they are partially-complete translation units. Perhaps you could name them *.pcpp.
As for the basename, choose the original file name, with a sequentially-numbered suffix: MyProg01.pcpp, MyProg02.pcpp, etc.
Finally, replace your original file with a series of #include statements:
#include "MyProg01.pcpp"
#include "MyProg02.pcpp"
#include "MyProg03.pcpp"

Of course, you can always just #include the various CPP-files into one master file which is the one that the compiler sees. It's a very bad idea though, and you will eventually get into headaches far worse than refactoring the file properly.

whilst you can declare the same set of globals in many cpp files, you will get a separate instance of each as the compiler compiles each file, which will then fail to link as they are combined.
The only answer is to put all your globals in their own file, then cut&paste them into a header file that contains extern declarations (this can easily be automated, but I find using the arrow keys to just paste 'extern' in front of them is quick and simple).
You could refactor everything, but often its not worth the effort (except when you need to change something for other reasons).
You could try splitting the files, and then using the compiler to tell you which globals are needed by each new file, and re-introducing just those directly into each file, keeping the true globals separately.
If you don't want to do this, just #include the cpp files.

Related

Making one common header file for parse tree implementation

I'm making a parse tree in Bison. Currently I have one class for each non-terminal and one subclass for each production. The problem is that I have one header for each class, so they are a lot. The solution I thought is to made a common header that includes all the headers.
Example of current project structure:
-ast
--program.hh
--decl.hh
--..
--..
--..
--constants.hh
The common header (say common_header.hh) looks like:
#ifndef COMMON_HEADER_HH
#define COMMON_HEADER_HH
#include "program.hh"
#include "decl.hh"
// a lot of includes here
#include "constants.hh"
#endif //COMMON_HEADER_HH
So in Bison I just include #include "common_header.hh", the problem is that I read that this is considered bad practice because it can produce an overhead and increment compilation times. Is this case justified to make this? The parser will always use all the headers.
In C++ (and C), it is good practice to minimize the size of each translation unit to a reasonable extent. Creating a single header file which includes many others is usually a poor practice.
However, you seem to be describing a case where any translation unit which includes any of this group of headers will need to include the entire group of headers. In such a case, it doesn't matter whether you include them all directly vs indirectly via a single monster header.
Still, it only makes sense to create the monster header if it will be used in many translation units. If it's only going to be included in one translation unit, there's no advantage vs explicitly including all the headers there.
One other potential advantage of the monster header is that you might be able to generate it at build time given that you already have a list of the Bison grammar files somewhere in your build system. But this is quite a minor convenience, because adding a new grammar file is not useful until you add code which uses it.

When is it necessary to separately declare a class in a ”.h” file and provide the function implementations in a ”.cpp” file in c++?

When is it necessary to separately declare a class in a ”.h” file and provide the
function implementations in a ”.cpp” file?
It is not strictly necessary, as far as the C++ language is concerned. You can put all class methods inline in the .h file.
However, putting the implementations into a separate .cpp offers many benefits, such as:
C++ is very complex. As the code grows, it will take longer and longer to compile it. Every .cpp file that includes the same header file will end up compiling the same code, over and over again.
Related to the first point: if any change is made to the class's methods, if all the class methods are in a separate .cpp file, only that .cpp needs recompilation. If all class methods are placed inline into the .h file, every .cpp that includes will must be recompiled.
Very often, the class's methods will use other classes as part of doing whatever they need to do. So, if they're all placed inline in the .h file, the .h file that defines those other classes will need to be included also, also slowing down the compilation of every .cpp file that includes the header file. If the class methods are in a separate .cpp file, only that .cpp file needs to include the other headers, and most of the time it's only necessary to add some forward declarations to the .h.
It's done that way so that you only build the class' code one time.
If you put the class' code in the .h file, then every file that picks up the .h (to access the public functions of the class) will also duplicate the class' code.
The compiler will happily do this for you.
The linker, however, will complain mightily about duplicate lvalues in the namespace.
Along the same lines, yet conversely: inline functions need to be in the .h so that their code will get picked up in the other code files, which is exactly the intent of inline functions.
If you want to use declarations to implement/define the function, declarations that you don't want to make visible in the *.h file, then it would be necessary to move the definition of the function to a separate file.
Usually that's a good separation between class definition (.h) and class implementation (.cpp) People can just read the .h files to know and use the class without bothering reading the implementation details.
It's, however, not mandatory to always separate .h and .cpp, you can have the class definition and implementation in a single file (eg., for some simple classes, or some quick prototypes).
From a technical perspective (in terms of what a compiler needs or will accept) it is almost never necessary - it is possible to copy/paste the content of every (non-standard) header file into the source files that include them, and compile that. After all, that is effectively what the preprocessor does with #include directives - copy the included file in place, after which the resultant source is fed to later phases of the compiler.
It is possible for a compiler to run out of memory when compiling source - in which case breaking the program into smaller pieces, including header files, can help - but such circumstances (on machines with very limited hardware resources, such as memory) are very rare in modern development.
However, humans are less consistent and more error prone than compilers when dealing with source files, so humans benefit from use of header files. For example, instead of typing (or copying in) needed declarations into every source file that needs them (an activity which people find boring, and tend to make mistakes when doing) simply place the declarations in a header file and #include it when needed.
So then it comes down to when placing declarations in a header file makes life easier for a human, allowing them to avoid making errors, and to focus their effort on the creative parts of software development (implementing new things) rather than the mechanical (copying function declarations into source files that need them).
In practice, it normally works out that a class which will be used within more than one compilation unit (aka source file) is better off being defined in a header file. A class which is local to a single compilation unit (e.g. to contain implementation details for that compilation unit that do not need to be directly accessed by others) does not need to be in a header file, since it can be defined directly without use of a header. The problems come in if such "local" classes later need to be used in other compilation units - in that case, it is usually advisable to migrate the necessary declarations to a header file, to aid reuse.
Header files also tend to become necessary for authors of libraries - who write a set of functions for use by other programmers, but don't wish to ship the source. This is a non-technical constraint (i.e. policy based), rather than a technical one. In that case, they can distribute the header files and the compiled object (or library) files, and keep their source code private. Of course, technically, they could provide a set of text files with instructions of the form "copy these declarations to your program when you need to use them" instead of header files ..... but that would make the library unpopular with developers, since it forces them back into the mundane and error-prone activity of copying text around rather than doing useful development.
Considerations like reducing compile times are also non-technical reasons (a compiler doesn't care how long it takes to build a program, but people do). Separating class definitions into header (class definition, any inline functions) and separate source (definition of non-inline member functions) does tend to reduce build times, and aid with incremental builds.

Ways not to write function headers twice?

I've got a C/C++ question, can I reuse functions across different object files or projects without writing the function headers twice? (one for defining the function and one for declaring it)
I don't know much about C/C++, Delphi and D. I assume that in Delphi or D, you would just write once what arguments a function takes and then you can use the function across diferent projects.
And in C you need the function declaration in header files *again??, right?. Is there a good tool that will create header files from C sources? I've got one, but it's not preprocessor-aware and not very strict. And I've had some macro technique that worked rather bad.
I'm looking for ways to program in C/C++ like described here http://www.digitalmars.com/d/1.0/pretod.html
Imho, generating the headers from the source is a bad idea and is unpractical.
Headers can contain more information that just function names and parameters.
Here are some examples:
a C++ header can define an abstract class for which a source file may be unneeded
A template can only be defined in a header file
Default parameters are only specified in the class definition (thus in the header file)
You usually write your header, then write the implementation in a corresponding source file.
I think doing the other way around is counter-intuitive and doesn't fit with the spirit of C or C++.
The only exception is can see to that is the static functions. A static function only appears in its source file (.cor .cpp) and can't (obviously) be used elsewhere.
While I agree it is often annoying to copy the header definition of a method/function to the source file, you can probably configure your code editor to ease this. I use Vim and a quick script helped me with this a lot. I guess a similar solution exists for most other editors.
Anyway, while this can seem annoying, keep in mind it also gives a greater flexibility. You can distribute your header files (.h, .hpp or whatever) and then transparently change the implementation in source files afterward.
Also, just to mention it, there is no such thing as C/C++: there is C and there is C++; those are different languages (which indeed share much, but still).
It seems to me that you don't really need/want to auto-generate headers from source; you want to be able to write a single file and have a tool that can intelligently split that into a header file and a source file.
Unfortunately, I'm not aware of any such tool. It's certainly possible to write one - but you'd need a given a C++ front end. You could try writing something using clang - but it would be a significant amount of work.
Considering you have declared some functions and wrote their implementation you will have a .c/cpp file and a header .h file.
What you must do in order to use those functions:
Create a library (DLL/so or static library .a/.lib - for now I recommend static library for the ease of use) from the files were the implementation resides
Use the header file (#include it) (you don't need to rewrite the header file again) in your programs to obtain the function definitions and link with your library from step 1.
Though >this< is an example for Visual Studio it makes perfect sense for other development environments also.
This seems like a rudimentary question, so assuming I have not mis-read,
Here is a basic example of re-use, to answer your first question:
#include "stdio.h"
int main( int c, char ** argv ){
puts( "Hello world" );
}
Explanation:
1. stdio.h is a C header file containing (among others) the definition of a function called puts().
2. in main, puts() is called, from the included definition.
Some compilers (including gcc I think ) have an option to generate headers.
There is always very much confusion about headers and source-files in C++. The links I provided should help to clear that up a little.
If you are in the situation that you want to extract headers from source-file, then you probably went about it the wrong way. Usually you first declare your function in a header-file, and then provide an implementation (definition) for it in a source-file. If your function is actually a method of a class, you can also provide the definition in header file.
Technically, a header file is just a bunch of text that is actually inserted into the source file by the preprocessor:
#include <vector>
tells the preprocessor to insert contents of the file vector at the exact place where the #include appears. This really just text-replacement. So, header-files are not some kind of special language construct. They contain normal code. But by putting that code into a separate file, you can easily include it in other files using the preprocessor.
I think it's a good question which is what led me to ask this: Visual studio: automatically update C++ cpp/header file when the other is changed?
There are some refactoring tools mentioned but unfortunately I don't think there's a perfect solution; you simply have to write your function signatures twice. The exception is when you are writing your implementations inline, but there are reasons why you can't or shouldn't always do this.
You might be interested in Lazy C++. However, you should do a few projects the old-fashioned way (with separate header and source files) before attempting to use this tool. I considered using it myself, but then figured I would always be accidentally editing the generated files instead of the lzz file.
You could just put all the definitions in the header file...
This goes against common practice, but is not unheard of.

where should "include" be put in C++

I'm reading some c++ code and Notice that there are "#include" both in the header files and .cpp files . I guess if I move all the "#include" in the file, let's say foo.cpp, to its' header file foo.hh and let foo.cpp only include foo.hh the code should work anyway taking no account of issues like drawbacks , efficiency and etc .
I know my "all of sudden" idea must be in some way a bad idea, but what is the exact drawbacks of it? I'm new to c++ so I don't want to read lots of C++ book before I can answer this question by myself. so just drop the question here for your help . thanks in advance.
As a rule, put your includes in the .cpp files when you can, and only in the .h files when that is not possible.
You can use forward declarations to remove the need to include headers from other headers in many cases: this can help reduce compilation time which can become a big issue as your project grows. This is a good habit to get into early on because trying to sort it out at a later date (when its already a problem) can be a complete nightmare.
The exception to this rule is templated classes (or functions): in order to use them you need to see the full definition, which usually means putting them in a header file.
The include files in a header should only be those necessary to support that header. For example, if your header declares a vector, you should include vector, but there's no reason to include string. You should be able to have an empty program that only includes that single header file and will compile.
Within the source code, you need includes for everything you call, of course. If none of your headers required iostream but you needed it for the actual source, it should be included separately.
Include file pollution is, in my opinion, one of the worst forms of code rot.
edit: Heh. Looks like the parser eats the > and < symbols.
You would make all other files including your header file transitively include all the #includes in your header too.
In C++ (as in C) #include is handled by the preprocessor by simply inserting all the text in the #included file in place of the #include statement. So with lots of #includes you can literally boast the size of your compilable file to hundreds of kilobytes - and the compiler needs to parse all this for every single file. Note that the same file included in different places must be reparsed again in every single place where it is #included! This can slow down the compilation to a crawl.
If you need to declare (but not define) things in your header, use forward declaration instead of #includes.
While a header file should include only what it needs, "what it needs" is more fluid than you might think, and is dependent on the purpose to which you put the header. What I mean by this is that some headers are actually interface documents for libraries or other code. In those cases, the headers must include (and probably #include) everything another developer will need in order to correctly use your library.
Including header files from within header files is fine, so is including in c++ files, however, to minimize build times it is generally preferable to avoid including a header file from within another header unless absolutely necessary especially if many c++ files include the same header.
.hh (or .h) files are supposed to be for declarations.
.cpp (or .cc) files are supposed to be for definitions and implementations.
Realize first that an #include statement is literal. #include "foo.h" literally copies the contents of foo.h and pastes it where the include directive is in the other file.
The idea is that some other files bar.cpp and baz.cpp might want to make use of some code that exists in foo.cc. The way to do that, normally, would be for bar.cpp and baz.cpp to #include "foo.h" to get the declarations of the functions or classes that they wanted to use, and then at link time, the linker would hook up these uses in bar.cpp and baz.cpp to the implementations in foo.cpp (that's the whole point of the linker).
If you put everything in foo.h and tried to do this, you would have a problem. Say that foo.h declares a function called doFoo(). If the definition (code for) this function is in foo.cc, that's fine. But if the code for doFoo() is moved into foo.h, and then you include foo.h inside foo.cpp, bar.cpp and baz.cpp, there are now three definitions for a function named doFoo(), and your linker will complain because you are not allowed to have more than one thing with the same name in the same scope.
If you #include the .cpp files, you will probably end up with loads of "multiple definition" errors from the linker. You can in theory #include everything into a single translation unit, but that also means that everything must be re-built every time you make a change to a single file. For real-world projects, that is unacceptable, which is why we have linkers and tools like make.
There's nothing wrong with using #include in a header file. It is a very common practice, you don't want to burden a user a library with also remembering what other obscure headers are needed.
A standard example is #include <vector>. Gets you the vector class. And a raft of internal CRT header files that are needed to compile the vector class properly, stuff you really don't need nor want to know about.
You can avoid multiple definition errors if you use "include guards".
(begin myheader.h)
#ifndef _myheader_h_
#define _myheader_h_
struct blah {};
extern int whatsit;
#endif //_myheader_h_
Now if you #include "myheader.h" in other header files, it'll only get included once (due to _myheader_h_ being defined). I believe MSVC has a "#pragma once" with the equivalent functionality.

Should every C or C++ file have an associated header file?

Should every .C or .cpp file should have a header (.h) file for it?
Suppose there are following C files :
Main.C
Func1.C
Func2.C
Func3.C
where main() is in Main.C file. Should there be four header files
Main.h
Func1.h
Func2.h
Func3.h
Or there should be only one header file for all .C files?
What is a better approach?
For a start, it would be unusual to have a main.h since there's usually nothing that needs to be exposed to the other compilation units at compile time. The main() function itself needs to be exposed for the linker or start-up code but they don't use header files.
You can have either one header file per C file or, more likely in my opinion, a header file for a related group of C files.
One example of that is if you have a BTree implementation and you've put add, delete, search and so on in their own C files to minimise recompilation when the code changes.
It doesn't really make sense in that case to have separate header files for each C file, as the header is the API. In other words, it's the view of the library as seen by the user. People who use your code generally care very little about how you've structured your source code, they just want to be able to write as little code as possible to use it.
Forcing them to include multiple distinct header files just so they can create, insert into, delete from, and search, a tree, is likely to have them questioning your sanity :-)
You would be better off with one btree.h file and a single btree.lib file containing all of the BTree object files that were built from the individual C files.
Another example can be found in the standard C headers.
We don't know for certain whether there are multiple C files for all the stdio.h functions (that's how I'd do it but it's not the only way) but, even if there were, they're treated as a unit in terms of the API.
You don't have to include stdio_printf.h, stdio_fgets.h and so on - there's a single stdio.h for the standard I/O part of the C runtime library.
Header files are not mandatory.
#include simply copy/paste whatever file included (including .c source files)
Commonly used in real life projects are global header files like config.h and constants.h that contains commonly used information such as compile-time flags and project wide constants.
A good design of a library API would be to expose an official interface with one set of header files and use an internal set of header files for implementation with all the details. This adds a nice extra layer of abstraction to a C library without adding unnecessary bloat.
Use common sense. C/C++ is not really for the ones without it.
I used to follow the "it depends" trend until I realized that consistency, uniformity and simplicity are more important than saving the effort to create a file, and that "standards are good even when they are bad".
What I mean is the following: a .cpp/.h pair of files is pretty much what all "modules" end up anyway. Making the existing of both a requirement saves a lot of confusion and bad engineering.
For instance, when I see some interface of something in a header file, I know exactly where to search for / place its implementation. Conversely, if I need to expose the interface of something that was previously hidden in .cpp file (e.g. static function becoming global), I know exactly where to put it.
I've seen too many bad consequences of not following this simple rule. Unnecessary inline functions, breaking any kind of rules about encapsulation, (non)separation of the interface and implementation, misplaced code, to name a few -- all due to the fact that the appropriate sibling header or cpp file was never added.
So: always define both .h and .c files. Make it a standard, follow it, and safely rely on it. Life is much simpler this way, and simplicity is the most important thing in software.
Generally it's best to have a header file for each .c file, containing the declarations for functions etc in the .c file that you want to expose. That way, another .c file can include the .h file for the functions it needs, and won't need to be recompiled if a header file it didn't include got changed.
Generally there will be one .h file for each .c/.cpp file.
Bjarne Stroustrup Explains it beautifully in his book "The C++ Programming Language"....
The single header style of physical partitioning is most useful when the program is small and its parts are not intended for separate use. When namespaces are used, the logical structure of the program can still be explained in a single header file.
For larger Programs, the single header file approach is unworkable in a conventional file-based development environment. A change to the common header forces recompilation of the whole program, and updates of that single header by several programmers are error prone. Unless strong emphasis is placed on programming styles relying heavily on namespaces and classes, the logical structure deteriorates as program grows.
An alternative physical organization lets each logical module have its own header defining the facilities it provides. Each .c file then has a corresponding h. file specifying what it provides(its interface). Each .c module includes its own .h file and usually also other .h files that specifies what it needs from other modules in order to implement the services advertised in its interface. This physical organization corresponds to the logical organization of a module. The multiple header approach makes it easy to determine the dependencies. The single header approach forces us to look at every declarations used by any module and decide if its relevant. The simple fact is that maintenance of a code is invariably done with incomplete information and from a local perspective.
The better localization leads to less information to compile a module and thus faster compilation..
It depends. Usually your reason for having separate .c files will dictate whether you need separate .h files.
Generally cpp/c files are for implementation and h/hpp (hpp are not used often) files are for header files (prototypes and declarations only). Cpp files don't always have to have a header file associated with it but it usually does as the header file acts like a bridge between cpp files so each cpp file can use code from another cpp file.
One thing that should be strongly enforced is the no use of code within a header file! There's been too many times where header files break compiles in any size project because of redefinitions. And that's simply when you include the header file in 2 different cpp files. Header files should always be designed to be included multiple times as well. Cpp files should never be included.
It's all about what code needs to be aware of what other code. You want to reduce the amount other files are aware of to the bare minimum for them to do their jobs.
They need to know that a function exists, what types they need to pass into it, and what types it will return, but not what it's doing internally. Note that it's also important from the programmers point of view to know what those types actually mean. (e.g which int is the row, and which is the column) but the code itself doesn't care. This is why naming the function and parameters sensibly is worthwhile.
As others have said, if there's nothing in a cpp file worth exposing to other parts of the code, as is normally the case with main.c, then there's no need for a header file.
It's occasionally worth putting everything you want to expose in a single header file (e.g, Func1and2and3.h), so that anything that knows about Func1 knows about Func2 as well, but I'm personally not keen on this, as it means that you tend to load a hell of a lot of junk along with the stuff you actually want.
Summary:
Imagine that you trust that someone can write code and that their algorithms, design, etc. are all good. You want to use code they've written. All you need to know is what to give them to get something to happen, what you should give it to, and what you'll get back. That's what needs to go in the header files.
I like putting interfaces into header files and implementation in cpp files. I don't like writing C++ where I need to add member variables and prototypes to the header and then the method again in the C++. I prefer something like:
module.h
struct IModuleInterface : public IUnknown
{
virtual void SomeMethod () = 0;
}
module.cpp
class ModuleImpl : public IModuleInterface,
public CObject // a common object to do the reference
// counting stuff for IUnknown (so we
// can stick this object in a smart
// pointer).
{
ModuleImpl () : m_MemberVariable (0)
{
}
int m_MemberVariable;
void SomeInternalMethod ()
{
// some internal code that doesn't need to be in the interface
}
void SomeMethod ()
{
// implementation for the method in the interface
}
// whatever else we need
};
I find this is a really clean way of separating implementation and interface.
There is no better approach, only common and less common cases.
The more common case is when you have a class/function interface to declare/define. It's better to have only one .cpp/.c with the definitions, and one header for the declarations.
Giving them the same name makes easy to understand that they are directly related.
But that's not a "rule", that's the common way and the most efficient in almost all cases.
Now in some cases( like template classes or some tiny struct definition ) you'll not need any .c/.cpp file, just the header. We often have some virtual class interface definition in only a header file for example, with only virtual pure functions or trivial functions.
And in other rare cases (like an hypothetical main.c/.cpp file) if wouldn't be always required to allow code from external compilation unit to call the function of a given compilation unit. The main function is an example (no header/declaration needed), but there are others, mostly when it's code that "connect all the other parts together" and is not called by other parts of the application. That's very rare but in this case a header make no sense.
If your file exposes an interface - that is, if it has functions which will be called from other files - then it should have a header file. Otherwise, it shouldn't.
As already noted, generally, there will be one header (.h) file for each source (.c or .cpp) file.
However, you should look at the cohesiveness of the files. If the various source files provide separate, individually reusable sets of functions - an ideal organization - then you should certainly have one header per file. If, however, the three source files provide a composite set of functions (that is too big to fit into one file), then you would use a more complex organization. There would be one header for the external services used by the main program - and that would be used by other programs needing the same services. There would also be a second header used by the cooperating source files that provides 'internal' definitions shared by those files.
(Also noted by Pax): The main program does not normally need its own header - no other source code should be using the services it provides; it uses the services provided by other files.
If you want your compiled code to be used from another compilation unit you will need the header files. There are some situations for which you do now need/want to have a headers.
The first such case are main.c/cpp files. This class is not meant to be included and as such there is no need for a header file.
In some cases you can have a header file that defines behavior of a set of different implementations that are loaded through a dll that is loaded at runtime. There will be different set of .c/.cpp files that implement variations of the same header. This can be common in plugin systems.
In general, I don't think there is any explicit relationship between .h and .c files. In many cases (probably most), a unit of code is a library of functionality with a public interface (.h) and an opaque implementation (.c). Sometimes a number of symbols are needed, like enums or macros, and you get a .h with no corresponding .c and in a few circumstances, you will have a lump of code with no public interface and no corresponding .h
in particular, there are a number of times when, for the sake of readability, the headers or implementations (seldom both) are so big and hairy that they end up being broken into many smaller files, for the sake of the programmer's sanity.