Why is it that during the preprocessing step, the #includes in a main file are only replaced with the contents of the relevant header files (and not the function definitions as well (.cpp files))?
I would think that during this step it should first go into the header files and replace the #includes there with the contents of their associated .cpp files and then and only then go back to replace the #includes in the main file with everything, thus negating the need for any linking (one giant file with everything). Why does it not happen this way?
Why is it that during the preprocessing step, the #includes in a main file are only replaced with the contents of the relevant header files (and not the function definitions as well (.cpp files))?
Simply put, the header files are the only files you've told the preprocessor about. It can't assume the names of the source files, because there could be many source files for any given header. You may be thinking "Hey, why don't I just include the source files?" and I'm here to tell you No! Bad! Besides, who's to say that you have access to the source files in the first place?
The only way for the compiler to know about and compile all of your source files is for you to pass the compiler each source file, have it compile them into objects, and link together those objects into a library or executable.
There's a great difference between compiling and linking:
Pre-processor, Pre-compile-time and Compile-time:
The pre-processor check for # symbol and replaces it with the relevant content eg:
#include <iostream> // the content will be replaced here and this line will be removed
So the content of iostream will be added above.
eg2:
#define PI 3.14 // wherever PI is used in your source file the macro will be expanded replacing each PI with the constant value 3.14
The compiler only checks for syntax errors,functions prototypes... and doesn't care about the body of functions, resulting in an.obj file`.
Link-time:
The linker links these obj files with the relevant libraries, and in this very time functions called must have a definition; without a definition will issue in a link-time error.
Why does it not happen this way?
History and the expectations of experienced programmers, and their experience with bigger programs (see last statements, below)
I would think that during this step it should first go into the header
files and replace the #includes there with the contents of their
associated .cpp files ...
If you accept a job coding in C++, your company will provide to you a coding standard detailing guide lines or rules which you will want to follow, or face defending your choices when you deviate from them.
You might take some time now to look at available coding standards. For example, try studying the Google C++ Style Guide (I don't particularly like or dislike this one, its just easy to remember). A simple google search can also find several coding standards. Adding a 'why conform to coding standard?' to your search might provide some info.
negating the need for any linking (one giant file with everything).
Note: this approach can not eliminate linking with compiler tools or 3rd party provided libraries. I often use -lrt, and -pthread, and some times -lncurses, -lgmp, -lgmpxx, etc.
For now, as an experiment, you can manually achieve the giant file approach (which I often do for my smaller trial and development of private tools).
Consider:
if main.cc has:
#include "./Foo.hh" // << note .hh file
int main(int argc, char* argv[])
{
Foo foo(argc, argv);
foo.show();
...
and Foo.cc has
#include "./Foo.hh" // << note .hh file
// Foo implementation
This is the common pattern (no, not the pattern book pattern), and will require you link together Foo.o and main, which is trivial enough for small builds, but still something more to do.
The 'small'-ness allows you to use #include to create your 'one giant file with everything' easily:
change main to
#include "./Foo.cc" // << note .cc also pulls in .hh
// (prepare for blow back on this idea)
int main(int argc, char* argv[])
{
Foo foo(argc, argv);
foo.show();
...
The compiler sees all the code in one compilation unit. No linking of local .o's needed (but still library linking).
Note, I do not recommend this. Why?
Probably the primary reason is that many of my tools have 100's of objects (ie. 100's of .cc files). That single 'giant' file can be quite giant.
For most development churn (i.e. early bug fixes), ONLY one or TWO of the .cc files gets changes. Recompiling all of the source code can be a big waste of your time, and your compiler's time.
The alternative is what the experience developers have already learned:
A) Compiling the much smaller number of .cc's that have changed (perhaps one or two?),
B) then linking them with the 100's of other .o's that have not changed is much quicker build.
A major key to your productivity is to minimize your edit-compile-debug duration. A and B and a good editor are important to this development iteration.
Related
Ok, so I don't have a problem, but a question:
When using c++, you can transfer class to another file and include it without creating header, like this:
foo.cpp :
#include <iostream>
using namespace std;
class foo
{
public:
string str;
foo(string inStr)
{
str = inStr;
}
void print()
{
cout<<str<<endl;
}
};
main.cpp :
#include "foo.cpp"
using namespace std;
int main()
{
foo Foo("That's a string");
Foo.print();
return 0;
}
So the question is: is this method any worse than using header files? It's much easier and much more clean, but is it any slower, any more bug-inducing etc?
I've searched for this topic for a long time now but I haven't seen a single topic on the internet considering this even an option...
So the question is: is this method any worse than using header files?
You might consider reviewing the central idea of what the "C++ translation unit" is.
In your example, what the preprocessor does is as if it inserts a copy of foo.cpp into an internal copy of main.cpp. The preprocessor does this, not the compiler.
So ... the compiler never sees your code when they were separate files. It is this single, concatenated, 'translation unit' that is submitted to the compiler. There is no magic in .hh nor .cc, except that they fulfill your peer's (or boss's) expectations.
Now think about your question ... the translation unit is neither of your source files, nor any of your system include files, but it is one stream of text, one thing, put together by the preprocessor. So how would it be better or worse?
It's much easier and much more clean,
It can be. I often take this 'different' approach in my 'private' coding efforts.
When I did a quick eval of using gmpxx.h (mpz_class) in factorial, I did indeed take just these kinds of shortcuts, and did not need a .hpp file to properly create my compilation unit. FYI - The factorial of 12345, is more than 45,000 bytes. It is pointless to read the chars, too.
A 'more formal' effort (job, cooperation, etc), I always use header's, and separate compilation, and the building a library of functions useful to the app as part of how things should be done. Especially if I might share this code or contribute to a companies archives. There are too many good reasons for me to describe why I recommend you learn these issues.
but is it any slower, any more bug-inducing etc?
I think not. I think not. There is one compilation unit, and concatenating the parts has to be right, but I think is no more difficult.
I've searched for this topic for a long time now but I haven't seen a single
topic on the internet considering this even an option...
I'm not sure I've ever seen it discussed either. I have acquired the information. The separate compilations and library development are generally perceived to save development time. (Time is money, right?)
Also, a library, and header files, are how you package your success for others to use, how you can improve your value to a team.
There's no semantic difference between naming your files .cpp or .hpp (or .c / .h).
People will be surprised by the #include "foo.cpp", the compiler doesn't care
You've still created a "header file", but you've given it the ".cpp" extension. File extensions are for the programmer, the compiler doesn't care.
From the compiler's point of view, there is no difference between your example and
foo.h :
#include <iostream>
using namespace std;
class foo
{
//...
};
main.cpp :
#include "foo.h"
using namespace std;
int main()
{
// ...
}
A "header file" is just a file that you include at the beginning i.e. the head of another file (technically, headers don't need to be at the beginning and sometimes are not but typically they are, hence the name).
You've simply created a header file named foo.cpp.
Naming header files with extension that is conventionally used for source files is not a good idea. Some IDE's and other tools may erroneously assume that your header is a source file, and therefore attempt to compile as if it were such, wasting resources if nothing else.
Not to mention the confusion it may cause in your colleagues. Source files may have definitions that the C++ standard allows to be defined exactly once (see one definition rule, odr) because source files are not included in other files. If you name your header as if it were a source file, someone might assume that they can have odr definitions there when they can't.
If you ever build some larger project, the two main differences will become clear to you:
If you deliver your code as a library to others, you have to give them all your code - all your IP - instead of only the headers of the exposed classes plus a compiled library.
If you change one letter in any file, you will need to recompile everything. Once compile times for a larger project hits minutes, you will lose a lot of productivity.
Otherwise, of course it works, and the result is the same.
I have seen many explanations on when to use forward declarations over including header files, but few of them go into why it is important to do so. Some of the reasons I have seen include the following:
compilation speed
reducing complexity of header file management
removing cyclic dependencies
Coming from a .net background I find header management frustrating. I have this feeling I need to master forward declarations, but I have been scrapping by on includes so far.
Why cannot the compiler work for me and figure out my dependencies using one mechanism (includes)?
How do forward declarations speed up compilations since at some point the object referenced will need to be compiled?
I can buy the argument for reduced complexity, but what would a practical example of this be?
"to master forward declarations" is not a requirement, it's a useful guideline where possible.
When a header is included, and it pulls in more headers, and yet more, the compiler has to do a lot of work processing a single translation module.
You can see how much, for example, with gcc -E:
A single #include <iostream> gives my g++ 4.5.2 additional 18,560 lines of code to process.
A #include <boost/asio.hpp> adds another 74,906 lines.
A #include <boost/spirit/include/qi.hpp> adds 154,024 lines, that's over 5 MB of code.
This adds up, especially if carelessly included in some file that's included in every file of your project.
Sometimes going over old code and pruning unnecessary includes improves the compilation dramatically just because of that. Replacing includes with forward declarations in the translation modules where only references or pointers to some class are used, improves this even further.
Why cannot the compiler work for me and figure out my dependencies using one mechanism (includes)?
It cannot because, unlike some other languages, C++ has an ambiguous grammar:
int f(X);
Is it a function declaration or a variable definition? To answer this question the compiler must know what does X mean, so X must be declared before that line.
Because when you're doing something like this :
bar.h :
class Bar {
int foo(Foo &);
}
Then the compiler does not need to know how the Foo struct / class is defined ; so importing the header that defines Foo is useless. Moreover, importing the header that defines Foo might also need importing the header that defines some other class that Foo uses ; and this might mean importing the header that defines some other class, etc.... turtles all the way.
In the end, the file that the compiler is working against is almost like the result of copy pasting all the headers ; so it will get big for no good reason, and when someone makes a typo in a header file that you don't need (or import , or something like that), then compiling your class starts to take waaay too much time (or fail for no obvious reason).
So it's a good thing to give as little info as needed to the compiler.
How do forward declarations speed up compilations since at some point the object referenced will need to be compiled?
1) reduced disk i/o (fewer files to open, fewer times)
2) reduced memory/cpu usage
most translations need only a name. if you use/allocate the object, you'll need its declaration.
this is probably where it will click for you: each file you compile compiles what is visible in its translation.
a poorly maintained system will end up including a ton of stuff it does not need - then this gets compiled for every file it sees. by using forwards where possible, you can bypass that, and significantly reduce the number of times a public interface (and all of its included dependencies) must be compiled.
that is to say: the content of the header won't be compiled once. it will be compiled over and over. everything in this translation must be parsed, checked that it's a valid program, checked for warnings, optimized, etc. many, many times.
including lazily only adds significant disk/cpu/memory increase, which turns into intolerable build times for you, while introducing significant dependencies (in non-trivial projects).
I can buy the argument for reduced complexity, but what would a practical example of this be?
unnecessary includes introduce dependencies as side effects. when you edit an include (necessary or not), then every file which includes it must be recompiled (not trivial when hundreds of thousands of files must be unnecessarily opened and compiled).
Lakos wrote a good book which covers this in detail:
http://www.amazon.com/Large-Scale-Software-Design-John-Lakos/dp/0201633620/ref=sr_1_1?ie=UTF8&s=books&qid=1304529571&sr=8-1
Header file inclusion rules specified in this article will help reduce the effort in managing header files.
I used forward declarations simply to reduce the amount of navigation between source files done. e.g. if module X calls some glue or interface function F in module Y, then using a forward declaration means the writing the function and the call can be done by only visiting 2 places, X.c and Y.c not so much of an issue when a good IDE helps you navigate, but I tend to prefer coding bottom-up creating working code then figuring out how to wrap it rather than through top down interface specification.. as the interfaces themselves evolve it's handy to not have to write them out in full.
In C (or c++ minus classes) it's possible to truly keep structure details Private by only defining them in the source files that use them, and only exposing forward declarations to the outside world - a level of black boxing that requires performance-destroying virtuals in the c++/classes way of doing things. It's also possible to avoid needing to prototype things (visiting the header) by listing 'bottom-up' within the source files (good old static keyword).
The pain of managing headers can sometimes expose how modular your program is or isn't - if its' truly modular, the number of headers you have to visit and the amount of code & datastructures declared within them should be minimized.
Working on a big project with 'everything included everywhere' through precompiled headers won't encourage this real modularity.
module dependancies can correlate with data-flow relating to performance issues, i.e. both i-cache & d-cache issues. If a program involves many modules that call each other & modify data at many random places, it's likely to have poor cache-coherency - the process of optimizing such a program will often involve breaking up passes and adding intermediate data.. often playing havoc with many'class diagrams'/'frameworks' (or at least requiring the creation of many intermediates datastructures). Heavy template use often means complex pointer-chasing cache-destroying data structures. In its optimized state, dependancies & pointer chasing will be reduced.
I believe forward declarations speed up compilation because the header file is ONLY included where it is actually used. This reduces the need to open and close the file once. You are correct that at some point the object referenced will need to be compiled, but if I am only using a pointer to that object in my other .h file, why actually include it? If I tell the compiler I am using a pointer to a class, that's all it needs (as long as I am not calling any methods on that class.)
This is not the end of it. Those .h files include other .h files... So, for a large project, opening, reading, and closing, all the .h files which are included repetitively can become a significant overhead. Even with #IF checks, you still have to open and close them a lot.
We practice this at my source of employment. My boss explained this in a similar way, but I'm sure his explanation was more clear.
How do forward declarations speed up compilations since at some point the object referenced will need to be compiled?
Because include is a preprocessor thing, which means it is done via brute force when parsing the file. Your object will be compiled once (compiler) then linked (linker) as appropriate later.
In C/C++, when you compile, you've got to remember there is a whole chain of tools involved (preprocessor, compiler, linker plus build management tools like make or Visual Studio, etc...)
Good and evil. The battle continues, but now on the battle field of header files. Header files are a necessity and a feature of the language, but they can create a lot of unnecessary overhead if used in a non optimal way, e.g. not using forward declarations etc.
How do forward declarations speed up
compilations since at some point the
object referenced will need to be
compiled?
I can buy the argument for reduced
complexity, but what would a practical
example of this be?
Forward declarations are bad ass. My experience is that a lot of c++ programmers are not aware of the fact that you don't have to include any header file, unless you actually want to use some type, e.g. you need to have the type defined so the compiler understands what you want to do. It's important to try and refrain from including header files in other header files.
Just passing around a pointer from one function to another, only requires a forward declaration:
// someFile.h
class CSomeClass;
void SomeFunctionUsingSomeClass(CSomeClass* foo);
Including someFile.h does not require you to include the header file of CSomeClass, since you are merely passing a pointer to it, not using the class. This means that the compiler only needs to parse one line (class CSomeClass;) instead of an entire header file (that might be chained to other header files etc etc).
This reduces both compile time and link time, and we are talking big optimizations here if you have many headers and many classes.
I've got a C/C++ question, can I reuse functions across different object files or projects without writing the function headers twice? (one for defining the function and one for declaring it)
I don't know much about C/C++, Delphi and D. I assume that in Delphi or D, you would just write once what arguments a function takes and then you can use the function across diferent projects.
And in C you need the function declaration in header files *again??, right?. Is there a good tool that will create header files from C sources? I've got one, but it's not preprocessor-aware and not very strict. And I've had some macro technique that worked rather bad.
I'm looking for ways to program in C/C++ like described here http://www.digitalmars.com/d/1.0/pretod.html
Imho, generating the headers from the source is a bad idea and is unpractical.
Headers can contain more information that just function names and parameters.
Here are some examples:
a C++ header can define an abstract class for which a source file may be unneeded
A template can only be defined in a header file
Default parameters are only specified in the class definition (thus in the header file)
You usually write your header, then write the implementation in a corresponding source file.
I think doing the other way around is counter-intuitive and doesn't fit with the spirit of C or C++.
The only exception is can see to that is the static functions. A static function only appears in its source file (.cor .cpp) and can't (obviously) be used elsewhere.
While I agree it is often annoying to copy the header definition of a method/function to the source file, you can probably configure your code editor to ease this. I use Vim and a quick script helped me with this a lot. I guess a similar solution exists for most other editors.
Anyway, while this can seem annoying, keep in mind it also gives a greater flexibility. You can distribute your header files (.h, .hpp or whatever) and then transparently change the implementation in source files afterward.
Also, just to mention it, there is no such thing as C/C++: there is C and there is C++; those are different languages (which indeed share much, but still).
It seems to me that you don't really need/want to auto-generate headers from source; you want to be able to write a single file and have a tool that can intelligently split that into a header file and a source file.
Unfortunately, I'm not aware of any such tool. It's certainly possible to write one - but you'd need a given a C++ front end. You could try writing something using clang - but it would be a significant amount of work.
Considering you have declared some functions and wrote their implementation you will have a .c/cpp file and a header .h file.
What you must do in order to use those functions:
Create a library (DLL/so or static library .a/.lib - for now I recommend static library for the ease of use) from the files were the implementation resides
Use the header file (#include it) (you don't need to rewrite the header file again) in your programs to obtain the function definitions and link with your library from step 1.
Though >this< is an example for Visual Studio it makes perfect sense for other development environments also.
This seems like a rudimentary question, so assuming I have not mis-read,
Here is a basic example of re-use, to answer your first question:
#include "stdio.h"
int main( int c, char ** argv ){
puts( "Hello world" );
}
Explanation:
1. stdio.h is a C header file containing (among others) the definition of a function called puts().
2. in main, puts() is called, from the included definition.
Some compilers (including gcc I think ) have an option to generate headers.
There is always very much confusion about headers and source-files in C++. The links I provided should help to clear that up a little.
If you are in the situation that you want to extract headers from source-file, then you probably went about it the wrong way. Usually you first declare your function in a header-file, and then provide an implementation (definition) for it in a source-file. If your function is actually a method of a class, you can also provide the definition in header file.
Technically, a header file is just a bunch of text that is actually inserted into the source file by the preprocessor:
#include <vector>
tells the preprocessor to insert contents of the file vector at the exact place where the #include appears. This really just text-replacement. So, header-files are not some kind of special language construct. They contain normal code. But by putting that code into a separate file, you can easily include it in other files using the preprocessor.
I think it's a good question which is what led me to ask this: Visual studio: automatically update C++ cpp/header file when the other is changed?
There are some refactoring tools mentioned but unfortunately I don't think there's a perfect solution; you simply have to write your function signatures twice. The exception is when you are writing your implementations inline, but there are reasons why you can't or shouldn't always do this.
You might be interested in Lazy C++. However, you should do a few projects the old-fashioned way (with separate header and source files) before attempting to use this tool. I considered using it myself, but then figured I would always be accidentally editing the generated files instead of the lzz file.
You could just put all the definitions in the header file...
This goes against common practice, but is not unheard of.
I have a static library that I am building in C++. I have separated it into many header and source files. I am wondering if it's better to include all of the headers that a client of the library might need in one header file that they in turn can include in their source code or just have them include only the headers they need? Will that cause the code to be unecessary bloated? I wasn't sure if the classes or functions that don't get used will still be compiled into their products.
Thanks for any help.
Keep in mind that each source file that you compile involves an independent invocation of the compiler. With each invocation, the compiler has to read in every included header file, parse through it, and build up a symbol table.
When you use one of these "include the world" header files in lots of your source files, it can significantly impact your build time.
There are ways to mitigate this; for example, Microsoft has a precompiled header feature that essentially saves out the symbol table for subsequent compiles to use.
There is another consideration though. If I'm going to use your WhizzoString class, I shouldn't have to have headers installed for SOAP, OpenGL, and what have you. In fact, I'd rather that WhizzoString.h only include headers for the types and symbols that are part of the public interface (i.e., the stuff that I'm going to need as a user of your class).
As much as possible, you should try to shift includes from WhizzoString.h to WhizzoString.cpp:
OK:
// Only include the stuff needed for this class
#include "foo.h" // Foo class
#include "bar.h" // Bar class
public class WhizzoString
{
private Foo m_Foo;
private Bar * m_pBar;
.
.
.
}
BETTER:
// Only include the stuff needed by the users of this class
#include "foo.h" // Foo class
class Bar; // Forward declaration
public class WhizzoString
{
private Foo m_Foo;
private Bar * m_pBar;
.
.
.
}
If users of your class never have to create or use a Bar type, and the class doesn't contain any instances of Bar, then it may be sufficient to provide only a forward declaration of Bar in the header file (WhizzoString.cpp will have #include "bar.h"). This means that anyone including WhizzoString.h could avoid including Bar.h and everything that it includes.
In general, when linking the final executable, only the symbols and functions that are actually used by the program will be incorporated. You pay only for what you use. At least that's how the GCC toolchain appears to work for me. I can't speak for all toolchains.
If the client will always have to include the same set of header files, then it's okay to provide a "convience" header file that includes others. This is common practice in open-source libraries. If you decide to provide a convenience header, make it so that the client can also choose to include specifically what is needed.
To reduce compile times in large projects, it's common practice to include the least amount of headers as possible to make a unit compile.
what about giving both choices:
#include <library.hpp> // include everything
#include <library/module.hpp> // only single module
this way you do not have one huge include file, and for your separate files, they are stacked neatly in one directory
It depends on the library, and how you've structured it. Remember that header files for a library, and which pieces are in which header file, are essentially part of the API of the library. So, if you lead your clients to carefully pick and choose among your headers, then you will need to support that layout for a long time. It is fairly common for libraries to export their whole interface via one file, or just a few files, if some part of the API is truly optional and large.
A consideration should be compilation time: If the client has to include two dozen files to use your library, and those includes have internal includes, it can significantly increase compilation time in a big project, if used often. If you go this route, be sure all your includes have proper include guards around not only the file contents, but the including line as well. Though note: Modern GCC does a very good job of this particular issue and only requires the guards around the header's contents.
As to bloating the final compiled program, it depends on your tool chain, and how you compiled the library, not how the client of the library included header files. (With the caveat that if you declare static data objects in the headers, some systems will end up linking in the objects that define that data, even if the client doesn't use it.)
In summary, unless it is a very big library, or a very old and cranky tool chain, I'd tend to go with the single include. To me, freezing your current implementation's division into headers into the library's API is bigger worry than the others.
The problem with single file headers is explained in detail by Dr. Dobbs, an expert compiler writer. NEVER USE A SINGLE FILE HEADER!!! Each time a header is included in a .cc/.cpp file it has to be recompiled because you can feed the file macros to alter the compiled header. For this reason, a single header file will dramatically increase compile time without providing any benifit. With C++ you should optimize for human time first, and compile time is human time. You should never, because it dramatically increases compile time, include more than you need to compile in any header, each translation unit(TU) should have it's own implementation (.cc/.cpp) file, and each TU named with unique filenames;.
In my decade of C++ SDK development experience, I religiously ALWAYS have three files in EVERY module. I have a config.h that gets included into almost every header file that contains prereqs for the entire module such as platform-config and stdint.h stuff. I also have a global.h file that includes all of the header files in the module; this one is mostly for debugging (hint enumerate your seams in the global.h file for better tested and easier to debug code). The key missing piece here is that ou should really have a public.h file that includes ONLY your public API.
In libraries that are poorly programmed, such as boost and their hideous lower_snake_case class names, they use this half-baked worst practice of using a detail (sometimes named 'impl') folder design pattern to "conceal" their private interface. There is a long background behind why this is a worst practice, but the short story is that it creates an INSANE amount of redundant typing that turns one-liners into multi-liners, and it's not UML compliant and it messes up the UML dependency diagram resulting in overly complicated code and inconsistent design patterns such as children actually being parents and vice versa. You don't want or need a detail folder, you need to use a public.h header with a bunch of sibling modules WITHOUT ADDITIONAL NAMESPACES where your detail is a sibling and not a child that is in reatliy a parent. Namespaces are supposed to be for one thing and one thing only: to interface your code with other people's code, but if it's your code you control it and you should use unique class and funciton names because it's bad practice to use a namesapce when you don't need to because it may cause hash table collision that slow downt he compilation process. UML is the best pratice, so if you can organize your headers so they are UML compliant then your code is by definition more robust and portable. A public.h file is all you need to expose only the public API; thanks.
I love to organize my code, so ideally I want one class per file or, when I have non-member functions, one function per file.
The reasons are:
When I read the code I will always
know in what file I should find a
certain function or class.
If it's one class or one non-member
function per header file, then I won't
include a whole mess when I
include a header file.
If I make a small change in a function then only that function will have to be recompiled.
However, splitting everything up into many header and many implementation files can considerately slow down compilation. In my project, most functions access a certain number of templated other library functions. So that code will be compiled over and over, once for each implementation file. Compiling my whole project currently takes 45 minutes or so on one machine. There are about 50 object files, and each one uses the same expensive-to-compile headers.
Maybe, is it acceptable to have one class (or non-member function) per header file, but putting the implementations of many or all of these functions into one implementation file, like in the following example?
// foo.h
void foo(int n);
// bar.h
void bar(double d);
// foobar.cpp
#include <vector>
void foo(int n) { std::vector<int> v; ... }
void bar(double d) { std::vector<int> w; ... }
Again, the advantage would be that I can include just the foo function or just the bar function, and compilation of the whole project will be faster because foobar.cpp is one file, so the std::vector<int> (which is just an example here for some other expensive-to-compile templated construction) has to be compiled in only once, as opposed to twice if I compiled a foo.cpp and bar.cpp separately. Of course, my reason (3) above is not valid for this scenario: After just changing foo(){...} I have to recompile the whole, potentially big, file foobar.cpp.
I'm curious what your opinions are!
IMHO, you should combine items into logical groupings and create your files based on that.
When I'm writing functions, there are often a half a dozen or so that are tightly related to each other. I tend to put them together in a single header and implementation file.
When I write classes, I usually limit myself to one heavyweight class per header and implementation file. I might add in some convenience functions or tiny helper classes.
If I find that an implementation file is thousands of lines long, that's usually a sign that there's too much there and I need to break it up.
One function per file could get messy in my opinion. Imagine if POSIX and ANSI C headers were made the same way.
#include <strlen.h>
#include <strcpy.h>
#include <strncpy.h>
#include <strchr.h>
#include <strstr.h>
#include <malloc.h>
#include <calloc.h>
#include <free.h>
#include <printf.h>
#include <fprintf.h>
#include <vpritnf.h>
#include <snprintf.h>
One class per file is a good idea though.
We use the principle of one external function per file. However, within this file there may be several other "helper" functions in unnamed namespaces that are used to implement that function.
In our experience, contrary to some other comments, this has had two main benefits. The first is build times are faster as modules only need to be rebuilt when their specific APIs are modified. The second advantage is that by using a common naming scheme, it is never necessary to spend time searching for the header that contains the function you wish to call:
// getShapeColor.h
Color getShapeColor(Shape);
// getTextColor.h
Color getTextColor(Text);
I disagree that the standard library is a good example for not using one (external) function per file. Standard libraries never change and have well defined interfaces and so neither of the points above apply to them.
That being said, even in the case of the standard library there are some potential benefits in splitting out the individual functions. The first is that compilers could generate a helpful warning when unsafe versions of functions are used, e.g. strcpy vs. strncpy, in a similar way to how g++ used to warn for inclusion of <iostream.h> vs. <iostream>.
Another advantage is that I would no longer be caught out by including memory when I want to use memmove!
One function per file has a technical advantage if you're making a static library (which I guess it's one of the reasons why projects like the Musl-libc project follow this pattern).
Static libraries are linked with object-file granularity and so if you have a static library libfoobar.a composed of*:
foo.o
foo1
foo2
bar.o
bar
then if you link the lib for the bar function, the bar.o archive member will get linked but not the foo.o member. If you link for foo1, then the foo.o member will get linked, bringing in the possibly unnecessary foo2 function.
There are possibly other ways of preventing unneeded functions from being linked in (-ffunction-sections -fdata-sections and --gc-sections) but one function per file is probably most reliable.
There's also the middle ground of putting small number of related functions/data-objects in a file. That way the compiler can better optimize intersymbol references compared to -ffunction-sections/-fdata-sections and you still get at least some granularity for static libs.
I'm ignoring C++ name mangling here for the sake of simplicity
I can see some advantages to your approach, but there are several disadvantages:
Including a package is nightmare. You can end up with 10-20 includes to get the functions you need. For example, imagine if STDIO or StdLib was implemented this way.
Browsing the code will be a bit of pain, since in general it is easier to scroll through a file than to switch files. Obviously too big of file is hard, but even there with modern IDEs it is pretty easy to collapse the file down to what you need and a lot of them have function short cut lists.
Make file maintenance is a pain.
I am a huge fan of small functions and refactoring. When you add overhead (making a new file, adding it to source control,...) it encourages people to write longer functions where instead of breaking one function into three parts, you just make one big one.
You can redeclare some of your functions as being static methods of one or more classes: this gives you an opportunity (and a good excuse) to group several of them into a single source file.
One good reason for having or not having several functions in one source files, if that source file are one-to-one with object files, and the linker links entire object files: if an executable might want one function but not another, then put them in separate source files (so that the linker can link one without the other).
An old programming professor of mine suggested breaking up modules every several hundred lines of code for maintainability. I don't develop in C++ anymore, but in C# I restrict myself to one class per file, and size of the file doesn't matter as long as there's nothing unrelated to my object. You can make use of #pragma regions to gracefully reduce editor space, not sure if the C++ compiler has them, but if it does then definitely make use of them.
If I were still programming in C++ I would group functions by usage using multiple functions per file. So I may have a file called 'Service.cpp' with a few functions that define that "service". Having one function per file will in turn cause regret to find its way back into your project somehow, someway.
Having several thousand lines of code per file isn't necessary some of the time though. Functions themselves should never be much more than a few hundred lines of code at most. Always remember that a function should only do one thing and be kept minimal. If a function does more than one thing, it should be refactored into helper methods.
It never hurts to have multiple source files that define a single entity either. Ie: 'ServiceConnection.cpp' 'ServiceSettings.cpp', and so on so forth.
Sometimes if I make a single object, and it owns other objects I will combine multiple classes into a single file. For example a button control that contains 'ButtonLink' objects, I might combine that into the Button class. Sometimes I don't, but that's a "preference of the moment" decision.
Do what works best for you. Experiment a little with different styles on smaller projects can help. Hope this helps you out a bit.
I also tried to split files in a function per file, but it had some drawbacks. Sometimes functions tend to get larger than they need to (you don't want to add a new .c file every time) unless you are diligent about refactoring your code (I am not).
Currently I put one to three functions in a .c file and group all the .c files for a functionality in a directory. For header files I have Funcionality.h and Subfunctionality.h so that I can include all the functions at once when needed or just a small utility function if the whole package is not needed.
For the header part, you should combine items into logical groupings and create your header files based on that. This seems and is very logical IMHO.
For the source part, you should put each function implementation in a separate source file (static functions are exceptions in this case). This may not seem logical at first, but remember, a compiler knows about the functions, but a linker knows only about the .o and .obj files and its exported symbols. This may change the size of the output file considerably, and this is a very important issue for embedded systems.
Checkout glibc or Visual C++ CRT source tree...