Suppose you have the following definition of a C++ class:
class A {
// Methods
#ifdef X
// Hidden methods in some translation units
#endif
};
Is this a violation of One Definition Rule for the class? What are the associated hazards?
I suspect if member function pointers or virtual functions are used this will most likely break. Otherwise is it safe to use?
I am considering it in the context of Objective C++. The header file is included in both pure C++ and Objective C++ translation units. My idea is to guard methods with Objective-C types with OBJC macro. Otherwise, I have to use void pointer for all Objective-C types in the header, but this way I am losing strong typing and also ugly static casts must be added all over the code.
Yes, it MAY allow a hazard of ODR breach if separate compilation units are allowed to have different state of macro definition X. X should be defined (or not defined) globally across program (and shared objects) before every inclusion of that class definition for program to meet requirement to be compliant. As far as C++ compiler (not preprocessor) concerned, those are two different, incompatible, unrelated class types.
Imagine situation where in compilation unit A.cpp X was defined before class A and in unit B.cpp X was not defined. You would not get any compiler errors if nothing within B.cpp uses those members which were "removed". Both units may be considered well-formed on their own. Now if B.cpp would contain a new expression, it would create an object of incompatible type, smaller than one defined in A.cpp. But any method from class A , including constructor, may cause an UB by accessing memory outside of object's storage when called with object created in B.cpp, because they use the larger definition.
There is a variation of this folly, an inclusion of header file's copy into two or more different folders of build tree with same name of file and POD struct type, one of those folders accessible by #include <filename>. Units with #include "filename" designed to use alternatives. But they wouldn't. Because order of header file lookup in this case is platform-defined, programmer isn't entirely in control of which header would be included in which unit on every platform with #include "filename". As soon as one definition would be changed, even just by re-ordering members, ODR was breached.
To be particularly safe, such things should be done only in compiler domain by using templates, PIMPL, etc. For inter-language communication some middle ground should be arranged, using wrappers or adapters, C++ and ObjectiveC++ may have incompatible memory layout of non-POD objects.
This blows up horribly. Do not do this. Example with gcc:
Header file:
// a.h
class Foo
{
public:
Foo() { ; }
#ifdef A
virtual void IsCalled();
#endif
virtual void NotCalled();
};
First C++ File:
// a1.cpp
#include <iostream>
#include "a.h"
void Foo::NotCalled()
{
std::cout << "This function is never called" << std::endl;
}
extern Foo* getFoo();
extern void IsCalled(Foo *f);
int main()
{
Foo* f = getFoo();
IsCalled(f);
}
Second C++ file:
// a2.cpp
#define A
#include "a.h"
#include <iostream>
void Foo::IsCalled(void)
{
std::cout << "We call this function, but ...?!" << std::endl;
}
void IsCalled(Foo *f)
{
f->IsCalled();
}
Foo* getFoo()
{
return new Foo();
}
Result:
This function is never called
Oops! The code called virtual function IsCalled and we dispatched to NotCalled because the two translation units disagreed on which entry was where in the class virtual function table.
What went wrong here? We violated ODR. So now two translations units disagree on what is supposed to be where in the virtual function table. So if we create a class in one translation unit and call a virtual function in it from another translation unit, we may call the wrong virtual function. Oopsie whoopsie!
Please do not deliberately do thigs that the relevant standards say are not allowed and will not work. You will never be able to think of every possible way it can go wrong. This kind of reasoning has caused many disasters over my decades of programming and I really wish people would stop deliberately and intentionally creating potential disasters.
Is it safe to use #ifdef guards on C++ class member functions?
In practice (look at the generated assembler code using GCC as g++ -O2 -fverbose-asm -S) what you propose to do is safe. In theory it should not be.
However, there is another practical approach (used in Qt and FLTK). Use some naming conventions in your "hidden" methods (e.g. document that all of them should have dontuse in their name like int dontuseme(void)), and write your GCC plugin to warn against them at compile time. Or just use some clever grep(1) in your build process (e.g. in your Makefile)
Alternatively, your GCC plugin may implement new #pragma-s or function attributes, and could warn against misuse of such functions.
Of course, you can also use (cleverly) private: and most importantly, generate C++ code (with a generator like SWIG) in your build procedure.
So practically speaking, your #ifdef guards may be useless. And I am not sure they make the C++ code more readable.
If performance matters (with GCC), use the -flto -O2 flags at both compile and link time.
See also GNU autoconf -which uses similar preprocessor based approaches.
Or use some other preprocessor or C++ code generator (GNU m4, GPP, your own one made with ANTLR or GNU bison) to generate some C++ code. Like Qt does with its moc.
So my opinion is that what you want to do is useless. Your unstated goals can be achieved in many other ways. For example, generating "random" looking C++ identifiers (or C identifiers, or ObjectiveC++ names, etc....) like _5yQcFbU0s (this is done in RefPerSys) - the accidental collision of names is then very improbable.
In a comment you state:
Otherwise, I have to use void* for all Objective-C types in the header, but this way I am losing strong typing
No, you can generate some inline C++ functions (that would use reinterpret_cast) to gain again that strong typing. Qt does so! FLTK or FOX or GTKmm also generate C++ code (since GUI code is easy to generate).
My idea was to guard methods with Objective-C types with OBJC macro
This make perfect sense if you generate some C++ or C or Objective C code with these macros.
I suspect if member function pointers or virtual functions are used this will most likely break.
In practice, it won't break if you generate random looking C++ identifiers. Or just if you document naming conventions (like GNU bison or ANTLR does) in generated C++ code (or in generated Objective C++, or in generated C, ... code)
Please notice that compilers like GCC use today (in 2021, internally) several C++ code generators. So generating C++ code is a common practice. In practice, the risks of name collisions are small if you take care of generating "random" identifiers (you could store them in some sqlite database at build time).
also ugly static casts must be added all over the code
These casts don't matter if the ugly code is generated.
As examples, RPCGEN and SWIG -or Bisoncpp- generate ugly C and C++ code which works very well (and perhaps also some proprietary ASN.1 or JSON or HTTP or SMTP or XML related in-house code generators).
The header file is included in both pure C++ and Objective C++ translation units.
An alternative approach is to generate two different header files...
one for C++, and another for Objective C++. The SWIG tool could be inspirational. Of course your (C or C++ or Objective C) code generators would emit random looking identifiers.... Like I do in both Bismon (generating random looking C names like moduleinit_9oXtCgAbkqv_4y1xhhF5Nhz_BM) and RefPerSys (generating random looking C++ names like rpsapply_61pgHb5KRq600RLnKD ...); in both systems accidental name collision is very improbable.
Of course, in principle, using #ifdef guards is not safe, as explained in this answer.
PS. A few years ago I did work on GCC MELT which generated millions of lines of C++ code for some old versions of the GCC compiler. Today -in 2021- you practically could use asmjit or libgccjit to generate machine code more directly. Partial evaluation is then a good conceptual framework.
Related
I've recently picked up C++ as part of my course, and I'm trying to understand in more depth the partnership between headers and classes. From every example or tutorial I've looked up on header files, they all use a class file with a constructor and then follow up with methods if they were included. However I'm wondering if it's fine just using header files to hold a group of related functions without the need to make an object of the class every time you want to use them.
//main file
#include <iostream>
#include "Example.h"
#include "Example2.h"
int main()
{
//Example 1
Example a; //I have to create an object of the class first
a.square(4); //Then I can call the function
//Example 2
square(4); //I can call the function without the need of a constructor
std::cin.get();
}
In the first example I create an object and then call the function, i use the two files 'Example.h' and 'Example.cpp'
//Example1 cpp
#include <iostream>
#include "Example.h"
void Example::square(int i)
{
i *= i;
std::cout << i << std::endl;
}
//Example1 header
class Example
{
public:
void square(int i);
};
In example2 I call the function directly from file 'Example2.h' below
//Example2 header
void square(int i)
{
i *= i;
std::cout << i;
}
Ultimately I guess what I'm asking is, if it's practical to use just the header file to hold a group of related functions without creating a related class file. And if the answer is no, what's the reason behind that. Either way I'm sure I've over looked something, but as ever I appreciate any kind of insight from you guys on this!
Of course, it's just fine to have only headers (as long as you consider the One Definition Rule as already mentioned).
You can as well write C++ sources without any header files.
Strictly speaking, headers are nothing else than filed pieces of source code which might be #included (i.e. pasted) into multiple C++ source files (i.e. translation units). Remembering this basic fact was sometimes quite helpful for me.
I made the following contrived counter-example:
main.cc:
#include <iostream>
// define float
float aFloat = 123.0;
// make it extern
extern float aFloat;
/* This should be include from a header
* but instead I prevent the pre-processor usage
* and simply do it by myself.
*/
extern void printADouble();
int main()
{
std::cout << "printADouble(): ";
printADouble();
std::cout << "\n"
"Surprised? :-)\n";
return 0;
}
printADouble.cc:
/* This should be include from a header
* but instead I prevent the pre-processor usage
* and simply do it by myself.
*
* This is intentionally of wrong type
* (to show how it can be done wrong).
*/
// use extern aFloat
extern double aFloat;
// make it extern
extern void printADouble();
void printADouble()
{
std::cout << aFloat;
}
Hopefully, you have noticed that I declared
extern float aFloat in main.cc
extern double aFloat in printADouble.cc
which is a disaster.
Problem when compiling main.cc? No. The translation unit is consistent syntactically and semantically (for the compiler).
Problem when compiling printADouble.cc? No. The translation unit is consistent syntactically and semantically (for the compiler).
Problem when linking this mess together? No. Linker can resolve every needed symbol.
Output:
printADouble(): 5.55042e-315
Surprised? :-)
as expected (assuming you expected as well as me nothing with sense).
Live Demo on wandbox
printADouble() accessed the defined float variable (4 bytes) as double variable (8 bytes). This is undefined behavior and goes wrong on multiple levels.
So, using headers doesn't support but enables (some kind of) modular programming in C++. (I didn't recognize the difference until I once had to use a C compiler which did not (yet) have a pre-processor. So, this above sketched issue hit me very hard but was really enlightening for me, also.)
IMHO, header files are a pragmatic replacement for an essential feature of modular programming (i.e. the explicit definion of interfaces and separation of interfaces and implementations as language feature). This seems to have annoyed other people as well. Have a look at A Few Words on C++ Modules to see what I mean.
C++ has a One Definition Rule (ODR). This rule states that functions and objects should be defined only once. Here's the problem: headers are often included more than once. Your square(int) function might therefore be defined twice.
The ODR is not an absolute rule. If you declare square as
//Example2 header
inline void square(int i)
// ^^^
{
i *= i;
std::cout << i;
}
then the compiler will inform the linker that there are multiple definitions possible. It's your job then to make sure all inline definitions are identical, so don't redefine square(int) elsewhere.
Templates and class definitions are exempt; they can appear in headers.
C++ is a multi paradigm programming language, it can be (at least):
procedural (driven by condition and loops)
functional (driven by recursion and specialization)
object oriented
declarative (providing compile-time arithmetic)
See a few more details in this quora answer.
Object oriented paradigm (classes) is only one of the many that you can leverage programming in C++.
You can mix them all, or just stick to one or a few, depending on what's the best approach for the problem you have to solve with your software.
So, to answer your question:
yes, you can group a bunch of (better if) inter-related functions in the same header file. This is more common in "old" C programming language, or more strictly procedural languages.
That said, as in MSalters' answer, just be conscious of the C++ One Definition Rule (ODR). Use inline keyword if you put the declaration of the function (body) and not only its definition (templates exempted).
See this SO answer for description of what "declaration" and "definition" are.
Additional note
To enforce the answer, and extend it to also other programming paradigms in C++,
in the latest few years there is a trend of putting a whole library (functions and/or classes) in a single header file.
This can be commonly and openly seen in open source projects, just go to github or gitlab and search for "header-only":
The common way is and always has been to put code in .cpp files (or whatever extension you like) and declarations in headers.
There is occasionally some merit to putting code in the header, this can allow more clever inlining by the compiler. But at the same time, it can destroy your compile times since all code has to be processed every time it is included by the compiler.
Finally, it is often annoying to have circular object relationships (sometimes desired) when all the code is the headers.
Some exception case is Templates. Many newer "modern" libraries such as boost make heavy use of templates and often are "header only." However, this should only be done when dealing with templates as it is the only way to do it when dealing with them.
Some downsides of writing header only code
If you search around, you will see quite a lot of people trying to find a way to reduce compile times when dealing with boost. For example: How to reduce compilation times with Boost Asio, which is seeing a 14s compile of a single 1K file with boost included. 14s may not seem to be "exploding", but it is certainly a lot longer than typical and can add up quite quickly. When dealing with a large project. Header only libraries do affect compile times in a quite measurable way. We just tolerate it because boost is so useful.
Additionally, there are many things which cannot be done in headers only (even boost has libraries you need to link to for certain parts such as threads, filesystem, etc). A Primary example is that you cannot have simple global objects in header only libs (unless you resort to the abomination that is a singleton) as you will run into multiple definition errors. NOTE: C++17's inline variables will make this particular example doable in the future.
To be more specific boost, Boost is library, not user level code. so it doesn't change that often. In user code, if you put everything in headers, every little change will cause you to have to recompile the entire project. That's a monumental waste of time (and is not the case for libraries that don't change from compile to compile). When you split things between header/source and better yet, use forward declarations to reduce includes, you can save hours of recompiling when added up across a day.
When I was first learning C++ I noticed that the functions are created from top to bottom unlike in languages such as Java where the order of the function "declarations" in the code does not matter.
C++ example:
#include <iostream>
using namespace std;
int main() {
test();
return 0;
}
void test() {
}
But when you swap the order of the functions the program works normally.
Was that intentional when C++ was designed?
Neither C or C++ mandates much about the order of function definitions compared to usage.
C allows a function to be declared before usage, but (in C89/90) didn't actually require it. If a call was made to a function that hadn't been declared, the compiler was required to make certain assumptions about the function's type (and the code was invalid if the definition of the function didn't fit those assumptions).
C++ does, however, mandates that free functions at least be declared before usage1. A function definition also declares that function, so tiny programs are often written with definitions preceding usage to avoid having to write declaration separate from their definitions.
For class members, C++ loosens the restrictions somewhat. For example, this is perfectly acceptable:
class Foo {
void bar() { baz(); }
void baz() {}
};
Java differs primarily in simply prohibiting all free functions, so it has only member functions, which follow roughly the same rules as C++ member functions.
Without this, it would be essentially impossible to support some C++ features such as function overloading.
The problem originates from a series of constrains:
value semantics requires that to compile a "call" you need to know parameters and return types sizes.
that knowledge must be available to the compiler during compilation of a same translation unit but...
knowledge coming from definitions is available only at linking time that is a separate step that happens aside of compilation.
To complicate more, C++ grammar change meaning depending if symbols are variable or types (so, without this knowledge, a<b>c it's even impossible to know what it means: an expression involving three variable or the declaration of a variable of type given by a template instance).
A function definition can reside in another source file and the compiler -when compiling the first one- has no way to access it and hence to know about how wide should be the space to leave on the stack for parameter and return passing.
Your sample -in a wide project- will be code as
//test.h
double test();
__
//test.cpp
#include "test.h" //required to check decl & def consistence
double test()
{ /*...*/ }
__
//main.cpp
#include "test.h" // required to know about test() parameters and return
int main()
{
double z = test(); //how does "=" translate? we need to know what test returns
}
This "project" will be compiled using independent steps:
g++ -c main.cpp //on a program developer machine
g++ -c test.cpp //on a library developer machine
g++ main.o test.o -o yourprogram //on a "package distributor" machine
None of these steps can collect a "global knowledge" about all "global symbols" and their "translation" at the same time. That's why we need headers to broadcast declarations to anyone have to use them
(Note how member functions don't have this problem, being all required to stay into a same class brackets, and as a consequence are granted to fit a same translation unit)
Languages like Java or python don't have this problem, since the modules (not matter how written and loaded) are all loaded-up by a same language-machine instance, that -being unique- can if fact collect all symbols and related types.
language like D (that are like C++ in the sense of "separate compilation") allow order independence between things residing in a same module, but requires module "import" for things coming from other modules, and -in fact- do a two step translation by first collecting symbols and types and then doing call translations.
You can see another description of this problem here
The following compiles, links and runs just fine (on Xcode 5.1 / clang):
#include <iostream>
class C { int foo(); };
int main(int argc, const char * argv[])
{
C c;
cout << "Hello world!";
}
However, C::foo() is not defined anywhere, only declared. I don't get any compiler or linker warnings / errors, apparently because C::foo() is never referenced anywhere.
Is there any way I can emit a warning that in the whole program no definition for C::foo() exists even though it is declared? An error would actually be better.
Thanks!
There are good reasons why it is not easily feasible. A set of header files could declare many functions, some of which are provided by additional libraries. You may want to #include such headers without using all of these functions (for instance, if you only want to use some #define-d constant).
Alternatively, it is legitimate to have some header and to implement (in your library) only a subset of the API defined by the header files.
And a C++ or C header file could also define the interface of code defined by potential plugins, for programs which usually run without plugins. Many programs accepting plugins are declaring the plugin interface in their header file.
If you really wanted to have such a check, you might perhaps consider customizing GCC with MELT; however, such a check is non trivial to implement currently (and you'll need link time optimization too).
Perhaps try calling all functions in your implementation map and adding a try catch that spits out some warning if they segfault.
You don't. Or rather, this is not the job of the compiler
I guess I'm just repeating what other said in the comments, but:
It's actually a feature (see unimplemented private constructor)
It wouldn't really really help, as the "bug" here is that no proper code cleanup was done prior to committing the code, and an unimplemented and unused function is really you smallest problem then. What about all the other stuff that hasn't been cleaned up? The likelihood of having an implemented and unused function seems just as high to me, and it's the same mess more or less.
Rather than worry about this specific case, I would check if this was just a one time glitch, or if your development team could improve some procedures that would prevent such things in the future.
As far as languages such as C++ are concerned, detecting and reporting undefined functions will defeat one of the important feature provided by it. Virtual/pure-virtual functions are one of the important mechanism by which C++ implements run-time polymorphism.
If you are developing a library to be used by its clients, you might have declared-but-not-defined virtual functions. You may leave the definition for its clients. In such cases, if the compiler were to report such undefined functions, it won't be beneficial.
When reading through some answers to this question, I started wondering why the compiler actually does need to know about a function when it first encounters it. Wouldn't it be simple to just add an extra pass when parsing a compilation unit that collects all symbols declared within, so that the order in which they are declared and used does not matter anymore?
One could argue, that declaring functions before they are used certainly is good style, but I am wondering, is there are any other reason why this is mandatory in C++?
Edit - An example to illustrate: Suppose you have to functions that are defined inline in a header file. These two function call each other (maybe a recursive tree traversal, where odd and even layers of the tree are handled differently). The only way to resolve this would be to make a forward declaration of one of the functions before the other.
A more common example (though with classes, not functions) is the case of classes with private constructors and factories. The factory needs to know the class in order to create instances of it, and the class needs to know the factory for the friend declaration.
If this is requirement is from the olden days, why was it not removed at some point? It would not break existing code, would it?
How do you propose to resolve undeclared identifiers that are defined in a different translation unit?
C++ has no module concept, but has separate translation as an inheritance from C. A C++ compiler will compile each translation unit by itself, not knowing anything about other translation units at all. (Except that export broke this, which is probably why it, sadly, never took off.)
Header files, which is where you usually put declarations of identifiers which are defined in other translation units, actually are just a very clumsy way of slipping the same declarations into different translation units. They will not make the compiler aware of there being other translation units with identifiers being defined in them.
Edit re your additional examples:
With all the textual inclusion instead of a proper module concept, compilation already takes agonizingly long for C++, so requiring another compilation pass (where compilation already is split into several passes, not all of which can be optimized and merged, IIRC) would worsen an already bad problem. And changing this would probably alter overload resolution in some scenarios and thus break existing code.
Note that C++ does require an additional pass for parsing class definitions, since member functions defined inline in the class definition are parsed as if they were defined right behind the class definition. However, this was decided when C with Classes was thought up, so there was no existing code base to break.
Historically C89 let you do this. The first time the compiler saw a use of a function and it didn't have a predefined prototype, it "created" a prototype that matched the use of the function.
When C++ decided to add strict typechecking to the compiler, it was decided that prototypes were now required. Also, C++ inherited the single-pass compilation from C, so it couldn't add a second pass to resolved all symbols.
Because C and C++ are old languages. Early compilers didn't have a lot of memory, so these languages were designed so a compiler can just read the file from top to bottom, without having to consider the file as a whole.
I think of two reasons:
It makes the parsing easy. No extra pass needed.
It also defines scope; symbols/names are available only after its declaration. Means, if I declare a global variable int g_count;, the later code after this line can use it, but not the code before the line! Same argument for global functions.
As an example, consider this code:
void g(double)
{
cout << "void g(double)" << endl;
}
void f()
{
g(int());//this calls g(double) - because that is what is visible here
}
void g(int)
{
cout << "void g(int)" << endl;
}
int main()
{
f();
g(int());//calls g(int) - because that is what is the best match!
}
Output:
void g(double)
void g(int)
See the output at ideone : http://www.ideone.com/EsK4A
The main reason will be to make the compilation process as efficient as possible. If you add an extra pass you're adding both time and storage. Remember that C++ was developed back before the time of Quad Core Processors :)
The C programming language was designed so that the compiler could be implemented as a one-pass compiler. In such a compiler, each compilation phase is only executed once. In such a compiler you cannot referrer to an entity that is defined later in the source file.
Moreover, in C, the compiler only interpret a single compilation unit (generally a .c file and all the included .h files) at a time. So you needed a mechanism to referrer to a function defined in another compilation unit.
The decision to allow one-pass compiler and to be able to split a project in small compilation unit was taken because at the time the memory and the processing power available was really tight. And allowing forward-declaration could easily solve the issue with a single feature.
The C++ language was derived from C and inherited the feature from it (as it wanted to be as compatible with C as possible to ease the transition).
I guess because C is quite old and at the time C was designed efficient compilation was a problem because CPUs were much slower.
Since C++ is a static language, the compiler needs to check if values' type is compatible with the type expected in the function's parameters. Of course, if you don't know the function signature, you can't do this kind of checks, thus defying the purpose of a static compiler. But, since you have a silver badge in C++, I think you already know this.
The C++ language specs were made right because the designer didn't want to force a multi-pass compiler, when hardware was not as fast as the one available today. In the end, I think that, if C++ was designed today, this imposition would go away but then, we would have another language :-).
One of the biggest reasons why this was made mandatory even in C99 (compared to C89, where you could have implicitly-declared functions) is that implicit declarations are very error-prone. Consider the following code:
First file:
#include <stdio.h>
void doSomething(double x, double y)
{
printf("%g %g\n",x,y);
}
Second file:
int main()
{
doSomething(12345,67890);
return 0;
}
This program is a syntactically valid* C89 program. You can compile it with GCC using this command (assuming the source files are named test.c and test0.c):
gcc -std=c89 -pedantic-errors test.c test0.c -o test
Why does it print something strange (at least on linux-x86 and linux-amd64)? Can you spot the problem in the code at a glance? Now try replacing c89 with c99 in the command line — and you'll be immediately notified about your mistake by the compiler.
Same with C++. But in C++ there're actually other important reasons why function declarations are needed, they are discussed in other answers.
* But has undefined behavior
Still, you can have a use of a function before it is declared sometimes (to be strict in the wording: "before" is about the order in which the program source is read) -- inside a class!:
class A {
public:
static void foo(void) {
bar();
}
private:
static void bar(void) {
return;
}
};
int main() {
A::foo();
return 0;
}
(Changing the class to a namespace doesn't work, per my tests.)
That's probably because the compiler actually puts the member-function definitions from inside the class right after the class declaration, as someone has pointed it out here in the answers.
The same approach could be applied to the whole source file: first, drop everything but declaration, then handle everything postponed. (Either a two-pass compiler, or large enough memory to hold the postponed source code.)
Haha! So, they thought a whole source file would be too large to hold in the memory, but a single class with function definitions wouldn't: they can allow for a whole class to sit in the memory and wait until the declaration is filtered out (or do a 2nd pass for the source code of classes)!
I remember with Unix and Linux, you have Global and Local. Within your own environment local works for functions, but does not work for Global(system). You must declare the function Global.
In C++, declaration and definition of functions, variables and constants can be separated like so:
function someFunc();
function someFunc()
{
//Implementation.
}
In fact, in the definition of classes, this is often the case. A class is usually declared with it's members in a .h file, and these are then defined in a corresponding .C file.
What are the advantages & disadvantages of this approach?
Historically this was to help the compiler. You had to give it the list of names before it used them - whether this was the actual usage, or a forward declaration (C's default funcion prototype aside).
Modern compilers for modern languages show that this is no longer a necessity, so C & C++'s (as well as Objective-C, and probably others) syntax here is histotical baggage. In fact one this is one of the big problems with C++ that even the addition of a proper module system will not solve.
Disadvantages are: lots of heavily nested include files (I've traced include trees before, they are surprisingly huge) and redundancy between declaration and definition - all leading to longer coding times and longer compile times (ever compared the compile times between comparable C++ and C# projects? This is one of the reasons for the difference). Header files must be provided for users of any components you provide. Chances of ODR violations. Reliance on the pre-processor (many modern languages do not need a pre-processor step), which makes your code more fragile and harder for tools to parse.
Advantages: no much. You could argue that you get a list of function names grouped together in one place for documentation purposes - but most IDEs have some sort of code folding ability these days, and projects of any size should be using doc generators (such as doxygen) anyway. With a cleaner, pre-processor-less, module based syntax it is easier for tools to follow your code and provide this and more, so I think this "advantage" is just about moot.
It's an artefact of how C/C++ compilers work.
As a source file gets compiled, the preprocessor substitutes each #include-statement with the contents of the included file. Only afterwards does the compiler try to interpret the result of this concatenation.
The compiler then goes over that result from beginning to end, trying to validate each statement. If a line of code invokes a function that hasn't been defined previously, it'll give up.
There's a problem with that, though, when it comes to mutually recursive function calls:
void foo()
{
bar();
}
void bar()
{
foo();
}
Here, foo won't compile as bar is unknown. If you switch the two functions around, bar won't compile as foo is unknown.
If you separate declaration and definition, though, you can order the functions as you wish:
void foo();
void bar();
void foo()
{
bar();
}
void bar()
{
foo();
}
Here, when the compiler processes foo it already knows the signature of a function called bar, and is happy.
Of course compilers could work in a different way, but that's how they work in C, C++ and to some degree Objective-C.
Disadvantages:
None directly. If you're using C/C++ anyway, it's the best way to do things. If you've got a choice of language/compiler, then maybe you can pick one where this is not an issue. The only thing to consider with splitting declarations into header files is to avoid mutually recursive #include-statements - but that's what include guards are for.
Advantages:
Compilation speed: As all included files are concatenated and then parsed, reducing the amount and complexity of code in included files will improve compilation time.
Avoid code duplication/inlining: If you fully define a function in a header file, each object file that includes this header and references this function will contain it's own version of that function. As a side-note, if you want inlining, you need to put the full definition into the header file (on most compilers).
Encapsulation/clarity: A well defined class/set of functions plus some documentation should be enough for other developers to use your code. There is (ideally) no need for them to understand how the code works - so why require them to sift through it? (The counter-argument that it's may be useful for them to access the implementation when required still stands, of course).
And of course, if you're not interested in exposing a function at all, you can usually still choose to define it fully in the implementation file rather than the header.
The standard requires that when using a function, a declaration must be in scope. This means, that the compiler should be able to verify against a prototype (the declaration in a header file) what you are passing to it. Except of course, for functions that are variadic - such functions do not validate arguments.
Think of C, when this was not required. At that time, compilers treated no return type specification to be defaulted to int. Now, assume you had a function foo() which returned a pointer to void. However, since you did not have a declaration, the compiler will think that it has to return an integer. On some Motorola systems for example, integeres and pointers would be be returned in different registers. Now, the compiler will no longer use the correct register and instead return your pointer cast to an integer in the other register. The moment you try to work with this pointer -- all hell breaks loose.
Declaring functions within the header is fine. But remember if you declare and define in the header make sure they are inline. One way to achieve this is to put the definition inside the class definition. Otherwise prepend the inline keyword. You will run into ODR violation otherwise when the header is included in multiple implementation files.
There are two main advantages to separating declaration and definition into C++ header and source files. The first is that you avoid problems with the One Definition Rule when your class/functions/whatever are #included in more than one place. Secondly, by doing things this way, you separate interface and implementation. Users of your class or library need only to see your header file in order to write code that uses it. You can also take this one step farther with the Pimpl Idiom and make it so that user code doesn't have to recompile every time the library implementation changes.
You've already mentioned the disadvantage of code repetition between the .h and .cpp files. Maybe I've written C++ code for too long, but I don't think it's that bad. You have to change all user code every time you change a function signature anyway, so what's one more file? It's only annoying when you're first writing a class and you have to copy-and-paste from the header to the new source file.
The other disadvantage in practice is that in order to write (and debug!) good code that uses a third-party library, you usually have to see inside it. That means access to the source code even if you can't change it. If all you have is a header file and a compiled object file, it can be very difficult to decide if the bug is your fault or theirs. Also, looking at the source gives you insight into how to properly use and extend a library that the documentation might not cover. Not everyone ships an MSDN with their library. And great software engineers have a nasty habit of doing things with your code that you never dreamed possible. ;-)
Advantage
Classes can be referenced from other files by just including the declaration. Definitions can then be linked later on in the compilation process.
You basically have 2 views on the class/function/whatever:
The declaration, where you declare the name, the parameters and the members (in the case of a struct/class), and the definition where you define what the functions does.
Amongst the disadvantages are repetition, yet one big advantage is that you can declare your function as int foo(float f) and leave the details in the implementation(=definition), so anyone who wants to use your function foo just includes your header file and links to your library/objectfile, so library users as well as compilers just have to care for the defined interface, which helps understanding the interfaces and speeds up compile times.
One advantage that I haven't seen yet: API
Any library or 3rd party code that is NOT open source (i.e. proprietary) will not have their implementation along with the distribution. Most companies are just plain not comfortable with giving away source code. The easy solution, just distribute the class declarations and function signatures that allow use of the DLL.
Disclaimer: I'm not saying whether it's right, wrong, or justified, I'm just saying I've seen it a lot.
One big advantage of forward declarations is that when used carefully you can cut down the compile time dependencies between modules.
If ClassA.h needs to refer to a data element in ClassB.h, you can often use just a forward references in ClassA.h and include ClassB.h in ClassA.cc rather than in ClassA.h, thus cutting down a compile time dependency.
For big systems this can be a huge time saver on a build.
Disadvantage
This leads to a lot of repetition. Most of the function signature needs to be put in two or more (as Paulious noted) places.
Separation gives clean, uncluttered view of program elements.
Possibility to create and link to binary modules/libraries without disclosing sources.
Link binaries without recompiling sources.
When done correctly, this separation reduces compile times when only the implementation has changed.