Why should copy constructors be sometimes declared explicitly non-inlined?

Why should copy constructors be sometimes declared explicitly non-inlined? - c++

I have trouble understanding the sentence with respect to inline and customers binary compatibility. Can someone please explain?
C++ FAQ Cline, Lomow:
When the compiler synthesizes the copy constructor, it makes them inline. If your classes are exposed to your customers ( for example, if your customers #include your header files rather than merely using an executable, built from your classes ), your inline code is copied into your customers executables. If your customers want to maintain binary compatibilty between releases of your header files, you must not change an inline functions that are visible to the customers. Because of this, you will want an explicit, non inline version of the copy constructor, that will be used directly by the customer.

Binary compatibility for dynamic libraries (.dll, .so) is often an important thing.
e.g. you don't want to have to recompile half the software on the OS because you updated some low level library everything uses in an incompatible way (and consider how frequent security updates can be). Often you may not even have all the source code required to do so even if you wanted.
For updates to your dynamic library to be compatible, and actually have an effect, you essentially can not change anything in a public header file, because everything there was compiled into those other binaries directly (even in C code, this can often include struct sizes and member layouts, and obviously you cant remove or change any function declarations either).
In addition to the C issues, C++ introduces many more (order of virtual functions, how inheritance works, etc.) so it is conceivable that you might do something that changes the auto generated C++ constructor, copy, destructor etc. while otherwise maintaining compatibility. If they are defined "inline" along with the class/struct, rather than explicitly in your source, then they will have been included directly by other applications/libraries that linked your dynamic library and used those auto generated functions, and they wont get your changed version (which you maybe didn't even realise has changed!).

It is referring to problems that can occur between binary releases of a library and header changes in that library. There are certain changes which are binary compatible and certain changes which are not. Changes to inline functions, such as an inlined copy-constructor, are not binary compatible and require that the consumer code be recompiled.
You see this within a single project all the time. If you change a.cpp then you don't have to recompile all of the files which include a.hpp. But if you change the interface in the header, then any consumer of that header typically needs to be recompiled. This is similar to the case when using shared libraries.
Maintaining binary compatibility is useful for when one wants to change the implementation of a binary library without changing its interface. This is useful for things like bug fixes.
For example say a program uses liba as a shared library. If liba contains a bug in a method for a class it exposes, then it can change the internal implementation and recompile the shared library and the program can use the new binary release of that liba without itself being recompiled. If, however, liba changes the public contract such as the implementation of an inlined method, or moving an inlined method to being externally declared, then it breaks the application binary interface (ABI) and the consume program must be recompiled to use the new binary version of the liba.

Consider the following code compiled into static library:
// lib.hpp
class
t_Something
{
private: ::std::string foo;
public: void
Do_SomethingUseful(void);
};
// lib.cpp
void t_Something::
Do_SomethingUseful(void)
{
....
}
// user_project.cpp
int
main()
{
t_Something something;
something.Do_SomethingUseful();
t_Something something_else = something;
}
Now when t_Something class fields changes somehow, for example a new one is added we get in a situation where all the user code have to be recompiled. Basically constructors implicitly generated by compiler "leaked" from our static library to user code.

I think I understand what this passage means. By no means I am endorsing this, though.
I believe, they describe the scenario when you are developing a library and provide it to your customers in a form of header files and pre-compiled binary library part. After customer has done initial build, they are expected to be able to substitute a binary part with a newer one without recompiling their application - only relink would be required. The only way to achieve that would be to guarantee that header files are immutable, i.e. not changed between versions.
I guess, the notion of this would come from the fact that in 98 build systems were not smart enough and would not be able to detect change in header file and trigger recompilation of affected source file.
Any and all of that is completely moot nowadays, and in fact, goes again the grain - as significant number of libraries actually try hard to be header-only libraries, for multiple reasons.

Related

C++ library export specific functions only

I am trying to search for ways to control the 'exposure' of functions/classes/variables to third-party users while I still have full access within a C++ project/library.
In javascript you got modules which does this exactually.
In java/C# you can get pretty far with access-modifiers.
But in C/C++ there doesn't seem to be any control beyond the file itself (i.e. when its in the .h/.hpp file, its accessible from anywhere).
So the question, is there a way to access functions/classes/variables from files within the project without exposing them to third-party users?

Well, don't put them on the API, and if these symbols aren't needed internally, keep them in header files only used for building your project, not installed as development headers for your consumers.
Say, you have a class declaration
class Whatpeopleuse {
private:
int _base;
public:
Whatpeopleuse(int base);
int nth_power(unsigned int exponent);
};
in your n247s/toolbox.hpp, which you install / ship to customers.
And the implementation in your mycode.cc file
#include "n247s/toolbox.hpp"
#include "math_functions.hpp" // contains declaration of power_function
Whatpeopleuse::Whatpeopleuse(int base) : _base(base)
{
}
int
Whatpeopleuse::nth_power(unsigned int exponent)
{
return power_function(_base, exponent)
}
with power_function defined in another file, math_functions.cc:
#include "math_functions.hpp"
int power_function(int base, unsigned int exponent)
{
int result = 1;
for(;exponent; --exponent)
result *= base;
return result;
}
Then you compile your mycode.cc and your math_functions.cc, link them together to a n247s.so (or .dll, or whatever your shared library extension is), and ship that to the customer together with toolbox.hpp.
Customer never sees the function definitions in math_functions.h, or the code in math_functions.cc or mycode.cc. That's internal to the binary you produced.
What the customer sees/gets is
the header toolbox.hpp, and what symbols / types there are in your library that they are able to access (otherwise, their compiler wouldn't know what there is to call in your library)
the binary n247s library, containing the symbols as declared in toolbox.hpp.
Of these symbols, only these that have visibility actually are then given a name associated with an address within the shared library file. You'll find that it's common to tell the linker that actually none of the functions in a header should be visible by default, and explicitly mark these classes and functions you want to see, using compiler __attribute__((visibility("default"))) (at least that's what's in my macros to do that, for MSVC, that attribute specification might look different).
The user of the class Whatpeopleuse can only access its public: members (There's ways around that, though, within limits), but they can see that you have private members (like _base). If that's too much disclosure, you can make your customer-facing classes only contain a single member pointer, something called a detail, and the customer-facing member functions (whose implementations just call detail->name_of_member).
I'd like to add that you don't want to make it hard for your customers to know what your class is doing. If they're so motivated, they can reverse engineer quite a lot. Making something that's just harder to read and understand because its headers go through lengths to obfuscate what's happening behind the scenes is frustrating and people won't like it.
On the other hand, the above methodology is something you typically find in large code bases – not to "hide" things from the user, but to keep the API clean – only the things that are in the public-facing header are part of the API, the rest is never seen by the user's compiler. That's great, because it means
you make clear what is a good idea to use, and what is internal "plumbing". This is mostly important because often, it's easy to forget what the functionality is that you actually want to offer, and then start writing confusing / hard to use libraries.
To minimize the ABI of your library: as long as the ABI of the symbols that you have in your public-facing libraries don't change, your user can just drop-in replace your library v1.1.2 with v1.1.3, by replacing the library binary, without recompilation.
It makes it clear what needs to be user-friendly documented, and what not. If you're shipping someone a library without documentation, they will go through hell to not use your library. I've myself have been in that position. I know companies who just rewrote whole driver suites from scratch because the documentation they got from the hardware vendor was not explaining behavior.

If headers of an interface change, why clients of the interface need to be recompiled?

As I read the problem statement of item31: Minimize compilation dependencies between files of Effective C++, the following statement puzzles me:
class Person {
public:
Person(const std::string& name, const Date& birthday,
const Address& addr);
std::string name() const;
std::string birthDate() const;
std::string address() const;
...
private:
std::string theName; // implementation detail
Date theBirthDate; // implementation detail
Address theAddress; // implementation detail
};
in the file defining the Person class, you are likely to find something like this:
#include < string>
#include "date.h"
#include "address.h"
Unfortunately, this sets up a compilation dependency between the file defining Person and these header files. If any of these header files (comment mine: the headers listed above, namely < string>, "date.h", "address.h") is changed, or if any of the header files they depend on changes, the file containing the Person class must be recompiled, as must any files that use Person.
What I don't quite understand is the last part highlighted. Why do clients that use Person need recompilation? They just need to relink to the newly compiled Person object code, right (I am assuming the Person interface remains the same to its clients)?
If what clients really need - assuming the Person interface doesn't change - is just a relinking, does it still warrant the Pimpl idiom? The Pimpl class still need recompilation if any of the headers changes. The idiom only saves the client one relinking.
EDIT: It seems that there is a lot of confusion about what headers have changed. In this case, Scott Meyers was talking about the header files included by Person.h are changed. But Person.h itself does not change so clients using (#including) Person.h doesn't see a change (no timestamp change on Person.h). The makefile dependency would list Person.o as a prerequisite so the client will simply link with the new Person.o. I am learning Pimpl idiom, maybe I missed some obvious points in everyone's argument. Please elucidate.
EDIT2: When client needs to use Person, it includes Person.h which also include all the other included file such as date.h and address.h. I missed this part and thought only Person.cpp need to deal with these headers.

there is an intermediate step in compiling. i.e. if you compile foo.cpp and it includes a.h ab b.h then an intermediate source file
a.h content
b.h content
foo.cpp content
which is the input for compilation is created. note that if other headers are included in headers, they are also listed recursively.
since a chance in header cause your
compilation file, the intermediate file, change, foo.cpp should be recompiled.

Yes, but that re-linkage will fail if the datatype sizes are wrong, or the old code is trying to link to code that no longer exists. It's not magic: the code, at link-time, has still been compiled.
There is a subset of interface changes you can make without breaking binary compatibility; adding members to a type is not part of that subset.
(I am assuming the Person interface remains the same to its clients)
This is the key. Your assumption has removed the constraints, so the answer to "why do the other files need to be recompiled" becomes "they don't".
Obviously, the quote in its original context does not mention that assumption, which is why it is giving broader guidelines. Though, personally, I'd have liked to have seen a more in-depth explanation from Meyers of binary compatibility.

In a very practical sense: Suppose that person.h includes other files, or defines some preprocessor symbols. If you change its includes or change its preprocessors symbol, then any file that also includes person.h potentially has its meaning changed.
In practice the compiler will fully recompile any compilation units affected by the change, if I understand correctly. And even if there are some optimizations to avoid doing lots of work when only "minor" or "insignficant" changes take place, like adding whitespace or something, the compiler needs to at least look at any compilation unit whose text was potentially changed in order to be sure.
Generally speaking, most tool-chains don't cache the intermediate results of each compilation unit after preprocessor expansion, and even if you are using something like ccache it's not going to try to do anything intelligent with the cached stuff to avoid doing work when only small changes happen, it's only going to try to check if it's stale or not.
So, changing things in a header file that may seem even smaller than changing the layout or interface of a class, still needs to trigger recompilation generally. What if some of the compilation units contain queries like sizeof your class? Or use SFINAE tricks to detect if it has certain methods?

The core information in a header file describes interfaces.
An interface to a function describes its arguments (how many, what type, etc) and return type. The actual function implementation (definition) requires the function be called in an expected way - and the interface describes that. If code that is calling the function provides a different set of arguments, or if it acts as if the function returns something different than it actually does, then there will be a malfunction somewhere (either in the function, since it is not given information it expects, or in the caller, since the function doesn't give information the caller expects).
This means, if the interface to a function changes, then both code for the called function and for the callers need to be recompiled, in order to ensure consistency.
The same goes for type definitions. struct and class types may include member functions, and the compiler needs to ensure consistency between behaviour of those functions and their callers (or the programmer has to deal with inconsistency, which may manifest in tricky ways). Also, when creating an instance of a type (i.e. an object or a variable) the compiler needs to know the size of the type (how much memory it needs, how far the second element of an array is from the first, etc) in order to work with objects correctly.
All of this information is specified in the interface, which is typically placed in headers. Yes, the compiler might get away with making assumptions if it is not given information (e.g. in C, a function is assumed to return int and accept an arbitrary set of arguments if it is called without being previously declared) but there is still the problem of mismatches (e.g. if the function is assumed to return int, but actually returns a pointer of some type, what happens?).
More prosaically, build management processes (makefiles, build scripts, etc) typically check creation dates of files. For example, a source file may be recompiled if the corresponding object is older than that source file, or older than any of the header files that source file #includes. The logic of doing that is that the content of source file and its included headers affect how code in the compiled object behaves and, if the object file is older than one of those files, then there may well have been a change. The only way to make things line up is to recompile.
It would be possible to only recompile if a "substantive" change of file content has occurred (e.g. not recompile if only a comment has been changed in a header). However, doing that would mean it is necessary to reliably detect that the change in a file actually doesn't matter to the working of the program. The analysis to do that is certainly possible, but will often be more complicated - and time consuming, which is a problem as programmers tend to whine about long build times - than simply checking file dates.

Is it ever impossible to write a header-only library?

Is there ever such a pattern of dependancies that it is impossible to keep everything in header files only? What if we enforced a rule of one class per header only?
For the purposes of this question, let's ignore static things :)

I am aware of no features in standard C++, excepting statics which you have already mentioned, which require a library to define a full translation unit (instead of only headers). However, it's not recommended to do that, because when you do, you force all your clients to recompile their entire codebase whenever your library changes. If you're using source files or a static library or a dynamic library form of distribution, your library can be changed/updated/modified without forcing everyone to recompile.

It is possible, I would say, at the express condition of not using a number of language features: as you noticed, a few uses of the static keyword.
It may require a few trick, but they can be reviewed.
You'll need to keep the header / source distinction whenever you need to break a dependency cycle, even though the two files will be header files in practice.
Free-functions (non-template) have to be declared inline, the compiler may not inline them, but if they are declared so it won't complained that they have been redefined when the client builts its library / executable.
Globally shared data (global variables and class static attributes) should be emulated using local static attribute in functions / class methods. In practice it matters little as far as the caller is concerned (just adds ()). Note that in C++0x this becomes the favored way because it's guaranteed to be thread-safe while still protecting from the initialization order fiasco, until then... it's not thread-safe ;)
Respecting those 3 points, I believe you would be able to write a fully-fledged header-only library (anyone sees something else I missed ?)
A number of Boost Libraries have used similar tricks to be header-only even though their code was not completely template. For example Asio does very consciously and proposes the alternative using flags (see release notes for Asio 1.4.6):
clients who only need a couple features need not worry about building / linking, they just grab what they need
clients who rely on it a bit more or want to cut down on compilation time are offered the ability to build their own Asio library (with their own sets of flags) and then include "lightweight" headers
This way (at the price of some more effort on the part of the library devs) the clients get their cake and eat it too. It's a pretty nice solution I think.
Note: I am wondering whether static functions could be inlined, I prefer to use anonymous namespaces myself so never really looked into it...

The one class per header rule is meaningless. If this doesn't work:
#include <header1>
#include <header2>
then some variation of this will:
#include <header1a>
#include <header2>
#include <header1b>
This might result in less than one class per header, but you can always use (void*) and casts and inline functions (in which case the 'inline' will likely be duly ignored by the compiler). So the question, seems to me, can be reduced to:
class A
{
// ...
void *pimpl;
}
Is it possible that the private implementation, pimpl, depends on the declaration of A? If so then pimpl.cpp (as a header) must both precede and follow A.h. But Since you can always, once again, use (void*) and casts and inline functions in preceding headers, it can be done.
Of course, I could be wrong. In either case: Ick.

In my long career, I haven't come across dependency pattern that would disallow header-only implementation.
Mind you that if you have circular dependencies between classes, you may need to resort to either abstract interface - concrete implementation paradigm, or use templates (using templates allows you to forward-reference properties/methods of template parameters, which are resolved later during instantiation).
This does not mean that you SHOULD always aim for header-only libraries. Good as they are, they should be reserved to template and inline code. They SHOULD NOT include substantial complex calculations.

Should I use a single header to include all static library headers?

I have a static library that I am building in C++. I have separated it into many header and source files. I am wondering if it's better to include all of the headers that a client of the library might need in one header file that they in turn can include in their source code or just have them include only the headers they need? Will that cause the code to be unecessary bloated? I wasn't sure if the classes or functions that don't get used will still be compiled into their products.
Thanks for any help.

Keep in mind that each source file that you compile involves an independent invocation of the compiler. With each invocation, the compiler has to read in every included header file, parse through it, and build up a symbol table.
When you use one of these "include the world" header files in lots of your source files, it can significantly impact your build time.
There are ways to mitigate this; for example, Microsoft has a precompiled header feature that essentially saves out the symbol table for subsequent compiles to use.
There is another consideration though. If I'm going to use your WhizzoString class, I shouldn't have to have headers installed for SOAP, OpenGL, and what have you. In fact, I'd rather that WhizzoString.h only include headers for the types and symbols that are part of the public interface (i.e., the stuff that I'm going to need as a user of your class).
As much as possible, you should try to shift includes from WhizzoString.h to WhizzoString.cpp:
OK:
// Only include the stuff needed for this class
#include "foo.h" // Foo class
#include "bar.h" // Bar class
public class WhizzoString
{
private Foo m_Foo;
private Bar * m_pBar;
.
.
.
}
BETTER:
// Only include the stuff needed by the users of this class
#include "foo.h" // Foo class
class Bar; // Forward declaration
public class WhizzoString
{
private Foo m_Foo;
private Bar * m_pBar;
.
.
.
}
If users of your class never have to create or use a Bar type, and the class doesn't contain any instances of Bar, then it may be sufficient to provide only a forward declaration of Bar in the header file (WhizzoString.cpp will have #include "bar.h"). This means that anyone including WhizzoString.h could avoid including Bar.h and everything that it includes.

In general, when linking the final executable, only the symbols and functions that are actually used by the program will be incorporated. You pay only for what you use. At least that's how the GCC toolchain appears to work for me. I can't speak for all toolchains.
If the client will always have to include the same set of header files, then it's okay to provide a "convience" header file that includes others. This is common practice in open-source libraries. If you decide to provide a convenience header, make it so that the client can also choose to include specifically what is needed.
To reduce compile times in large projects, it's common practice to include the least amount of headers as possible to make a unit compile.

what about giving both choices:
#include <library.hpp> // include everything
#include <library/module.hpp> // only single module
this way you do not have one huge include file, and for your separate files, they are stacked neatly in one directory

It depends on the library, and how you've structured it. Remember that header files for a library, and which pieces are in which header file, are essentially part of the API of the library. So, if you lead your clients to carefully pick and choose among your headers, then you will need to support that layout for a long time. It is fairly common for libraries to export their whole interface via one file, or just a few files, if some part of the API is truly optional and large.
A consideration should be compilation time: If the client has to include two dozen files to use your library, and those includes have internal includes, it can significantly increase compilation time in a big project, if used often. If you go this route, be sure all your includes have proper include guards around not only the file contents, but the including line as well. Though note: Modern GCC does a very good job of this particular issue and only requires the guards around the header's contents.
As to bloating the final compiled program, it depends on your tool chain, and how you compiled the library, not how the client of the library included header files. (With the caveat that if you declare static data objects in the headers, some systems will end up linking in the objects that define that data, even if the client doesn't use it.)
In summary, unless it is a very big library, or a very old and cranky tool chain, I'd tend to go with the single include. To me, freezing your current implementation's division into headers into the library's API is bigger worry than the others.

The problem with single file headers is explained in detail by Dr. Dobbs, an expert compiler writer. NEVER USE A SINGLE FILE HEADER!!! Each time a header is included in a .cc/.cpp file it has to be recompiled because you can feed the file macros to alter the compiled header. For this reason, a single header file will dramatically increase compile time without providing any benifit. With C++ you should optimize for human time first, and compile time is human time. You should never, because it dramatically increases compile time, include more than you need to compile in any header, each translation unit(TU) should have it's own implementation (.cc/.cpp) file, and each TU named with unique filenames;.
In my decade of C++ SDK development experience, I religiously ALWAYS have three files in EVERY module. I have a config.h that gets included into almost every header file that contains prereqs for the entire module such as platform-config and stdint.h stuff. I also have a global.h file that includes all of the header files in the module; this one is mostly for debugging (hint enumerate your seams in the global.h file for better tested and easier to debug code). The key missing piece here is that ou should really have a public.h file that includes ONLY your public API.
In libraries that are poorly programmed, such as boost and their hideous lower_snake_case class names, they use this half-baked worst practice of using a detail (sometimes named 'impl') folder design pattern to "conceal" their private interface. There is a long background behind why this is a worst practice, but the short story is that it creates an INSANE amount of redundant typing that turns one-liners into multi-liners, and it's not UML compliant and it messes up the UML dependency diagram resulting in overly complicated code and inconsistent design patterns such as children actually being parents and vice versa. You don't want or need a detail folder, you need to use a public.h header with a bunch of sibling modules WITHOUT ADDITIONAL NAMESPACES where your detail is a sibling and not a child that is in reatliy a parent. Namespaces are supposed to be for one thing and one thing only: to interface your code with other people's code, but if it's your code you control it and you should use unique class and funciton names because it's bad practice to use a namesapce when you don't need to because it may cause hash table collision that slow downt he compilation process. UML is the best pratice, so if you can organize your headers so they are UML compliant then your code is by definition more robust and portable. A public.h file is all you need to expose only the public API; thanks.

What are the advantages and disadvantages of separating declaration and definition as in C++?

In C++, declaration and definition of functions, variables and constants can be separated like so:
function someFunc();
function someFunc()
{
//Implementation.
}
In fact, in the definition of classes, this is often the case. A class is usually declared with it's members in a .h file, and these are then defined in a corresponding .C file.
What are the advantages & disadvantages of this approach?

Historically this was to help the compiler. You had to give it the list of names before it used them - whether this was the actual usage, or a forward declaration (C's default funcion prototype aside).
Modern compilers for modern languages show that this is no longer a necessity, so C & C++'s (as well as Objective-C, and probably others) syntax here is histotical baggage. In fact one this is one of the big problems with C++ that even the addition of a proper module system will not solve.
Disadvantages are: lots of heavily nested include files (I've traced include trees before, they are surprisingly huge) and redundancy between declaration and definition - all leading to longer coding times and longer compile times (ever compared the compile times between comparable C++ and C# projects? This is one of the reasons for the difference). Header files must be provided for users of any components you provide. Chances of ODR violations. Reliance on the pre-processor (many modern languages do not need a pre-processor step), which makes your code more fragile and harder for tools to parse.
Advantages: no much. You could argue that you get a list of function names grouped together in one place for documentation purposes - but most IDEs have some sort of code folding ability these days, and projects of any size should be using doc generators (such as doxygen) anyway. With a cleaner, pre-processor-less, module based syntax it is easier for tools to follow your code and provide this and more, so I think this "advantage" is just about moot.

It's an artefact of how C/C++ compilers work.
As a source file gets compiled, the preprocessor substitutes each #include-statement with the contents of the included file. Only afterwards does the compiler try to interpret the result of this concatenation.
The compiler then goes over that result from beginning to end, trying to validate each statement. If a line of code invokes a function that hasn't been defined previously, it'll give up.
There's a problem with that, though, when it comes to mutually recursive function calls:
void foo()
{
bar();
}
void bar()
{
foo();
}
Here, foo won't compile as bar is unknown. If you switch the two functions around, bar won't compile as foo is unknown.
If you separate declaration and definition, though, you can order the functions as you wish:
void foo();
void bar();
void foo()
{
bar();
}
void bar()
{
foo();
}
Here, when the compiler processes foo it already knows the signature of a function called bar, and is happy.
Of course compilers could work in a different way, but that's how they work in C, C++ and to some degree Objective-C.
Disadvantages:
None directly. If you're using C/C++ anyway, it's the best way to do things. If you've got a choice of language/compiler, then maybe you can pick one where this is not an issue. The only thing to consider with splitting declarations into header files is to avoid mutually recursive #include-statements - but that's what include guards are for.
Advantages:
Compilation speed: As all included files are concatenated and then parsed, reducing the amount and complexity of code in included files will improve compilation time.
Avoid code duplication/inlining: If you fully define a function in a header file, each object file that includes this header and references this function will contain it's own version of that function. As a side-note, if you want inlining, you need to put the full definition into the header file (on most compilers).
Encapsulation/clarity: A well defined class/set of functions plus some documentation should be enough for other developers to use your code. There is (ideally) no need for them to understand how the code works - so why require them to sift through it? (The counter-argument that it's may be useful for them to access the implementation when required still stands, of course).
And of course, if you're not interested in exposing a function at all, you can usually still choose to define it fully in the implementation file rather than the header.

The standard requires that when using a function, a declaration must be in scope. This means, that the compiler should be able to verify against a prototype (the declaration in a header file) what you are passing to it. Except of course, for functions that are variadic - such functions do not validate arguments.
Think of C, when this was not required. At that time, compilers treated no return type specification to be defaulted to int. Now, assume you had a function foo() which returned a pointer to void. However, since you did not have a declaration, the compiler will think that it has to return an integer. On some Motorola systems for example, integeres and pointers would be be returned in different registers. Now, the compiler will no longer use the correct register and instead return your pointer cast to an integer in the other register. The moment you try to work with this pointer -- all hell breaks loose.
Declaring functions within the header is fine. But remember if you declare and define in the header make sure they are inline. One way to achieve this is to put the definition inside the class definition. Otherwise prepend the inline keyword. You will run into ODR violation otherwise when the header is included in multiple implementation files.

There are two main advantages to separating declaration and definition into C++ header and source files. The first is that you avoid problems with the One Definition Rule when your class/functions/whatever are #included in more than one place. Secondly, by doing things this way, you separate interface and implementation. Users of your class or library need only to see your header file in order to write code that uses it. You can also take this one step farther with the Pimpl Idiom and make it so that user code doesn't have to recompile every time the library implementation changes.
You've already mentioned the disadvantage of code repetition between the .h and .cpp files. Maybe I've written C++ code for too long, but I don't think it's that bad. You have to change all user code every time you change a function signature anyway, so what's one more file? It's only annoying when you're first writing a class and you have to copy-and-paste from the header to the new source file.
The other disadvantage in practice is that in order to write (and debug!) good code that uses a third-party library, you usually have to see inside it. That means access to the source code even if you can't change it. If all you have is a header file and a compiled object file, it can be very difficult to decide if the bug is your fault or theirs. Also, looking at the source gives you insight into how to properly use and extend a library that the documentation might not cover. Not everyone ships an MSDN with their library. And great software engineers have a nasty habit of doing things with your code that you never dreamed possible. ;-)

Advantage
Classes can be referenced from other files by just including the declaration. Definitions can then be linked later on in the compilation process.

You basically have 2 views on the class/function/whatever:
The declaration, where you declare the name, the parameters and the members (in the case of a struct/class), and the definition where you define what the functions does.
Amongst the disadvantages are repetition, yet one big advantage is that you can declare your function as int foo(float f) and leave the details in the implementation(=definition), so anyone who wants to use your function foo just includes your header file and links to your library/objectfile, so library users as well as compilers just have to care for the defined interface, which helps understanding the interfaces and speeds up compile times.

One advantage that I haven't seen yet: API
Any library or 3rd party code that is NOT open source (i.e. proprietary) will not have their implementation along with the distribution. Most companies are just plain not comfortable with giving away source code. The easy solution, just distribute the class declarations and function signatures that allow use of the DLL.
Disclaimer: I'm not saying whether it's right, wrong, or justified, I'm just saying I've seen it a lot.

One big advantage of forward declarations is that when used carefully you can cut down the compile time dependencies between modules.
If ClassA.h needs to refer to a data element in ClassB.h, you can often use just a forward references in ClassA.h and include ClassB.h in ClassA.cc rather than in ClassA.h, thus cutting down a compile time dependency.
For big systems this can be a huge time saver on a build.

Disadvantage
This leads to a lot of repetition. Most of the function signature needs to be put in two or more (as Paulious noted) places.

Separation gives clean, uncluttered view of program elements.
Possibility to create and link to binary modules/libraries without disclosing sources.
Link binaries without recompiling sources.

When done correctly, this separation reduces compile times when only the implementation has changed.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js