Why is header including sufficient for definitions? - c++

as far as i understood; headerfiles declare things. Now including header files like #include iostream includes the header file iostream.h. This is telling the compiler for example „there is something called: cout“.
QUESTION: How does the compiler get to the definition of cout (or all the other functions)? In my understanding the compiler only gets to know names and types but no definitions.
Thanks in advance.

Actually: It doesn't. It needs to know how the objects look like, what interfaces they offer (so for std::cout that's a some std::ostream stream object, apparently a subclass of) and that such objects do exist somewhere. That's it. What the compiler then does is adding placeholders for that object – right as it does for function calls as well.
After compilation there's then a second unit: the linker. As its name tells, it links all those compilation units together. If it now sees such a place holder it will replace it with the address of the object or function – which must exist, of course (for std::cout, there's an extern declaration in the header, but some other source file must have implemented it without extern – and if pre-compiled in some library), otherwise a linker error is thrown.

Related

Double header inclusion in C++?

I have what seems a relatively simple question, but one that keeps defying my efforts to understand it.
I apologise if it is a simple question, but like many simple questions, I can't seem to find a solid explanation anywhere.
With the below code:
/*foo.c*/
#include "bar.h"
int main() {
return(my_function(1,2));
}
/*bar.h*/
int my_function(int,int);
/*bar.c*/
#include "bar.h" /*is this necessary!?*/
int my_function(int x, int y) {
return(x+y);
}
Simply, is the second inclusion necessary? I don't understand why I keep seeing headers included in both source files. Surely if the function is declared in "foo.c" by including "bar.h," it does not need to be declared a second time in another linked source file (especially the one which actually defines it)??? A friend tried to explain to me that it didn't really matter for functions, but it did for structs, something which still eludes me! Help!
Is it simply for clarity, so that programmers can see which functions are being used externally?
I just don't get it!
Thanks!
In this particular case, it's unnecessary for the reason you described. It might be useful in situations where you have a more complex set of functions that might all depend on each other. If you include the header at the top of the .cpp file, you have effectively forward-declared every single function and so you don't have to worry about making sure your function definitions are in a certain order.
I also find that it clearly shows that these function definitions correspond to those declarations. This makes it easier for the reader to find how translation units depend on each other. Of course, the names of the files might be sufficient, but some more complex projects do not have one-to-one relationship between .cpp files and .h files. Sometimes headers are broken up into multiple parts, or many implementation files will have their external functions declared in a single header (common for large modules).
Really, all inclusions are unnecessary. You can always, after all, just duplicate the declarations (or definitions, in the case of classes) across all of the files that require them. We use the preprocessor to simplify this task and reduce the amount of redundant code. It's easier to stick to a pattern of always including the corresponding header because it will always work, rather than have to check each file every time you edit them and determine if the inclusion is necessary or not.
The way the C language (and C++) is designed is that the compiler processes each .c file in isolation.
You typically launch your compiler (cl.exe or gcc, for example) for one of your c files, and this produces one object file (.o or .obj).
Once all your object files have been generated, you run the linker, passing it all the object files, and it will tie them together into an executable.
That's why every .c file needs to include the headers it depends on. When the compiler is processing it, it knows nothing about which other .c files you may have. All it knows is the contents of the .c file you point it to, as well as the headers it includes.
In your simplified example inclusion of "bar.h" in "bar.c" is not necessary. But in real world in most cases it would be. If you have a class declaration in "bar.h", and "bar.c" has functions of this class, the inclusion is needed. If you have any other declaration which is used in "bar.c" - being it a constant, enum, etc. - again include is needed. Because in real world it is nearly always needed, the easy rule is - include the header file in the corresponding source file always.
If the header only declares global functions, and the source file only implements them (without calling any of them) then it's not strictly necessary. But that's not usually the case; in a large program, you rarely want global functions.
If the header defines a class, then you'll need to include it in the source file in order to define member functions:
void Thing::function(int x) {
//^^^^^^^ needs class definition
}
If the header declares functions in a namespace, then it's a good idea to put the definitions outside the namespace:
void ns::function(int x) {
//^^^^ needs previous declaration
}
This will give a nice compile-time error if the parameter types don't match a previous declaration - for which you'd need to include the header. Defining the function inside its namespace
namespace ns {
void function(int x) {
// ...
}
}
will silently declare a new overload if you get the parameter types wrong.
Simple rule is this(Considering foo is a member function of some class):-
So, if some header file is declaring a function say:=
//foo.h
void foo (int x);
Compiler would need to see this declaration anywhere you have defined this function ( to make sure your definition is in line with declaration) and you are calling this function ( to make sure you have called the function with correct number and type of arguments).
That means you have to include foo.h everywhere you are making call to that function and where you are providing definition for that function.
Also if foo is a global function ( not inside any namespace ) then there is no need to include that foo.h in implementation file.

Different C++ Class Declarations

I'm trying to use a third party C++ library that isn't using namespaces and is causing symbol conflicts. The conflicting symbols are for classes my code isn't utilizing, so I was considering creating custom header files for the third party library where the class declarations only include the public members my code is using, leaving out any members that use the conflicting classes. Basically creating an interface.
I have three questions:
If the compilation to .obj files works, will this technique still cause symbol conflicts when I get to linking?
If that isn't a problem, will the varying class declarations cause problems when linking? For example, does the linker verify that the declaration of a class used by each .obj file has the same number of members?
If neither of those are a problem and I'm able to link the .obj files, will it cause problems when invoking methods? I don't know exactly how C++ works under the hood, but if it uses indexes to point to class methods, and those indexes were different from one .obj file to another, I'm guessing this approach would blow up at runtime.
In theory, you need identical declarations for this to work.
In practice, you will definitely need to make sure your declarations contain:
All the methods you use
All the virtual methods, used or not.
All the data members
You need all these in the right order of declaration too.
You might get away with faking the data members, but would need to make sure you put in stubs that had the same size.
If you do not do all this, you will not get the same object layout and even if a link works it will fail badly and quickly at run-time.
If you do this, it still seems risky to me and as a worst case may appear to work but have odd run time failures.
"if it uses indexes ": To some extent exactly how virtual functions work is implementation defined, but typically it does use an index into a virtual function table.
What you might be able to do is to:
Take the original headers
Keep the full declarations for the classes you use
Stub out the classes and declarations you do not use but are referenced by the ones you do.
Remove all the types not referenced at all.
For explanatory purposes a simplified explaination follows.
c++ allows you to use functions you declare. what you do is putting multiple definitions to a single declaration across multiple translation units. if you expose the class declaration in a header file your compiler sees this in each translation unit, that includes the header file.
Therefore your own class functions have to be defined exactly as they have been declared (same function names same arguments).
if the function is not called you are allowed not to define it, because the compiler doesn't know whether it might be defined in another translation unit.
Compilation causes label creation for each defined function(symbol) in the object code. On the other hand a unresolved label is created for each symbol that is referenced to (a call site, a variable use).
So if you follow this rules you should get to the point where your code compiles but fails to link. The linker is the tool that maps defined symbols from each translation-unit to symbol references.
If the object files that are linked together have multiple definitions to the same functions the linker is unable to create an exact match and therefore fails to link.
In practice you most likely want to provide a library and enjoy using your own classes without bothering what your user might define. In spite of the programmer taking extra care to put things into a namespace two users might still choose the same name for a namespace. This will lead to link failures, because the compiler exposed the symbols and is supposed to link them.
gcc has added an attribute to explicitly mark symbols, that should not be exposed to the linker. (called attribute hidden (see this SO question))
This makes it possible to have multiple definitions of a class with the same name.
In order for this to work across compilation units, you have to make sure class declarations are not exposed in an interface header as it could cause multiple unmatching declarations.
I recommend using a wrapper to encapsulate the third party library.
Wrapper.h
#ifndef WRAPPER_H_
#define WRAPPER_H_
#include <memory>
class third_party;
class Wrapper
{
public:
void wrappedFunction();
Wrapper();
private:
// A better choice would be a unique_ptr but g++ and clang++ failed to
// compile due to "incomplete type" which is the whole point
std::shared_ptr<third_party> wrapped;
};
#endif
Wrapper.cpp
#include "Wrapper.h"
#include <third_party.h>
void Wrapper::wrappedFunction()
{
wrapped->command();
}
Wrapper::Wrapper():wrapped{std::make_shared<third_party>()}
{
}
The reason why a unique_ptr doesn't work is explained here: std::unique_ptr with an incomplete type won't compile
You can move the entire library into a namespace by using a clever trick to do with imports. All the import directive does is copy the relevant code into the current "translation unit" (a fancy name for the current code). You can take advantage of this as so
I've borrowed heavily from another answer by user JohnB which was later deleted by him.
// my_thirdparty.h
namespace ThirdParty {
#include "thirdparty.h"
//... Include all the headers here that you need to use for thirdparty.
}
// my_thirdparty.cpp / .cc
namespace ThirdParty {
#include "thirdparty.cpp"
//... Put all .cpp files in here that are currently in your project
}
Finally, remove all the .cpp files in the third party library from your project. Only compile my_thirdparty.cpp.
Warning: If you include many library files from the single my_thirdparty.cpp this might introduce compiler issues due to interaction between the individual .cpp files. Things such as include namespace or bad define / include directives can cause this. Either resolve or create multiple my_thirdparty.cpp files, splitting the library between them.

How are function definitions determined with header files?

When using separate files in C++, I know that functions can be declared using header files like this:
// MyHeader.h
int add(int num, int num2);
// MySource.cpp
int add(int num, int num2) {
return num + num2;
}
// Main.cpp
#include "MyHeader.h"
#include <iostream>
int main() {
std::cout << add(4, 5) << std::endl;
return 0;
}
My question is, in this situation, how does the compiler determine the function definition of add(int,int) when MyHeader.h and Main.cpp have no references at all to MySource.cpp?
As, if there were multiple add functions (with the same arguments) in a program, how can I make sure the correct one is being used in a certain situation?
The function declaration gives the compiler enough information to generate a call to that function.
The compiler then generates an object file that specifies the names (which, in the case of C++ are mangled to specify the arguments, namespace, cv-qualifiers, etc.) of external functions to which that object file refers (along with another list of names it defines).
The linker then takes all those object files, and tries to match up every name that something refers to but doesn't define with some other object file that defines the same name. Then it assigns and fills in addresses, so where one object file refers to nameX, it fills in the address it's assigning to nameX from the other file.
At least in a typical case, the object files it looks at will include a number of libraries (standard library + any others you specify). A library is basically just a collection of object files, shoved together into a single file, with enough data to index what data is which object file. In a few cases, it also includes some extra meta-data to (for example) quickly find an object file that defines a specific name (obviously handy for the sake of faster linking, but not really an absolute necessity).
If there are two or more functions with exactly the same mangled name, then your code has undefined behavior (you're violating the one definition rule). The linker will usually give an error message telling you that nameZ was defined in both object file A and object file B (but the C++ standard doesn't really require that).
The compiler does not "determine" (you mean "know") the function definition. The linker does. You have just discovered why the build process consists of compiling and linking.
So, basically, the compiler produces two object files here. One which contains the definition of add and one which just refers to the "unknown" function add. The linker then takes the two object files and puts the reference and definition together. Of course, that's just a very simple explanation, but for a beginner, that's all you need to know.
The compiler doesn't compile header files; it compiles source files. It will include the code in the header when the header is #included in a source file being compiled, but on its own, the header file doesn't "do" anything.
Also, the compiler doesn't worry about whether a function is defined or not. It just compiles against function declarations. It's the linker that resolves the definitions of functions.
You don't need to include a definition of a function at all, unless it's being called by some other code you need to link.
As to your question, "If there were multiple add functions (with the same arguments) in a program, how can I make sure the correct one is being used in a certain situation?": It depends on the linker and the settings, but generally, if you have more than one definition of a function with the same signature, the linker will issue an error stating that the function is multiply defined.

Clarification about the header-guards and header-file inclusion used in C/C++

I know people recommend including header guards in header files, to prevent header files contents from being inserted by the pre-processor into the source-code files more than once.
But consider the following scenario:
Let's say I have the files main.cpp , stuff.cpp, and commonheader.h, with the .h file having its header guards.
If either .cpp files tries to include commonheader.h more than once, then the preprocessor
will stop that from happening, and after compiling to object code we get,
main.o containing the contents of commonheader.h exactly once.
stuff.o containing the contents of commonheader.h exactly once.
Note that the contents of commonheader, have been repeated across the files, but not within the same .o file.
So what happens during the linking step? Since the .o files are being fused into an exectuable
we will have to ensure for a second time that the contents of commonheader are not being repeated. Does the compiler take care of that? If not, wouldn't that be a problem when we are dealing with huge header files, giving rise to code repetition across files and leading to large executable sizes.
If I am making some conceptual mistake anywhere in the question, please correct me.
Typically your header file should not actually define any symbols, it should just declare them. So commonheader.h would look like this (omitting the include guards):
void commonFunc1(void);
void commonFunc2(void);
In that case, there is no problem. If you call commonFunc1 in main.cpp and stuff.cpp, both main.o and stuff.o will know they want to link against a symbol called commonFunc1 and the linker will try to find that symbol. If the linker doesn't find the symbol, you get an undefined reference error. The actual definition of commonFunc1 needs to be in some cpp file.
If you really want to define functions in your header file, use static so that the linker does not see them. So your commonheader.h could look like:
static void commonFunc1()
{
/* ... do stuff ... */
}
In this case, the linker does not know about commonFunc1 and no errors will occur. This could increase the executable size though; you'll probably end up with two copies of the code for commonFunc1.
To expand Grayson's answer to cover variables. If you want to declare a variable in a header file you should use the extern keyword. This is one way to handle global variables.
In the header file global.h you write this:
extern Globals globals;
then you can use foo in any file including global.h, while in global.cpp you write
#include "globalstype.h"
Globals globals;
Note that global.cpp doesn't need to include global.h, however you will need to make sure global.cpp is compiled into each usage otherwise the linker will complain.
Header files normally contain declarative code not definitive code. That is they declare the existence of something that must exist exactly once. Macros and inline functions are allowed and necessarily duplicate wherever they are used.
The declarations are used by the compiler to insert unresolved links (or references) into the object code. The job of the linker is to resolve these links by matching the reference with the one single definition.
If you omit the include guards, with multiple inclusion in a single translation unit you will get a compiler error for multiple declaration of an existing symbol. If however you have a header erroneously containing a definition, and the header is included in more than one translation unit, there will be more than one object file with a definition - this instead causes a linker error for multiple definition.
So while:
extern int b ; // declaration, may occur in multiple translation units
is fin in a header file,
int b ; // definition, must occur in only object file.
is not.
Not the the declarations are not included in the object code, rather the compiler uses them to create references that the linker will resolve if the compiler has not already uses the definition and resolved it already.
Yes, it can be a problem. You could end up with multiple definitions, or redundant copies.
C is quite simple in this regard. You have static, extern, and inline -- and compilers also define several ways to alter visibility. I think a lot of this has been covered by other answers.
C++ is quite different, however. There is a lot of information and there are also implicit definitions (e.g. the compiler may emit a copy constructor or RTTI).
With C++, the likelihood that a definition appears in a header is much more likely -- consider templates, methods defined in a class declaration, and so on. C++ defaults to using the One Definition Rule. You will want to read about it in more detail, but it basically states that some categories of symbols may be multiply-defined; depending on the decoration and the location/scope of declaration, that in many cases, the linker is allowed to assume that each body (definition) is identical and it is free to discard any copies it encounters (leaving one definition in your binary). So this really cuts down on the size of the resulting binary, unless you specify a copy shall be produced.
However, having those definitions in your headers can surely increase compilation times, memory and files required to compile each file, visible dependencies, and will increase the number of files which must be recompiled when a definition is edited.
Of course, the language still allows bad forms, and will not complain if you repeatedly state over and over again and include in multiple translations definitions which must be copied for each translation. Then you can certainly end up with a lot of bloat.
This may be a good intro:
http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=386

C++ including a ".h" file, function duplication confusion

I'm currently writing a program, and couldn't figure out why I got an error (note: I already fixed it, I'm curious about WHY the error was there and what this implies about including .h files).
Basically, my program was structured as follows:
The current file I'm working with, I'll call Current.cc (which is an implementation of Current.h).
Current.cc included a header file, named CalledByCurrent.h (which has an associated implementation called CalledByCurrent.cc). CalledByCurrent.h contains a class definition.
There was a non-class function defined in CalledByCurrent.cc called thisFunction(). thisFunction() was not declared in CalledByCurrent.h since it was not actually a member function of the class (just a little helper function). In Current.cc, I needed to use this function, so I just redefined thisFunction() at the top of Current.cc. However, when I did this, I got an error saying that the function was duplicated. Why is this, when myFunction() wasn't even declared in CalledByCurrent.h?
Thus, I just removed the function from Current.cc, now assuming that Current.cc had access to thisFunction() from CalledByCurrent.cc. However, when I did this, I found that Current.cc did not know what function I was talking about. What the heck? I then copied the function definition for thisFunction() to the top of my CalledByCurrent.h file and this resolved the problem. Could you help me understand this behavior? Particularly, why would it think there was a duplicate, yet it didn't know how to use the original?
p.s - I apologize for how confusing this post is. Please let me know if there's anything I can clear up.
You are getting multiple definitions from the linker - it sees two functions with the same name and complains. For example:
// a.cpp
void f() {}
// b.cpp
void f() {}
then
g++ a.cpp b.cpp
gives:
C:\Users\neilb\Temp\ccZU9pkv.o:b.cpp:(.text+0x0): multiple definition of `f()'
The way round this is to either put the definition in only one .cpp file, or to declare one or both of the functions as static:
// b.cpp
static void f() {}
You can't have two global functions with the same name (even in 2 different translation units). To avoid getting the linker error define the function as static so that it is not visible outside the translation unit.
EDIT
You can use the function in the other .cpp file by using extern keyword. See this example:
//Test.cpp
void myfunc()
{
}
//Main.cpp
extern void myfunc();
int main()
{
myfunc();
}
It will call myfunc() defined in test.cpp.
The header file inclusion mechanism should be tolerant to duplicate header file inclusions.
That's because whenever you simply declare a function it's considered in extern (global) scope (whether you declare it in a header file or not). Linker will have multiple implementation for the same function signature.
If those functions are truely helper functions then, declare them as;
static void thisFunction();
Other way, if you are using the same function as helper then, simply declare it in a common header file, say:
//CalledByCurrent.h (is included in both .cc files)
void thisFunction();
And implement thisFunction() in either of the .cc files. This should solve the problem properly.
Here are some ideas:
You didn't put a header include guard in your header file. If it's being included twice, you might get this sort of error.
The function's prototype (at the top) doesn't match its signature 100%.
You put the body of the function in the header file.
You have two functions of the same signature in two different source files, but they aren't marked static.
If you are using gcc (you didn't say what compiler you're using), you can use the -E switch to view the preprocessor output. This includes expanding all #defines and including all #includes.
Each time something is expanded, it tells you what file and line it was in. Using this you can see where thisFunction() is defined.
There are 2 distinct errors coming from 2 different phases of the build.
In the first case where you have a duplicate, the COMPILER is happy, but the LINKER is complaining because when it picks up all the function definitions across the different source files it notices 2 are named the same. As the other answers state, you can use the static keyword or use a common definition.
In the second case where you see your function not declared in this scope, its because the COMPILER is complaining because each file needs to know about what functions it can use.
Compiling happens before Linking, so the COMPILER cannot know ahead of time whether or not the LINKER will find a matching function, thats why you use declarations to notify the COMPILER that a definition will be found by the LINKER later on.
As you can see, your 2 errors are not contradictory, they are the result of 2 separate processes in the build that have a particular order.