c++ Header Files and Implementation - c++

I m trying to understand how an IDE (I'm using Visual Studio) knows where to find the implementation for a header file's declarations.
Let's say that I have 2 cpp files and 1 header file:
main.cpp, contains main() function and includes person.h
person.h, contains some class declarations that are implemented in person.cpp
person.cpp, contains the implementation of the person.h declarations, and also defines person.h
So my understanding is that main.cpp and person.cpp know where to find the function declarations, but person.h has to search in some .cpp file for these implementations.
Now, how does Visual Studio keeps track of this? Each time a new header file is created, does VS needs to parse every .cpp file in the project to find where the declarations are implemented? Header files don't have any declaration of which .cpp files to search for, yet it finds them! Is Visual Studio scanning each .cpp file in the project?
So, what happens in a large project, with hundreds, or thousands of .cpp files? This does not appear to be very efficient, so I figure VS must do it in a different way. Do you know how?
PS: Not about compiler/linker process, but rather how VS IDE works (even before compiling or link the final exe)

Intellisense is the name of the Visual Studio feature which links definitions to declarations; which lists completions when typing identifiers; and which draws red squiggly lines under errors.
Intellisense works by continually scanning every source file in a solution, including new definitions as you type them. Records about all of the constructs in the source code are stored in a database, which is located on disk in a "solution_name.VC.db" file. When a new definition or declaration is added, whether to a header or source file, Intellisense doesn't have to take very long to look up any related items in the database.
If you delete the .VC.db file, and then open the solution, it will take a noticeably long time for Intellisense to rescan everything before it starts working very well. The IDE will also tell you that it is "parsing" files in the status bar.

Related

What's the REAL difference between .h and .cpp files?

This question was posted several times on StackOverflow, but most of the answers stated something similar to ".h files are supposed to contain declarations whereas .cpp files are supposed to contain their definitions/implementation". I've noticed that simply defining functions in .h files works just fine. What's the purpose of declaring functions in .h files but defining and implementing them in .cpp files? Does it really reduce compile time? What else?
Practically: the conventions around .h files are in place so that you can safely include that file in multiple other files in your project. Header files are designed to be shared, while code files are not.
Let's take your example of defining functions or variables. Suppose your header file contains the following line:
header.h:
int x = 10;
code.cpp:
#include "header.h"
Now, if you only have one code file and one header file this probably works just fine:
g++ code.cpp -o outputFile
However, if you have two code files this breaks:
header.h:
int x = 10;
code1.cpp:
#include "header.h"
code2.cpp:
#include "header.h"
And then:
g++ code1.cpp -c (produces code1.o)
g++ code2.cpp -c (produces code2.o)
g++ code1.o code2.o -o outputFile
This breaks, specifically at the linker step, because now you have two symbols in the same executable that have the same symbol, and the linker doesn't know what's it's supposed to do with that. When you include your header in code1 you get a symbol "x" and when you include your header in code2 you get another symbol "x". The linker doesn't know your intention here, so it throws an error:
code2.o:(.data+0x0): multiple definition of `x'
code1.o:(.data+0x0): first defined here
collect2: error: ld returned 1 exit status
Which again is just the linker saying that it can't resolve the fact that you now have two symbols with the same name in the same executable.
What's the REAL difference between .h and .cpp files?
They are both fundamentally just text files. From certain perspective, their only difference is the filename.
However, many programming related tools treat the files differently depending on their name. For example, some tools will detect programming language: .c is compiled as C language, .cpp is compiled as C++ and .h is not compiled at all.
For header files, the name does not matter at all to the compiler. The name could be .h or .header or anything else, it doesn't affect how the pre processor includes it. It is however good practice to conform to a common convention in order to avoid confusion.
I've noticed that simply defining functions in .h files works just fine.
Are the functions declared non-inline? Have you ever included the header file into more than one translation unit? If you answered yes to both, then your program has been ill formed. If you didn't, then that would explain why you didn't encounter any problems.
Does it really reduce compile time?
Yes. Dividing function definitions into smaller translation units can indeed reduce the time to compile said translation units compared to compiling larger translation units.
This is because doing less work takes less time. What is important to realise is that other translation units do not need to be recompiled when only one is modified. If you only have one translation unit, then you have to compile it i.e. the program in its entirety.
Multiple translation units are also better because they can be compiled in parallel, which allows taking advantage of modern multi core hardware.
What else?
Does there need to be anything else? Having to wait a few minutes to compile your program instead of a day improves development speed drastically.
There are some other advantages too regarding organisation of files. In particular, it is quite convenient to be able to define different implementations for same function for different target systems on order to be able to support multiple platforms. With header files, you must do tricks with macros while with source files, you simply choose which files to compile.
Another use case where implementing functions in header is not an option is distributing a library without source, as some middleware providers do. You must give the headers or else your functions cannot be called, but if all your source is in the headers, then you've given up your trade secrets. Compiled sources have to be at least reverse engineered.
Keep in mind that the C++ compiler is a fairly simple beast as far as file-handling goes. All it's allowed to do is a read in a single source-code file (and, via the pre-processor, logically insert into that incoming text-stream the contents of any files that the file #includes, recursively), parse the contents, and spit out the resulting .o file.
For small programs, keeping the entire codebase in a single .cpp file (or even a single .h file) works fine, because number of lines of code that the compiler needs to load into memory are small (relative to the computer's RAM).
But imagine you are working on a monster program, with tens of millions of lines of code -- yes, such things do exist. Loading that much code into RAM at once would likely stress the capabilities of all but the most powerful computers, leading to exceedingly long compile times or even outright failure.
And even worse than that, touching any of the code in a .h file requires the recompilation of any other files that #include that .h file, either directly or indirectly -- so if all your code is in .h files, then your compiler is likely to spend a lot of time unnecessarily recompiling a lot of code that didn't actually change.
To avoid those problems, C++ lets you place your code into multiple .cpp files. Since .cpp files are (at least traditionally) never #include'd by anything, the only time your Makefile or IDE will need to recompile any given .cpp file is after you've actually modified that exact file, or a .h file it #include's.
So when you've modified a function in the 375th .cpp file out of 700 .cpp files in your program, and now you want to test your modification, the compiler only has to recompile that one .cpp file and then re-link the .o files into an executable. If OTOH you've modified a .h file, compilation might be much longer, because now the build system will have to recompile every other file that includes that .h file, directly or indirectly, just in case you changed the meaning of something those files depend on.
.cpp files also make link-time issues much easier to deal with. For example, if you want to have a global variable, defining that global variable in a .cpp file (and maybe declaring an extern for it in a .h file) is straightforward; if OTOH you want to do that in a .h file, you'll have to be very careful or you'll end up with duplicate-symbol errors from your linker, and/or subtle violations of the One Definition Rule that will come back to bite you later on.
The REAL difference is that your programming environment lists .h and .cpp files separately. And/or populates file-browser-dialogs appropriately. And/or tries to compile .cpp files into object form (but doesn't do that to .h files). And whatever, depending on which IDE / environment you use.
The second difference is that people assume that your .h files are header files, and that your .cpp files are code source files.
If you don't care about people or development environments, you can put any damn thing you want in a .h or .cpp file, and call them any thing you want. You can put your declarations in a .cpp file and call it an "include file", and your definitions in a .pas file and call it a "source file".
I have to do this kind of thing when working in a constrained environment.
Header files weren't part of the original definition of c. The world got on perfectly well without them. Opening and closing lots of header files did slow down the compilation of c, which is why we got pre-compiled header files. Pre-compiled header files do speed up the compilation and linking of source code, but not any faster than just writing assembler, or machine code, or any other thing that didn't take advantage of the co-operation of other people or a design environment.
It is useful to put declarations in a header file, and definitions in a code source file. That's why you should do that. There isn't a requirement.
Whenever you see an #include <header.h> directive, pretend that the contents of header.h is being copied and pasted right where the #include directive appears.
.cpp files get compiled to become .obj files. They have no knowledge of the existence of any other .cpp file, and are compiled individually. That's why we need to declare things before we use them - otherwise the compiler won't know whether the function we're trying to invoke exists within a different .cpp file.
We use header files to share declarations amongst multiple .cpp files to avoid having to write the same code over and over for every single .cpp file.

What's the difference between including a header and a C++ file?

What's the difference between including a header(.h files) and a C++ file(.cpp files)? When I create a class, I create a .h file and .cpp file. If I want to use an object of this class should I include both of these files or not? In which cases should I include the .cpp file?
What files are called, and what their contents is, are entirely convention. If you like to confuse people, you could call your header files something.b and your source files something.r - this will of course mean nothing useful to most people, and some people may think your files contain the language R rather than C++ sources. And your editor will probably not understand that it's C or C++ in files called .b - and build tools such as Make, scons, CMake, etc will probably not understand how to compile the your files without being "told". [Compilers also look a the filename extension to determine if it's supposed to compile as C++ or C, which of course will not work with "unconventional names"]
What is important is not what the files are called, but what they actually contain. A header (what most people call something.h) file should be such that it can be included anywhere, and any number of times in your project [exceptions do exist, where header files are not really meant to be included more than a single time in the entire project - for example a version.h which declares a string that describes the current version number].
A source file (what is conventionally called something.cpp, typically, should be passed to the compiler directly to be compiled, and not used as #include "something.cpp". However, it is the CONTENT that determines this, not the name of the file. It's just badly named files if you use them that way.
In summary: The compiler just reads the source file passed in, then "inserts" the #include into the stream of code that it compiles, as if it was pasted into the original source file. The compiler doesn't really care what your file names are, where they came from, or what their content is, as long as the compiler is "ok" with the compilation as a whole.
There is no difference to include .cpp and .h files from point of view of compiler. But The content of .cpp and .h is different in common case. The .cpp files is for implementation of class, functions, static objects, and the .h files is for class definition. If you include the .cpp file into another .cpp file the content is duplicated and will fail at link stage becouse the naming collisions.
What's the difference between including a header(.h files) and C++ file(.cpp files)?
Supposed the .cpp file contains some function definitions, the latter option usually doesn't work well with commonly used build systems, and ends up with multiple definition errors, when the .cpp is included by more of one translation unit.
The exception might be having inlined all of your function definitions in the .cpp file.
In the very principle it's that the C/C++ preprocessor just expands the text found in either file type into the current translation unit. The file extension doesn't play any role here.
There is no difference. Both are handled by the preprocessor as plain text for concatenation to a single file. However, including a source might have a undesirable result (multiple definitions of variables/functions). Header files are usually protected by inclusion guards (#ifndef HEADER_H or a #pragma once) to prevent duplication of their content.
Note: The compiler does it's work after preprocessing (or invokes the preprocessor before compiling).

What is the point of a .cpp file containing only a single #include?

I am starting some work using a third party library and when building it in Visual Studio 2010, I noticed I was receiving this linker warning many times (LNK4221). I looked at the sources used in creating the object files that were being linked and found that all of the implementation for these is located in the header files. Interestingly, I also noticed the project included corresponding .cpp files containing only a #include for the header with the implementation.
I am curious - what is the point of this and why would I want to use this technique? If the .cpp files aren't adding any value to the project, why shouldn't I just remove them to get rid of the linker warnings?
I tried searching for similar questions, but didn't find anything of interest. If you know of any, please link them.
Was the single #included file stdafx.h? I. That case, you're dealing with precompiled headers. The normal setup is for one .cpp file having "generate precompiled headers" compiler option, and the rest of the .cpp files in your project having "use pch".
I'm using this to make sure, that the header is at least in one file included at the first position. By doing so, I make sure that the header is compilable on it's own.
To convence the linker to not issue a warning, one could use an external variable with a very large variable:
int variable_with_a_name_that_includes_the_file_name_somehow = 42;

Compiling a program with multiple files

I just started learning C++ with Dev C++ as my IDE. One of the tutorials I'm using has a page in it about compiling a program made up of multiple files. It's simple stuff at this point, I have one file with a function in it, and the other file has all the other required code to call the function and output the results. The problem is that the tutorial doesn't tell me how to join these files so I can compile the program and have it work. There's seems to be multiple ways of doing this and I'd like them all but I'm mainly look for the simplest one right now.
I should also mention that I'm new at this so please try and keep your explanations simple and understandable.
In general, you would add both .cpp files to your project under the same target. It IDE will automatically add both files to the build and link them together.
That said, Dev-C++ is very, very old and unmaintained. It has not seen updates in several years. I strongly urge you to use a different IDE. There are many to choose from, including a fork of Dev-C++ called wxDev-C++. I'd actually recommend Code::Blocks or Visual Studio Express, which are both much more modern and have better support for debugging and many other features.
I am not sure of Dev-C++, but the concepts remain the same. So, here is how you can try to get both the files to work together
Each C++ file is a compilation unit - meaning, the compiler will convert one .cpp / .cxx file to one .obj / .o file (on Windows and Linux (or any Unix)) respectively
The obj files, called the object files contain the machine code (am skipping few internal details here) for the classes and functions present in that particular file
If you want to access the functions present in a different compilation unit, you need to link those two object files
Linking is a term that is used to, well, link two object files
There is a separate process (other than the compiler) which does the linking of the object files
So,in your case, you need to use the dev-c++ compiler and create separate object files
Then using the linker you link both the object files to create the final executable
If there are functions that exist in the .cpp files that you want to reference, you use the header files. The header files contain the function/class declarations. The .cpp files will have the implementations. So, in one of your .cpp file, (say) A.cpp, you include the header B.hpp and use the functions in the B.hpp file. The inclusion of headers will tell the compiler that the function declarations exist elsewhere and that the linker will take care of stringing all these references together to create the final executable.
Hope this helps, else, please don't hesitate to mention the files you are using and I can suggest how to link both the .cpp files together.
You must include the other files by using the #include preprocessor directive
in the top of the file where you have the main() function
For example:
#include "filename.h"
...
/* rest of code containing main function goes here */
...
#include "path/filename.c"
main
{
...
...
...
}

Is including C++ source files an approved method?

I have a large C++ file (SS.cpp) which I decided to split in smaller files so that I can navigate it without the need of aspirins. So I created
SS_main.cpp
SS_screen.cpp
SS_disk.cpp
SS_web.cpp
SS_functions.cpp
and cut-pasted all the functions from the initial SS.cpp file to them.
And finally I included them in the original file :
#include "SS_main.cpp"
#include "SS_screen.cpp"
#include "SS_disk.cpp"
#include "SS_web.cpp"
#include "SS_functions.cpp"
This situation remains for some months now , and these are the problems I've had :
The Entire Solution search (Shift-Ctrl-F in VS) does not search in the included files, because they are not listed as source files.
I had to manually indicate them for Subversion inclusion.
Do you believe that including source files in other sources is an accepted workaround when files go really big ? I should say that splitting the implemented class in smaller classes is not an option here.
There are times when it's okay to include an implementation file, but this doesn't sound like one of them. Usually this is only useful when dealing with certain auto-generated files, such as the output of the MIDL compiler. As a workaround for large files, no.
You should just add all of those source files to your project instead of #including them. There's nothing wrong with splitting a large class into multiple implementation files, but just add them to your project, including them like that doesn't make much sense.
--
Also, as an FYI, you can add files to your projects, and then instruct the compiler to ignore them. This way they're still searchable. To do this, add the file to the project, then right-click it, and go to Properties, and under "General" set "Exclude from Build" to Yes.
Don't include cpp files in other files. You don't have to define every class function in one file, you can spread them across multiple files. Just add them individually to the project and have it compile all of them separately.
You don't include implementation (.cpp) files. Create header files for these implementation files containing the function/class declarations and include these as required.
There are actually times you will want to include CPP files. There are several questions here about Unity Builds which discuss this very topic.
You need to learn about Separate compilation, linking, and what header files are for.
You need to create a header file for each of those modules (except possibly main.cpp). The header file will contain the declarative parts of each .cpp source file, and the .cpp files themselves will contain the instantive parts. Each unit can then be separately compiled and linked. For example:
main.cpp
#include "function.h"
int main()
{
func1() ;
}
function.h
#if !defined FUNCTION_H
#define FUNCTION_H
extern void func1() ;
#endif
function.cpp
void func1()
{
// do stuff
}
Then function.cpp and main.cpp are separately compiled (by adding them to the sources for the project), and then linked. The header file is necessary so that the compiler is made aware of the interface to func1() without seeing the complete definition. The header should be added to the project headers, then you will find that the source browser and auto-completion etc. work correctly.
What bothers me with this question is the context of it.
A large cpp file has been created, large enough to warrant thinking about splitting it into smaller more manageable files. The proposed split is:
SS_main.cpp
SS_screen.cpp
SS_disk.cpp
SS_web.cpp
SS_functions.cpp
This seems to indicate that there are separate units of functionality from a specification and design perspective. We can only guess at the coupling between these units of code.
However, it would be a start to define these code units such that each new cpp file has its own header file thus defining the interfaces of these units and the (low) coupling between them to achieve (high) cohesion for each unit.
We are refactoring here.
It is not acceptable to use included cpp files in this context it as does not provide any advantages. The only time I've come across included cpp files is when a one is included to provide code for debug code, and example being to compile non-inline versions of functions. It helps in stepping through code in the debugger.