How do C++ header files work?

How do C++ header files work? - c++

When I include some function from a header file in a C++ program, does the entire header file code get copied to the final executable or only the machine code for the specific function is generated. For example, if I call std::sort from the <algorithm> header in C++, is the machine code generated only for the sort() function or for the entire <algorithm> header file.
I think that a similar question exists somewhere on Stack Overflow, but I have tried my best to find it (I glanced over it once, but lost the link). If you can point me to that, it would be wonderful.

You're mixing two distinct issues here:
Header files, handled by the preprocessor
Selective linking of code by the C++ linker
Header files
These are simply copied verbatim by the preprocessor into the place that includes them. All the code of algorithm is copied into the .cpp file when you #include <algorithm>.
Selective linking
Most modern linkers won't link in functions that aren't getting called in your application. I.e. write a function foo and never call it - its code won't get into the executable. So if you #include <algorithm> and only use sort here's what happens:
The preprocessor shoves the whole algorithm file into your source file
You call only sort
The linked analyzes this and only adds the source of sort (and functions it calls, if any) to the executable. The other algorithms' code isn't getting added
That said, C++ templates complicate the matter a bit further. It's a complex issue to explain here, but in a nutshell - templates get expanded by the compiler for all the types that you're actually using. So if have a vector of int and a vector of string, the compiler will generate two copies of the whole code for the vector class in your code. Since you are using it (otherwise the compiler wouldn't generate it), the linker also places it into the executable.

In fact, the entire file is copied into .cpp file, and it depends on compiler/linker, if it picks up only 'needed' functions, or all of them.
In general, simplified summary:
debug configuration means compiling in all of non-template functions,
release configuration strips all unneeded functions.
Plus it depends on attributes -> function declared for export will be never stripped.
On the other side, template function variants are 'generated' when used, so only the ones you explicitly use are compiled in.
EDIT: header file code isn't generated, but in most cases hand-written.

If you #include a header file in your source code, it acts as if the text in that header was written in place of the #include preprocessor directive.
Generally headers contain declarations, i.e. information about what's inside a library. This way the compiler allows you to call things for which the code exists outside the current compilation unit (e.g. the .cpp file you are including the header from). When the program is linked into an executable that you can run, the linker decides what to include, usually based on what your program actually uses. Libraries may also be linked dynamically, meaning that the executable file does not actually include the library code but the library is linked at runtime.

It depends on the compiler. Most compilers today do flow analysis to prune out uncalled functions. http://en.wikipedia.org/wiki/Data-flow_analysis

Related

Does compiler compiles header files in C++?

like these are two headers files, now does compiler compiles these ?
#include<iostream.h>
#include<conio.h>

When any file is included with an #include directive, the compiler processes its contents as if it were part of the source file being compiled

actually it depends.
#include mean that: hey compiler would you please copy content of that file inside this file(in preprocessing stage). after compiler copied content to given file then it will go for compiler now it depends. most compiler by default compiler compile given c++ files but other external functions and classes(for example stl and other third party libraries that not yours) will be find during run(which is dynamic library) the other side of dynamic library is static library which is instead of compiler compile specific files it goes for your dependencies and compile them along your code and put them together in this case programme does not need to find libraries during run which is why i said it depend.
more information about compiler stages here.
more information about dynamic library and stl and dynamic library here.

How does a function caller use a header file to determine what to do with a compiled binary?

My understanding is that C++ (and C, I guess) header files are never compiled, and simply act as an explanation of the interface of the C++ file they describe.
So if my header file describes a hello() function, some program that includes the header will know about hello() and how to call it and what arguments to give it, etc.
However, after compilation (and before linking, I guess? I'm not sure), when the hello.c file is binary machine code, and hello.h is still C++, how does the compiler/linker know how to call a function in the binary blob based on the presence of its declaration in the header file?
I understand concepts such as symbol tables, abstract syntax trees, etc (i.e., I have taken a compiler class in the past), but this is a gap in my knowledge).

The implementation of hello() assumes a certain calling convention (where are the parameters on the stack, who cleans up the stack the caller or the callee, etc).
The compiler generates code with the correct calling convention. It may use information from the header file to do this (e.g. the function is marked __stdcall in Windows program) or it may use it's default calling convention. The compiler will also use the header file to make sure your are calling the routine with the right number and types of parameters. Once the code is generated by the compiler the header file is not used again.
The linker is not concerned with calling convention it's primary responsibility is to patch together the binaries you've compiled by fixing up references among your modules and any libraries it calls.

A C/C++ compilation unit (cpp file / c file) includes all the header files (as text) and the code.
The header file helps explain how to produce the call instruction
push arg1
push arg2
call _some_function
If the compilation unit includes _some_function then this will be resolved at compile time.
Otherwise it becomes an undefined symbol. If so, when the linker comes along, it looks through all the object files and libraries to resolve all the undefined symbols.
So the header file helps code the assembly correctly.
Object and library files provide implementations.
The library files are optional. When a linker looks in a library file, it only gets added if it satisfies some symbol, otherwise it is not added to the binary.
Object files (ignoring optimization) will get added to the binary completely.

Building a C++ program is a two-step process: compile and link.
The header is for compilation of the module you are writing. The binary is for linking: it contains the compiled code for the method defined in the correspnding header. The header has to match what's already been compiled. At link time you will learn if your header has a method signature that matches what was compiled in the binary.

How Iostream file is located in computer by c++ code during execution

i want to know that in a c++ code during execution how iostream file is founded. we write #include in c++ program and i know about #include which is a preprocessor directive
to load files and is a file name but i don't know that how that file is located.
i have some questions in my mind...
Is Standard library present in compiler which we are using?
Is that file is present in standard library or in our computer?
Can we give directory path to locate the file through c++ code if yes then how?

You seem to be confused about the compilation and execution model of C++. C++ is generally not interpreted (though it could) but instead a native binary is produced during the compilation phase, which is then executed... So let's take a detour.
In order to go from a handful of text files to a program being executed, there are several steps:
compilation
link
load
execution proper
I will only describe what traditional compilers do (such as gcc or clang), potential variations will be indicated later on.
Compiling
During compilation, each source file (generally .cpp or .cxx though the compiler could care less) is processed to produced an object file (generally .o on Linux):
The source file is preprocessed: this means resolving the #include (copy/pasting the included file in the current file), navigating the #if and #else to remove unneeded sources and expanding macros.
The preprocessed file is fed to the compiler which will produce native code for each function, static or global variable, etc... the format depends on the target system in general
The compiler outputs the object file (it's a binary format, in general)
Linking
In this phase, multiple object files are assembled together into a library or an executable. In the case of a static library or a statically-linked executable, the libraries it depend on are also assembled in the produced file.
Traditionally, the linker job is relatively simple: it just concatenates all object files, which already contain the binary format the target machine can execute. However it often actually does more: in C and C++ inline functions are duplicated across object files, so the linker need to keep only one of the definitions, for example.
At this point, the program is "compiled", and we live the realm of the compiler.
Loading
When you ask to execute a program, the OS will load its code into memory (thanks to a loader) and execute it.
In the case of a statically linked executable, this is easy: it's just a single big blob of code that need be loaded. In the case of a dynamically linked executable, it implies finding the dependencies and "resolving the symbols", I'll describe this below:
First of all, your dynamically linked executable and libraries have a section describing which other libraries they depend on. They only have the name of the library, not its exact location, so the loader will search among a list of paths (LD_LIBRARY_PATH for example on Linux) for the libraries and actually load them.
When loading a library, the loader will perform replacements. Your executable had placeholders saying "Here should be the address of the printf function", and the loader will replace that placeholder with the actual address.
Once everything is loaded properly, all symbols should be resolved. If some symbols are missing, ie the code for them is not found in either the present library or any of its dependencies, then you generally get an error (either immediately, or only when the symbol is actually needed if you use lazy loading).
Executing
The code (assembler instruction in binary format) is now executed. In C++, this starts with building the global and static objects (at file scope, not function-static), and then goes on to calling your main function.
Caveats: this is a simplified view, nowadays Link-Time Optimizations mean that the linker will do more and more, the loader could actually perform optimizations too and as mentioned, using lazy loading, the loader might be invoked after the execution started... still, you've got to start somewhere, don't you ?
So, what does it mean about your question ?
The #include <iostream> in your source code is a pre-processor directive. It is thus fully resolved early in the compilation phase, and only depends on finding the appropriate header (no library code is actually needed yet). Note: a compiler would be allowed not to have a header file sitting around, and just magically inject the necessary code as if the header file existed, because this is a Standard Library header (thus special). For regular headers (yours) the pre-processor is invoked.
Then, at link-time:
if you use static linking, the linker will search for the Standard Library and include it in the executable it produces
if you use dynamic linking, the linker will note that it depends on the Standard Library file (libc++.so for example) and that the produced code is missing an implementation of printf (for example)
Then, at load time:
if you used dynamic linking, the loader will search for the Standard Library and load its code
Then, at execution time, the code (yours and its dependencies) is finally executed.
Several Standard Library implementations exist, off the top of my head:
MSVC ships with a modified version of Dirkumware
gcc ships with libstdc++ (which depends on libc)
clang ships with libc++ (which depends on libc), but may use libstdc++ instead (with compiler flags)
And of course, you could provide others... probably... though setting it up might not be easy.
Which is ultimately used depends on the compiler options you use. By default the most common compiler ship with their own implementation and use it without any intervention on your part.
And finally yes, you can indeed specify paths in a #include directive. For example, using boost:
#include <boost/optional.hpp>
#include <boost/algorithm/string/trim.hpp>
you can even use relative paths:
#include <../myotherproject/x.hpp>
though this is considered poor form by some (since it breaks as soon as your reorganize your files).
What matters is that the pre-processor will look through a list of directories, and for each directory append / and the path you specified. If this creates the path of an existing file, it picks it, otherwise it continues to the next directory... until it runs out (and complain).

The <iostream> file is just not needed during execution. But that's just the header. You do need the standard library, but that's generally named differently if not included outright into your executable.
The C++ Standard Library doesn't ship with your OS, although on many Linux systems the line between OS and common libraries such as the C++ Standard Library is a bit thin.
How you locate that library is very much OS dependent.

There can be 2 ways to load a header file (like iostream.h) in C++
if you write the code as:
# include <iostream>
It will look up the header file in include directory of your C++ compiler and will load it
Other way you can give the full path of the header file as:
# include "path_of_file.h"
And loading up the file is OS dependent as answered by MSalters

You definitely required the standard library header files so that pre-processor directive can locate them.
Yes those files are present in the library and on include copied to our code.
if we had defined the our own header file then we have to give path of that file. In that way we can include also *.c or *.cpp along with the header files in which we had defined various methods and those had to include at pre-processing time.

Difference between C++ files

I just started a graphical C++ course and I have problem getting an overview how it is.
we got some starting code, two files; one of type "C++ Source" and another of "C/C++ Header".
its supposed to be a graphical program which fills the screen with color.
also, we are using some custom libraries such as SDL and GLM, in the same folder as those two files there is a folder named gml and loads of subfolders, which I wont get into.
I have downloaded mingw, cmake and Visual Studio 11 beta for c++.
I've tried making a normal Win32 program and also a forms-application for the graphical part, but its always something wrong when compiling.
My question: how are you supposed to handle C++ files? I just got used to java and there its so easy to just open the .java file and paste into your IDE, dealing with C++ makes me really confused.

Hmm... Where to begin...
Somethings that happen behind the scenes in other languages are much more visible in C++. The process of obtaining a binary (say, an executable) from C++ involves first compiling the source code (There are sub-steps of this but the compiler handles them) to obtain object files, then the object files are linked by the linker to generate a binary.
In theory, you could simply #include all the cpp files in a project, and compile them all together and "link" (although there's nothing to link) but that would take a very long time, and more importantly, in complex projects that could deplete the memory available to your compiler.
So, we split our projects into compilation units, and by convention a .cpp file represents a single compilation unit. A compilation unit is the part of your project that gets compiled to generate one object file. Even though compilation units are compiled separately, some code has to be common among them, so that the piece of code in each of them can use the functionalities implemented by the others. .h files conventionally serve this purpose. Things are basically declared (sort of announced) in them, so that each compilation unit knows what to expect when it's a part of a linking process to generate a binary.
There's also the issue with libraries. You can find mainly two kinds of things in libraries;
Already implemented functionality, shipped to you in the form of binary files including CPU instructions that can almost be run (but they've to be inserted in the right place). This form is accompanied by .h files to let your .cpp files know what to expect in the library.
The second type is functionality implemented directly in the .h
files. Yes, this is possible under special cases. There are cases,
where the implementation has to (a weak has to) accompany the
declaration (inlined functions, templated types etc.).
The first type comes in two flavors: A "static library" (.lib in windows, .a in linux), that enters your executable and becomes a part of it during linking, and a "dynamic library", that is exposed to your binary (so it knows about it) but that doesn't become a part of it. So, your executable will be looking for that dynamic library (.dll files in windows and .so files in linux f.x.) while it's run.
So, in order for your .cpp files to be able to receive services from libraries, they have to #include their .h files, to know about what there is in them. Later on, during linking, you have to show the linker where (what path in the file system) to find the binary components of those libraries. Finally, if the library is dynamic, the .dll's (or .so's etc.) must be accessible during run time (keep them in the same folder for instance).
While compiling your compilation units you have to tell the compiler where to find the .h files. Otherwise, all it will see will be #include <something.h> and it won't know where to find that file. with gcc, you tell the compiler with the -I option. Note that, you just tell the folder. Also of importance is that if the include directive looks like #include<somefolder/somefile.h> you shouldn't include somefolder in the path. So the invocation looks like:
g++ mycompilationunit.cpp -IPATH/TO/THE/INCLUDED/FILES -IPATH/TO/OTHER/INCLUDED/FILES -c
The -c option tells the compiler that it shouldn't attempt to make an executable just from this compilation unit, so it creates a .o file, to be linked with others later. Since we don't tell it the output file name, it spits out mycompilationunit.o.
Now we want to generate our binary (you probably want an executable, but you could also want to create a library of yours). So we have to tell the linker everything that goes into the binary. All the object files and all the static and dynamic libraries. So, we say: (Note g++ here also acts as the linker)
g++ objectfile1.o objectfile2.o objectfile3.o -LPATH/TO/LIBRARY/BINARIES -llibrary1 -llibrary2 -o myexecutable
Here, -L option is self explanatory in the example. -l option tells which binaries to look for. The linker will accept both static and dynamic libraries if it finds them on the path, and if it finds both, it'll choose one. Note that what goes after -l is not the full binary name. For instance in linux library names take the form liblibrary.so.0 but they're referred to as -llibrary in the linker command. finally -o tells the compiler what name to give to your executable. You need some other options to f.x. create a dynamic library, but you probably don't need to know about them now.

What is the difference between a .cpp file and a .h file?
Look at this answer. Also a quick google search explains a bit too.
Pretty much .h (header) files are declerations and .cpp (source) files are definitions. It is possible to combine both files into one .cpp file but as projects get bigger and bigger its becomes annoying and almost unreasonable.
Hope that helps.

In C++ there is a notion of a function declaration (the function signature) and a function definition (the actual code).
A header file (*.h) contains the declarations of functions and classes. A source file (*.cpp, *.c++, *.C) contains the definitions.
A header file can be included in a source file using #include directive.
When you define a class in C++, you typically only include the declarations of the member functions (methods in Java lingo), and you put the class definition into a header file. The member function definitions containing the body of each function are typically put outside the class definition and into the source file.
Generally the best thing to do here is to get a book on C++ or C, and to look at some sample code.

Header files (.h) are supposed to contain definitions of classes, methods, and variables. Source file (.cpp) will contain the code. So in your .cpp file you need to include the header file as #include "header-file-name.h".
Then use g++ to compile the .cpp file. Make sure that the path to .h file is correct.
If you are using CodeBlocks or Visual Studio, then just compiling the project and running will do everything for you. You can also add .h or .cpp file from there. You need not worry about anything.
Hope this helps.

Linker error with Duplicated Symbols, SWIG and C++ Vectors

I came across this error trying to compile a shared object from 2 sets of objects. The first set contains one .os object compiled from one cpp file generated by SWIG. The second set is contains all of the .so files from the individual files that make up the interface to be wrapped.
$g++ -shared *.os -o Mathlibmodule.so
ld: duplicate symbol std::vector<int, std::allocator<int> >::size() constin Mathlib_wrap.o and Capsule.o
The swig c++ wrapper (Mathlib_wrap.o's source file) is machine generated and nasty to look at, with lots of #defines to make it extra hard to trace. It looks like the redefinition is present in all of the object files in the second set. I've traced through the headers included in all those files, and the seem to be #pragma once'd.
What advice do people have for tracking down what/where the problem is?

I'm going to assume that you've properly #ifndef/#define blocked all of the header files in your C++ library, after that I'd check your .i file to make sure you aren't actually duplicating some declaration there somehow. Maybe try importing a small small piece of the library first or something.
I have run into issues like this before, but its always turned out to be something silly I'd done. Nothing specific I'm afraid.
Post the .i file maybe, donno.

When in doubt, assume that the error means what it says: Actual code was generated for vector<T>::size within each of those object files. This of course seems very unusual because you would expect the function to be expanded inline in each file it was being used in.
If it weren't std::vector the first thing I would say is that a function defined in a header wasn't marked inline correctly. The compiler would generate the code in each source file that included that header. What version of g++ are you using, and are you using a custom standard library/vector implementation?
One thing to check is to compile with optimization on (-O2) and see if that causes it to inline the calls within creating an actual function.
Another possibility is that you're including two different versions of the vector include, and violating the one definition rule. At that point I wouldn't rule out a linker error such as you're seeing.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js