Why GDB (or any debugger) steps into headers? - c++

I am self learning C++ for hardly 2 weeks so although this is a very basic question be kind. Now, as I understand headers play completely no role during a program's runtime. Headers are used solely by compilers during compilation. Still when debugging (I am using GDB) the program the debugger steps into headers. And when also using a disassembly while debugging I noticed those header steps actually represent assembly instructions (like 'mov' etc.). But headers should not even exist in binaries. So exactly what is happening here?

"source files" are used exclusively by compilers during compilation too, although if you ask a compiler nicely, it will place debug symbols in a compiled binary that will contain program source file and header file data.
There's nothing magical about "headers". They are merely used by convention and #included by a preprocessor directive by one or more "source files". And it's this "or more" that's the important bit - typically a program is arranged so "headers" can be included by more than once source file.
Your debugger is being helpful in pinpointing the location of the original code.

Now, as I understand headers play completely no role during a
program's runtime. Headers are used solely by compilers during
compilation.
This is a misunderstanding, since headers are not a special kind of file. Headers (usually .h or .hpp) are not technical different than other source file - it is merely a convention that header files are supposed to be included and a lot of headers contain source code (just look at any STL file - which is thought of as a header file)

Related

C/C++ - precompiled headers - encapsulation, how to, and why is config required?

I understand the idea that precompiling headers can speed up build times, but there are a handful of questions that have thus far prevented me from grokking them.
Why does using precompiled headers require the developer to configure anything?
Why can't the compiler (or linker/IDE?) just have individual precompiled header object files, in the same way it does for the source files (i.e. .obj files)? Dependencies are indicated by which source/header files include which other files, and it can already detect when source files change, so a regular build is normally not a full rebuild. Instead of requiring me to specify which headers get precompiled, etc., why isn't this all just always automatically on, and transparent to the developer?
As I understand the precompiled headers methodology in Visual Studio, the idea is that you get this one big header file (stdafx.h) that includes all the other header files that you want to be precompiled, and then you include that in all your source files that use any of those headers.
a. Am I understanding correctly?
b. Doesn't this break encapsulation? Often effectively including various (likely) unrelated items that you don't, which makes it harder to tell what libraries you're actually using, and what comes from where.
It seems to me that this implementation forces bad practices. What am I missing?
How do I utilize precompiled headers in Visual Studio (2013)?
Is there a cross-platform way to use or facilitate precompiled headers?
Thanks.
Why can't the compiler (or linker/IDE?) just have individual precompiled header object files, in the same way it does for the source files (i.e. .obj files)?
The answer to 1 and 2 is in the way how precompiled headers work. Assume you have a my_source_file.c:
#include "header1.h"
#include "header2.h"
int func(int x) { return x+1; }
my_other_source_file.c:
#include "header1.h"
int func2(int x) { return x-1; }
When you call compiler.exe my_source_file.c the compiler starts parsing your file. All the internal variables of the compiler (things like which types have been defined, what variables declared, etc) are called the compiler state.
After it has parsed header1.h it can save the state to the disk. Then, when compiling my_other_source_file.c, instead of parsing header1.h again, it can just load the state and continue.
That state is a precompiled header. It is literally just a dump of all the compiler variables in the moment after it has parsed the entire header.
Now, the question is why can't you have two state dumps, for header1.h and header2.h and just load them both.. Well, the states are not independent. The second file would be the state of header1.h + header2.h. So, what is usually done is you have one state which is after all the common header files have been compiled, and use that.
In theory, you could have one for every combination and use the appropriate one, but that is much more hassle than it's worth.
Some things that are side effects of how this is done:
Different compilers (including even minor versions) have different variables, so you can't reuse the precomps.
Since the dumped state started from the top of the file, your precomp must be the first include. There must be nothing that could influence the state (i.e. not #defines, typedefs, declarations) before including the precomp.
Any defines passed by the command line (-DMY_DEFINE=0) will not be picked up in the precompiled header.
Any defines passed by the command line while precompiling will be in effect for all source files that use the precomp.
For 3), refer to MSFT documentation.
For 4), most compilers support precompiled headers, and they generally work in the same way. You could configure your makefiles/build scripts to always precompile a certain header (e.g. stdafx.h) which would include all the other headers. As far as your source code goes, you'd always just #include "stdafx.h", regardless of the platform.
Why can't the compiler (or linker/IDE?) just have individual
precompiled header object files
C and C++ have no concept of modules. The traditional compiler has a preprocessor phase (which may be invoked as a separate program) that will include the files and the whole thing will get compiled to intermediate code. The compiler per se does not see includes (or comments, or trigraphs, etc.).
Add to this that the behaviour of a header file can change depending on the context in which it is included (think macros, for example) and you end up with either many precompiled versions of the same header, or an intermediate form that is basically the language itself.
Am I understanding correctly?
Mostly. The actual name is irrelevant, as it can be specified in the project options. stdafx.h is a relic of the early development of MFC, which was originally named AFX (Application Framework eXtensions). The preprocessor also treats includes of the precompiled header differently, as they are not looked up in the include paths. If the name matches what is in the project settings, the .pch is used automatically.
Doesn't this break encapsulation
Not really. Encapsulation is an object-oriented feature and has nothing to do with include files. It might increase coupling and dependencies by making some names available across all files, but in general, this is not a problem. Most includes in a precompiled header are standard headers or third-party libraries, that is, headers that may be large and fairly static.
As an example, a project I'm currently working on includes GTK, standard headers, boost and various internal libraries. It can be assumed that these headers never change. Even if they changed once a day, I probably compile every minute or so on average, so it is more than worth it.
The fact that all these names are available project-wide makes no difference. What would I gain by including boost/tokenizer.hpp in only one .cpp file? Perhaps some intellectual satisfaction of knowing that I can only use boost::char_separator in that particular file. But it certainly creates no problem. All these headers are part of a collection of utilities that my program can use. I am completely dependent on them, because I made a design decision early on to integrate them. I am tightly coupled with them by choice.
However, this program needs to access system-specific graphical facilities, and it needs to be portable on (at least) Debian and Windows. Therefore, I centralized all these operations in two files: windows.cpp and x11.cpp. They both include their own X11/Xlib.h and windows.h. This makes sure I don't use non-portable stuff elsewhere (which would however quickly be caught as I keep switching back and forth) and it satisfies my obsession with design. In reality, they could have been in the precompiled header. It doesn't make much of a difference.
Finally, none of the headers that are part of this specific program are in the precompiled header. This is where coupling and dependencies come into play. Reducing the number of available names forces you to think about design and architecture. If you try to use something and get an error saying that that name isn't declared, you don't blindly include the file. You stop and think: does it make sense for this name to be available here, or am I mixing up my user interface and data acquisition? It helps you separate the various parts of your program.
It also serves as a "compilation firewall", where modifying a header won't require you to rebuild the whole thing. This is more of a language issue than anything else, but in practice, it's still damn useful.
Trying to localize the GTK includes, for example, would not be helpful: all of my user interface uses it. I have no intention of supporting a different kind of toolkit. Indeed, I chose GTK because it was portable and I wouldn't have to port the interface myself.
What would be the point of only including the GTK headers in the user interface files? Obviously, it will prevent me from using GTK in files where I don't need to. But this is not solving any problem. I'm not inadvertently using GTK in places I shouldn't. It only slows down my build time.
How do I utilize precompiled headers in Visual Studio
This has been answered elsewhere. If you need more help, I suggest you ask a new question, as this one is already pretty big.
Is there a cross-platform way to use or facilitate precompiled headers?
A precompiled header is a feature provided by your compiler or build system. It is not inherently tied to a platform. If you are asking whether there is a portable way of using precompiled headers across compilers, then no. They are highly compiler-dependent.

C++ Modules and the C++ ABI

I've been reading about the C++ modules proposal (latest draft) but I don't fully understand what problem(s) it aims to solve.
Is its purpose to allow a module built by one compiler to be used by any other compiler (on the same OS/architecture, of course)? That is, does the proposal amount to standardizing the C++ ABI?
If not, is there another proposal being considered that would standardize the C++ ABI and allow compilers to interoperate?
Pre-compiled headers (PCH) are special files that certain compilers can generate for a .cpp file. What they are is exactly that: pre-compiled source code. They are source code that has been fed through the compiler and built into a compiler-dependent format.
PCHs are commonly used to speed up compilation. You put commonly used headers in the PCH, then just include the PCH. When you do a #include on the PCH, your compiler does not actually do the usual #include work. It instead loads these pre-compiled symbols directly into the compiler. No running a C++ preprocessor. No running a C++ compiler. No #including a million different files. One file is loaded and symbols appear fully formed directly in your compiler's workspace.
I mention all that because modules are PCHs in their perfect form. PCHs are basically a giant hack built on top of a system that doesn't allow for actual modules. The purpose of modules is ultimately to be able to take a file, generate a compiler-specific module file that contains symbols, and then some other file loads that module as needed. The symbols are pre-compiled, so again, there is no need to #include a bunch of stuff, run a compiler, etc. Your code says, import thing.foo, and it appears.
Look at any of the STL-derived standard library headers. Take <map> for example. Odds are good that this file is either gigantic or has a lot of #inclusions of other files that make the resulting file gigantic. That's a lot of C++ parsing that has to happen. It must happen for every .cpp file that has #include <map> in it. Every time you compile a source file, the compiler has to recompile the same thing. Over. And over. And over again.
Does <map> change between compilations? Nope, but your compiler can't know that. So it has to keep recompiling it. Every time you touch a .cpp file, it must compile every header that this .cpp file includes. Even though you didn't touch those headers or source files that affect those headers.
PCH files were a way to get around this problem. But they are limited, because they're just a hack. You can only include one per .cpp file, because it must be the first thing included by .cpp files. And since there is only one PCH, if you do something that changes the PCH (like add a new header to it), you have to recompile everything in that PCH.
Modules have essentially nothing to do with cross-compiler ABI (though having one of those would be nice, and modules would make it a bit easier to define one). Their fundamental purpose is to speed up compile times.
Modules are what Java, C#, and a lot of other modern languages offer. They immensely reduce compile time simply because the code that's in today's header doesn't have to be parsed over and over again, everytime it's included. When you say #include <vector>, the content of <vector> will get copied into the current file. #include really is nothing else than copy and paste.
In the module world, you simply say import std.vector; for example and the compiler loads the query/symbol table of that module. The module file has a format that makes it easy for the compiler to parse and use it. It's also only parsed once, when the module is compiled. After that, the compiler-generated module file is just queried for the information that is needed.
Because module files are compiler-generated, they'll be pretty closely tied to the compiler's internal representation of the C++ code (AST) and will as such most likely not be portable (just like today's .o/.so/.a files, because of name mangling etc.).
Modules in C++ have to be primarily better thing than today solutions, that is, when a library consists of a *.so file and *.h file with API. They have to solve the problems that are today with #includes, that is:
require macroguards (macros that prevent that definitions are provided multiple times)
are strictly text-based (so they can be tricked and in normal conditions they are reinterpreted, which gives also a chance to look differently in different compilation unit to be next linked together)
do not distinguish between dependent libraries being only instrumentally used and being derived from (especially if the header provides inline function templates)
Despite to what Xeo says, modules do not exist in Java or C#. In fact, in these languages "loading modules" relies on that "ok, here you have the CLASSPATH and search through it to find whatever modules may provide symbols that the source file actually uses". The "import" declaration in Java is no "module request" at all - the same as "using" in C++ ("import ns.ns2.*" in Java is the same as "using namespace ns::ns2" in C++). I don't think such a solution can be used in C++. The closest approximation I can imagine are packages in Vala or modules in Tcl (those from 8.5 version).
I imagine that C++ modules are rather not possible to be cross-platform, nor dynamically loaded (requires a dedicated C++ dynamic module loader - it's not impossible, but today hard to define). They will definitely by platform-dependent and should also be configurable when requested. But a stable C++ ABI is practically only required within the range of one system, just as it is with C++ ABI now.

What difference it will make, if we have uniform extension (.c/.cpp) for all C/C++ files?

In C/C++ project, mostly the file can be of either types .h or .c/.cpp. Apart from their naming difference such as header and implementation files; is there any functionality difference ?
In other words: if in a working C/C++ project what difference it makes if we change all files with .c or .cpp extension ?
[Note: We can also have #include guards for .c/.cpp files. We can skip their compilation if they are observed as headers.]
Edit:
Debate is not intended for this as I don't have any solid use case. Rather I wanted to know, that allowing to give .h,.hxx,.i extensions are just a facility or a rule. e.g. One functionality difference I see is that .cxx files can have their likable object files.
What difference does it make? The compiler is perfectly happy about it. To it, it's just files.
But to you? You makes a lot of difference:
you're no longer able to immediately figure out which one is the header and which one is the implementation;
you can no longer give header and implementation the same name;
If you are using gcc, and you try and compile a bunch of C++ files labled with a .c extension, it's going to try and compile your file as-if it were a C-language file, which is going to create a ton of errors.
There's also project sanity as well ... this is why many times you'll see projects actually label C++ headers as .hpp rather than just .h so that it's easier to create a distinction between C-language source-code and headers, and C++ source-code and headers.
Header files generally must not be compiled directly but instead #included in a file that is directly compiled. By giving these two groups of files their own extension it makes it a lot easier to determine which to compile.
Make and IDE's and other tools normally expect the conventions of .c/.cpp for source and h/hpp for header. Compiler normally goes a step further and defaults to C compilation for .c and c++ compilation for .cpp
Hence, a bad idea to give your headers the same extension as the the source files.

What Should be the Structure of a C++ Project?

I have recently started learning C++ and coming from a Ruby environment I have found it very hard to structure a project in a way that it still compiles correctly, I have been using Code::Blocks which is brilliant but a downside is that when I add a new header file or c++ source file, it will generate some code and even though it is only a mere 3 or 4 lines, I do not know what these lines do. First of all I would like to ask this question:
What do these lines do?
#ifndef TEXTGAME_H_INCLUDED
#define TEXTGAME_H_INCLUDED
#endif // TEXTGAME_H_INCLUDED
My second question is, do I need to #include both the .h file and the .cpp file, and in which order.
My third question is where can I find the GNU GCC Compiler that, I beleive, was packaged with Code::Blocks and how do I use it without Code::Blocks? I would rather develop in a notepad++ sort of way because that is what I'm used to in Ruby but since C++ is compiled, you may think differently (please give advice and views on that as well)
Thanks in advance, ell.
EDIT: I'm on Windows XP & thanks for the lighting fast replies!
To answer your questions:
The lines are include guards. They prevent the header file being included more than once in any given translation unit. If it was included multiple times, you would probably get multiple definition errors.
Header files are #included in .cpp files and in other headers. .cpp files are not normally #included.
The C++ compiler that comes with Code::Blocks is called MinGW GCC, and can be found in the bin directory of the MinGW installation. To find it, do a Windows search via explorer for 'g++'. To use it, you will need to put the directory it is in on your search path. Note the version of the compiler that ships with Code::Blocks is quite old - you can get a much more recent version from here.
That's an inclusion guard, to prevent a .h file from being included twice. Besides saving time, this is often in fact required to avoid defining things twice.
You should include only the .h. The .c file will be linked to your program in some form. For small programs, you can just pass all the .c files to gcc, but larger programs will involve intermediate .o files or even libraries (static or dynamic).
You can definitely work without an IDE. There are many ways to install the gcc compiler on Windows, including Cygwin and MinGW. I think you are correct that Code::Blocks comes with a gcc executable, but I don't know where it is or what version.
Those lines make it so that if a file is #included twice, everything will continue to work. That in turn lets you treat header-file dependencies as a simple directed graph, which is definitely easiest.
You don't #include .cpp files. (Well, not unless you're an evil programmer. Don't do it!)
I'll let others (or google!) tell you about gcc, but it might help if you were to describe what platform you're using.
All of your questions have been answered by others, except this:
I would rather develop in a notepad++
sort of way because that is what I'm
used to in Ruby but since C++ is
compiled, you may think differently
(please give advice and views on that
as well)
I think this is a very bad idea. A fully fledged IDE with an integrated debugger, jump to symbol definitions, refactoring capabilities, a profiler, intellisense and more is practically a must for any real world project.
And the absolute best is Visual Studio* with Visual Assist X**. Code::Blocks pales in comparison ;)
* If you study in a university you can usually get it for free through MSDNAA; otherwise there the Visual Studio Express edition whicih is free
** 30 days evaluation period

How to view source code of header file in C++?

similar to iostream.h ,conio.h , ...
The standard library is generally all templates. You can just open up the desired header and see how it's implemented†. Note it's not <iostream.h>, it's <iostream>; the C++ standard library does not have .h extensions. C libraries like <string.h> can be included as <cstring> (though that generally just includes string.h)
That said, the run-time library (stuff like the C library, not-template-stuff) is compiled. You can search around your compiler install directory to find the source-code to the run-time library.
Why? If just to look, there you go. But it's a terrible way to try to learn, as the code may have non-standard extensions specific to the compiler, and most implementations are just generally ugly to read.
If you have a specific question about the inner-workings of a function, feel free to start a new question and ask how it works.
† I should mention that you may, on the off chance, have a compiler that supports export. This would mean it's entirely possible they have templated code also compiled; this is highly unlikely though. Just should be mentioned for completeness.
From a comment you added, it looks like you're looking for the source to the implementations of functions that aren't templates (or aren't in the header file for whatever reason). The more traditional runtime library support is typically separately compiled and in a library file that gets linked in to your program.
The majority of compilers provide the source code for the library (though it's not guaranteed to be available), but the source files might be installed anywhere on your system.
For the Microsoft compilers I have installed, I can find the source for the runtime in a directory under the Visual Studio installed location named something like:
vc\crt\src // VS2008
vc7\crt\src // VS2003
vc98\crt\src // VC6
If you're using some other compiler, poke around the installation directory (and make sure that you had asked that runtime sources to be installed when you installed your compiler tools).
As mentioned, it is implementation specific but there is an easy way to view contents of header files.
Compile your code with just preprocessing enabled for gcc and g++ it is -E option.
This replaces the contents of header files by their actual content and you can see them.
On linux, you can find some of them in /usr/include
These files merely contain declarations and macro definitions.The actual implementation source files can be obtained from the library provider e.g the source code of standard C++ Library(libstdc++) is obtainable here.
According to the C++ language specification, implementors do not have to put standard headers into physical files. Implementors are allowed to have the headers hard coded in the translator's executable.
Thus, you may not be able to view the contents of standard header files.