Accelerated C++: Practical Programming by Example book says the following..
... system header files need not be implemented as files. Even though the #include
directive is used to access both our own header files and system headers, there
is no requirement that they be implemented in the same way
What exactly does this mean? If not as a file how else can a system header file be implemented?
Imagine you write your own compiler and C++ standard library. You could make it so that #include <vector> does not open any file, but instead simply loads some state into the compiler which makes it understand std::vector. You could then implement your vector class in some language other than C++, so long as your compiler understands enough to make it work "as if" you had written an actual C++ source file called vector.
The compiler could have hardcoded that when it sees:
#include <iostream>
then it makes available all definitions of things that are specified as being declared by this directive, etc.
Or it could store the definitions in a database, or some other encoded file, or the cloud, or whatever. The point is that the standard does not restrict the compiler in any way, so long as the end goal is achieved that the specified things get declared.
The way in which headers are included into your "source file stream" is left mostly up to the implementation.
C++11 (but this has been the case for a long time, both in C++ and C) 16.2 Source file inclusion states:
A #include directive shall identify a header or source file that can be processed by the implementation.
A preprocessing directive of the form # include < h-char-sequence> new-line searches a sequence of implementation-defined places for a header identified uniquely by the specified sequence between the < and > delimiters, and causes the replacement of that directive by the entire contents of the header. How the places are specified or the header identified is implementation-defined.
(and then further description of the " and naked variants of #include).
So the header may be in a file.
It may also be injected by the compiler from hard-coded values.
Or read from a server located on one of the planets orbiting Betelgeuse (though, without FTL transmissions, such a compiler wouldn't last long in the marketplace).
The possibilities are many and varied, most of them bordering on lunacy but none of them actually forbidden by the standard itself.
Related
T.C. left an interesting comment to my answer on this question:
Why aren't include guards in c++ the default?
T.C. states:
There's "header" and there's "source file". "header"s don't need to be
actual files.
What does this mean?
Perusing the standard, I see plenty of references to both "header files" and "headers". However, regarding #include, I noticed that the standard seems to make reference to "headers" and "source files". (C++11, § 16.2)
A preprocessing directive of the form
# include < h-char-sequence> new-line
searches a sequence of implementation-defined places for a header identified uniquely
by the specified sequence between the < and > delimiters, and causes the replacement
of that directive by the entire contents of the header. How the places are specified
or the header identified is implementation-defined.
and
A preprocessing directive of the form
# include " q-char-sequence" new-line
causes the replacement of that directive by the entire contents of the source *file*
identified by the specified sequence between the " delimiters. The named source *file*
is searched for in an implementation-defined manner.
I don't know if this is significant. It could be that "headers" in a C++ context unambiguously means "header files" but the word "sources" would be ambiguous so "headers" is a shorthand but "sources" is not. Or it could be that a C++ compiler is allowed leeway for bracket includes and only needs to act as if textual replacement takes place.
So when are header (files) not files?
The footnote mentioned by T.C. in the comments below is quite direct:
174) A header is not necessarily a source file, nor are the sequences
delimited by < and > in header names necessarily valid source file
names (16.2).
For the standard header "files" the C++ standard doesn't really make a mandate that the compiler uses a file or that the file, if it uses one, actually looks like a C++ file. Instead, the standard header files are specified to make a certain set of declarations and definitions available to the C++ program.
An alternative implementation to a file could be a readily packaged set of declarations represented in the compiler as data structure which is made available when using the corresponding #include-directive. I'm not aware of any compiler which does exactly that but clang started to implement a module system which makes the headers available from some already processed format.
They do not have to be files, since the C and C++ preprocessor are nearly identical it is reasonable to look into the C99 rationale for some clarity on this. If we look at the Rationale for International Standard—Programming Languages—C it says in section 7.1.2 Standard headers says (emphasis mine):
In many implementations the names of headers are the names of files in
special directories. This implementation technique is not required,
however: the Standard makes no assumptions about the form that a file
name may take on any system. Headers may thus have a special status if
an implementation so chooses. Standard headers may even be built into
a translator, provided that their contents do not become “known” until
after they are explicitly included. One purpose of permitting these
header “files” to be “built in” to the translator is to allow an
implementation of the C language as an interpreter in a free-standing
environment where the only “file” support may be a network interface.
It really depends on the definition of files.
If you consider any database which maps filenames to contents to be a filesystem, then yes, headers are files. If you only consider files to be that which is recognized by the OS kernel open system call, then no, headers don't have to be files.
They could be stored in a relational database. Or a compressed archive. Or downloaded over the network. Or stored in alternate streams or embedded resources of the compiler executable itself.
In the end, though, textual replacement is done, and the text comes from some sort of indexed-by-name database.
Dietmar mentioned modules and loading already processed content... but this is generally NOT allowable behavior for #include according to the C++ standard (modules will have to use a different syntax, or perhaps #include with a completely new quotation scheme other than <> or ""). The only processing that could be done in advance is tokenization. But contents of headers and included source files are subject to stateful preprocessing.
Some compilers implement "precompiled headers" which have done more processing than mere tokenization, but eventually you find some behavior that violates the Standard. For example, in Visual C++:
The compiler ... skips to just beyond the #include directive associated with the .h file, uses the code contained in the .pch file, and then compiles all code after filename.
Ignoring the actual source code prior to #include definitely does not conform to the Standard. (That doesn't prevent it from being useful, but you need to be aware that edits may not produce the expected behavior changes)
Our C++ library contains a file with a namethat is (considered) equal to one of the standard libraries' headers. In our case this is "String.h", which Windows considers to be the same as "string.h", but for the sake of this question it could be any other ile name used in the standard library.
Normally, this file name ambiguity is not a problem since a user is supposed to set up the include paths to only include the parent of the library folder (therefore requiring to include "LibraryFolder/String.h") and not the folder containing the header.
However, sometimes users get this wrong and directly set the include path to the containing folder. This means that "String.h" will be included in place of "string.h" in both the user code and in the standard library headers, resulting in a lot of compile errors that may not be easy to resolve or understand for beginners.
Is it possible, during compile-time, to detect such wrongly set up include paths in our libraries' header and throw a compile #warning or #error right away via directive, based on some sort of check on how the inclusion path was?
There's no failsafe way. If the compiler finds another file, it won't complain.
However, you could make it so you can detect it. In your own LibraryName/string.h, you could define a unique symbol, like
#define MY_STRING_H412a55af_7643_4bd6_be5c_4315d3a1e6b7
Then later in dependent code you could check
#ifndef MY_STRING_H412a55af_7643_4bd6_be5c_4315d3a1e6b7
#error "Custom standard library path not configured correctly"
#endif
Likewise you could use this to detect when the wrong version of the library was included.
[edit - as per comments]
Header inclusion can be summarized as :
Parse #include line to determine header name to look up
Depending on <Foo.h> or "Foo.h" form, determine set of locations (usually directories) to search
Interpret the header name, in an implementation-dependent way. (usually as a relative path). Note that this is not necessarily as a string, e.g. MSVC doesn't treat \ as a string escape character.
If the header is found (usually, if a file is found), replace the #include line with the content of that file. If not, fail the compilation.
(The parenthesized "usually" apply to MSVC, GCC, clang, etc but theoretically a compiler could compile directly from a git repository instead of disk files)
The problem here is that the test imagined (spelling of header name) must be located in the included header file. This test would necessarily be part of the replaced #include line, which therefore no longer exists and cannot be tested.
C++17 introduces __has_include but this does not affect the analysis: It would still have to occur in the included header file, and would not have the character sequence from the #include "Foo.h" available.
[old]
Probably the easiest way, especially for beginners is to have a LibraryName/LibraryName.h. Hopefully that name is unique.
The benefit is that once that works, users can replace #include "LibraryName.h" with just #include "String.h" as you know the path is right.
That said, "String.h" is asking for problems. Windows isn't case sensitive.
Use namespaces. In your case this would translate into something like this:
MyString/String.h
namespace my_namespace {
class string {
...
}
}
Now to make sure your std::string or any other class named string is not accidentally used instead of my_namespace::string (by any means, including but not limited to setting up your include paths incorrectly) you need to refer to your type using its fully qualified name, namely my_namespace::string. By doing this you avoid any naming clashes and are guaranteed to get a compile error if you don't include the correct header file (unless there's actually exists another class called my_namespace::string that is not yours). There are other ways to avoid these clashes (such as using my_namespace::string) but I'd rather be explicit about the types I'm using. This solution is costly however because it probably needs change all over your code base (changing all strings to my_namespace::string).
A somewhat less cumbersome alternative would be to change the name of the header String.h to something like MyString.h. This would quickly introduce compile errors but requires changing all your includes from #include "String.h" into#include "MyString.h"` (Should be much less effort compared to the first option).
I cannot think of any other way that requires less effort as of now. Since you were looking for a solution that would work in all similar scenarios I'd go with the namespaces if I were you and solve the problem once and for all. This would prevent any other existing/future naming clashes that may be in you code.
T.C. left an interesting comment to my answer on this question:
Why aren't include guards in c++ the default?
T.C. states:
There's "header" and there's "source file". "header"s don't need to be
actual files.
What does this mean?
Perusing the standard, I see plenty of references to both "header files" and "headers". However, regarding #include, I noticed that the standard seems to make reference to "headers" and "source files". (C++11, § 16.2)
A preprocessing directive of the form
# include < h-char-sequence> new-line
searches a sequence of implementation-defined places for a header identified uniquely
by the specified sequence between the < and > delimiters, and causes the replacement
of that directive by the entire contents of the header. How the places are specified
or the header identified is implementation-defined.
and
A preprocessing directive of the form
# include " q-char-sequence" new-line
causes the replacement of that directive by the entire contents of the source *file*
identified by the specified sequence between the " delimiters. The named source *file*
is searched for in an implementation-defined manner.
I don't know if this is significant. It could be that "headers" in a C++ context unambiguously means "header files" but the word "sources" would be ambiguous so "headers" is a shorthand but "sources" is not. Or it could be that a C++ compiler is allowed leeway for bracket includes and only needs to act as if textual replacement takes place.
So when are header (files) not files?
The footnote mentioned by T.C. in the comments below is quite direct:
174) A header is not necessarily a source file, nor are the sequences
delimited by < and > in header names necessarily valid source file
names (16.2).
For the standard header "files" the C++ standard doesn't really make a mandate that the compiler uses a file or that the file, if it uses one, actually looks like a C++ file. Instead, the standard header files are specified to make a certain set of declarations and definitions available to the C++ program.
An alternative implementation to a file could be a readily packaged set of declarations represented in the compiler as data structure which is made available when using the corresponding #include-directive. I'm not aware of any compiler which does exactly that but clang started to implement a module system which makes the headers available from some already processed format.
They do not have to be files, since the C and C++ preprocessor are nearly identical it is reasonable to look into the C99 rationale for some clarity on this. If we look at the Rationale for International Standard—Programming Languages—C it says in section 7.1.2 Standard headers says (emphasis mine):
In many implementations the names of headers are the names of files in
special directories. This implementation technique is not required,
however: the Standard makes no assumptions about the form that a file
name may take on any system. Headers may thus have a special status if
an implementation so chooses. Standard headers may even be built into
a translator, provided that their contents do not become “known” until
after they are explicitly included. One purpose of permitting these
header “files” to be “built in” to the translator is to allow an
implementation of the C language as an interpreter in a free-standing
environment where the only “file” support may be a network interface.
It really depends on the definition of files.
If you consider any database which maps filenames to contents to be a filesystem, then yes, headers are files. If you only consider files to be that which is recognized by the OS kernel open system call, then no, headers don't have to be files.
They could be stored in a relational database. Or a compressed archive. Or downloaded over the network. Or stored in alternate streams or embedded resources of the compiler executable itself.
In the end, though, textual replacement is done, and the text comes from some sort of indexed-by-name database.
Dietmar mentioned modules and loading already processed content... but this is generally NOT allowable behavior for #include according to the C++ standard (modules will have to use a different syntax, or perhaps #include with a completely new quotation scheme other than <> or ""). The only processing that could be done in advance is tokenization. But contents of headers and included source files are subject to stateful preprocessing.
Some compilers implement "precompiled headers" which have done more processing than mere tokenization, but eventually you find some behavior that violates the Standard. For example, in Visual C++:
The compiler ... skips to just beyond the #include directive associated with the .h file, uses the code contained in the .pch file, and then compiles all code after filename.
Ignoring the actual source code prior to #include definitely does not conform to the Standard. (That doesn't prevent it from being useful, but you need to be aware that edits may not produce the expected behavior changes)
I am aware that questions about the difference between <header> and <header.h> have been asked before. And after reading those answers, I have enlisted the following differences
Of course iostream.h is deprecated, it is not supported by newer standard-complying compilers
iostream.hdoesn't contain everything inside the std namespace and doesn't make use of templates.
Okay.
But, after reading a few books and a few answers (like this), I have inferred that #include<iostream.h> includes a specific file called iostream.h in our program whereas, #include<iostream> is NOT even required to map to a file at all. It simply guarantees that everything belonging to the iostream library is included in our program. Am I correct?
No "system" header is required to be a file. Inclusion using <> is specified thusly:
C++11 16.2 [cpp.include]/2: searches a sequence of implementation-defined places for a header identified uniquely by the specified sequence between the < and > delimiters, and causes the replacement of that directive by the entire contents of the header. How the places are specified or the header identified is implementation-defined.
So the declarations from headers known to the implementation (which might or might not include current and/or obsolete standard library headers) can be made available without loading and preprocessing a text file, if the implementor deems that to be a good idea.
Including with "" will first search for a file (in implementation-defined places), and fall back to <> if that fails.
For the purposes of this question, I am interested only in Standard-Compliant C++, not C or C++0x, and not any implementation-specific details.
Questions arise from time to time regarding the difference between #include "" and #include <>. The argument typically boils down to two differences:
Specific implementations often search different paths for the two forms. This is platform-specific, and not in the scope of this question.
The Standard says #include <> is for "headers" whereas #include "" is for a "source file." Here is the relevant reference:
ISO/IEC 14882:2003(E)
16.2 Source file inclusion [cpp.include]
1 A #include directive shall identify a header or source file that can be processed by the implementation.
2 A preprocessing directive of the form
# include < h-char-sequence > new-line
searches a sequence of implementation-defined places for a header identified uniquely by the specified sequence between the < and > delimiters, and causes the replacement of that directive by the entire contents of the header. How the places are specified or the header identified is implementation-defined.
3 A preprocessing directive of the form
# include "q-char-sequence" new-line
causes the replacement of that directive by the entire contents of the source file identified by the specified sequence between the " delimiters. The named source file is searched for in an implementation-defined manner. If this search is not supported, or if the search fails, the directive is reprocessed as if it read
# include < h-char-sequence > new-line
with the identical contained sequence (including > characters, if any) from the original directive.
(Emphasis in quote above is mine.) The implication of this difference seems to be that the Standard intends to differentiate between a 'header' and a 'source file', but nowhere does the document define either of these terms or the difference between them.
There are few other places where headers or source files are even mentioned. A few:
158) A header is not necessarily a source file, nor are the sequences delimited by in header names necessarily valid source file names (16.2).
Seems to imply a header may not reside in the filesystem, but it doesn't say that source files do, either.
2 Lexical conventions [lex]
1 The text of the program is kept in units called source files in this International Standard. A source file together with all the headers (17.4.1.2) and source files included (16.2) via the preprocessing directive #include, less any source lines skipped by any of the conditional inclusion (16.1) preprocessing directives, is called a translation unit. [Note: a C + + program need not all be translated at the same time. ]
This is the closest I could find to a definition, and it seems to imply that headers are not the "text of the program." But if you #include a header, doesn't it become part of the text of the program? This is a bit misleading.
So what is a header? What is a source file?
My reading is that the standard headers, included by use of <> angle brackets, need not be actual files on the filesystem; e.g. an implementation would be free to enable a set of "built-in" operations providing the functionality of iostream when it sees #include <iostream>.
On the other hand, "source files" included with #include "xxx.h" are intended to be literal files residing on the filesystem, searched in some implementation-dependent manner.
Edit: to answer your specific question, I believe that "headers" are limited only to those #includeable facilities specified in the standard: iostream, vector and friends---or by the implementation as extensions to the standard. "Source files" would be any non-standard facilities (as .h files, etc.) the programmer may write or use.
Isn't this saying that a header may be implemented as a source file, but there again may not be? as for "what is a source file", it seems very sensible for the standard not to spell this out, given the many ways that "files" are implemented.
The standard headers (string, iostream) don't necessarily have to be files with those names, or even files at all. As long as when you say
#include <iostream>
a certain list of declarations come into scope, the Standard is satisfied. Exactly how that comes about is an implementation detail. (when the Standard was being written, DOS could only handle 8.3 filenames, but some of the standard header names were longer than that)
As your quotes say: a header is something included using <>, and a source file is the file being compiled, or something included using "". Exactly where the contents of these come from, and what non-standard headers are available, is up to the implementation. All the Standard specifies is what is defined if you include the standard headers.
By convention, headers are generally system-wide things, and source files are generally local to a project (for some definition of project), but the standard wisely doesn't get bogged down in anything to do with project organisation; it just gives very general definitions that are compatible with such conventions, leaving the details to the implementation and/or the user.
Nearly all of the standard deals with the program after it's been preprocessed, at which time there are no such things as source files or headers, just the translations units that your last quote defines.
Hmmm...
My casual understanding has been that the distinction between <> includes and "" includes was inherited from c and (though not defined by the standards) the de facto meaning was that <> searched paths for system and compiler provided headers and "" also searched local and user specified paths.
The definition above seem to agree in some sense with that usage, but restricts the use of "header" to things provided by the compiler or system exclusive of code provided by the user, even if they have the traditional "interface goes in the header" form.
Anyway, very interesting.