Is a C++ preprocessor identical to a C preprocessor? - c++

I am wondering how different the preprocessors for C++ and C are.
The reason for the question is this question on a preprocessor-specific question where the paragraph of the standard that addresses the question has a different wording (and a different paragraph number) and also are difference concerning the true and false keywords in C++.
So, are there more differences or is this the only difference.
An extension of the question would be when is a source file emitted differently by a C++ preprocessor and a C preprocessor.

The C++03 preprocessor is (at least intended to be) similar to the C preprocessor before C99. Although the wording and paragraph numbers are slightly different, the only technical differences I'm aware of between the two are that the C++ preprocessor handles digraphs (two-letter alternative tokens) and universal character names, which are not present in C.
As of C99, the C preprocessor added some new capabilities (e.g., variadic macros) that do not exist in the current version of C++. I don't remember for sure, but don't believe that digraphs were added.
I believe C++0x will bring the two in line again (at least that's the intent). Again, the paragraph numbers and wording won't be identical, but I believe the intent is that they should work the same (other than retaining the differences mentioned above).

They are supposed to be the same: C++98 and C++03 should match C90, and C++0x should match C99. There may be bugs in the wording, though.

Predefined macros differ between the preprocessors, mostly for obvious language feature differences. E.g. compare:
C99 N1256 draft 6.10.8 "Predefined macro names"
C++11 N3337 draft 16.8 "Predefined macro names"
In particular:
C requires you not to define __cplusplus, C++ uses it to represent the version
C uses __STDC__ to represent the version, C++ says is implementation defined and uses __cplusplus instead
C has __STDC_IEC_559__ and __STDC_IEC_559_COMPLEX__ to indicate floating point characteristics, C++ does not and seems replace that with the per type std::numeric_limits<float>::is_iec559 constants
C does not have the macros prefixed with __STDCPP: _STDCPP_STRICT_POINTER_SAFETY__ and __STDCPP_THREADS__
As mentioned by DevSolar, C11 added many more defines which are not part of C++11.

Related

What's the meaning of "reserved for any use"?

NOTE: This is a c question, though I added c++ in case some C++ expert can provide a rationale or historical reason why C++ is using a different wording than C.
In the C standard library specification, we have this normative text, C17 7.1.3 Reserved identifiers (emphasis mine):
All identifiers that begin with an underscore and either an uppercase letter or another underscore are always reserved for any use.
All identifiers that begin with an underscore are always reserved for use as identifiers with file scope in both the ordinary and tag name spaces.
Now I keep reading answers on SO by various esteemed C experts, where they claim it is fine for a compiler or standard library to use identifiers with underscore + uppercase, or double underscore.
Doesn't "reserved for any use" mean reserved for anyone except future extensions to the C language itself? Meaning that the implementation is not allowed to use them.
While the second phrase above, regarding single leading underscore seems to be directed to the implementation?
In general, the C standard is written in a way that expects compiler vendors/library implementers to be the typical reader - not so much the application programmers.
Notably, C++ has a very different wording:
Each name that contains a double underscore (__) or begins with an underscore followed by an uppercase letter (2.11) is reserved to the implementation for any use.
(See What are the rules about using an underscore in a C++ identifier?)
Is this perhaps a mix-up between C and C++ and the languages are different here?
In the C standard, the meaning of the term "reserved" is defined by 7.1.3p2, immediately below the bullet list you are quoting:
No other identifiers are reserved. If the program declares or defines an identifier in a context in which it is reserved (other than as allowed by 7.1.4), or defines a reserved identifier as a macro name, the behavior is undefined.
Emphasis mine: reserved identifiers place a restriction on the program, not the implementation. Thus, the common interpretation – reserved identifiers may be used by the implementation to any purpose – is correct for C.
I have not kept up with the C++ standard and no longer feel qualified to interpret it.
While the Standard is primarily written to guide implementers, it is written as a description of what makes a program well-formed, and what its effect is. That's because the basic definition of a standards-conforming compiler is one that does the correct thing for any standards-conforming program:
A strictly conforming program shall use only those features of the language and library
specified in this International Standard....A conforming
hosted implementation shall accept any strictly conforming program.
Read separately, this is hugely restrictive of extensions to a compiler. For instance, based solely on that clause, a compiler shouldn't get to define any of its own reserved words. After all, any given word a particular compiler might want to reserve, could nevertheless show up in a strictly conforming program, forcing the compiler's hand.
The standard goes on, however:
A conforming implementation may have extensions (including additional
library functions), provided they do not alter the behavior of any strictly conforming
program.
That's the key piece. Compiler extensions need to be written in such a way that they affect nonconforming programs (ones which contain undefined behavior, or which shouldn't even compile at all), allowing them to compile and do fun extra things.
So the purpose of defining "reserved identifiers", when the language doesn't actually need those identifiers for anything, is to give implementations some extra wiggle room by providing them with some things which make a program nonconforming. The reason a compiler can recognize, say, __declspec as part of a declaration is because putting __declspec into a declaration is otherwise illegal, so the compiler is allowed to do whatever it wants!
The importance of "reserved for any use", therefore, is that it leaves no question about a compiler's power to treat such identifiers as having any meaning it cares to. Future compatibility is a comparatively distant concern.
The C++ standard works in a similar way, though it's a bit more explicit about the gambit:
A conforming implementation may have extensions (including additional library functions), provided they do
not alter the behavior of any well-formed program. Implementations are required to diagnose programs that
use such extensions that are ill-formed according to this International Standard. Having done so, however,
they can compile and execute such programs.
I suspect the difference in wording is down to the C++ standard just being clearer about how extensions are meant to work. Nevertheless, nothing in the C standard precludes an implementation from doing the same thing. (And we all basically ignore the requirement that the compiler warn you every time you use __declspec.)
Regarding the difference in wording in C versus C++, I'm posting my own little research here as reference:
The early K&R C 1st edition has this text:
...names which are intended for use only by functions of the library begin with an underscore so they are less likely to collide with names in a user's program.
K&R 2nd edition added an Appendix B which addresses the standard library, where we can read
External identifiers that begin with an underscore are reserved for use by the library, as are all
other identifiers that begin with an underscore and an upper-case letter or another underscore.
Early ANSI C drafts, as well as "C90" ISO 9899:1990, has the same text as in the current ISO standard.
The earliest C++ drafts however, has a different text, as noted by #hvd, possibly a clarification of the C standard. From DRAFT: 20 September 1994:
17.3.3.1.2 Global names
...
Each name that begins with an underscore and either an uppercase letter or another underscore (2.8) is
reserved to the implementation for any use
So apparently the wording "reserved for any use" was invented by the ANSI/ISO C90 committee, whereas the C++ committee some years later used a clearer wording, similar to the wording in the pre-standard K&R book.
The C99 rationale V5.10 says this below 7.1.3:
Also reserved for the implementor are all external identifiers beginning with an underscore, and
all other identifiers beginning with an underscore followed by a capital letter or an underscore.
This gives a name space for writing the numerous behind-the-scenes non-external macros and
functions a library needs to do its job properly.
This makes the committee's intention quite clear: "reserved for any use" means "reserved for the implementor".
Also of note, the current C standard has the following normative text elsewhere, in 6.2.5:
There may also be
implementation-defined extended signed integer types. 38)
where the informative foot note 38 says:
Implementation-defined keywords shall have the form of an identifier reserved for any use as
described in 7.1.3.
C has multiple contexts in which a symbol can have a definition:
The space of macro names,
The space of formal names of arguments to a macro (this space is specific to each function-like macro),
The space of ordinary identifiers,
The space of tag names,
The space of labels (this space is specific to each function), and
The space of structure/union members (this space is specific to each struct/union).
What "reserved for any use" means that the user code in a compliant program cannot use1 symbols that start with an underscore that is followed by an uppercase letter or another underscore in any of the above contexts. Compare with identifiers that start with a single underscore but are followed by a lowercase number or a digit. This falls into the second class of identifiers that start with an underscore. User code can can be use these identifiers as the names of macro arguments, as labels, or as the names of structure/union members.
"Reserved for any use" does not mean that the implementation cannot use such symbols. The intent of the reservation is to provide a name space that implementations can freely use without concern that the names defined by the implementation will conflict with the names defined by the user code in a compliant program.
1The standard does not quite mean "cannot use". The standard encourages the programmatic use of a small number of names that start with a double underscore. For example, a compliant implementation is required to define __STDC_VERSION__, __FILE__, __LINE__, and __func__. The 2011 version of the standard even gives an example of a presumably compliant program that references __func__.
The C Standard allows implementations to attach any meaning they see fit to reserved identifiers. Most implementations will treat unrecognized identifiers of reserved forms the same as any other recognized identifiers when there is no reason to do otherwise, thus allowing something like:
#ifdef __ACME_COMPILER
#define near __near
#else
#define near
#endif
int near foo;
to declare an identifier foo using a __near qualifier if the code is being processed in an Acme compiler (which would presumably support such a thing), but also be compatible with other compilers that would not require or benefit from the use of such a directive. Nothing would forbid a conforming implementation from defining __ACME_COMPILER and interpreting __near to mean "launch nuclear missiles", but a quality implementation shouldn't go out of its way to break code like the above. If an implementation doesn't know what __ACME_COMPILER is supposed to mean, treating it like any other unknown identifier would allow it to support useful constructs like the above.
It is months late but one point remains the others have not addressed.
Your question can be viewed from the opposite direction. The standard allows the implementation (as you have observed) to use a symbol like _Foo but, more importantly, thereby forbids the implementation from using foo. The latter is reserved for your use.
To understand, for discussion's sake, suppose that a future C standard introduced the new keyword _Foo. The hypothetical implementation was already using this symbol, so what happens?
Answer:
At first, the implementation will not yet have implemented the new standard. Until implemented, the new standard lacks practical effect.
Later, as part of implementing the new standard, the implementation quietly changes each _Foo to _Bar.
No problem.
In fact, if you think about it in this manner, you can say that the way the standard reserves such words is almost the only way it could reserve them.

Is it undefined behavior to #define/#undef an identifier with special meaning?

An answer to the question Disable check for override in gcc suggested using -Doverride= on the command line to disable errors for erroneous use of override, which is effectively the same as adding:
#define override
to the source file.
My initial reaction was that this seems like undefined behavior since we are redefining a keyword but looking at the draft C++11 standard section 2.12 Keywords [lex.key] I was surprised that neither override nor final are keywords. They are covered in the previous section 2.11 [lex.name] which says they are identifiers with special meaning:
The identifiers in Table 3 have a special meaning when appearing in a
certain context[...]
and Table 3 is labelled Identifiers with special meaning and includes both override and final.
The question is, is it undefined behavior to redefine(using #define) identifiers with special meaning? Are they treated any differently than keywords in this respect?
If you are using the C++ standard library it is undefined behavior to redefine identifiers with special meaning, this also applies to keywords. From the draft C++11 standard under section 17.6.4 [constraints] we have section 17.6.4.1 [constraints.overview] which says:
This section describes restrictions on C++ programs that use the
facilities of the C++ standard library [...]
and under 17.6.4 we have section 17.6.4.3.1 [macro.names] which says:
A translation unit shall not #define or #undef names lexically
identical to keywords, to the identifiers listed in Table 3, or to the
attribute-tokens described in 7.6.
Table 3 list the Identifiers with special meaning. We can see this paragraph also covers keywords and they are treated in the same manner.
Implementations' standard header files are allowed to "implement" standard functions using macros in cases where a macro could meet the requirements for the function (including ensuring that arguments are evaluated exactly once). Further, such macros are allowed to make use of keywords or identifiers whose behavior is specified in in the standard or "reserved to the implementation"; use of such macros in contexts where the keywords or identifiers have been redefined could have arbitrary effects.
That having been said, the historical interpretation of this form of UB would be to say that compilers shouldn't go out of their way to cause wacky behavior, and outside of "pedantic modes" should allow user code to assign meanings to reserved identifiers the compiler would otherwise not use. This can be helpful in cases where code should be usable both on compilers which would require a keyword like __packed, and on compilers which neither recognize nor require such a keyword.). Redefining keywords in the fashion you're doing is a bit dodgier; it will probably work, but there's a significant likelihood that that it will disrupt the behavior of a standard-library macro.

Why does the C++ standard not mention __STDC_IEC_559__?

According to C++11 standard [c.math], the <cmath> header is same as Standard C library header <math.h>.
(Of course, there are several differences, --- namespace, overloads etc. --- but these can be ignored here.)
And according to C99 standard annex F, "An implementation that defines __STDC_IEC_559__ shall conform to the specifications in" the annex F.
Ex. The atan2 may cause a domain error if both arguments are zero, but It must not if __STDC_IEC_559__ is defined.
In C99, many behavior is also dependent on whether __STDC_IEC_559__ is defined or not.
However, it seems that __STDC_IEC_559__ is not mentioned anywhere in C++11 standard.
If so, shall a C++ implementation conform to the specifications in the annex F?
I think that std::numeric_limits<T>::is_iec559() is a substitute, but it seems to mention about only type.
The C++ standard (n3797) includes the C standard library by reference, see s1.2/2.
The library described in Clause 7 of ISO/IEC 9899:1999 and Clause 7 of ISO/IEC 9899:1999/Cor.1:2001
and Clause 7 of ISO/IEC 9899:1999/Cor.2:2003 is hereinafter called the C standard library.
With the qualifications noted in Clauses 18 through 30 and in C.4, the C standard library is a subset of the C++ standard
library.
The standard contains no mention of that symbol, and I would not expect it be defined, since it appears to be specific to Standard C. By not defining that symbol, C++ is not bound by the contents of Annex F.
Instead the C++ standard contains multiple mentions of IEC 559 in a rather more C++-like form. For example,
Shall be true for all specializations in which is_iec559 != false
There is a specific mention in 18.3.2.4/56.
static constexpr bool is_iec559;
True if and only if the type adheres to IEC 559 standard.218
Meaningful for all floating point types.
I think it would be fair to say that C++ includes all the same capabilities (or lack of them), but adapted to the C++ world.

putc implemented as Macro ic C++?

I know Macro implementation of putc() in C, but is it same in C++?
It will depend on your implementation of cstdio. In most cases this is really just a wrapper around stdio.h, with wrappers declared inside the std namespace, and the C and C++ compilers share the same standard library for C functions. For example, VS2010 uses stdio.h for C++, in which putc is implemented as both a macro and a function, depending on environment and other compile-time definitions.
Which version of C++? C++83 (1983)? C++98 (1998)? C++11 (2011)?
The C++98 and C++11 Specifications rely on the ISO C specifications for C Library functions, and do not put additional implementation constraints on them, other than trivial ones like renaming stdio.h to cstdio.h and allowing inclusion without the dot-h suffix.
See: C++98 Specification
See: C++11 Specification
Look in cstdio.h if you are interested in your particular compiler.
However, if we dig deeper and take a look at the ISO C standard: "ISO/IEC 9899:1990" (C89/C90), well, we find that it is unavailable for free viewing on the web (not even the final draft standard), so moving on to C99 (NOT ISO C), you find...
...that C99 (Not "ISO C") says putc() MAY be implemented as a macro,
See: C99 Specification
So if you are really developing in Obj-C++ (which uses C99), then C99 is the relevant specification to consider, not ISO C (C90). Also, since C99 lets the compiler writer decide whether to make putc() a macro or not, you should consider it an open possibility, and decide whether you really care to know about the C90 (ISO C) spec which is becoming obsolete (now that even C11 (2011) is out.)
Yes it is. Both C and C++ use <stdio.h> which has the same scheme in all implementations that I know of.

Is the C part of the C++ library automatically C99?

Are all the functions in a conformant C++98/03/0x implementation completely C99 conformant?
I thought C++0x added some C99 (language) features, but never heard or read anything definitive about the C library functions.
Just to avoid any confusion, I'm talking about a C++ program using functions declared in the <c*> header set.
Thanks.
Most of the C99 standard library has been imported in C++0X but not all. From memory, in what wasn't imported there are
<ctgmath> simply includes <ccomplex> and <cmath>,
<ccomplex> behaves as if it included <complex>
<cmath> has quite a few adjustment (providing overload and template functions completing the C99 provided one)
Some other headers (<cstdbool>, <iso646.h>, ...) have adjustments to take differences between language into account (bool is primitive in C++, a macro provided by <stdbool.h> in C for instance), but nothing of the scope of the math part.
The headers <xxx.h> whose <cxx> form doesn't behaves as the C99 version simply declares the content of <cxxx> in the global namespace, they aren't nearer of the C99 <xxx.h> content.
A related thing: C++0X provides some headers in both cxxx and xxx.h forms which aren't defined in C99 (<cstdalign> and <cuchar>, the second one is defined in a C TR)
(I remembered that a bunch of mathematical functions from C99 had been put in TR1 but not kept in C++0X, I was mistaken, that bunch of mathematical functions weren't part of C99 in the first place).
No. C++03 is aligned with ANSI C89/ISO C90, not C99.
The upcoming C++0x standard is expected to be aligned to some degree with C99. See paragraph 17.6.1.2 in the current draft which lists ccomplex, cinttypes, cstdint etc. Note that, as AProgrammer mentions, some headers aren't exactly the same; further, that the header cuchar is aligned with the C Technical Report 19769 rather than C99.