Is it legal to redefine a C++ keyword? - c++

In this article from Guru of the week, it is said: It is illegal to #define a reserved word. Is this true? I can’t find anything in the norm, and I have already seen programmers redefining new, for instance.

17.4.3.1.1 Macro names [lib.macro.names]
1 Each name defined as a macro in a header is reserved to the implementation for any use if the translation unit includes the header.164)
2 A translation unit that includes a header shall not contain any macros that define names declared or defined in that header. Nor shall such a translation unit define macros for names lexically identical to keywords.
By the way, new is an operator and it can be overloaded (replaced) by the user by providing its own version.

The corresponding section from C++11:
17.6.4.3.1 Macro names [macro.names]
1 A translation unit that includes a standard library header shall not #define or #undef names declared in any standard library header.
2 A translation unit shall not #define or #undef names lexically identical to keywords.
Paragraph 1 from C++03 has been removed. The second paragraph has been split in two. The first half has now been changed to specifically state that it only applies to standard headers. The second point has been broadened to include any translation unit, not just those that include headers.
However, the Overview for this section of the standard (17.6.4.1 [constraints.overview]) states:
This section describes restrictions on C++ programs that use the facilities of the C++ standard library.
Therefore, if you are not using the C++ standard library, then you're okay to do what you will.
So to answer your question in the context of C++11: you cannot define (or undefine) any names identical to keywords in any translation unit if you are using the C++ standard library.

They're actually wrong there, or at least doesn't tell the whole story about it. The real reason it's disallowed is that it violates the one-definition-rule (which by the way is also mentioned as the second reason why it's illegal).
To see that it's actually allowed (to redefine keywords), at least if you don't use the standard libraries, you have to look at an entirely different part of the standard, namely the translation phases. It says that the input is only decomposed into preprocessor tokens before preprocessing takes place and looking at those there's no distinction between private and fubar, they are both identifiers to the preprocessor. Later when the input is decomposed into token the replacement has already taken place.
It has been pointed out that there's a restriction on programs that are to use the standard libraries, but it's not evident that the example redefining private is doing that (as opposed to the "Person #4: The Language Lawyer" snippet which uses it for output to cout).
It's mentioned in the last example that the trick doesn't get trampled on by other translation units or tramples on other. With this in mind you should probably consider the possibility that the standard library is being used somewhere else which will put this restriction in effect.

Here's a little thing you can do if you don't want someone to use goto's. Just drop the following somewhere in his code where he won't notice it.
#define goto { int x = *(int *)0; } goto
Now every time he tries to use a goto statement, his program will crash.

It's not as far as I'm aware illegal - no compiler I've come across yet will generate an error if you do
#define true false
#defining certain keywords are likely to generate errors in compilation for other reasons. But a lot of them will just result in very strange program behaviour.

Related

Is there a way to have the same #define statement in different files that are included into the same file

So, I have a file structure like this:
FileA
FileB
FileC
FileA includes FileB and FileC
FileB has:
#define image(i, j, w) (image[ ((i)*(w)) + (j) ])
and FileC has:
#define image(i, j, h) (image[ ((j)*(h)) + (i) ])
on compilation i get:
warning: "image" redefined
note: this is the location of the previous definition ...
Does this warning mean it changes the definition of the other file where it found it initially when compiling ?
Is there any way to avoid this warning while maintaining these two defines, and them applying their different definitions on their respective files?
Thankyou in advance :)
Does this warning mean it changes the definition of the other file where it found it initially when compiling ?
The program is ill-formed. The language doesn't specify what happens in this case. If the compiler accepts an ill-formed program, then you must read the documentation of the compiler to find out what they do in such case.
Note that the program might not even compile with other compilers.
Is there any way to avoid this warning while maintaining these two defines, and them applying their different definitions on their respective files?
Technically, you could use hack like this without touching either header:
#include "FileB"
#undef image
#include "FileC"
But a good solution - if you can modify the headers - is to not use macros. Edit the headers to get rid of them. Use functions instead, and declare them in distinct namespaces so that their names don't conflict.
Some rules of thumb:
Don't use unnecessary macros. Functions and variables are superior to macros.
Follow the common convention of using only upper case for macro names, if you absolutely need to use macros. It is important to make sure that macro names don't mix with non-macros because macros don't respect namespaces nor scopes.
If you need a macro within a single header, then undefine it immediately when it's no longer needed instead of leaking it into other headers.
Don't use names without namespaces. That will lead to name conflicts. Macros don't respect C++ namespaces, but you can instead prefix their names. For example, you could have FILE_B_IMAGE and FILE_C_IMAGE (or something more descriptive based on the concrete context).
They are not functionally equivalent, one can be seen as a row-wise iteration and the other a column-wise
This seems like a good argument for renaming the functions (or the macros, if you for some reason cannot replace them). Call one row_wise and the other column_wise or something along those lines. Use descriptive names!
Does this warning mean it changes the definition of the other file where it found it initially when compiling ?
For GCC (tagged) it means that the definition processed second is used from the point of the redefinition onward, including not only in the same file but at any places later in the translation unit where the macro identifier appears followed by a (. Previous appearances will have used the previous definition.
Neither the C language specification nor the C++ language specification provides a more general answer: the redefinition other than with an identical token sequence violates language constraints, therefore both the translation behavior and the execution behavior of a program containing such a non-matching redefinition are undefined.
Is there any way to avoid this warning while maintaining these two
defines, and them applying their different definitions on their
respective files?
If these definitions are meant to be used only within their respective files, then the easiest solution would be for each file to #undef image at the end. This would work in both C and C++.
If both are intended to be exposed for use by other files then you have a name collision that you will have to resolve one way or another. You might, for instance, add a distinguishing prefix to the definition and all uses of each one. In C++ only, you also have the option of resolving the name collision by changing the macros to [inline] functions and putting them in different namespaces. That would probably make it easier to adapt each one's users to the new names than prefixing the names would do.

Can I define a macro in a header file?

I have a macro definition in MyClass.h, stated as such:
#define _BufferSize_ 64
I placed the include directive for MyClass.h inside of main.cpp:
#include "MyClass.h"
Does this mean I can use _BufferSize_ in both main.cpp and MyClass.h? Also, is this good practice?
Yes, it would work. (Disregarding the problem with underscores that others have pointed out.)
Directive #include "MyClass.h" just copies the whole content of file MyClass.h and pastes it in the place of the #include. From the point of view of the compiler there is only one source file composed of the file specified by the user and all included files.
Having said that, it would be much better if you use in-language construction instead of preprocessor directive.
For example replace:
#define _BufferSize_ 64
with
constexpr size_t BufferSize = 64;
The only thing it does differently than the #define is that it specifies the type of the value (size_t in this case). Beside that, the second code will behave the same way and it avoids disadvantages of preprocessor.
In general, try to avoid using preprocessor directives. This is an old mechanism that was used when c++ coudn't do that things in-language yet.
Yes, that is the purpose of header files: creating declarations and constants in one file that you can "include" into translation units whenever you like.
However, your macro name is illegal, and a nice constexpr size_t BufferSize = 64 would be more idiomatic nowadays; even before recent versions of C++, a typed constant would be preferable to a macro in many cases.
First, regarding the identifier _BufferSize_, the standard states that:
3. ...some identifiers are reserved for use by C++ implementations and shall not be used otherwise; no diagnostic is required.
(3.1) Each identifier that contains a double underscore __ or begins with an underscore followed by an uppercase letter is reserved to the implementation for any use.
So having such an identifier in your code would lead to undefined behavior.
And as already suggested in the comments, using macro variables is not good practice in C++. You can use a const int instead.
Replying 3 years later because the answers are wrong and this is first google search result in certain keywords.
https://google.github.io/styleguide/cppguide.html#Preprocessor_Macros
Avoid defining macros, especially in headers; prefer inline functions, enums, and const variables. Name macros with a project-specific prefix. Do not use macros to define pieces of a C++ API.
Highlight by me, not in original text.

Does using `__DATE__` or `__TIME__` violate the one-definition-rule?

When using __DATE__ or __TIME__ in a header file, the results of the preprocessor for that header inclusion can be somewhat different.
Under which circumstances does using __DATE__ or __TIME__ in a header file violate the one-definition-rule?
As a follow-up: Does the assert header violate the one-definition-rule?
If __TIME__ gives different results for different translation units, then it must not be used in a context where the same result is required across translation units. This means e.g. initialising an object (e.g. a class member) to __TIME__, where that initialiser is part of a header that gets included in multiple translation units, is going to be problematic.
__DATE__ is less likely to give different results for different translation units if you start a fresh build, but incremental builds, ones that only recompile the files that changed, do make it likely to become a problem as well.
assert is a macro that expands differently depending on how NDEBUG was defined when its header was included, so either the whole project must agree on whether NDEBUG should be defined, or functions defined in headers should avoid using assert.
The one definition rule is only applicable to variables, functions, class types, enumerations, or templates (e.g. Section 3.2, ISO/IEC 14882, 1998 C++ Standard). __DATE__ or __TIME__ are both predefined macros that expand to a string literal - which is not one of the things that the one-definition rule is applicable to.
assert() is also a preprocessor macro. If its expansion defines a variable, function, class type, enumeration, or template, then its use could potentially violate the one-definition rule, if that definition differed between translation units. Pragmatically, it is difficult to envisage a situation in which an implementation would have an assert() macro that expanded to such a definition.

Is it definitely illegal to refer to a reserved name?

On the std-proposals list, the following code was given:
#include <vector>
#include <algorithm>
void foo(const std::vector<int> &v) {
#ifndef _ALGORITHM
std::for_each(v.begin(), v.end(), [](int i){std::cout << i; }
#endif
}
Let's ignore, for the purposes of this question, why that code was given and why it was written that way (as there was a good reason but it's irrelevant here). It supposes that _ALGORITHM is a header guard inside the standard header <algorithm> as shipped with some known standard library implementation. There is no inherent intention of portability here.
Now, _ALGORITHM would of course be a reserved name, per:
[C++11: 2.11/3]: In addition, some identifiers are reserved for use by C++ implementations and standard libraries (17.6.4.3.2) and shall not be used otherwise; no diagnostic is required.
[C++11: 17.6.4.3.2/1]: Certain sets of names and function signatures are always reserved to the implementation:
Each name that contains a double underscore _ _ or begins with an underscore followed by an uppercase letter (2.12) is reserved to the implementation for any use.
Each name that begins with an underscore is reserved to the implementation for use as a name in the global namespace.
I was always under the impression that the intent of this passage was to prevent programmers from defining/mutating/undefining names that fall under the above criteria, so that the standard library implementors may use such names without any fear of conflicts with client code.
But, on the std-proposals list, it was claimed that this code is itself ill-formed for merely referring to such a reserved name. I can now see how the use of the phrase "shall not be used otherwise" from [C++11: 2.11/3]: may indeed suggest that.
One practical rationale given was that the macro _ALGORITHM could expand to some code that wipes your hard drive, for example. However, taking into account the likely intention of the rule, I'd say that such an eventuality has more to do with the obvious implementation-defined* nature of the _ALGORITHM name, and less to do with it being outright illegal to refer to it.
* "implementation-defined" in its English language sense, not the C++ standard sense of the phrase
I'd say that, as long as we're happy that we are going to have implementation-defined results and that we should investigate what that macro means on our implementation (if it exists at all!), it should not be inherently illegal to refer to such a macro provided we do not attempt to modify it.
For example, code such as the following is used all over the place to distinguish between code compiled as C and code compiled as C++:
#ifdef __cplusplus
extern "C" {
#endif
and I've never heard a complaint about that.
So, what do you think? Does "shall not be used otherwise" include simply writing such a name? Or is it probably not intended to be so strict (which may point to an opportunity to adjust the standard wording)?
Whether it's legal or not is implementation-specific (and identifier-specific).
When the Standard gives the implementation the sole right to use these names, that includes the right to make the names available in user code. If an implementation does so, great.
But if an implementation doesn't expressly give you the right, it is clear from "shall not be used otherwise" that the Standard does not, and you have undefined behavior.
The important part is "reserved to the implementation". It means that the compiler vendor may use those names and even document them. Your code may then use those names as documented. This is often used for extensions like __builtin_expect, where the compiler vendor avoids any clash with your identifiers (that are declared by your code) by using those reserved names. Even the standard uses them for things like __attribute__ to make sure it doesn't break existing (legal) code when adding new features.
http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1882
Each identifier that contains a double understore __ or begins with an underscore followed by an uppercase letter is reserved to the implementation for any use.
any use. (similar text occurs both before and after that defect fix is applied)
__cplusplus is defined by the standard. _ALGORITHM is reserved by the standard to be used by implementations. These seem quite different? (The two sections of the standard do conflict, in that one states that __cplusplus is reserved for any use, and another uses it specifically, but I think that the winner of that conflict is clear).
The _ALGORITHM identifier could, under the standard, be used as part of a pre-processing step to say "replace this source code with hard drive deleting code". Its existence (prior to pre-processing, or after) could be sufficient to completely change your program behavior.
Now this is unlikely, but I do not think it results in an non-conforming implementation. It is a matter of quality of implementation only.
An implementation is free to document and define what _ALGORITHM means. For example, it could document that it is a header guard for <algorithm>, and indicates if that header file has been included. Treating your current <algorithm> implementation as documentation is probably going to far.
I'd guess using __cplusplus in C mode is technically "just as bad" as using _ALGORITHM, but this question is a c++ question, not a c question. I haven't delved into the c standard to look for quotes about it.
The names in [cpp.predefined] are different. Those have a specified meaning, so an implementation can't reserve them for any use, and using them in a program has a well-defined portable meaning. Using an implementation-specific identifier like the example of _ALGORITHM is ill-formed because it violates a shall-rule.
Yes, I'm fully aware of multiple examples where the library specification uses "shall" to mean "this is a requirement on user code, and violations are UB, not ill-formed".
Regarding whether it's UB or implementation-defined, running an ill-formed program results in UB. The standard wording clearly says the program is ill-formed, UB occurs if the implementation still chooses to accept the program and run it.
So, if a program uses the identifier _ALGORITHM, that program is ill-formed, and running such a program is UB, but that does not mean it doesn't work fine on an implementation that uses _ALGORITHM as an include guard, nor does it mean that it doesn't work fine on an implementation that doesn't.
If users are concerned about such ill-formedness and potential UB, and said users want to write portable C++, they shouldn't use reserved identifiers in portable C++ programs. If users accept that regardless of the standard prohibiting such a use, no practical implementation will wipe your hard drive, they can freely use such reserved identifiers, but by the letter of the standard, such uses are still ill-formed.
Historically, the purpose for making the use of such tokens "undefined behavior" is that compilers are free to attach any meaning they want to any such token that are not defined within the C standard. For example, on some embedded processors, using __xdata as a storage class for a variable will ask that it be stored in an area of RAM which is slower to access than the normal variable-storage area, but is much larger. On typical processors of that family, storage for "normal" variables would be limited to about 100 bytes, but storage for xdata variables may be much larger--up to 64K. The standard says basically nothing about what compilers are allowed to do with such directives, although typically (I'm not sure if the standard mandates this behavior, though I'm unaware of compilers violating it) such tokens are generally ignored within code that is disabled using a #if or similar directives.
Some libraries' header files will start their own internal identifiers with something that starts with two underscores but includes a pattern that's unlikely to be used by a compiler for any purpose (e.g. version 23 of the Foozle library might precede its identifiers with use __FZ23). It would be perfectly legitimate for a future compilers to use identifiers starting with __FZ23 for other purposes, and if that were to happen the Foozle library would need to be changed to use something else. If, however, it is likely that a major compiler upgrade would likely necessitate rewrites of the Foozle library for other reasons anyway, that risk may be acceptable compared to the risk of identifiers conflicting with outside code.
Note also that some project header files which are targeted toward a processor that requires __ directives may conditionally define macros with those names when compiled for other processors, for example:
#ifndef USE_XDATA
#define __XDATA
#endif
though a somewhat better pattern would generally be:
#ifdef USE_XDATA
#define XDATA __XDATA
#else
#define XDATA
#endif
When writing new code, the latter pattern is often better, but the former pattern may sometimes be useful when adapting existing code written on a platform that requires __XDATA so that it may be used both on platforms that use/require that directive and on platforms that do not.
Whether or not it is legal is a matter of local law. Whether it means anything, and if so, what, is a matter for the language definition. When you use a name that's reserved to the implementation the behavior of your program is undefined. That means that the language definition does not tell you what the program does. Nothing more, nothing less. If the compiler you're using documents what a particular reserved identifier does, then you can use that identifier with that compiler. If you hunt through headers and guess what various un-documented identifiers mean you might be able to use them, but don't be surprised if your code breaks when a subsequent update changes something.
Don't get hung up on __cplusplus. It's core language, and the stuff about double underscores, etc. is library. If that's not convincing, just consider it a glitch. You can use __cplusplus in C++ programs; its meaning is well defined.

C++ preprocessor #define-ing a keyword. Is it standards conforming?

Help settle the debate that's going on in the comments at this question about bool and 1:
Can a standards-conforming C++ preprocessor allow one to use #define to redefine a language keyword? If so, must a standards-conforming C++ preprocessor allow this?
If a C++ program redefines a language keyword, can that program itself be standards conforming?
In C++, the closest thing to forbidding #defineing a keyword is §17.4.3.1.1/2, which only disallows it in a translation unit that includes a standard library header:
A translation unit that includes a header shall not contain any macros that define names declared or defined in that header. Nor shall such a translation unit define macros for names lexically identical to keywords.
The second sentence of that paragraph has been changed in C++0x to outright forbid #defineing a keyword (C++0x FCD §17.6.3.3.1):
A translation unit shall not #define or #undef names lexically identical to keywords.
Edit: As pointed out by Ken Bloom in comments to his answer, the rules have not changed in C++0x; the text has just been rearranged to confuse people like me. :-)
It is not permitted, according to C++11 [macro.names]:
A translation unit shall not #define or #undef names lexically identical to keywords, to the identifiers listed in Table 3, or to the attribute-tokens described in 7.6.
The "identifiers listed in Table 3" are final and override; and the attribute-tokens are the identifiers in [[fallthrough]] and so on.
This clause is still in the latest standard too.
Working from the 2005-10-19 C++ working draft (since I don't have a standard handy):
Section 16.3 defines the grammar for #define to be #define identifier replacement-list-newline (object-like macros) or one of several constructions beginning with #define identifier lparen (function-like macros). identifiers are defined in section 2.10 to be identifier-nondigit | identifier identifier-nondigit | identifier digit. Section 2.11 indicates that a certain list of identifiers are unconditionally treated as keywords in phase 7 of compilation (section 2.1), and I conclude that they are therefore not treated specially in phase 4, which is preprocessor expansion. Thus, it appears that the standard requires the preprocessor to allow you to redefine language keywords (listed in Section 2.11).
However, the preprocessor has a keyword of its own, namely defined, as well as a list of predefined macros (Section 16.8). Section 16.8 states that the behavior is undefined if you redefine these, but does not prohibit the preprocessor from recognizing these as macro names.