In C++ (and C) we have the #pragma directive which basically has implementation defined effects. However, are there any limits of what the directive may do? (Note that I'm asking about what the standard allows, not about what real compilers actually do.)
What I'm certain #pragma may do:
Allow to select one of several compilation options which all result in valid C++- For example, select one of several available ABIs, or switch certain implementation defined options.
What I would guess is allowed, but am not sure:
Allow the compiler to accept otherwise illegal code without issuing a diagnostic (for example, a compiler might decide to support a new built-in type long long long, but any code using that would have to issue a diagnostic; this diagnostic could then be suppressed with e.g. #pragma long long long.
Allow the compiler to reject otherwise legal code, for example there could be a #pragma strict which causes the compiler to flag as error the use of certain library functions and/or language constructs which are considered unsafe.
What I actually doubt is allowed, but am not sure either:
Allow the compiler to change the semantics of legal code to something different (for example, assume that a compiler vendor considered it a good idea if the for condition were a postcondition (as in do … while), and defined #pragma for postcondion to switch the meaning of for accordingly.
The reason why I doubt the latter is that the compiler is allowed to ignore any pragma it doesn't recognize, and therefore a change in semantics by a pragma would cause the same program to have different semantics on different compilers.
However, what does the standard actually allow? And are there things which are allowed, but which are not covered by my list above?
The standard is pretty clear on that:
[cpp.pragma] A preprocessing directive of the form
#pragma pp-tokensopt new-line
causes the implementation to behave in an implementation-defined manner. The behavior might cause translation to fail or cause the translator or the resulting program to behave in a non-conforming manner. Any pragma that is not recognized by the implementation is ignored.
The compiler can thus do pretty much whatever it wants on seeing a #pragma.
The n3337 version of C++ standard says
A preprocessing directive of the form
#pragma pp-tokensopt new-line
causes the implementation to behave in an implementation-defined manner.
The behavior might cause translation to fail or cause the translator or
the resulting program to behave in a non-conforming manner. Any pragma that is not recognized by the implementation is ignored.
I think that allows the compiler to do almost anything it would like to do in case of a #pragma. Both the "translation to fail" and "translator or resulting program to behave in non-conforming manner" covers a great range of options.
That, of course, doesn't mean that it's a wise thing to allow the compiler to do "crazy" things - it may "upset" people, but it's perfectly valid from the standard's perspective.
The compiler is allowed to do absolutely everything, as long as it is documented. A really old version of gcc, upon seeing any pragma, used to stop compilation and attempted to locate and launch one of the text-mode games that existed back then. Which was perfectly standard conforming because there was a section about it in the user guide.
Related
On the std-proposals list, the following code was given:
#include <vector>
#include <algorithm>
void foo(const std::vector<int> &v) {
#ifndef _ALGORITHM
std::for_each(v.begin(), v.end(), [](int i){std::cout << i; }
#endif
}
Let's ignore, for the purposes of this question, why that code was given and why it was written that way (as there was a good reason but it's irrelevant here). It supposes that _ALGORITHM is a header guard inside the standard header <algorithm> as shipped with some known standard library implementation. There is no inherent intention of portability here.
Now, _ALGORITHM would of course be a reserved name, per:
[C++11: 2.11/3]: In addition, some identifiers are reserved for use by C++ implementations and standard libraries (17.6.4.3.2) and shall not be used otherwise; no diagnostic is required.
[C++11: 17.6.4.3.2/1]: Certain sets of names and function signatures are always reserved to the implementation:
Each name that contains a double underscore _ _ or begins with an underscore followed by an uppercase letter (2.12) is reserved to the implementation for any use.
Each name that begins with an underscore is reserved to the implementation for use as a name in the global namespace.
I was always under the impression that the intent of this passage was to prevent programmers from defining/mutating/undefining names that fall under the above criteria, so that the standard library implementors may use such names without any fear of conflicts with client code.
But, on the std-proposals list, it was claimed that this code is itself ill-formed for merely referring to such a reserved name. I can now see how the use of the phrase "shall not be used otherwise" from [C++11: 2.11/3]: may indeed suggest that.
One practical rationale given was that the macro _ALGORITHM could expand to some code that wipes your hard drive, for example. However, taking into account the likely intention of the rule, I'd say that such an eventuality has more to do with the obvious implementation-defined* nature of the _ALGORITHM name, and less to do with it being outright illegal to refer to it.
* "implementation-defined" in its English language sense, not the C++ standard sense of the phrase
I'd say that, as long as we're happy that we are going to have implementation-defined results and that we should investigate what that macro means on our implementation (if it exists at all!), it should not be inherently illegal to refer to such a macro provided we do not attempt to modify it.
For example, code such as the following is used all over the place to distinguish between code compiled as C and code compiled as C++:
#ifdef __cplusplus
extern "C" {
#endif
and I've never heard a complaint about that.
So, what do you think? Does "shall not be used otherwise" include simply writing such a name? Or is it probably not intended to be so strict (which may point to an opportunity to adjust the standard wording)?
Whether it's legal or not is implementation-specific (and identifier-specific).
When the Standard gives the implementation the sole right to use these names, that includes the right to make the names available in user code. If an implementation does so, great.
But if an implementation doesn't expressly give you the right, it is clear from "shall not be used otherwise" that the Standard does not, and you have undefined behavior.
The important part is "reserved to the implementation". It means that the compiler vendor may use those names and even document them. Your code may then use those names as documented. This is often used for extensions like __builtin_expect, where the compiler vendor avoids any clash with your identifiers (that are declared by your code) by using those reserved names. Even the standard uses them for things like __attribute__ to make sure it doesn't break existing (legal) code when adding new features.
http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1882
Each identifier that contains a double understore __ or begins with an underscore followed by an uppercase letter is reserved to the implementation for any use.
any use. (similar text occurs both before and after that defect fix is applied)
__cplusplus is defined by the standard. _ALGORITHM is reserved by the standard to be used by implementations. These seem quite different? (The two sections of the standard do conflict, in that one states that __cplusplus is reserved for any use, and another uses it specifically, but I think that the winner of that conflict is clear).
The _ALGORITHM identifier could, under the standard, be used as part of a pre-processing step to say "replace this source code with hard drive deleting code". Its existence (prior to pre-processing, or after) could be sufficient to completely change your program behavior.
Now this is unlikely, but I do not think it results in an non-conforming implementation. It is a matter of quality of implementation only.
An implementation is free to document and define what _ALGORITHM means. For example, it could document that it is a header guard for <algorithm>, and indicates if that header file has been included. Treating your current <algorithm> implementation as documentation is probably going to far.
I'd guess using __cplusplus in C mode is technically "just as bad" as using _ALGORITHM, but this question is a c++ question, not a c question. I haven't delved into the c standard to look for quotes about it.
The names in [cpp.predefined] are different. Those have a specified meaning, so an implementation can't reserve them for any use, and using them in a program has a well-defined portable meaning. Using an implementation-specific identifier like the example of _ALGORITHM is ill-formed because it violates a shall-rule.
Yes, I'm fully aware of multiple examples where the library specification uses "shall" to mean "this is a requirement on user code, and violations are UB, not ill-formed".
Regarding whether it's UB or implementation-defined, running an ill-formed program results in UB. The standard wording clearly says the program is ill-formed, UB occurs if the implementation still chooses to accept the program and run it.
So, if a program uses the identifier _ALGORITHM, that program is ill-formed, and running such a program is UB, but that does not mean it doesn't work fine on an implementation that uses _ALGORITHM as an include guard, nor does it mean that it doesn't work fine on an implementation that doesn't.
If users are concerned about such ill-formedness and potential UB, and said users want to write portable C++, they shouldn't use reserved identifiers in portable C++ programs. If users accept that regardless of the standard prohibiting such a use, no practical implementation will wipe your hard drive, they can freely use such reserved identifiers, but by the letter of the standard, such uses are still ill-formed.
Historically, the purpose for making the use of such tokens "undefined behavior" is that compilers are free to attach any meaning they want to any such token that are not defined within the C standard. For example, on some embedded processors, using __xdata as a storage class for a variable will ask that it be stored in an area of RAM which is slower to access than the normal variable-storage area, but is much larger. On typical processors of that family, storage for "normal" variables would be limited to about 100 bytes, but storage for xdata variables may be much larger--up to 64K. The standard says basically nothing about what compilers are allowed to do with such directives, although typically (I'm not sure if the standard mandates this behavior, though I'm unaware of compilers violating it) such tokens are generally ignored within code that is disabled using a #if or similar directives.
Some libraries' header files will start their own internal identifiers with something that starts with two underscores but includes a pattern that's unlikely to be used by a compiler for any purpose (e.g. version 23 of the Foozle library might precede its identifiers with use __FZ23). It would be perfectly legitimate for a future compilers to use identifiers starting with __FZ23 for other purposes, and if that were to happen the Foozle library would need to be changed to use something else. If, however, it is likely that a major compiler upgrade would likely necessitate rewrites of the Foozle library for other reasons anyway, that risk may be acceptable compared to the risk of identifiers conflicting with outside code.
Note also that some project header files which are targeted toward a processor that requires __ directives may conditionally define macros with those names when compiled for other processors, for example:
#ifndef USE_XDATA
#define __XDATA
#endif
though a somewhat better pattern would generally be:
#ifdef USE_XDATA
#define XDATA __XDATA
#else
#define XDATA
#endif
When writing new code, the latter pattern is often better, but the former pattern may sometimes be useful when adapting existing code written on a platform that requires __XDATA so that it may be used both on platforms that use/require that directive and on platforms that do not.
Whether or not it is legal is a matter of local law. Whether it means anything, and if so, what, is a matter for the language definition. When you use a name that's reserved to the implementation the behavior of your program is undefined. That means that the language definition does not tell you what the program does. Nothing more, nothing less. If the compiler you're using documents what a particular reserved identifier does, then you can use that identifier with that compiler. If you hunt through headers and guess what various un-documented identifiers mean you might be able to use them, but don't be surprised if your code breaks when a subsequent update changes something.
Don't get hung up on __cplusplus. It's core language, and the stuff about double underscores, etc. is library. If that's not convincing, just consider it a glitch. You can use __cplusplus in C++ programs; its meaning is well defined.
This question already has answers here:
C++ preprocessor #define-ing a keyword. Is it standards conforming?
(3 answers)
Closed 8 years ago.
I want to define private and protected to public.
#define private public
#define protected public
Is this safe in C++?
No, this almost certainly results in undefined behaviour.
From n4296 17.6.4.3.1 [macro.names] /2: (via #james below)
A translation unit shall not #define or #undef names lexically identical to keywords, to the identifiers listed in Table 2, or to the attribute-tokens described in 7.6.
private and public are keywords. Simply doing a #define on one of them is undefined behavior if you use anything in the C++ standard library:
17.6.4.1/1 [constraints.overview]
This section describes restrictions on C ++ programs that use the facilities of the C++ standard library.
If you do not, the restriction in 17.6.4.3.1 does not seem to apply.
The other way it could lead to violation is if you use the same structure with two different definitions. While most implementations may not care that the two structures are identical other than public vs private, the standard does not make that guarantee.
Despite that, the most common kind of UB is 'it works', as few compilers care.
But that does not mean it is 'safe'. It may be safe in a particular compiler (examine the docs of said compiler: this would be a strange guarantee to give, however!). Few compilers (if any) will provide the layout guarantees (and mangling guarantees) required for the above to work explicitly if you access the same structure through two different definitions, for example, even if the other possibilities of error are more remote.
Many compilers will 'just work'. That does not make it safe: the next compiler version could make one of a myriad of changes and break your code in difficult (or easy) to detect ways.
Only do something like this if the payoff is large.
I cannot find evidence that #defineing a keyword is undefined behavior if you never include any standard library headers and do not use it to make a definition different in two compilation units. So in a highly restricted program, it may be legal. In practice, even if it legal, it still isn't 'safe', both because that legality is extremely fragile, and because compilers are unlikely to test against that kind of language abuse, or care if it leads to a bug.
The undefined behavior caused by #define private foo does not seem to be restricted to doing it before the #include of the std header, as an example of how fragile it is.
It is allowed only in a translation unit that doesn't in any way (even indirectly) include a standard header, but if you do there are restrictions such as this:
(17.6.4.3.1) A translation unit shall not #define or #undef names lexically identical to keywords [...]
Regardless of that, it's usually a bad idea - mucking around with access modifiers isn't "safe" by any common meaning of the word, even if it won't cause any immediate problems in itself.
If you're using library code, there are usually good reasons for things being protected.
If you want to make things public temporarily, e.g. for testing purposes, you could use a special conditional macro for that part of the class:
#if TESTING
#define PRIVATE_TESTABLE public
#else
#define PRIVATE_TESTABLE private
#endif
class Foo
{
public:
Foo();
void operation();
PRIVATE_TESTABLE:
int some_internal_operation();
private:
int some_internal_data;
};
There is nothing illegal about doing this, it's just crazy.
You have to use this definition for all the compilation units (otherwise you might get the linker failing because of the name mangling.
If you are asking out of curiosity then this is perfectly legal (if confusing) c++; if you are asking because you think this is a good idea so you don't have all those pesky access permissions then you are on a very bad path. There a specific semantic reasons for using protected and private which serve to reduce he complexity of code and explain the 'contract' that modules have with each other. Using this define makes you code nearly unreadable for practiced c++ programmers.
Syntax is correct but semantic is wrong.
Access modifiers are for humans only so that you by accident don't use a method or field etc. that you are not supposed to access/change.
You could as well always make everything public if you write new code but in old code it can definitely break something. It would work but it would of course also be more error prone. Imagine what IntelliSense would suggest if you had access to everything everytime. Access modifiers don't only protect code that could break something if you used it in a wrong way but it helps IntelliSense to show you only members that are relevant in a particular context.
How portable is code that uses #pragma optimize? Do most compilers support it and how complete is the support for this #pragma?
#pragma is the sanctioned and portable way for compilers to add non-sanctioned and non-portable language extensions *.
Basically, you never know for sure, and at least one major C++ compiler (g++) does not support this pragma as is.
*:
From the C++ standard (N3242):
16.6 Pragma directive [cpp.pragma]
A preprocessing directive of the form
# pragma pp-tokensopt new-line
causes the implementation to behave in an implementation-defined manner. The behavior might cause translation to fail or cause the translator or the resulting program to behave in a non-conforming manner. Any pragma that is not recognized by the implementation is ignored.
From the C standard (Committee Draft — April 12, 2011):
6.10.6 Pragma directive
Semantics
A preprocessing directive of the form
# pragma pp-tokensopt new-line
where the preprocessing token STDC does not immediately follow pragma in the
directive (prior to any macro replacement)174) causes the implementation to behave in an
implementation-defined manner. The behavior might cause translation to fail or cause the
translator or the resulting program to behave in a non-conforming manner. Any such
pragma that is not recognized by the implementation is ignored.
And here's an example:
int main () {
#pragma omp parallel for
for (int i=0; i<16; ++i) {}
}
A big part of the C and C++ OpenMP API is implemented as #pragmas.
Often this is not a good idea to rely on compiler flags, since each compiler has its own behaviour.
This flag should not be used as it is a compiling level spec you inject into your code.
Normally and theoretically this flag should be ignored by compilers if not used.
The #pragma keyword is portable in the sense that it should always compile despite on the compiler. However, the pragmas are compiler-specific so it's probable that when changing compiler it will complain with some warnings. Some pragmas are wide used, such as these from OpenMP. In order to make the code the most portable possible, you might surround your pragmas with #ifdef/#endif that depend on the compiler you're using. For example:
#ifdef __ICC
#pragma optimize
#endif
Compilers usually define some macros such as __ICC that make the code know which compiler is being used.
Any use of #pragma is compiler specific.
For example :
GNU, Intel and IBM :
#warning "Do not use ABC, which is deprecated. Use XYZ instead."
Microsoft :
#pragma message("Do not use ABC, which is deprecated. Use XYZ instead.")
Regarding your specific question about the #pragma optimize, it is supported by gcc and microsoft, but it doesn't mean it will be in the future.
#pragma is not portable, full stop. There was a version of gcc that used to start of a game whenever it came across that
Of the compilers we use at work, two definitely don't support #pragma optimise, and I can't answer for the others.
And even if they did, as the command line switches for optimisation are different, the chances are that the options for the pragma would be different.
In this article from Guru of the week, it is said: It is illegal to #define a reserved word. Is this true? I can’t find anything in the norm, and I have already seen programmers redefining new, for instance.
17.4.3.1.1 Macro names [lib.macro.names]
1 Each name defined as a macro in a header is reserved to the implementation for any use if the translation unit includes the header.164)
2 A translation unit that includes a header shall not contain any macros that define names declared or defined in that header. Nor shall such a translation unit define macros for names lexically identical to keywords.
By the way, new is an operator and it can be overloaded (replaced) by the user by providing its own version.
The corresponding section from C++11:
17.6.4.3.1 Macro names [macro.names]
1 A translation unit that includes a standard library header shall not #define or #undef names declared in any standard library header.
2 A translation unit shall not #define or #undef names lexically identical to keywords.
Paragraph 1 from C++03 has been removed. The second paragraph has been split in two. The first half has now been changed to specifically state that it only applies to standard headers. The second point has been broadened to include any translation unit, not just those that include headers.
However, the Overview for this section of the standard (17.6.4.1 [constraints.overview]) states:
This section describes restrictions on C++ programs that use the facilities of the C++ standard library.
Therefore, if you are not using the C++ standard library, then you're okay to do what you will.
So to answer your question in the context of C++11: you cannot define (or undefine) any names identical to keywords in any translation unit if you are using the C++ standard library.
They're actually wrong there, or at least doesn't tell the whole story about it. The real reason it's disallowed is that it violates the one-definition-rule (which by the way is also mentioned as the second reason why it's illegal).
To see that it's actually allowed (to redefine keywords), at least if you don't use the standard libraries, you have to look at an entirely different part of the standard, namely the translation phases. It says that the input is only decomposed into preprocessor tokens before preprocessing takes place and looking at those there's no distinction between private and fubar, they are both identifiers to the preprocessor. Later when the input is decomposed into token the replacement has already taken place.
It has been pointed out that there's a restriction on programs that are to use the standard libraries, but it's not evident that the example redefining private is doing that (as opposed to the "Person #4: The Language Lawyer" snippet which uses it for output to cout).
It's mentioned in the last example that the trick doesn't get trampled on by other translation units or tramples on other. With this in mind you should probably consider the possibility that the standard library is being used somewhere else which will put this restriction in effect.
Here's a little thing you can do if you don't want someone to use goto's. Just drop the following somewhere in his code where he won't notice it.
#define goto { int x = *(int *)0; } goto
Now every time he tries to use a goto statement, his program will crash.
It's not as far as I'm aware illegal - no compiler I've come across yet will generate an error if you do
#define true false
#defining certain keywords are likely to generate errors in compilation for other reasons. But a lot of them will just result in very strange program behaviour.
which compiling a multithreaded program we use gcc like below:
gcc -lpthread -D_REENTRANT -o someprogram someprogram.c
what exactly is the flag -D_REENTRANT doing over here?
Defining _REENTRANT causes the compiler to use thread safe (i.e. re-entrant) versions of several functions in the C library.
You can search your header files to see what happens when it's defined.
Excerpt from the libc 8.2 manual:
Macro: _REENTRANT
Macro: _THREAD_SAFE
These macros are obsolete. They have the same effect as defining
_POSIX_C_SOURCE with the value 199506L.
Some very old C libraries required one of these macros to be defined
for basic functionality (e.g. getchar) to be thread-safe.
We recommend you use _GNU_SOURCE in new programs. If you don’t specify
the ‘-ansi’ option to GCC, or other conformance options such as
-std=c99, and don’t define any of these macros explicitly, the effect is the same as defining _DEFAULT_SOURCE to 1.
When you define a feature test macro to request a larger class of
features, it is harmless to define in addition a feature test macro
for a subset of those features. For example, if you define
_POSIX_C_SOURCE, then defining _POSIX_SOURCE as well has no effect. Likewise, if you define _GNU_SOURCE, then defining either
_POSIX_SOURCE or _POSIX_C_SOURCE as well has no effect.
JayM replied:
Defining _REENTRANT causes the compiler to use thread safe (i.e. re-entrant) versions of several functions in the C library.
You can search your header files to see what happens when it's defined.
Since OP and I were both interested in the question, I decided to actually post the answer. :) The following things happen with _REENTRANT on Mac OS X 10.11.6:
<math.h> gains declarations for lgammaf_r, lgamma_r, and lgammal_r.
On Linux (Red Hat Enterprise Server 5.10), I see the following changes:
<unistd.h> gains a declaration for the POSIX 1995 function getlogin_r.
So it seems like _REENTRANT is mostly a no-op, these days. It might once have declared a lot of new functions, such as strtok_r; but these days those functions are mostly mandated by various decades-old standards (C99, POSIX 95, POSIX.1-2001, etc.) and so they're just always enabled.
I have no idea why the two systems I checked avoid declaring lgamma_r resp. getlogin_r when _REENTRANT is not #defined. My wild guess is that this is just historical cruft that nobody ever bothered to go through and clean up.
Of course my observations on these two systems might not generalize to all systems your code might ever encounter. You should definitely still pass -pthread to the compiler (or, less good but okay, -lpthread -D_REENTRANT) whenever your program requires pthreads.
In multithreaded programs, you tell the compiler that you need this feature by defining the _REENTRANT macro before any #include lines in your program. This does three things, and does them so elegantly that usually you don’t even need to know what was done:
Some functions get prototypes for a re-entrant safe equivalent.
These are normally the same function name, but with _r appended so
that, for example, gethostbyname is changed to gethostbyname_r.
Some stdio.h functions that are normally implemented as macros
become proper re-entrant safe functions.
The variable errno, from errno.h, is changed to call a function,
which can determine the real errno value in a multithread safe way.
Taken from Beginning Linux Programming
It simply defined _REENTRANT for the preprocessor. Somewhere in the associated code, you'll probably find #ifdef _REENTRANT or #if defined(_REENTRANT) in at least a few places.
Also note that the name "_REENTRANT: is in the implementer's name space (any name starting with an underscore followed by another underscore or a capital letter is), so defining it means you've stepped outside what the standard defines (at least the C or C++ standards).