Can't use "not", "or", or "plus" as identifier? - c++

I tried to compile this:
enum class conditional_operator { plus, or, not };
But apparently GCC (4.6) thinks these are special, while I can't find a standard that says they are (neither C++0x n3290 or C99 n2794). I'm compiling with g++ -pedantic -std=c++0x. Is this a compiler convenience? How do I turn it off? Shouldn't -std=c++0x turn this "feature" off?
PS: Hmmm, apparently, MarkDown code formatting thinks so too...

Look at 2.5. They are alternative tokens for || and !.
There is a bunch of other alternative tokens BTW.
Edit: The rationale for their inclusion is the same as the one of trigraphs: allow the use of non ASCII character sets. The committee has tried to get rid of them (at least of trigraphs, I don't remember for alternative tokens), and has met opposition of people (mostly IBM mainframe users) which are using them.
Edit for completeness: as other have make the remarks, plus isn't in that class and should not be a problem unless you are using namespace std.

These are actually defined as alternative tokens (and reserved) oddly enough, as alternative representations for operators. I believe this was originally to aid people who were using keyboards which made the relevant symbols hard to produce, although this seems a pretty poor reason to add extra keywords to the language :(
There may be a GCC compiler option to disable them, but I'm not sure.
(As mentioned in comments, plus should be okay unless you're using the std namespace.)

or and not are alternative representations of || and ! respectively. You can't turn them off and you can't use these tokens for anything else, they are part of the language (current C++, not even just C++0x). ( See ISO/IEC 14882:2003 2.5 [lex.digraph] and 2.11 [lex.key] / 2. )
You should be safe with plus unless you use using namespace std; or using std::plus;.

The Standard lists keywords in 2.11. There's also a list of alternative representations separate from the keyword list that is reserved and can't be used otherwise, but aren't keywords. and and or are on that list. Section 17.4.3 describes restrictions on programs that use libraries, and 17.4.3.1.3 describes that names declared with external linkage in a header are reserved both in std:: and the global namespace.
In other words, you don't have to go to C++0x to have those problems. and and or are already reserved, and header <functional> contains plus as a templated struct type, and plus is therefore off-limits if <functional> is directly or indirectly #included.
I'm not sure dumping that much stuff into the global namespace was really wise, but that's what the standard says.

It is an year 1995 amendment to the C90 standard. Probably a compiler may choose on how to behave on this. GCC probably includes the header as part of the standard library. With microsoft it doesn't and you have to include the iso646.h.
Here is a link to wikipedia regarding this.

Related

Usage of macros in std::string source

I'm writing some c++ code that makes use of std::string.
I wanted to see how to code is written, so I went into the source code. (ctrl + left click).
I noticed, that there are macros everywhere.
The code even ends with:
_STD_END
// Corresponds to: #define _STD_END }
I get why macros are useful, and I use them for my own Log.hpp file, but I don't understand why anyone would use macros such as _STD_END instead of just writing }.
Just to clear up, my question is why he author of std::string, P.J. Plauger, decided to use macros in this way, and if I also should?
That’s the Dinkumware library, which Microsoft licenses (although they've recently taken over full maintenance of their version). The _STD_BEGIN and _STD_END macros are used for customizing the std namespace. Some compilers don't (didn't?) support namespaces; for those compilers, the macro expansions are empty. Some compilers need some indirection, and those macros expand into directives that put the code into an implementor-specific namespace (i.e., a namespace whose name begins with an underscore followed by a capital letter), which may or may not be complemented by a using-directive to pull the contents of that namespace into std. And in many cases they expand into the obvious, ordinary namespace std { and }, respectively.
In short, they're about configurability for a multi-platform library implementation.
I worked for Dinkumware for quite a few years, so I have first-hand knowledge.

Meaning of "reserved for the implementation"

Reading the anwser from What are the rules about using an underscore in a c identifier I stumbled across the follwing quotation:
From the 2003 C++ Standard:
17.4.3.2.1 Global names [lib.global.names]
Certain sets of names and function signatures are always reserved to the implementation:
Each name that contains a double underscore (_ _) or begins with an underscore followed by an uppercase letter (2.11) is reserved to the implementation for any use.
Each name that begins with an underscore is reserved to the implementation for use as a name in the global namespace.165
165) Such names are also reserved in namespace ::std (17.4.3.1).
What exactly is meant with reserved for the implementation?
Means exactly this. It means, that you are only allowed to create such names if you are providing a compiler or standard library implementation.
The "implementation" refers to the "implementation of the C++ language". It consists of everything needed to execute a C++ program: A compiler, a standard library, hardware on which to execute, an operating system, a visualization system, input, etc.
The restriction in question means that your compiler may predefine names of the reserved form without telling you, or your standard library implementation may do so. For example, your standard library may define a macro __Foo, so if you tried to use __Foo as an identifier in your source code, you'd actually end up with the macro replacement.
The purpose of reserved names is to give your compiler and standard library freedom to express functionality in plain C++ without worrying about introducing name clashes with user code.
For a vivid example of how this is used in practice, just look at any header file of your standard library implementation.
Some reserved names have actually been made into well-defined, publicly available facilities: __FILE__, __cplusplus, __VA_ARGS__, to name a few. The C language (which has the same rules for reserved identifies) has been using reserved names exclusively to introduce new keywords (e.g. _Bool).
Implementation here means the combination of compiler(say gcc, msvc and so on), the standard library (says what features are included in the language), Operating System(Windows, Mac etc) and hardware(Intel,ARM and so on).
Depending upon the implementation, certain values are defined which the compiler uses to produce the object code that is specific to the implementation. For example
__TARGET_ARCH_ARM is defined by RealView #Matches first case
_M_ARM is defined by Visual Studio #Matches second case
to identify the CPU manufacturer.
In short these clauses are meant to discourage you from using macros of mentioned format.
In fact, n3797->17.6.5.3 Restrictions on macro definitions says, if you wish to define macros of the aforementioned formats they are :
suitable for use in #if preprocessing directives, unless explicitly
stated otherwise.
Example :
#ifndef _M_ARM
#define _M_ARM // Say you're compiling for another platform
#endif
Note
Macros, reserved for implementation, are not restricted to the format mentioned in question. For instance __arm__ is defined by gcc to identify the manufacturer.

Is it definitely illegal to refer to a reserved name?

On the std-proposals list, the following code was given:
#include <vector>
#include <algorithm>
void foo(const std::vector<int> &v) {
#ifndef _ALGORITHM
std::for_each(v.begin(), v.end(), [](int i){std::cout << i; }
#endif
}
Let's ignore, for the purposes of this question, why that code was given and why it was written that way (as there was a good reason but it's irrelevant here). It supposes that _ALGORITHM is a header guard inside the standard header <algorithm> as shipped with some known standard library implementation. There is no inherent intention of portability here.
Now, _ALGORITHM would of course be a reserved name, per:
[C++11: 2.11/3]: In addition, some identifiers are reserved for use by C++ implementations and standard libraries (17.6.4.3.2) and shall not be used otherwise; no diagnostic is required.
[C++11: 17.6.4.3.2/1]: Certain sets of names and function signatures are always reserved to the implementation:
Each name that contains a double underscore _ _ or begins with an underscore followed by an uppercase letter (2.12) is reserved to the implementation for any use.
Each name that begins with an underscore is reserved to the implementation for use as a name in the global namespace.
I was always under the impression that the intent of this passage was to prevent programmers from defining/mutating/undefining names that fall under the above criteria, so that the standard library implementors may use such names without any fear of conflicts with client code.
But, on the std-proposals list, it was claimed that this code is itself ill-formed for merely referring to such a reserved name. I can now see how the use of the phrase "shall not be used otherwise" from [C++11: 2.11/3]: may indeed suggest that.
One practical rationale given was that the macro _ALGORITHM could expand to some code that wipes your hard drive, for example. However, taking into account the likely intention of the rule, I'd say that such an eventuality has more to do with the obvious implementation-defined* nature of the _ALGORITHM name, and less to do with it being outright illegal to refer to it.
* "implementation-defined" in its English language sense, not the C++ standard sense of the phrase
I'd say that, as long as we're happy that we are going to have implementation-defined results and that we should investigate what that macro means on our implementation (if it exists at all!), it should not be inherently illegal to refer to such a macro provided we do not attempt to modify it.
For example, code such as the following is used all over the place to distinguish between code compiled as C and code compiled as C++:
#ifdef __cplusplus
extern "C" {
#endif
and I've never heard a complaint about that.
So, what do you think? Does "shall not be used otherwise" include simply writing such a name? Or is it probably not intended to be so strict (which may point to an opportunity to adjust the standard wording)?
Whether it's legal or not is implementation-specific (and identifier-specific).
When the Standard gives the implementation the sole right to use these names, that includes the right to make the names available in user code. If an implementation does so, great.
But if an implementation doesn't expressly give you the right, it is clear from "shall not be used otherwise" that the Standard does not, and you have undefined behavior.
The important part is "reserved to the implementation". It means that the compiler vendor may use those names and even document them. Your code may then use those names as documented. This is often used for extensions like __builtin_expect, where the compiler vendor avoids any clash with your identifiers (that are declared by your code) by using those reserved names. Even the standard uses them for things like __attribute__ to make sure it doesn't break existing (legal) code when adding new features.
http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1882
Each identifier that contains a double understore __ or begins with an underscore followed by an uppercase letter is reserved to the implementation for any use.
any use. (similar text occurs both before and after that defect fix is applied)
__cplusplus is defined by the standard. _ALGORITHM is reserved by the standard to be used by implementations. These seem quite different? (The two sections of the standard do conflict, in that one states that __cplusplus is reserved for any use, and another uses it specifically, but I think that the winner of that conflict is clear).
The _ALGORITHM identifier could, under the standard, be used as part of a pre-processing step to say "replace this source code with hard drive deleting code". Its existence (prior to pre-processing, or after) could be sufficient to completely change your program behavior.
Now this is unlikely, but I do not think it results in an non-conforming implementation. It is a matter of quality of implementation only.
An implementation is free to document and define what _ALGORITHM means. For example, it could document that it is a header guard for <algorithm>, and indicates if that header file has been included. Treating your current <algorithm> implementation as documentation is probably going to far.
I'd guess using __cplusplus in C mode is technically "just as bad" as using _ALGORITHM, but this question is a c++ question, not a c question. I haven't delved into the c standard to look for quotes about it.
The names in [cpp.predefined] are different. Those have a specified meaning, so an implementation can't reserve them for any use, and using them in a program has a well-defined portable meaning. Using an implementation-specific identifier like the example of _ALGORITHM is ill-formed because it violates a shall-rule.
Yes, I'm fully aware of multiple examples where the library specification uses "shall" to mean "this is a requirement on user code, and violations are UB, not ill-formed".
Regarding whether it's UB or implementation-defined, running an ill-formed program results in UB. The standard wording clearly says the program is ill-formed, UB occurs if the implementation still chooses to accept the program and run it.
So, if a program uses the identifier _ALGORITHM, that program is ill-formed, and running such a program is UB, but that does not mean it doesn't work fine on an implementation that uses _ALGORITHM as an include guard, nor does it mean that it doesn't work fine on an implementation that doesn't.
If users are concerned about such ill-formedness and potential UB, and said users want to write portable C++, they shouldn't use reserved identifiers in portable C++ programs. If users accept that regardless of the standard prohibiting such a use, no practical implementation will wipe your hard drive, they can freely use such reserved identifiers, but by the letter of the standard, such uses are still ill-formed.
Historically, the purpose for making the use of such tokens "undefined behavior" is that compilers are free to attach any meaning they want to any such token that are not defined within the C standard. For example, on some embedded processors, using __xdata as a storage class for a variable will ask that it be stored in an area of RAM which is slower to access than the normal variable-storage area, but is much larger. On typical processors of that family, storage for "normal" variables would be limited to about 100 bytes, but storage for xdata variables may be much larger--up to 64K. The standard says basically nothing about what compilers are allowed to do with such directives, although typically (I'm not sure if the standard mandates this behavior, though I'm unaware of compilers violating it) such tokens are generally ignored within code that is disabled using a #if or similar directives.
Some libraries' header files will start their own internal identifiers with something that starts with two underscores but includes a pattern that's unlikely to be used by a compiler for any purpose (e.g. version 23 of the Foozle library might precede its identifiers with use __FZ23). It would be perfectly legitimate for a future compilers to use identifiers starting with __FZ23 for other purposes, and if that were to happen the Foozle library would need to be changed to use something else. If, however, it is likely that a major compiler upgrade would likely necessitate rewrites of the Foozle library for other reasons anyway, that risk may be acceptable compared to the risk of identifiers conflicting with outside code.
Note also that some project header files which are targeted toward a processor that requires __ directives may conditionally define macros with those names when compiled for other processors, for example:
#ifndef USE_XDATA
#define __XDATA
#endif
though a somewhat better pattern would generally be:
#ifdef USE_XDATA
#define XDATA __XDATA
#else
#define XDATA
#endif
When writing new code, the latter pattern is often better, but the former pattern may sometimes be useful when adapting existing code written on a platform that requires __XDATA so that it may be used both on platforms that use/require that directive and on platforms that do not.
Whether or not it is legal is a matter of local law. Whether it means anything, and if so, what, is a matter for the language definition. When you use a name that's reserved to the implementation the behavior of your program is undefined. That means that the language definition does not tell you what the program does. Nothing more, nothing less. If the compiler you're using documents what a particular reserved identifier does, then you can use that identifier with that compiler. If you hunt through headers and guess what various un-documented identifiers mean you might be able to use them, but don't be surprised if your code breaks when a subsequent update changes something.
Don't get hung up on __cplusplus. It's core language, and the stuff about double underscores, etc. is library. If that's not convincing, just consider it a glitch. You can use __cplusplus in C++ programs; its meaning is well defined.

C++ using C code using double underscores in defines and identifiers

I understand that in C++ double underscores in identifiers are reserved for the compiler. I have some C code which has characteristics similar to this in the corresponding header files:
extern "C" {
#define HELLO__THERE 1
int hello__out__there( int );
}
I will be using this header in a C++ project, and plan to be doing things in C++ like:
if (HELLO__THERE == abc)
hello__out__there(foo);
Is this acceptable behavior in C++, covered by the standard?
In the C++03 standard 17.4.3.1.2 Global names, that use of underscores is defined as reserved:
Each name that contains a double underscore (_ _) or begins with an underscore followed by an upper-
case letter (2.11) is reserved to the implementation for any use.
Being reserved means that it might be used in any conforming implementation and therefore it is not advisable to use it.
You should be fine, unless by some fluke chance that one of the defines has clashes with your compiler's one. If that is the case, it'll likely be a warning or error (depending on your compiler's configuration) that there'll be a duplicate symbol.
Hope it helps. Cheers!
The method call would be OK but why compare HELLO_THERE to some value abc? If you were testing to see if a method was there I would wrap it in #ifdef ... #endif instead because if hello_out_there is not defined for some reason that would be a compile error.
double underlines in identifiers are reserved for the compiler
First, it's underscore I guess. Second such identifiers are reserved. That doesn't hold one back to not use it. You can use it (until there is no naming conflict).
Is this acceptable behavior in C++, covered by the standard?
Yes. It's acceptable. However, there is difference between acceptable and good code. If you are following a proper coding guidelines then your code will be good as well as acceptable. IMHO, you should refer to some good coding standards on internet; it will help you a lot.

difference between cstdint and tr1/cstdint

What is the difference between <cstdint> and <tr1/cstdint>? (apart from that one puts things in namespace std:: and the other in std::tr1::)
Since this stuff isn't standard yet I guess it's compiler specific so I'm talking about gcc. To compile with the non-tr1 one I must compile with -std=c++0x, but there is no such restriction when using tr1.
Is the answer perhaps that there is none but you can't go around adding things to std:: unless there, well, standard. So until c++0x is standardised an error must be issued using <cstdint> but you dont need to worry when adding to the tr1:: namespace, which makes no claim to things in it being standard? Or is there more to this?
Thanks.
p.s - If you read "std" as standard, as I do, I do apologise for the overuse of the word in this Q.
At least as far as I know, there was no intent to change <cstdint> between TR1 and C++0x. There's no requirement for #includeing <cstdint> to result in an error though -- officially, it's nothing more or less than undefined behavior. An implementation is allowed to specify exact behavior, and in this case it does.
I think you've got it. On my system, they're very similar, but with different macro logic. For instance, /usr/include/c++/4.4/tr1/cstdint has:
# define _GLIBCXX_BEGIN_NAMESPACE_TR1 namespace tr1 {
# define _GLIBCXX_END_NAMESPACE_TR1 }
# define _GLIBCXX_TR1 tr1::
but /usr/include/c++/4.4/cstdint has:
# define _GLIBCXX_BEGIN_NAMESPACE_TR1
# define _GLIBCXX_END_NAMESPACE_TR1
# define _GLIBCXX_TR1
So if it's being included as <cstdint> the TR1 namespace is simply defined into oblivion.
<tr1/cstdint> is defined, as name suggests, in TR1, while <cstdint> is defined in c++0x.
From gcc manual, -std=c++0x is needed to enable experimental features that are likely to be included in C++0x. However, <tr1/cstdint> is defined in TR1, not c++0x, so -std=c++0x is no needed.
The following is gcc manual for -std=c++0x for your reference.
The working draft of the upcoming ISO C++0x standard. This
option enables experimental features that are likely to be
included in C++0x. The working draft is constantly changing,
and any feature that is enabled by this flag may be removed
from future versions of GCC if it is not part of the C++0x
standard.