Definition of an "expression" in the C and C++ standards

Definition of an "expression" in the C and C++ standards - c++

I'm asking this question because I'm updating my C and C++ course materials and I've had past students ask about it...
From ISO/IEC 9899:2017 section 6.5 Expressions ¶1 (and similar in the C++ standard):
"An expression is a sequence of operators and operands that specifies computation of a value, or that designates an object or a function, or that generates side effects, or that performs a combination thereof. …"
Because the standards writers obviously choose their words carefully, the use of the phrase "sequence of operators and operands" seems potentially misleading to me. It seems to indicate that to be considered an expression there must be more than one operator and also more than one operand. Thus, literals like 123 or variables like XYZ would not be considered expressions because there is no operator, and they certainly can't be considered operands if there is no operator.
However, if 123 and XYZ actually are expressions, wouldn't replacing the phrase "sequence of operators and operands" with "sequence of one or more characters" or something similar be more accurate?
Please tell me what I am misinterpreting about what the standard is stating.

and similar in the C++ standard
I don't know about the C standard, but the C++ standard puts this statement in a non-normative notation. It has no normative value to C++, so it should be read as colloquial.

You forgot of Primary expressions that have a separate definition in (6.5.1).
You just confused different entities; the definition you provided describes exactly what it should describe.
6.5.1 Primary expressions
Syntax:
primary-expression:
identifier
constant
string-literal
(expression)

Yes, the definition of "expression" in the C standard is incomplete -- but not in a way that causes any actual problems (other than to picky people like me).
The word "expression" in the text you quoted is in italics, which means that that is the official definition of the term. It's clear from other parts of the standard that 123, for example, is an expression: it's a decimal-constant, which is an integer-constant, which is a constant, which is a primary-expression`, which is a postfix-expression, which (skipping multiple steps) is an expression.
It is not "a sequence of operators and operands". There is no operator, which implies that 123 is not an operand (this can be demonstrated by referring to the definitions of operator and operand elsewhere in the standard).
In practice, I've never heard of anyone, either a compiler implementer or a C programmer, having any real difficulty because of this incomplete definition. Compiler implementers refer to the language grammar. C programmers probably get a pretty good idea of what an "expression" is before reading the standard.
I'd like to see the definition of expression updated in a new edition of the standard. A definition that refers to the grammar rather than attempting an English description would IMHO be an improvement.
But if it isn't updated, we'll all keep using expressions without any problems.
As for C++, Nicol Bolas's answer correctly points out that the C++ standard doesn't have a formal definition of "expression" like the C standard does. It does have similar wording at the top of Clause 8: "An expression is a
sequence of operators and operands that specifies a computation." -- but the word "expression" is not in italics and that sentence is part of a "Note", and is therefore non-normative. In C++, the standard defines expressions syntactically.

Related

Why is the semicolon at the end of the init-statement within the for statement mandatory?

This is how the C++17 standard defines the for statement:
for ( init-statement conditionₒₚₜ ; expressionₒₚₜ ) statement
I've also looked in https://en.cppreference.com/w/cpp/language/for:
attr(optional) for ( init-statement condition(optional) ; iteration_expression(optional) ) statement
Therefore, I can only understand that init-statement isn't optional when using a for loop. In other words, one has to initialize some variable inside the header in order to work with this flow of control mechanism. Well, this doesn't seem to be the case, because when I type code such as
for (; a != b; ++a)
Presuming that I have already defined these variables, it runs just fine. Not only did I not declare a variable in the header, but also referenced other variables previously defined outside the loop. If I were to provide an init-statement, its object would only be usable inside the for loop, but it seems I can use variables declared elsewhere just fine.
Having come to this conclusion, I thought I didn't need the first part: tried removing the semicolon to make it more readable (and well, just for the heck of it). It won't compile now. Compiler says it expected a ;, calculates a != b as if it weren't inside a for loop: "C4552: '!=': result of expression not used" and finally concludes: "C2143: syntax error: missing ';' before ')'".
You don't need an init-statement, but you do need the semicolon. Why didn't these resources I cited initially made this clear, or is it implied in some way I am blind to?

The semicolon is mandatory because init-statement includes the semicolon.
Quote from N3337 6.5 Iteration statements:
for (for-init-statement condition_{opt}; expression_{opt}) statement
for-init-statement:
expression-statement
simple-declaration
6.2 Expression statement:
expression-statement:
expression_{opt} ;
7 Declarations:
simple-declaration:
decl-specifier-seq_{opt} init-declaratior-list_{opt} ;
attribute-specifier-seq decl-specifier-seq_{opt} init-declarator-list ;

If you look up init-statement and trace through the possibilities you'll find that they all require a trailing semi-colon.
iteration-statement:
    for ( for-init-statement conditionₒₚₜ ; expressionₒₚₜ ) statement
for-init-statement:
    expression-statement
    simple-declaration
expression-statement:
    expressionₒₚₜ ;
simple-declaration:
    attribute-specifier-seqₒₚₜ decl-specifier-seqₒₚₜ init-declarator-listₒₚₜ ;
A for-init-statement is either an expression-statement or a simple-declaration. An expression-statement has an optional expression but a mandatory semi-colon. Similarly, a simple-declaration's components are all optional except for the final semi-colon.

Why didn't these resources I cited initially made this clear, or is it implied in some way I am blind to?
Language Standard
The first resource you cited is the C++17 standard. The language standard is written to be precise and consistent. Often (as in your case) details are deferred to other sections so that each definition appears only once, no matter how often it is used. Ease of reading by the casual coder is not a priority. The first sentence of the standard sets the scope of the document.
This document specifies requirements for implementations of the C++ programming language.
Note who the requirements are for. They are for implementations, which are more commonly, yet imprecisely, called "compilers". The standard is not written for those writing C++ programs; it is written for those writing C++ compilers (and the other parts of the implementation). That is why this resource is not concerned about making points clear to a programmer's perspective.
cppreference.com
Coders often have to do a lot of interpretation to extract what they need from the standard. That is why there are books teaching the language and resources like your second citation, cppreference.com. Contrast the standard's stated purpose with that of cppreference.com:
Our goal is to provide programmers with a complete online reference for the C and C++ languages and standard libraries, i.e. a more convenient version of the C and C++ standards.
That website's target audience is programmers, rather than implementors. Hence the harsh precision of the standard has been diluted with a bit more explanation. It tries to strike a balance between rigorous correctness and readability, with a touch of empathy for the coder who needs to check a detail of some aspect of the language.
In this particular case, I think they did a reasonable job. I guess you just overlooked the explanations?
You copied the "formal syntax" line. Right after that is an "informal syntax" line that replaces the confusing init-statement with declaration-or-expression(optional) ; making it clear that the semicolon is required, but nothing need appear before it.
Right after the the syntax lines is a list of relevant definitions. The definition for init-statement calls out that it "may be a null statement ;". Also, the definition ends with
Note that any init-statement must end with a semicolon ;, which is why it is often described informally as an expression or a declaration followed by a semicolon.
Note that the definition is directly in the relevant page. This is one way the website differs in presentation from the standard. Since the website is not the ultimate authority, there is less fallout if it is inconsistent. That gives it the freedom to repeat itself and risk definitions not being identical. Case in point: the definition of init-statement is repeated in the if statement article (where the init-statement is optional).
Perhaps the important takeaway is to read those definitions instead of assuming you already know what the meanings are?

In the comma operator, is the left operand guaranteed not to be actually executed if it hasn't side effects?

To show the topic I'm going to use C, but the same macro can be used also in C++ (with or without struct), raising the same question.
I came up with this macro
#define STR_MEMBER(S,X) (((struct S*)NULL)->X, #X)
Its purpose is to have strings (const char*) of an existing member of a struct, so that if the member doesn't exist, the compilation fails. A minimal usage example:
#include <stdio.h>
struct a
{
int value;
};
int main(void)
{
printf("a.%s member really exists\n", STR_MEMBER(a, value));
return 0;
}
If value weren't a member of struct a, the code wouldn't compile, and this is what I wanted.
The comma operator should evaluate the left operand and then discard the result of the expression (if there is one), so that my understanding is that usually this operator is used when the evaluation of the left operand has side effects.
In this case, however, there aren't (intended) side effects, but of course it works iff the compiler doesn't actually produce the code which evaluates the expression, for otherwise it would access to a struct located at NULL and a segmentation fault would occur.
Gcc/g++ 6.3 and 4.9.2 never produced that dangerous code, even with -O0, as if they were always able to “see” that the evaluation hasn't side effects and so it can be skipped.
Adding volatile in the macro (e.g. because accessing that memory address is the desired side effect) was so far the only way to trigger the segmentation fault.
So the question: is there anything in the C and C++ languages standard which guarantees that compilers will always avoid actual evaluation of the left operand of the comma operator when the compiler can be sure that the evaluation hasn't side effects?
Notes and fixing
I am not asking for a judgment about the macro as it is and the opportunity to use it or make it better. For the purpose of this question, the macro is bad if and only if it evokes undefined behaviour — i.e., if and only if it is risky because compilers are allowed to generate the “evaluation code” even when this hasn't side effects.
I have already two obvious fixes in mind: “reifying” the struct and using offsetof. The former needs an accessible memory area as big as the biggest struct we use as first argument of STR_MEMBER (e.g. maybe a static union could do…). The latter should work flawlessly: it gives an offset we aren't interested in and avoids the access problem — indeed I'm assuming gcc, because it's the compiler I use (hence the tag), and that its offsetof built-in behaves.
With the offsetof fix the macro becomes
#define STR_MEMBER(S,X) (offsetof(struct S,X), #X)
Writing volatile struct S instead of struct S doesn't cause the segfault.
Suggestions about other possible “fixes” are welcome, too.
Added note
Actually, the real usage case was in C++ in a static storage struct. This seems to be fine in C++, but as soon as I tried C with a code closer to the original instead of the one boiled for this question, I realized that C isn't happy at all with that:
error: initializer element is not constant
C wants the struct to be initializable at compile time, instead C++ it's fine with that.

Is there anything in the C and C++ languages standard which guarantees that compilers will always avoid actual evaluation of the left operand of the comma operator ?
It's the opposite. The standard guarantees that the left operand IS evaluated (really it does, there aren't any exceptions). The result is discarded.
Note: for lvalue expressions, "evaluate" does not mean "access the stored value". Instead, it means to work out where the designated memory location is. The other code encompassing the lvalue expression may or may not then go on to access the memory location. The process of reading from the memory location is known as "lvalue conversion" in C, or "lvalue to rvalue conversion" in C++.
In C++ a discarded-value expression (such as the left operand of the comma operator) only has lvalue to rvalue conversion performed on it if it is volatile and also meets some other criteria (see C++14 [expr]/11 for detail). In C lvalue conversion does occur for expressions whose result is not used (C11 6.3.2.1/2).
In your example, it is moot whether or not lvalue conversion happens. In both languages X->Y, where X is a pointer, is defined as (*X).Y; in C the act of applying * to a null pointer already causes undefined behaviour (C11 6.5.3/3), and in C++ the . operator is only defined for the case when the left operand actually designates an object (C++14 [expr.ref]/4.2).

The comma operator (C documentation, says something very similar) has no such guarantees.
In a comma expression E1, E2, the expression E1 is evaluated, its result is discarded ..., and its side effects are completed before evaluation of the expression E2 begins
irrelevant information omitted
To put it simply, E1 will be evaluated, although the compiler might optimize it away by the as-if rule if it is able to determine that there are no side-effects.

Gcc/g++ 6.3 and 4.9.2 never produced that dangerous code, even with -O0, as if they were always able to “see” that the evaluation hasn't side effects and so it can be skipped.
clang will produce code which raises an error if you pass it the -fsanitize=undefined option. Which should answer your question: at least one major implementation's developers clearly consider the code as having undefined behaviour. And they are correct.
Suggestions about other possible “fixes” are welcome, too.
I would look for something which is guaranteed not to evaluate the expression. Your suggestion of offsetof does the job, but may occasionally cause code to be rejected that would otherwise be accepted, such as when X is a.b. If you want that to be accepted, my thought would be to use sizeof to force an expression to remain unevaluated.

You ask,
is there anything in the C and C++ languages standard which guarantees
that compilers will always avoid actual evaluation of the left operand
of the comma operator when the compiler can be sure that the
evaluation hasn't side effects?
As others have remarked, the answer is "no". On the contrary, the standards both unconditionally state that the left-hand operand of the comma operator is evaluated, and that the result is discarded.
This is of course a description of the execution model of an abstract machine; implementations are permitted to work differently, so long as the observable behavior is the same as the abstract machine behavior would produce. If indeed evaluation of the left-hand expression produces no side effects, then that would permit skipping it altogether, but there is nothing in either standard that provides for requiring that it be skipped.
As for fixing it, you have various options, some of which apply only to one or the other of the two languages you have named. I tend to like your offsetof() alternative, but others have noted that in C++, there are types to which offsetof cannot be applied. In C, on the other hand, the standard specifically describes its application to structure types, but says nothing about union types. Its behavior on union types, though very likely to be consistent and natural, as technically undefined.
In C only, you could use a compound literal to avoid the undefined behavior in your approach:
#define HAS_MEMBER(T,X) (((T){0}).X, #X)
That works equally well on structure and union types (though you need to provide a full type name for this version, not just a tag). Its behavior is well defined when the given type does have such a member. The expansion violates a language constraint -- thus requiring a diagnostic to be emitted -- when the type does not have such a member, including when it is neither a structure type nor a union type.
You might also use sizeof, as #alain suggested, because although the sizeof expression will be evaluated, its operand will not be evaluated (except, in C, when its operand has variably-modified type, which will not apply to your use). I think this variation will work in both C and C++ without introducing any undefined behavior:
#define HAS_MEMBER(T,X) (sizeof(((T *)NULL)->X), #X)
I have again written it so that it works for both structs and unions.

The left operand of the comma operator is a discarded-value expression
5 Expressions
11 In some contexts, an expression only appears for its side effects. Such an expression is called a discarded-value
expression. The expression is evaluated and its value is discarded.
[...]
There are also unevaluated operands which, as the name implies, are not evaluated.
8 In some contexts, unevaluated operands appear (5.2.8, 5.3.3, 5.3.7,
7.1.6.2). An unevaluated operand is not evaluated. An unevaluated operand is considered a full-expression. [...]
Using a discarded-value expression in your use case is undefined behavior, but using an unevaluated operand is not.
Using sizeof for example would not cause UB because it takes an unevaluated operand.
#define STR_MEMBER(S,X) (sizeof(S::X), #X)
sizeof is preferable to offsetof, because offsetof can't be used for static members and classes that are not standard-layout:
18 Language support library
4 The macro offsetof(type, member-designator) accepts a restricted
set of type arguments in this International Standard. If type is not a
standard-layout class (Clause 9), the results are undefined. [...] The result of applying the offsetof macro to a field that
is a static data member or a function member is undefined. [...]

The language doesn't need to say anything about "actual execution" because of the as-if rule. After all, with no side effects how could you tell whether the expression is evaluated? (Looking at the assembly or setting breakpoints doesn't count; that's not part of execution of the program, which is all the language describes.)
On the other hand, dereferencing a null pointer is undefined behavior, so the language says nothing at all about what happens. You can't expect as-if to save you: as-if is a relaxation of otherwise-plausible restrictions on the implementation, and undefined behavior is a relaxation of all restrictions on the implementation. There is therefore no "conflict" between "this doesn't have side effects, so we can ignore it" and "this is undefined behavior, so nasal demons"; they're on the same side!

Differences in C and C++ with sequence points and UB

I used this post Undefined Behavior and Sequence Points to document undefined behavior(UB) in a C program and it was pointed to me that C and C++ have their own divergent rules for this [sequence points]. So what are the differences between C and C++ when it comes to sequence points and related UB? Can’t I use a post about C++ sequences to analyze what is happening in C code?
* Of Course I am not talking about features of C++ not applicable to C.

There are two parts to this question, we can tackle a comparison of sequence points rules without much trouble. This does not get us too far though, C and C++ are different languages which have different standards(the latest C++ standard is almost twice as large as the the latest C standard) and even though C++ uses C as a normative reference it would be incorrect to quote the C++ standard for C and vice versa, regardless how similar certain sections may be. The C++ standard does explicitly reference the C standard but that is for small sections.
The second part is a comparison of undefined behavior between C and C++, there can be some big differences and enumerating all the differences in undefined behavior may not be possible but we can give some indicative examples.
Sequence Points
Since we are talking about sequence points then this is covering pre C++11 and pre C11. The sequence point rules do not differ greatly as far as I can tell between C99 and Pre C++11 draft standards. As we will see in some of the example I give of differing undefined behavior the sequence point rules do not play a part in them.
The sequence points rules are covered in the closest draft C++ standard to C++03 section 1.9 Program execution which says:
There is a sequence point at the completion of evaluation of each full-expression12).
When calling a function (whether or not the function is inline), there is a sequence point after the evaluation of all
function arguments (if any) which takes place before execution of any expressions or statements in the function body.
There is also a sequence point after the copying of a returned value and before the execution of any expressions outside
the function13). Several contexts in C++ cause evaluation of a function call, even though no corresponding function call
syntax appears in the translation unit. [ Example: evaluation of a new expression invokes one or more allocation and
constructor functions; see 5.3.4. For another example, invocation of a conversion function (12.3.2) can arise in contexts
in which no function call syntax appears. —end example ] The sequence points at function-entry and function-exit
(as described above) are features of the function calls as evaluated, whatever the syntax of the expression that calls the
function might be.
In the evaluation of each of the expressions
a && b
a || b
a ? b : c
a , b
using the built-in meaning of the operators in these expressions (5.14, 5.15, 5.16, 5.18), there is a sequence point after
the evaluation of the first expression14).
I will use the sequence point list from the draft C99 standard Annex C which although it is not normative I can find no disagreement with the normative sections it references. It says:
The following are the sequence points described in 5.1.2.3:
The call to a function, after the arguments have been evaluated (6.5.2.2).
The end of the first operand of the following operators: logical AND && (6.5.13);
logical OR || (6.5.14); conditional ? (6.5.15); comma , (6.5.17).
The end of a full declarator: declarators (6.7.5);
The end of a full expression: an initializer (6.7.8); the expression in an expression
statement (6.8.3); the controlling expression of a selection statement (if or switch)
(6.8.4); the controlling expression of a while or do statement (6.8.5); each of the
expressions of a for statement (6.8.5.3); the expression in a return statement
(6.8.6.4).
The following entries do not seem to have equivalents in the draft C++ standard but these come from the C standard library which C++ incorporates by reference:
Immediately before a library function returns (7.1.4).
After the actions associated with each formatted input/output function conversion
specifier (7.19.6, 7.24.2).
Immediately before and immediately after each call to a comparison function, and
also between any call to a comparison function and any movement of the objects
passed as arguments to that call (7.20.5).
So there is not much of a difference between C and C++ here.
Undefined Behavior
When it comes to the typical examples of sequence points and undefined behavior, for example those covered in Section 5 Expression dealing with modifying a variable more than once within a sequence points I can not come up with an example that is undefined in one but not the other. In C99 it says:
Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an
expression.72) Furthermore, the prior value shall be read only to
determine the value to be stored.73)
and it provides these examples:
i = ++i + 1;
a[i++] = i;
and in C++ it says:
Except where noted, the order of evaluation of operands of individual
operators and subexpressions of individual expressions, and the order
in which side effects take place, is unspecified.57) Between the
previous and next sequence point a scalar object shall have its stored
value modified at most once by the evaluation of an expression.
Furthermore, the prior value shall be accessed only to determine the
value to be stored. The requirements of this paragraph shall be met
for each allowable ordering of the subexpressions of a full
expression; otherwise the behavior is undefined
and provides these examples:
i = v[i ++]; / / the behavior is undefined
i = ++ i + 1; / / the behavior is undefined
In C++11 and C11 we do have one major difference which is covered in Assignment operator sequencing in C11 expressions which is the following:
i = ++i + 1;
This is due to the result of pre-increment being an lvalue in C++11 but not in C11 even though the sequencing rules are the same.
We do have major difference in areas that have nothing to do with sequence points:
In C what uses of an indeterminate value is undefined has always been well specified while in C++ it was not until the recent draft C++1y standard that it has been well specified. This is covered in my answer to Has C++ standard changed with respect to the use of indeterminate values and undefined behavior in C++1y?
Type punning through a union has always been well defined in C but not in C++ or at least it is hotly debatable whether it is undefined behavior or not. I have several references to this in my answer to Why does optimisation kill this function?
In C++ simply falling off the end of value returning function is undefined behavior while in C it is only undefined behavior if you use the value.
There are probably plenty more examples but these are ones I have written about before.

Is there a valid C++11 program with the expression 'C++11'?

The name of the programming language C++ derives from the parent language C and the ++ operator (it should arguably be ++C) and, hence, the expression C++ may naturally occur in C++ programs. I was wondering whether you can write a valid C++ program using the 2011 standard (without extensions) and containing the expression C++11 not within quotes and after pre-processing (note: edited the requirement, see also answer).
Obviously, if you could write a C++ program prior to the 2011 standard with the expressions C++98 or C++03, then the answer is a trivial yes. But I don't think that was possible (though I don't really know). So, can it be done with the new armory of C++11?

NO if we require the characters C++11 to be outside any literal, after preprocessing -- because at translation phase 7 the three tokens will be identifier, ++ and integer-literal
The first two tokens are a postfix-expression, the later is a primary.
There is no reduction in the grammar that can contain these two nonterminals in sequence, so any program containing C++11 will fail syntax analysis.
However, if you do not consider character literals to be strings, then the answer is YES as you can contain it in a wide character literal:
int main()
{
wchar_t x = L'C++11';
}
which does not use the preprocessor or a string literal, and the construct is required to be supported by the standard:
The value of a wide-character literal containing multiple c-chars is implementation-defined.

So, can it be done with the new armory of C++11?
No.

Define “valid C++ program”.
The C++ standard defines a “well-formed C++ program” as “a C++ program constructed according to the syntax rules, diagnosable semantic rules, and the One Definition Rule”. This leaves open the possibility of C++ programs that are not well-formed. (C explicitly has the notion of programs that are conforming but not strictly conforming, e.g., that use extensions of a particular compiler.)
If you consider it valid to use extensions, then you can implement a C++ compiler that permits C++11 in some context.

What is the meaning of "qualifier"?

What is the meaning of "qualifier" and the difference between "qualifier" and "keyword"?
For the volatile qualifier in C and we can say that volatile is a keyword, so what is the meaning of "qualifier"?

A qualifier adds an extra "quality", such as specifying volatility or constness of a variable. They're similar to adjectives: "a fickle man", "a volatile int", "an incorruptible lady", "a const double". With or without a qualifier, the variable itself still occupies the same amount of memory, and each bit has the same interpretation or contribution to the state/value. Qualifiers just specify something about how it may be accessed or where it is stored.
keywords are predefined reserved identifiers (arguably, see below) that the language itself assigns some meaning to, rather than leaving free for you to use for your own purposes (i.e. naming your variables, types, namespaces, functions...).
Examples
volatile and const are both qualifiers and keywords
if, class, namespace are keywords but not qualifiers
std, main, iostream, x, my_counter are all identifiers but neither keywords nor qualifiers
There's a full list of keywords at http://www.cppreference.com/wiki/keywords/start. C++ doesn't currently have any qualifiers that aren't keywords (i.e. they're all "words" rather than some punctuation symbols).
Where do qualifiers appear relative to other type information?
A quick aside from "what does qualifier mean" into the syntax of using a qualifier - as Zaibis comments below:
...[qualifiers] only qualify what follows [when] there is nothing preceding. so if you want a const pointer to non-const object you had to write char * const var...
A bit (lot?) about identifiers
identifiers themselves are lexical tokens (distinct parts of the C++ source code) that:
begin with a alpha/letter character or underscore
continue with 0 or more alphanumerics or underscores
If it helps, you can think of identifiers as specified by the regexp "[A-Za-z_][A-Za-z_0-9]*". Examples are "egg", "string", "__f", "x0" but not "4e4" (a double literal), "0x0a" (that's a hex literal), "(f)" (that's three lexical tokens, the middle being the identifier "f").
But are keywords identifiers?
For C++, the terminology isn't used consistently. In general computing usage, keywords are a subset of identifiers, and some places/uses in the C++11 Standard clearly reflect that:
"The identifiers shown in Table 4 are reserved for use as keywords" (first sentence in 2.12 Keywords)
"Identifiers that are keywords or operators in C++..." (from 17.6.1.2 footnote 7)
(There are alternative forms of some operators - not, and, xor, or - though annoyingly Visual C++ disables them by default to avoid breaking old code that used them but not as operators.)
As Potatoswatter points out in a comment, in many other places the Standard defines lexical tokens identifier and keyword as mutually exclusive tokens in the Grammar:
"There are five kinds of tokens: identifiers, keywords, ..." (2.7 Tokens)
There's also an edge case where the determination's context sensitive:
If a keyword (2.12) or an alternative token (2.6) that satisfies the syntactic requirements of an identifier (2.11) is contained in an attribute-token, it is considered an identifier. (7.6.1. Attribute Syntax and Semantics 2)
Non-keyword identifiers you still shouldn't use
Some identifiers, like "std" or "string", have a specific usage specified in the C++ Standard - they are not keywords though. Generally, the compiler itself doesn't treat them any differently to your own code, and if you don't include any Standard-specified headers then the compiler probably won't even know about the Standard-mandated use of "std". You might be able to create your own function, variable or type called "std". Not a good idea though... while it's nice to understand the general division between keywords and the Standard library, implementations have freedom to blur the boundaries so you should just assume C++ features work when relevant headers are included and your usage matches documentation, and not do anything that might conflict.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js