What is the meaning of "qualifier"? - c++

What is the meaning of "qualifier" and the difference between "qualifier" and "keyword"?
For the volatile qualifier in C and we can say that volatile is a keyword, so what is the meaning of "qualifier"?

A qualifier adds an extra "quality", such as specifying volatility or constness of a variable. They're similar to adjectives: "a fickle man", "a volatile int", "an incorruptible lady", "a const double". With or without a qualifier, the variable itself still occupies the same amount of memory, and each bit has the same interpretation or contribution to the state/value. Qualifiers just specify something about how it may be accessed or where it is stored.
keywords are predefined reserved identifiers (arguably, see below) that the language itself assigns some meaning to, rather than leaving free for you to use for your own purposes (i.e. naming your variables, types, namespaces, functions...).
Examples
volatile and const are both qualifiers and keywords
if, class, namespace are keywords but not qualifiers
std, main, iostream, x, my_counter are all identifiers but neither keywords nor qualifiers
There's a full list of keywords at http://www.cppreference.com/wiki/keywords/start. C++ doesn't currently have any qualifiers that aren't keywords (i.e. they're all "words" rather than some punctuation symbols).
Where do qualifiers appear relative to other type information?
A quick aside from "what does qualifier mean" into the syntax of using a qualifier - as Zaibis comments below:
...[qualifiers] only qualify what follows [when] there is nothing preceding. so if you want a const pointer to non-const object you had to write char * const var...
A bit (lot?) about identifiers
identifiers themselves are lexical tokens (distinct parts of the C++ source code) that:
begin with a alpha/letter character or underscore
continue with 0 or more alphanumerics or underscores
If it helps, you can think of identifiers as specified by the regexp "[A-Za-z_][A-Za-z_0-9]*". Examples are "egg", "string", "__f", "x0" but not "4e4" (a double literal), "0x0a" (that's a hex literal), "(f)" (that's three lexical tokens, the middle being the identifier "f").
But are keywords identifiers?
For C++, the terminology isn't used consistently. In general computing usage, keywords are a subset of identifiers, and some places/uses in the C++11 Standard clearly reflect that:
"The identifiers shown in Table 4 are reserved for use as keywords" (first sentence in 2.12 Keywords)
"Identifiers that are keywords or operators in C++..." (from 17.6.1.2 footnote 7)
(There are alternative forms of some operators - not, and, xor, or - though annoyingly Visual C++ disables them by default to avoid breaking old code that used them but not as operators.)
As Potatoswatter points out in a comment, in many other places the Standard defines lexical tokens identifier and keyword as mutually exclusive tokens in the Grammar:
"There are five kinds of tokens: identifiers, keywords, ..." (2.7 Tokens)
There's also an edge case where the determination's context sensitive:
If a keyword (2.12) or an alternative token (2.6) that satisfies the syntactic requirements of an identifier (2.11) is contained in an attribute-token, it is considered an identifier. (7.6.1. Attribute Syntax and Semantics 2)
Non-keyword identifiers you still shouldn't use
Some identifiers, like "std" or "string", have a specific usage specified in the C++ Standard - they are not keywords though. Generally, the compiler itself doesn't treat them any differently to your own code, and if you don't include any Standard-specified headers then the compiler probably won't even know about the Standard-mandated use of "std". You might be able to create your own function, variable or type called "std". Not a good idea though... while it's nice to understand the general division between keywords and the Standard library, implementations have freedom to blur the boundaries so you should just assume C++ features work when relevant headers are included and your usage matches documentation, and not do anything that might conflict.

Related

Why is the semicolon at the end of the init-statement within the for statement mandatory?

This is how the C++17 standard defines the for statement:
for ( init-statement conditionₒₚₜ ; expressionₒₚₜ ) statement
I've also looked in https://en.cppreference.com/w/cpp/language/for:
attr(optional) for ( init-statement condition(optional) ; iteration_expression(optional) ) statement
Therefore, I can only understand that init-statement isn't optional when using a for loop. In other words, one has to initialize some variable inside the header in order to work with this flow of control mechanism. Well, this doesn't seem to be the case, because when I type code such as
for (; a != b; ++a)
Presuming that I have already defined these variables, it runs just fine. Not only did I not declare a variable in the header, but also referenced other variables previously defined outside the loop. If I were to provide an init-statement, its object would only be usable inside the for loop, but it seems I can use variables declared elsewhere just fine.
Having come to this conclusion, I thought I didn't need the first part: tried removing the semicolon to make it more readable (and well, just for the heck of it). It won't compile now. Compiler says it expected a ;, calculates a != b as if it weren't inside a for loop: "C4552: '!=': result of expression not used" and finally concludes: "C2143: syntax error: missing ';' before ')'".
You don't need an init-statement, but you do need the semicolon. Why didn't these resources I cited initially made this clear, or is it implied in some way I am blind to?
The semicolon is mandatory because init-statement includes the semicolon.
Quote from N3337 6.5 Iteration statements:
for (for-init-statement condition_{opt}; expression_{opt}) statement
for-init-statement:
expression-statement
simple-declaration
6.2 Expression statement:
expression-statement:
expression_{opt} ;
7 Declarations:
simple-declaration:
decl-specifier-seq_{opt} init-declaratior-list_{opt} ;
attribute-specifier-seq decl-specifier-seq_{opt} init-declarator-list ;
If you look up init-statement and trace through the possibilities you'll find that they all require a trailing semi-colon.
iteration-statement:
    for ( for-init-statement conditionₒₚₜ ; expressionₒₚₜ ) statement
for-init-statement:
    expression-statement
    simple-declaration
expression-statement:
    expressionₒₚₜ ;
simple-declaration:
    attribute-specifier-seqₒₚₜ decl-specifier-seqₒₚₜ init-declarator-listₒₚₜ ;
A for-init-statement is either an expression-statement or a simple-declaration. An expression-statement has an optional expression but a mandatory semi-colon. Similarly, a simple-declaration's components are all optional except for the final semi-colon.
Why didn't these resources I cited initially made this clear, or is it implied in some way I am blind to?
Language Standard
The first resource you cited is the C++17 standard. The language standard is written to be precise and consistent. Often (as in your case) details are deferred to other sections so that each definition appears only once, no matter how often it is used. Ease of reading by the casual coder is not a priority. The first sentence of the standard sets the scope of the document.
This document specifies requirements for implementations of the C++ programming language.
Note who the requirements are for. They are for implementations, which are more commonly, yet imprecisely, called "compilers". The standard is not written for those writing C++ programs; it is written for those writing C++ compilers (and the other parts of the implementation). That is why this resource is not concerned about making points clear to a programmer's perspective.
cppreference.com
Coders often have to do a lot of interpretation to extract what they need from the standard. That is why there are books teaching the language and resources like your second citation, cppreference.com. Contrast the standard's stated purpose with that of cppreference.com:
Our goal is to provide programmers with a complete online reference for the C and C++ languages and standard libraries, i.e. a more convenient version of the C and C++ standards.
That website's target audience is programmers, rather than implementors. Hence the harsh precision of the standard has been diluted with a bit more explanation. It tries to strike a balance between rigorous correctness and readability, with a touch of empathy for the coder who needs to check a detail of some aspect of the language.
In this particular case, I think they did a reasonable job. I guess you just overlooked the explanations?
You copied the "formal syntax" line. Right after that is an "informal syntax" line that replaces the confusing init-statement with declaration-or-expression(optional) ; making it clear that the semicolon is required, but nothing need appear before it.
Right after the the syntax lines is a list of relevant definitions. The definition for init-statement calls out that it "may be a null statement ;". Also, the definition ends with
Note that any init-statement must end with a semicolon ;, which is why it is often described informally as an expression or a declaration followed by a semicolon.
Note that the definition is directly in the relevant page. This is one way the website differs in presentation from the standard. Since the website is not the ultimate authority, there is less fallout if it is inconsistent. That gives it the freedom to repeat itself and risk definitions not being identical. Case in point: the definition of init-statement is repeated in the if statement article (where the init-statement is optional).
Perhaps the important takeaway is to read those definitions instead of assuming you already know what the meanings are?

Definition of an "expression" in the C and C++ standards

I'm asking this question because I'm updating my C and C++ course materials and I've had past students ask about it...
From ISO/IEC 9899:2017 section 6.5 Expressions ¶1 (and similar in the C++ standard):
"An expression is a sequence of operators and operands that specifies computation of a value, or that designates an object or a function, or that generates side effects, or that performs a combination thereof. …"
Because the standards writers obviously choose their words carefully, the use of the phrase "sequence of operators and operands" seems potentially misleading to me. It seems to indicate that to be considered an expression there must be more than one operator and also more than one operand. Thus, literals like 123 or variables like XYZ would not be considered expressions because there is no operator, and they certainly can't be considered operands if there is no operator.
However, if 123 and XYZ actually are expressions, wouldn't replacing the phrase "sequence of operators and operands" with "sequence of one or more characters" or something similar be more accurate?
Please tell me what I am misinterpreting about what the standard is stating.
and similar in the C++ standard
I don't know about the C standard, but the C++ standard puts this statement in a non-normative notation. It has no normative value to C++, so it should be read as colloquial.
You forgot of Primary expressions that have a separate definition in (6.5.1).
You just confused different entities; the definition you provided describes exactly what it should describe.
6.5.1 Primary expressions
Syntax:
primary-expression:
identifier
constant
string-literal
(expression)
Yes, the definition of "expression" in the C standard is incomplete -- but not in a way that causes any actual problems (other than to picky people like me).
The word "expression" in the text you quoted is in italics, which means that that is the official definition of the term. It's clear from other parts of the standard that 123, for example, is an expression: it's a decimal-constant, which is an integer-constant, which is a constant, which is a primary-expression`, which is a postfix-expression, which (skipping multiple steps) is an expression.
It is not "a sequence of operators and operands". There is no operator, which implies that 123 is not an operand (this can be demonstrated by referring to the definitions of operator and operand elsewhere in the standard).
In practice, I've never heard of anyone, either a compiler implementer or a C programmer, having any real difficulty because of this incomplete definition. Compiler implementers refer to the language grammar. C programmers probably get a pretty good idea of what an "expression" is before reading the standard.
I'd like to see the definition of expression updated in a new edition of the standard. A definition that refers to the grammar rather than attempting an English description would IMHO be an improvement.
But if it isn't updated, we'll all keep using expressions without any problems.
As for C++, Nicol Bolas's answer correctly points out that the C++ standard doesn't have a formal definition of "expression" like the C standard does. It does have similar wording at the top of Clause 8: "An expression is a
sequence of operators and operands that specifies a computation." -- but the word "expression" is not in italics and that sentence is part of a "Note", and is therefore non-normative. In C++, the standard defines expressions syntactically.

Is it safe to use "yes","no","i","out" as name for variables/enum?

I have read the document about naming rule of C++, they seems to be usable names.
However, in practice, when I tried to create a variable/enum with a name like iter, yes, no, out, i, Error, etc. , Visual Studio will strangely use italic font for them.
I can only guess that they are reserved for special thing, and IDE (e.g. refactoring/rename process) might act strangely if I use such names.
Is it safe to use those names in practice? Am I just too panic?
Sorry if it is too newbie or an inappropriate question.
I doubt about it for a few weeks but too afraid to ask.
These names are valid and will not cause any "harm", the standard only says:
Each name that contains a double underscore (_ _) or begins with an underscore followed by an uppercase letter (2.11) is reserved to the
implementation for any use.
Each name that begins with an underscore is reserved to the implementation for use as a name in the global namespace.
Which means that all your names are fine to use in user-code. Visual Studio might just have a thing for these names as i and iter are usually used in looping.
These names are not reserved in standard C++, as explained by Rick Astley. An implementation may choose to accept additional reserved words to provide language extensions, such as ref class in C++/CLI. In some cases, such as with ref class, where ref is a contextual keyword, these extensions only make otherwise ill-formed programs well-formed in the scope of the extended language. In other cases, an otherwise well-formed program may change its meaning or become ill-formed. In the former case, the implementation is still conforming to the C++ standard, as long as it issues all mandatory diagnostics; in the latter case, it is certainly not conforming.
It is considered good practice to make the latter kind of extensions optional e.g. using a command line option, so that the implementation still has a mode in which it is fully standards compliant. My immediate guess is that VC++ in fact does allow you to write well-formed programs containing yes, no, i, iter which will behave as required by the standard (implementation bugs notwithstanding).
The IDE is a different beast, though. It is considered to be outside of the scope of the C++ standard, and might discourage or even stop you from writing perfectly well-formed code. That would still be a quality of implementation issue, or an issue of customer satisfaction, if you will.

Problems with matches containing space for gtksourceview?

I'm working on improving syntax highlighting for Ada in gtksourceview (currently, it is very outdated and very incomplete). An issue I'm having, is Ada is very positional, so matching many constructs requires matching those positions. I was able to do this in nano fairly easily.
So, let's consider a type declaration such as:
type Trit is range 0..2;
Keywords like "type", "is" and "range" are recognized (and were originally). However, type names were treated as keywords (a bad design decision, as Ada regularly defines new types, even for simple types like integers). What the use gets, is the types in Standard being colored, and all other types looking like normal text, defeating the purpose of highlighting. In some languages this might be a notable problem. However, the majority of type names occur after two regex patterns:
type\s+(\w|\.|_)+
:\s+(\w|\.|_)+
It might just be a matter of implementation (nano and gtksourceview seem to use different regex implementations). I thought the problem was recognizing spaces. As it turns out, putting the type context above the keyword context results in types now being highlighted, but the "type" keyword, or ":" operator are then not highlighted properly (they are highlighted as "type"). I was able to override this in nano, resulting in correct highlighting, but cannot seem to find out how gtksourceview does this.
Here you can see the old gtksourceview definition in action, which doesn't work for a file with many custom types. My nano definition in action sidebyside for comparison; matching by position is definately possible and works.
Here is what happens when I put the type context below the keyword context.
Here is what happens when I put the type context above the keyword context.
In both cases the context is the same, just a simple pattern to get started.
<context id="type" style-ref="type">
<match>(type)\s+\w+</match>
</context>
You may want to consider generating the parser from the formal description of the syntax of Ada in annex P of the Language Reference Manual.
Unfortunately this doesn't answer your question of how to formulate the syntax for a GtkSourceView.

Compiler: limitation of lexical analysis

In classic Compiler theory, the first 2 phases are Lexical Analysis and Parsing. They're in a pipeline. Lexical Analysis recognizes tokens as the input of Parsing.
But I came across some cases which are hard to be correctly recognized in Lexical Analysis. For example, the following code about C++ template:
map<int, vector<int>>
the >> would be recognized as bitwise right shift in a "regular" Lexical Analysis, but it's not correct. My feeling is it's hard to divide the handling of this kind of grammars into 2 phases, the lexing work has to be done in the parsing phase, because correctly parsing the >> relies on the grammar, not only the simple lexical rule.
I'd like to know the theory and practice about this problem. Also, I'd like to know how does C++ compiler handle this case?
The C++ standard requires that an implementation perform lexical analysis to produce a stream of tokens, before the parsing stage. According to the lexical analysis rules, two consecutive > characters (not followed by =) will always be interpreted as one >> token. The grammar provided with the C++ standard is defined in terms of these tokens.
The requirement that in certain contexts (such as when expecting a > within a template-id) the implementation should interpret >> as two > is not specified within the grammar. Instead the rule is specified as a special case:
14.2 Names of template specializations [temp.names] ###
After name lookup (3.4) finds that a name is a template-name or that an operator-function-id or a literal-operator-id refers to a set of overloaded functions any member of which is a function template if this is
followed by a <, the < is always taken as the delimiter of a template-argument-list and never as the less-than
operator. When parsing a template-argument-list, the first non-nested > is taken as the ending delimiter
rather than a greater-than operator. Similarly, the first non-nested >> is treated as two consecutive but
distinct > tokens, the first of which is taken as the end of the template-argument-list and completes the
template-id. [ Note: The second > token produced by this replacement rule may terminate an enclosing
template-id construct or it may be part of a different construct (e.g. a cast).—end note ]
Note the earlier rule, that in certain contexts < should be interpreted as the < in a template-argument-list. This is another example of a construct that requires context in order to disambiguate the parse.
The C++ grammar contains many such ambiguities which cannot be resolved during parsing without information about the context. The most well known of these is known as the Most Vexing Parse, in which an identifier may be interpreted as a type-name depending on context.
Keeping track of the aforementioned context in C++ requires an implementation to perform some semantic analysis in parallel with the parsing stage. This is commonly implemented in the form of semantic actions that are invoked when a particular grammatical construct is recognised in a given context. These semantic actions then build a data structure that represents the context and permits efficient queries. This is often referred to as a symbol table, but the structure required for C++ is pretty much the entire AST.
These kind of context-sensitive semantic actions can also be used to resolve ambiguities. For example, on recognising an identifier in the context of a namespace-body, a semantic action will check whether the name was previously defined as a template. The result of this will then be fed back to the parser. This can be done by marking the identifier token with the result, or replacing it with a special token that will match a different grammar rule.
The same technique can be used to mark a < as the beginning of a template-argument-list, or a > as the end. The rule for context-sensitive replacement of >> with two > poses essentially the same problem and can be resolved using the same method.
You are right, the theoretical clean distinction between lexer and parser is not always possible. I remember a porject I worked on as a student. We were to implement a C compiler, and the grammar we used as a basis would treat typedefined names as types in some cases, as identifiers in others. So the lexer had to switch between these two modes. The way I implemented this back then was using special empty rules, which reconfigured the lexer depending on context. To accomplish this, it was vital to know that the parser would always use exactly one token of look-ahead. So any change to lexer behaviour would have to occur at least one lexiacal token before the affected location. In the end, this worked quite well.
In the C++ case of >> you mention, I don't know what compilers actually do. willj quoted how the specification phrases this, but implementations are allowed to do things differently internally, as long as the visible result is the same. So here is how I'd try to tackle this: upon reading a >, the lexer would emit token GREATER, but also switch to a state where each subsequent > without a space in between would be lexed to GREATER_REPEATED. Any other symbol would switch the state back to normal. Instead of state switches, you could also do this by lexing the regular expression >+, and emitting multiple tokens from this rule. In the parser, you could then use rules like the following:
rightAngleBracket: GREATER | GREATER_REPEATED;
rightShift: GREATER GREATER_REPEATED;
With a bit of luck, you could make template argument rules use rightAngleBracket, while expressions would use rightShift. Depending on how much look-ahead your parser has, it might be neccessary to introduce additional non-terminals to hold longer sequences of ambiguous content, until you encounter some context which allows you to eventually make the decision between these cases.