What does an `auto and T` parameter mean in C++20? [duplicate] - c++

I'm used to the and and or keywords in C++. I've always used them and typing them is fast and comfortable for me. Once I've heard that these aliases are non-standard and may not work on all compilers. But I'm not sure of it, I don't really know if it's true.
Let's assume that I give someone my code, will he have problems compiling it?
Is it all right when I use and, or instead of &&, ||? Or are these keywords really non-standard?
P.S.I use the MinGW compiler.

They are in fact standard in C++, as defined by the ISO 14882:2003 C++ standard 2.5/2 (and, indeed, as defined by the 1998 edition of the standard). Note that they are built into the language itself and don't require that you include a header file of some sort.
However, they are very rarely used, and I have yet to see production code that actually uses the alternative tokens. The only reason why the alternative tokens exist in the first place is because these characters on some keyboards (especially non-QWERTY ones) were either nonexistent or clumsy to type. It's still in the standard for backwards compatibility.
Even though they are standard, I highly recommend that you don't use them. The alternative tokens require more characters to type, and the QWERTY keyboard layout already has all the characters needed to type out C++ code without having to use the alternative tokens. Also, they would most likely bewilder readers of your code.
2.5/2 Alternative tokens
In all respects of the language, each
alternative token behaves the same,
respectively, as its primary token,
except for its spelling. The set of
alternative tokens is defined in Table
2.
Table 2 - alternative tokens
+--------------+-----------+
| Alternative | Primary |
+--------------+-----------+
| <% | { |
| %> | } |
| <: | [ |
| :> | ] |
| %: | # |
| %:%: | ## |
| and | && |
| bitor | | |
| or | || |
| xor | ^ |
| compl | ~ |
| bitand | & |
| and_eq | &= |
| or_eq | |= |
| xor_eq | ^= |
| not | ! |
| not_eq | != |
+--------------+-----------+

These keywords ARE standard and are described in section 2.5 of the standard. Table 2 is a table of these "alternative tokens". You can use them all you want, even though everyone will hate you if you do.

They are standard in the new c++0x standard. Up-to-date modern compilers should recognise them, although I don't believe they are obliged to yet. Whatever floats your boat, I assume.

they're standard C++, but with older compilers and possibly also with MSVC 10.0 (i haven't checked) you may have to include a special header, [isosomethingsomething.h]
cheers & hth.,

I have always messed up ^ (xor) and the ~ (two complement) operators. With the alternative tokens (that I believe should be primary ones) there is no question about what they do, yes, I agree with former posters that the textual ones are much more descriptive.
There is another possible messup using the digraphs, it is possible to forget one of the characters in ||, && that will cause subtle bugs and strange behaviours.
With the textual operators, it is much harder to make such a mistake.
I believe what I mentioned above are real valid arguments to improve code safety and clarity. Most C++ programmers SHOULD in my opinion try to get used to the textual operators in favor of the old cryptic ones.
I am surprised that so few programmers know about them. These operators should have taken over long time ago as I see it.

Wow, i've been using and looking at many C++ code examples for years.. and never, until now, knew about these so I guess that means most people don't use them. So, for the sake of consistency (if you plan on working in group projects etc) it's probably best to make a habit of using && and ||.

It's syntactically valid, given those are required alternative tokens.
In all respects of the language, each alternative token behaves the same, respectively, as its primary token, except for its spelling.
However, some tokens are used for more than just logical or bitwise operators. So one can see the idiosyncratic:
auto& foo(auto and T) { // C++20 forwarding reference
return T;
}
Or even
auto& foo(auto bitand T) { // lvalue reference
return T;
}
That's gonna make one scratch their head for a while.

Section 2.5 of the ISO/IEC 14882:1998 standard (the original C++ standard) says:
§2.5 Alternative tokens [lex.digraph]
1 Alternative token representations are provided for some operators and punctuators16).
2 In all respects of the language, each alternative token behaves the same, respectively, as its primary token,
except for its spelling17). The set of alternative tokens is defined in Table 2.
16) These include “digraphs” and additional reserved words. The term “digraph” (token consisting of two characters) is not perfectly
descriptive, since one of the alternative preprocessing tokens
is %:%: and of course several primary tokens contain two characters.
Nonetheless, those alternative tokens that aren’t lexical keywords are colloquially known as “digraphs”.
17) Thus the “stringized” values (16.3.2) of [ and <: will be different, maintaining the source spelling, but the tokens can otherwise be
freely interchanged.
Table 2—alternative tokens
_______________________________________________________________________________
alternative primary | alternative primary | alternative primary
<% { | and && | and_eq &=
%> } | bitor | | or_eq |=
<: [ | or || | xor_eq ^=
:> ] | xor ^ | not !
%: # | compl ~ | not_eq !=
%:%: ## | bitand & |
_______________________________________________________________________________
There is no discussion of 'if you include some header' (though in C, you need #include <iso646.h>). Any implementation that does not support the keywords or digraphs is not compliant with the 1998 edition, let alone later editions, of the C++ standard.

Obviously in regards to backward compatability the "and/or" keywords are not the issue. I would believe them to be the newer standard. It is just old programmers not understanding that some noob might have to be able to read the code and not want to look up what && means. Then again if any IT department is worth it's salt it will make the programmers conform to the standards of the company! That is my belief so (and/or) are futuristic and real possible standard going towards the future. && is backward compatable not(pun) (and/or).

Related

Perl 6 Grammar doesn't match like I think it should

I'm doing Advent of Code day 9:
You sit for a while and record part of the stream (your puzzle input). The characters represent groups - sequences that begin with { and end with }. Within a group, there are zero or more other things, separated by commas: either another group or garbage. Since groups can contain other groups, a } only closes the most-recently-opened unclosed group - that is, they are nestable. Your puzzle input represents a single, large group which itself contains many smaller ones.
Sometimes, instead of a group, you will find garbage. Garbage begins with < and ends with >. Between those angle brackets, almost any character can appear, including { and }. Within garbage, < has no special meaning.
In a futile attempt to clean up the garbage, some program has canceled some of the characters within it using !: inside garbage, any character that comes after ! should be ignored, including <, >, and even another !.
Of course, this screams out for a Perl 6 Grammar...
grammar Stream
{
rule TOP { ^ <group> $ }
rule group { '{' [ <group> || <garbage> ]* % ',' '}' }
rule garbage { '<' [ <garbchar> | <garbignore> ]* '>' }
token garbignore { '!' . }
token garbchar { <-[ !> ]> }
}
This seems to work fine on simple examples, but it goes wrong with two garbchars in a row:
say Stream.parse('{<aa>}');
gives Nil.
Grammar::Tracer is no help:
TOP
| group
| | group
| | * FAIL
| | garbage
| | | garbchar
| | | * MATCH "a"
| | * FAIL
| * FAIL
* FAIL
Nil
Multiple garbignores are no problem:
say Stream.parse('{<!!a!a>}');
gives:
「{<!!a!a>}」
group => 「{<!!a!a>}」
garbage => 「<!!a!a>」
garbignore => 「!!」
garbchar => 「a」
garbignore => 「!a」
Any ideas?
UPD Given that the Advent of code problem doesn't mention whitespace you shouldn't be using the rule construct at all. Just switch all the rules to tokens and you should be set. In general, follow Brad's advice -- use token unless you know you need a rule (discussed below) or a regex (if you need backtracking).
My original answer below explored why the rules didn't work. I'll leave it in for now.
TL;DR <garbchar> | contains a space. Whitespace that directly follows any atom in a rule indicates a tokenizing break. You can simply remove this inappropriate space, i.e. write <garbchar>| instead (or better still, <.garbchar>| if you don't need to capture the garbage) to get the result you seek.
As your original question allowed, this isn't a bug, it's just that your mental model is off.
Your answer correctly identifies the issue: tokenization.
So what we're left with is your follow up question, which is about your mental model of tokenization, or at least how Perl 6 tokenizes by default:
why ... my second example ... goes wrong with two garbchars in a row:
'{<aa>}'
Simplifying, the issue is how to tokenize this:
aa
The simple high level answer is that, in parsing vernacular, aa will ordinarily be treated as one token, not two, and, by default, Perl 6 assumes this ordinary definition. This is the issue you're encountering.
You can overrule this ordinary definition to get any tokenizing result you care to achieve. But it's seldom necessary to do so and it certainly isn't in simple cases like this.
I'll provide two redundant paths that I hope might lead folk to the correct mental model:
For those who prefer diving straight into nitty gritty detail, there's a reddit comment I wrote recently about tokenization in Perl 6.
The rest of this SO answer provides a high level discussion that complements the low level explanation in my reddit comment.
Excerpting from the "Obstacles" section of the wikipedia page on tokenization, and interleaving the excerpts with P6 specific discussion:
Typically, tokenization occurs at the word level. However, it is sometimes difficult to define what is meant by a "word". Often a tokenizer relies on simple heuristics, for example:
Punctuation and whitespace may or may not be included in the resulting list of tokens.
In Perl 6 you control what gets included or not in the parse tree using capturing features that are orthogonal to tokenizing.
All contiguous strings of alphabetic characters are part of one token; likewise with numbers.
Tokens are separated by whitespace characters, such as a space or line break, or by punctuation characters.
By default, the Perl 6 design embodies an equivalent of these two heuristics.
The key thing to get is that it's the rule construct that handles a string of tokens, plural. The token construct is used to define a single token per call.
I think I'll end my answer here because it's already getting pretty long. Please use the comments to help us improve this answer. I hope what I've written so far helps.
A partial answer to my own question: Change all the rules to tokens and it works.
It makes sense, because the difference is :sigspace, which we don't need or want here. What I don't understand, though, is why it did work for some input, like my second example.
The resulting code is here, if you're interested.

Extracting information using BNF grammars

I would like to extract information from a body of text and be able to query it.
The structure of this body of text would be specified by a BNF grammar (or variant), and the information to extract would be specified at runtime (the syntax of the query does not matter at the moment).
So the requirements are simple, really:
Receive some structured body of text
Load it in an exploitable form using a grammar to parse it
Run a query to select some portions of it
To illustrate with an example, suppose that we have such grammar (in a customized BNF format):
<digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<id> ::= 15 * digit
<hex> ::= 10 * (<digit> | a | b | c | d | e | f)
<anything> ::= <digit> | .... (all characters)
<match> ::= <id> (" " <hex>)*
<nomatch> ::= "." <anything>*
<line> ::= (<match> | <nomatch> | "") [<CR>] <LF>
<text> ::= <line>+
For which such text would be conforming:
012345678901234
012345678901234 abcdef0123
Nor the previous line nor this one would match
And then I would want to list all tags that appear in the rule, so for example using an XPath like syntax:
match//id
which would return a list.
This sounds relatively easy, except that I have two big constraints:
the BNF grammar should be read at runtime (from a string/vector like structure)
the queries will be read at runtime too
Some precisions:
the grammar is not expected to change often so a "compilation" step to produce an in-memory structure is acceptable (and perhaps necessary to achieve good speed)
speed is of the essence, bonus points for on-the-fly collection of the wanted portions
bonus points for the possibility to have callbacks to disambiguate (sometimes the necessary disambiguation information might require DB access for example)
bonus points for multipart grammars (favoring modularity and reuse of grammar elements)
I know of lex/yacc and flex/bison for example, however they appear to only create C / C++ code to be compiled, which is not what I am looking after.
Do you know of a robust library (preferably free and open-source) that can transform a BNF grammar into a parser "on-the-fly" and produce a structured in-memory output from a body of text using this parser ?
EDIT: I am open to alternatives. At the moment, the idea was that perhaps regexes could allow this extraction, however given the complexity of the grammars involved, this could get ugly quickly and thus maintaining the regexes would be quite a horrendous task. Furthermore, by separating grammars and extraction I hope to be able to reuse the same grammar for different extractions needs rather than having slightly different regexes each time.
I have a proprietary solution that can convert grammar source into an in memory representation. The result is a pure data structure. Any code can use it. I also have C++ class that actually implements the parser. Rule handlers are implemented as virtual methods.
The primary difference between our solution and YACC/Bison is that no C/C++ code is generated. This means that grammar can be reloaded without recompiling the app. The grammar can be annotated with application IDs that are used in the code of the rule handlers.
The GOLD parser system produces an LALR parse table that is apparantly loaded AFAIK at runtime. I believe it has a C++ "parsing" engine so that should be easy to integrate.
You'd read your grammar, fork a subprocess to get the GOLD parser generator to produce the table, and then call your wired-in GOLD parser to load-and-parse.
I don't know how you attach actions to the reductions, which you'd probably like to do.
I have no specific experience with GOLD. "Gold" luck to you.

Why are Clojure's multimethods better than 'if' or 'case' statements

I've spent some time, trying to understand Clojure multimethods. The main "pro" multimethod argument, as far as I understand, is their flexibility, however, I'm confused with the argumentation of why multimethods are better than a simple if or case statement. Could someone, please, explain, where is the line between polymorphism and an overglorified case statement drawn?
EDIT: I should have been clearer in the question, that I'm more interested in comparison with the 'if' statement. Thanks a lot for the answers!
Say we have types A, B, C, D and E, and methods m1, m2, m3 taking single argument of the previous types. You can put them in a table like this:
| A | B | C | D | E |
m1 | | | | | |
m2 | | | | | |
m3 | | | | | |
The "switch" statement strategy is implementing one row of this table at a time. Suppose you add a new type F. You'll have to modify all implementations to support it.
The class-based polymorphism (C++, Java, etc.) allows you to implement a whole column instead. Adding a new type is thus easy, as you don't have to change the already defined classes. But adding a new method is hard, as you'll have to add it to all other types.
Multimethods allow you to implement single cells of the table independently of each other.
This flexibility is even greater if you have to dispatch on multiple arguments. Each new argument adds another dimension to this table, and both swich-based and class-based dispatches become very complex pretty quickly (c.f. Visitor pattern).
Note, that multimethods are actually even more generic than depicted, as you can dispatch on pretty much anything, not just on the types of the arguments.
The difference between multimethods and a big if-statement is that you need to modify the function that contains the case-statement to add cases to the if-statement. You can add a new method without touching the previously existing methods.
So if you define a multimethod inside your library and you want your users to be able to extend it for their own data types, that's no problem. If you had used an if-statement instead, it would be a big problem.
ivant's answer above can be expanded by taking a look at this article . It does a good job of explaining the power of protocols. Think of multimethods as protocols with many dimensions.

When did "and" become an operator in C++

I have some code that looks like:
static const std::string and(" AND ");
This causes an error in g++ like so:
Row.cpp:140: error: expected unqualified-id before '&&' token
so after cursing the fool that defined "and" as &&, I added
#ifdef and
#undef and
#endif
and now I get
Row.cpp:9:8: error: "and" cannot be used as a macro name as it is an operator in C++
Which leads to my question of WHEN did "and" become an operator in C++? I can't find anything that indicates it is, except of course this message from g++
From the C++03 standard, section 2.5:
2.5 Alternative tokens
Alternative token representations are provided for some operators and punctuators. In all respects of the language, each alternative token behaves the same, respectively, as its primary token, except for its spelling. The set of alternative tokens is defined in Table 2.
Table 2—alternative tokens
alternative primary
<% {
%> }
<: [
:> ]
%: #
%:%: ##
and &&
bitor |
or ||
xor ˆ
compl ˜
bitand &
and_eq &=
or_eq |=
xor_eq ˆ=
not !
not_eq !=
They've been there since C++ 98. They're listed in the §2.5/2 of the standard (either the 1998 or the 2003 edition). The alternate tokens include: and, or, xor, not, bitand, bitor, compl, and_eq, or_eq, xor_eq, not, not_eq.
You can use -fno-operator-names to disable this. Alternatively, you can name your std::string object something else!
There are several such alternatives defined in C++. You can probably use switches to turn these on/off.
According to C++ Standard 2.12 there are predefined preprocessor tokens "which are used in
the syntax of the preprocessor or are converted into tokens for operators and punctuators." and is one of them. In new C++ Standard there is new 2.12/2:
Furthermore, the alternative representations shown in Table 4 for certain operators and punctuators (2.6) are reserved and shall not be used otherwise:
and and_eq bitand bitor compl not
not_eq or or_eq xor xor_eq
They were added because some of those characters are difficult to type on some keyboards.
I don't know when it was introduced, it may well have been there from the beginning, but I believe the reason it's there is as an alternative to && for people with restricted character sets i.e. where they don't actually have the ampersand character.
There are many others too eg. and_eq, or, compl and not to name just a selection.

When were the 'and' and 'or' alternative tokens introduced in C++?

I've just read this nice piece from Reddit.
They mention and and or being "Alternative Tokens" to && and ||
I was really unaware of these until now. Of course, everybody knows about the di-graphs and tri-graphs, but and and or? Since when? Is this a recent addition to the standard?
I've just checked it with Visual C++ 2008 and it doesn't seem to recognize these as anything other than a syntax error. What's going on?
From the first ISO C++ standard C++98, this is described in 2.5/ Alternative tokens [lex.digraph]:
Alternative token representations are provided for some operators and punctuators.
In all respects of the language, each alternative token behaves the same, respectively, as its primary token, except for its spelling. The set of alternative tokens is defined in Table 2.
Table 2 - Alternative tokens
alternative primary | alternative primary | alternative primary
--------------------+---------------------+--------------------
<% { | and && | and_eq &=
%> } | bitor | | or_eq |=
<: [ | or || | xor_eq ^=
:> ] | xor ^ | not !
%: # | compl ~ | not_eq !=
%:%: ## | bitand & |
So it's been around since the earliest days of the C++ standardisation process. The reason so few people are aware of it is likely because the main use case was for people operating in environments where the full character set wasn't necessarily available. For example (and this is stretching my memory), the baseline EBCDIC character set on the IBM mainframes did not have the square bracket characters [ and ].
MSVC supports them as keywords only if you use the /Za option to disable extensions; this is true from at least VC7.1 (VS2003).
You can get them supported as macros by including iso646.h.
My guess is they believe that making them keywords by default would break too much existing code (and I wouldn't be surprised if they are right).
To actually answer the question :
They were defined in the first C++ standard.
See the C++ standard. The committee draft #2 is freely available at ftp://ftp.research.att.com/dist/c++std/WP/CD2/body.pdf, although it's non-authoritative, out-of-date, and partially incorrect in a few places. Specifically, in section 2.5, Alternative Tokens, the following are defined:
Alternative Primary
<% {
%> }
<: [
:> ]
%: #
%:%: ##
and &&
bitor |
or ||
xor ^
compl ~
bitand &
and_eq &=
or_eq |=
xor_eq ^=
not !
not_eq !=
Though honestly, I've never seen any of them ever used except for and, or, and not, and even then, those are rare. Note that these are NOT allowable by default in plain C code, only in C++. If you want to use them in C, you'll have to either #define them yourself as macros, or #include the header <iso646.h>, which defines all of the above except for <% >% <: :> %: %:%: as macros (see section 7.9 of the C99 standard).
Although the question is old, I'd want to provide it with more or less full answer:
Alternative tokens were already a part of the currently withdrawn C++98 (ISO/IEC 14882:1998, which, I believe, was the first ISO standard for C++).
While not a proof in itself (and I don't own a copy of ISO for c++98), here's a link - see C++ section.
As mentioned in the other answers, MSVC compiler is violating [lex.digraph] section of the standard when /Za flag is not specified.
You may be surprised to learn about the rest of them:
and and_eq bitand bitor compl not not_eq or or_eq xor xor_eq
List from C++ Keywords.
I believe recent versions of GCC support these keywords.
The GNU compiler g++ has them, but I don't know about MS VC++.
You can get the same functionality by putting this at the top of your code file.
#define and &&
#define bitor |
#define or ||
#define xor ^
#define compl ~
#define bitand &
#define and_eq &=
#define or_eq ^=
#define xor_eq ^=
#define not !
#define not_eq !=
Though this is kinda hackish, it should work.
They are in the working paper for the new C++ standard, on page 14:
C++ Standard