What does the C++ standard say about using dollar signs in identifiers, such as Hello$World? Are they legal?
A c++ identifier can be composed of any of the following: _ (underscore), the digits 0-9, the letters a-z (both upper and lower case) and cannot start with a number.
There are a number of exceptions as C99 allows extensions to the standard (e.g. visual studio).
They are illegal. The only legal characters in identifiers are letters, numbers, and _. Identifiers also cannot start with numbers.
In C++03, the answers given earlier are correct: they are illegal. In C++11 the situation changed however:
The answer here is "Maybe":
According to §2.11, identifiers may consist of digits and identifier-nondigits, starting with one of the latter. identifier-nondigits are the usual a-z, A-Z and underscore, in addition since C++11 they include universal-character-names (e.g. \uBEAF, \UC0FFEE32), and other implementation-defined characters. So it is implementation defined if using $ in an identifier is allowed. VC10 and up supports that, maybe earlier versions, too. It even supports identifiers like こんばんは.
But: I wouldn't use them. Make identifiers as readable and portable as possible. $ is implementation defined and thus not portable.
Not legal, but many if not most of compilers support them, note this may depend on platform, thus gcc on arm does not support them due to assembly restrictions.
The relevant section is "2.8 Identifiers [lex.name]". From the basic character set, the only valid characters are A-Z a-z 0-9 and _. However, characters like é (U+00E9) are also allowed. Depending on your compiler, you might need to enter é as \u00e9, though.
They are not legal in C++. However some C/C++ derived languages (such as Java and JavaScript) do allow them.
Illegal. I think the dollar sign and backtick are the only punctuation marks on my keyboard that aren't used in C++ somewhere (the "%" sign is in format strings, which are in C++ by reference to the C standard).
Related
I was experimenting with extern and extern "C" for a little, and accidentially had a typo in one of the identifiers - a $ had snuck in. When I compiled the code and got the error of an undefined symbol and eventually saw what caused it, it made me curios if it would actually compile. And guess what - Clang actually did compile that.
According to documentation I had read previously, the rules for identifiers were basically:
No double underscore at the beginning - because those are reserved.
No single underscore and upper case letter - reserved too.
Must start with a letter, a non-digit.
Must not exceed 31 characters.
May contain a-z, A-Z or 0-9 and _.
But this compiled just fine - no warning was showing too:
void __this$is$a$mess() {}
int main() { __this$is$a$mess(); }
When looking at it:
Ingwie#Ingwies-Macbook-Pro.local /tmp $ clang y.c
Ingwie#Ingwies-Macbook-Pro.local /tmp $ nm a.out
0000000100000f90 T ___this$is$a$mess
0000000100000000 T __mh_execute_header
0000000100000fa0 T _main
U dyld_stub_binder
I can see the symbol name very clearly.
So why is it that Clang will let me do this, although by ANSI standards, it should not? Even the GCC 6 I have installed did not warn or error about this.
Which compilers will allow what kinds of identifiers - and, why actually?
The rules in the 2018 C standard for identifiers include:
Per 6.4.2.1 1, an identifier is a sequence of identifier-nondigit and digit characters, starting with an identifier-nondigit.
An identifier-nodigit is _, a to z, A to Z, a universal-character-name, or “other implementation-defined characters”.
A digit is 0 to 9.
A universal-character-name is \u followed by four hexadecimal digits or \U followed by eight hexadecimal digits, which specify Unicode characters.
So, if an implementation allows $, that is a valid character for that implementation. You may use it, but it may not be portable to other implementations. The C standard requires implementations to accept the specific characters listed, but it allows them to accept more. Generally, the C standard should be viewed as an open field rather than a walled garden: The behavior is defined within the field, but you are not stopped at the barrier; you may go beyond it, at your own risk.
The rules you were taught were rules for what is portable, not rules for what the C standard requires implementations to restrict you to.
The C standard defines strictly conforming code, which is, roughly speaking, code that should work in any C implementation, and conforming code, which is code that works in at least one C implementation. Conforming code is still C code. So the rules you were taught were for strictly conforming code.
Generally, you should prefer to write strictly conforming code and only use additional features when benefit (speed, ease of development on a particular platform, whatever) is worth the cost (loss of portability).
According to documentation I had read previously, the rules for
identifiers were basically:
No double underscore at the beginning - because those are reserved.
No single underscore and upper case letter - reserved too.
Such identifiers are indeed reserved, but that means that you must not declare or define them, not that they fail to be identifiers, or that they necessarily are not meaningful.
Must start with a letter, a non-digit.
Letters are indeed non-digits, but not all non-digits are letters. The _ character is a prime example.
Must not exceed 31 characters.
This is not a formal limit of the language. C requires that implementations support at least 31 significant characters in external identifiers. Two external identifiers that differ only at the 32nd character or later are not guaranteed to be recognized as distinct, but they do not fail to be identifiers. Furthermore, implementations must recognize at least 63 significant characters in internal identifiers, which, again, can be longer.
Some implementations recognize more significant characters, some even an unbounded number.
May contain a-z, A-Z or 0-9 and _.
Yes, but explicitly may also contain other implementation-defined characters. The $ character in particular is one that is fairly commonly allowed.
So why is it that Clang will let me do this, although by ANSI
standards, it should not? Even the GCC 6 I have installed did not warn
or error about this.
The standard does not by any means say that identifiers containing the $ character are disallowed. It explicitly permits implementations to accept that character and substantially any other in identifiers, though there are some that cannot pragmatically be allowed because allowing them would introduce ambiguity. Programs that use identifiers containing such characters do not for that reason fail to conform, and implementations that accept them do not for that reason fail to conform. Such programs do fail to strictly conform, however, as that term is defined by the standard.
Are dollar-signs allowed in identifiers in C++03? covers that dollar signs in identifiers are not allowed in C++03. GCC provides it as a C extension and properly gives a diagnostic in C++03 mode. However, in C++11, int $ = 0 will compile without warning.
This answer reasons that $ may be allowed because no diagnostic is required for implementation defined identifiers:
The answer here is "Maybe": According to §2.11, identifiers
may consist of digits and identifier-nondigits, starting with one
of the latter. identifier-nondigits are the usual a-z, A-Z and
underscore, in addition since C++11 they include
universal-character-names (e.g. \uBEAF, \UC0FFEE32), and other implementation-defined characters. So it is implementation defined
if using $ in an identifier is allowed. VC10 and up supports that,
maybe earlier versions, too. It even supports identifiers like
こんばんわ.
But: I wouldn't use them. Make identifiers as readable and portable as possible. $ is implementation defined and thus not
portable.
This language is present in the C++03 standard as well, so I don't find this to be a very convincing argument.
§2.10/2
In addition, some identifiers are reserved for use by C ++
implementations and standard libraries (17.6.4.3.2) and shall not be
used otherwise; no diagnostic is required.
What change in the standard allows $ to be used as an identifier name?
This is implementation defined behavior, $ is not included in grammar for identifiers. The rules for identifier names in C++11 are:
It can not start with a number
Can be composed of letters, numbers, underscore, universal character names and implementation defined characters
Can not be a keyword
Implementation-defined characters are allowed and many compilers support as an extension, including gcc, clang, Visual Studio and as noted in a comment apparently DEC C++ compilers.
The grammar is covered in the draft C++ standard section 2.11 Indentifier, I added additional notes starting with <-:
identifier:
identifier-nondigit <- Can only start with a non-digit
identifier identifier-nondigit <- Next two rules allows for subsequent
identifier digit <- characters to be those outlined in 2 above
identifier-nondigit:
nondigit <- a-z, A-Z and _
universal-character-name
other implementation-defined characters
[...]
If we compile this code using clang with the -pedantic-errors flag it will not compile:
int $ = 0
and generates the following error:
error: '$' in identifier [-Werror,-Wdollar-in-identifier-extension]
int $ = 0;
^
I don't think so. Dollar sign is in ASCII 0x24, which is not inside any of the ranges defined in appendix E.1 (charname.allowed) of the standard. And since it is neither digit nor nondigit it must be an implementation-defined character. I aggree thus that this is not portable C++11. Also note that an identifier shall not start with a universal-character, while it does allow an identifier to start with an character allowed by the implementation.
Are dollar-signs allowed in identifiers in C++03? covers that dollar signs in identifiers are not allowed in C++03. GCC provides it as a C extension and properly gives a diagnostic in C++03 mode. However, in C++11, int $ = 0 will compile without warning.
This answer reasons that $ may be allowed because no diagnostic is required for implementation defined identifiers:
The answer here is "Maybe": According to §2.11, identifiers
may consist of digits and identifier-nondigits, starting with one
of the latter. identifier-nondigits are the usual a-z, A-Z and
underscore, in addition since C++11 they include
universal-character-names (e.g. \uBEAF, \UC0FFEE32), and other implementation-defined characters. So it is implementation defined
if using $ in an identifier is allowed. VC10 and up supports that,
maybe earlier versions, too. It even supports identifiers like
こんばんわ.
But: I wouldn't use them. Make identifiers as readable and portable as possible. $ is implementation defined and thus not
portable.
This language is present in the C++03 standard as well, so I don't find this to be a very convincing argument.
§2.10/2
In addition, some identifiers are reserved for use by C ++
implementations and standard libraries (17.6.4.3.2) and shall not be
used otherwise; no diagnostic is required.
What change in the standard allows $ to be used as an identifier name?
This is implementation defined behavior, $ is not included in grammar for identifiers. The rules for identifier names in C++11 are:
It can not start with a number
Can be composed of letters, numbers, underscore, universal character names and implementation defined characters
Can not be a keyword
Implementation-defined characters are allowed and many compilers support as an extension, including gcc, clang, Visual Studio and as noted in a comment apparently DEC C++ compilers.
The grammar is covered in the draft C++ standard section 2.11 Indentifier, I added additional notes starting with <-:
identifier:
identifier-nondigit <- Can only start with a non-digit
identifier identifier-nondigit <- Next two rules allows for subsequent
identifier digit <- characters to be those outlined in 2 above
identifier-nondigit:
nondigit <- a-z, A-Z and _
universal-character-name
other implementation-defined characters
[...]
If we compile this code using clang with the -pedantic-errors flag it will not compile:
int $ = 0
and generates the following error:
error: '$' in identifier [-Werror,-Wdollar-in-identifier-extension]
int $ = 0;
^
I don't think so. Dollar sign is in ASCII 0x24, which is not inside any of the ranges defined in appendix E.1 (charname.allowed) of the standard. And since it is neither digit nor nondigit it must be an implementation-defined character. I aggree thus that this is not portable C++11. Also note that an identifier shall not start with a universal-character, while it does allow an identifier to start with an character allowed by the implementation.
I stumbled on some C++ code like this:
int $T$S;
First I thought that it was some sort of PHP code or something wrongly pasted in there but it compiles and runs nicely (on MSVC 2008).
What kind of characters are valid for variables in C++ and are there any other weird characters you can use?
The only legal characters according to the standard are alphanumerics
and the underscore. The standard does require that just about anything
Unicode considers alphabetic is acceptable (but only as single
code-point characters). In practice, implementations offer extensions
(i.e. some do accept a $) and restrictions (most don't accept all of the
required Unicode characters). If you want your code to be portable,
restrict symbols to the 26 unaccented letters, upper or lower case, the
ten digits, and the '_'.
It's an extension of some compilers and not in the C standard
MSVC:
Microsoft Specific
Only the first 2048 characters of Microsoft C++ identifiers are significant. Names for user-defined types are "decorated" by the compiler to preserve type information. The resultant name, including the type information, cannot be longer than 2048 characters. (See Decorated Names for more information.) Factors that can influence the length of a decorated identifier are:
Whether the identifier denotes an object of user-defined type or a type derived from a user-defined type.
Whether the identifier denotes a function or a type derived from a function.
The number of arguments to a function.
The dollar sign is also a valid identifier in Visual C++.
// dollar_sign_identifier.cpp
struct $Y1$ {
void $Test$() {}
};
int main() {
$Y1$ $x$;
$x$.$Test$();
}
https://web.archive.org/web/20100216114436/http://msdn.microsoft.com/en-us/library/565w213d.aspx
Newest version: https://learn.microsoft.com/en-us/cpp/cpp/identifiers-cpp?redirectedfrom=MSDN&view=vs-2019
GCC:
6.42 Dollar Signs in Identifier Names
In GNU C, you may normally use dollar signs in identifier names. This is because many traditional C implementations allow such identifiers. However, dollar signs in identifiers are not supported on a few target machines, typically because the target assembler does not allow them.
http://gcc.gnu.org/onlinedocs/gcc/Dollar-Signs.html#Dollar-Signs
In my knowledge only letters (capital and small), numbers (0 to 9) and _ are valid for variable names according to standard (note: the variable name should not start with a number though).
All other characters should be compiler extensions.
This is not good practice. Generally, you should only use alphanumeric characters and underscores in identifiers ([a-z][A-Z][0-9]_).
Surface Level
Unlike in other languages (bash, perl), C does not use $ to denote the usage of a variable. As such, it is technically valid. In C it most likely falls under C11, 6.4.2. This means that it does seem to be supported by modern compilers.
As for your C++ question, lets test it!
int main(void) {
int $ = 0;
return $;
}
On GCC/G++/Clang/Clang++, this indeed compiles, and runs just fine.
Deeper Level
Compilers take source code, lex it into a token stream, put that into an abstract syntax tree (AST), and then use that to generate code (e.g. assembly/LLVM IR). Your question really only revolves around the first part (e.g. lexing).
The grammar (thus the lexer implementation) of C/C++ does not treat $ as special, unlike commas, periods, skinny arrows, etc... As such, you may get an output from the lexer like this from the below c code:
int i_love_$ = 0;
After the lexer, this becomes a token steam like such:
["int", "i_love_$", "=", "0"]
If you where to take this code:
int i_love_$,_and_.s = 0;
The lexer would output a token steam like:
["int", "i_love_$", ",", "_and_", ".", "s", "=", "0"]
As you can see, because C/C++ doesn't treat characters like $ as special, it is processed differently than other characters like periods.
Is it safe to use $ character as part of identifier in C/C++?
Like this,
int $a = 10;
struct $b;
class $c;
void $d();
No; it is a non-standard extension in some compilers.
No. The C standard only guarantees the use of uppercase and lowercase English letters, digits, _, and Unicode codepoints specified using \u (hex-quad) or \U (hex-quad) (hex-quad) (with a few exceptions). Specific compilers may allow other characters as an extension; however this is highly nonportable. (ISO/IEC 9899:1999 (E) 6.4.2.1, 6.4.3) Further note that the Unicode codepoint method is basically useless in identifiers, even though it is, strictly speaking, permitted (in C99), as it still shows up as a literal \uXXXX in your editor.
This is not standard and only Microsoft Visual Studio (that I know of) even allows the '$' character in identifiers.
So if you ever want your code to be portable (or readable to others), I'd say no.