Can C++ variables in cpp file defined as Special Symbols β - c++

Can we define the variable in c++/ c using special characters such as;
double ε,µ,β,ϰ;
If yes, how can this be achieved?

As per the working draft of CPP standard (N4713),
5.10 Identifiers [lex.name]
...
An identifier is an arbitrarily long sequence of letters and digits. Each universal-character-name in an identifier shall designate a character whose encoding in ISO 10646 falls into one of the ranges specified in Table 2. The initial element shall not be a universal-character-name designating a character whose encoding falls into one of the ranges specified in Table 3.
And when we look at table 3:
Table 3 — Ranges of characters disallowed initially (combining characters)
0300-036F 1DC0-1DFF 20D0-20FF FE20-FE2F
The symbols you have mentioned are the Greek Alphabet which ranges from U+0370 to U+03FF and the extended Greek set ranges from U+1F0x to U+1FFx as per wikipedia. Both these ranges are allowed as the initial element of an identifier.
Note that not all compilers provide support for this.
GCC 8.2 with -std=c++17 option fails to compile.
However, Clang 7.0 with -std=c++17 option compiles.
Live Demo for both GCC and Clang

Since the question is tagged Visual Studio: Just write the code as you'd expect it.
double β = 0.1;
When you save the file, Visual Studio will warn you that it needs to save the file as Unicode. Accept it, and it works. AFAICT, this also works in C mode, even though most other C99 extensions are unsupported in Visual Studio.
However, as of g++ 8.2, g++ still does not support non-ASCII characters used directly in identifiers, so the code is then effectively not portable.

Yes you can use special characters, but not all of them. You can find the allowed one in the link below.
You can find a detailed explanation on how to built identifier (with the list of unicode authorized characters) on the page Identifiers - cppreference.com.
An identifier is, quoting,
an arbitrarily long sequence of digits, underscores, lowercase and uppercase Latin letters, and most Unicode characters (see below for details). A valid identifier must begin with a non-digit character (Latin letter, underscore, or Unicode non-digit character). Identifiers are case-sensitive (lowercase and uppercase letters are distinct), and every character is significant.
Furthermore, Unicode characters need to be escaped.

Related

C identifier names: What goes with which compiler?

I was experimenting with extern and extern "C" for a little, and accidentially had a typo in one of the identifiers - a $ had snuck in. When I compiled the code and got the error of an undefined symbol and eventually saw what caused it, it made me curios if it would actually compile. And guess what - Clang actually did compile that.
According to documentation I had read previously, the rules for identifiers were basically:
No double underscore at the beginning - because those are reserved.
No single underscore and upper case letter - reserved too.
Must start with a letter, a non-digit.
Must not exceed 31 characters.
May contain a-z, A-Z or 0-9 and _.
But this compiled just fine - no warning was showing too:
void __this$is$a$mess() {}
int main() { __this$is$a$mess(); }
When looking at it:
Ingwie#Ingwies-Macbook-Pro.local /tmp $ clang y.c
Ingwie#Ingwies-Macbook-Pro.local /tmp $ nm a.out
0000000100000f90 T ___this$is$a$mess
0000000100000000 T __mh_execute_header
0000000100000fa0 T _main
U dyld_stub_binder
I can see the symbol name very clearly.
So why is it that Clang will let me do this, although by ANSI standards, it should not? Even the GCC 6 I have installed did not warn or error about this.
Which compilers will allow what kinds of identifiers - and, why actually?
The rules in the 2018 C standard for identifiers include:
Per 6.4.2.1 1, an identifier is a sequence of identifier-nondigit and digit characters, starting with an identifier-nondigit.
An identifier-nodigit is _, a to z, A to Z, a universal-character-name, or “other implementation-defined characters”.
A digit is 0 to 9.
A universal-character-name is \u followed by four hexadecimal digits or \U followed by eight hexadecimal digits, which specify Unicode characters.
So, if an implementation allows $, that is a valid character for that implementation. You may use it, but it may not be portable to other implementations. The C standard requires implementations to accept the specific characters listed, but it allows them to accept more. Generally, the C standard should be viewed as an open field rather than a walled garden: The behavior is defined within the field, but you are not stopped at the barrier; you may go beyond it, at your own risk.
The rules you were taught were rules for what is portable, not rules for what the C standard requires implementations to restrict you to.
The C standard defines strictly conforming code, which is, roughly speaking, code that should work in any C implementation, and conforming code, which is code that works in at least one C implementation. Conforming code is still C code. So the rules you were taught were for strictly conforming code.
Generally, you should prefer to write strictly conforming code and only use additional features when benefit (speed, ease of development on a particular platform, whatever) is worth the cost (loss of portability).
According to documentation I had read previously, the rules for
identifiers were basically:
No double underscore at the beginning - because those are reserved.
No single underscore and upper case letter - reserved too.
Such identifiers are indeed reserved, but that means that you must not declare or define them, not that they fail to be identifiers, or that they necessarily are not meaningful.
Must start with a letter, a non-digit.
Letters are indeed non-digits, but not all non-digits are letters. The _ character is a prime example.
Must not exceed 31 characters.
This is not a formal limit of the language. C requires that implementations support at least 31 significant characters in external identifiers. Two external identifiers that differ only at the 32nd character or later are not guaranteed to be recognized as distinct, but they do not fail to be identifiers. Furthermore, implementations must recognize at least 63 significant characters in internal identifiers, which, again, can be longer.
Some implementations recognize more significant characters, some even an unbounded number.
May contain a-z, A-Z or 0-9 and _.
Yes, but explicitly may also contain other implementation-defined characters. The $ character in particular is one that is fairly commonly allowed.
So why is it that Clang will let me do this, although by ANSI
standards, it should not? Even the GCC 6 I have installed did not warn
or error about this.
The standard does not by any means say that identifiers containing the $ character are disallowed. It explicitly permits implementations to accept that character and substantially any other in identifiers, though there are some that cannot pragmatically be allowed because allowing them would introduce ambiguity. Programs that use identifiers containing such characters do not for that reason fail to conform, and implementations that accept them do not for that reason fail to conform. Such programs do fail to strictly conform, however, as that term is defined by the standard.

$ identifier works in cpp how is this possible? [duplicate]

Are dollar-signs allowed in identifiers in C++03? covers that dollar signs in identifiers are not allowed in C++03. GCC provides it as a C extension and properly gives a diagnostic in C++03 mode. However, in C++11, int $ = 0 will compile without warning.
This answer reasons that $ may be allowed because no diagnostic is required for implementation defined identifiers:
The answer here is "Maybe": According to §2.11, identifiers
may consist of digits and identifier-nondigits, starting with one
of the latter. identifier-nondigits are the usual a-z, A-Z and
underscore, in addition since C++11 they include
universal-character-names (e.g. \uBEAF, \UC0FFEE32), and other implementation-defined characters. So it is implementation defined
if using $ in an identifier is allowed. VC10 and up supports that,
maybe earlier versions, too. It even supports identifiers like
こんばんわ.
But: I wouldn't use them. Make identifiers as readable and portable as possible. $ is implementation defined and thus not
portable.
This language is present in the C++03 standard as well, so I don't find this to be a very convincing argument.
§2.10/2
In addition, some identifiers are reserved for use by C ++
implementations and standard libraries (17.6.4.3.2) and shall not be
used otherwise; no diagnostic is required.
What change in the standard allows $ to be used as an identifier name?
This is implementation defined behavior, $ is not included in grammar for identifiers. The rules for identifier names in C++11 are:
It can not start with a number
Can be composed of letters, numbers, underscore, universal character names and implementation defined characters
Can not be a keyword
Implementation-defined characters are allowed and many compilers support as an extension, including gcc, clang, Visual Studio and as noted in a comment apparently DEC C++ compilers.
The grammar is covered in the draft C++ standard section 2.11 Indentifier, I added additional notes starting with <-:
identifier:
identifier-nondigit <- Can only start with a non-digit
identifier identifier-nondigit <- Next two rules allows for subsequent
identifier digit <- characters to be those outlined in 2 above
identifier-nondigit:
nondigit <- a-z, A-Z and _
universal-character-name
other implementation-defined characters
[...]
If we compile this code using clang with the -pedantic-errors flag it will not compile:
int $ = 0
and generates the following error:
error: '$' in identifier [-Werror,-Wdollar-in-identifier-extension]
int $ = 0;
^
I don't think so. Dollar sign is in ASCII 0x24, which is not inside any of the ranges defined in appendix E.1 (charname.allowed) of the standard. And since it is neither digit nor nondigit it must be an implementation-defined character. I aggree thus that this is not portable C++11. Also note that an identifier shall not start with a universal-character, while it does allow an identifier to start with an character allowed by the implementation.

Does C++11 allow dollar signs in identifiers?

Are dollar-signs allowed in identifiers in C++03? covers that dollar signs in identifiers are not allowed in C++03. GCC provides it as a C extension and properly gives a diagnostic in C++03 mode. However, in C++11, int $ = 0 will compile without warning.
This answer reasons that $ may be allowed because no diagnostic is required for implementation defined identifiers:
The answer here is "Maybe": According to §2.11, identifiers
may consist of digits and identifier-nondigits, starting with one
of the latter. identifier-nondigits are the usual a-z, A-Z and
underscore, in addition since C++11 they include
universal-character-names (e.g. \uBEAF, \UC0FFEE32), and other implementation-defined characters. So it is implementation defined
if using $ in an identifier is allowed. VC10 and up supports that,
maybe earlier versions, too. It even supports identifiers like
こんばんわ.
But: I wouldn't use them. Make identifiers as readable and portable as possible. $ is implementation defined and thus not
portable.
This language is present in the C++03 standard as well, so I don't find this to be a very convincing argument.
§2.10/2
In addition, some identifiers are reserved for use by C ++
implementations and standard libraries (17.6.4.3.2) and shall not be
used otherwise; no diagnostic is required.
What change in the standard allows $ to be used as an identifier name?
This is implementation defined behavior, $ is not included in grammar for identifiers. The rules for identifier names in C++11 are:
It can not start with a number
Can be composed of letters, numbers, underscore, universal character names and implementation defined characters
Can not be a keyword
Implementation-defined characters are allowed and many compilers support as an extension, including gcc, clang, Visual Studio and as noted in a comment apparently DEC C++ compilers.
The grammar is covered in the draft C++ standard section 2.11 Indentifier, I added additional notes starting with <-:
identifier:
identifier-nondigit <- Can only start with a non-digit
identifier identifier-nondigit <- Next two rules allows for subsequent
identifier digit <- characters to be those outlined in 2 above
identifier-nondigit:
nondigit <- a-z, A-Z and _
universal-character-name
other implementation-defined characters
[...]
If we compile this code using clang with the -pedantic-errors flag it will not compile:
int $ = 0
and generates the following error:
error: '$' in identifier [-Werror,-Wdollar-in-identifier-extension]
int $ = 0;
^
I don't think so. Dollar sign is in ASCII 0x24, which is not inside any of the ranges defined in appendix E.1 (charname.allowed) of the standard. And since it is neither digit nor nondigit it must be an implementation-defined character. I aggree thus that this is not portable C++11. Also note that an identifier shall not start with a universal-character, while it does allow an identifier to start with an character allowed by the implementation.

dollar sign in variable name?

I stumbled on some C++ code like this:
int $T$S;
First I thought that it was some sort of PHP code or something wrongly pasted in there but it compiles and runs nicely (on MSVC 2008).
What kind of characters are valid for variables in C++ and are there any other weird characters you can use?
The only legal characters according to the standard are alphanumerics
and the underscore. The standard does require that just about anything
Unicode considers alphabetic is acceptable (but only as single
code-point characters). In practice, implementations offer extensions
(i.e. some do accept a $) and restrictions (most don't accept all of the
required Unicode characters). If you want your code to be portable,
restrict symbols to the 26 unaccented letters, upper or lower case, the
ten digits, and the '_'.
It's an extension of some compilers and not in the C standard
MSVC:
Microsoft Specific
Only the first 2048 characters of Microsoft C++ identifiers are significant. Names for user-defined types are "decorated" by the compiler to preserve type information. The resultant name, including the type information, cannot be longer than 2048 characters. (See Decorated Names for more information.) Factors that can influence the length of a decorated identifier are:
Whether the identifier denotes an object of user-defined type or a type derived from a user-defined type.
Whether the identifier denotes a function or a type derived from a function.
The number of arguments to a function.
The dollar sign is also a valid identifier in Visual C++.
// dollar_sign_identifier.cpp
struct $Y1$ {
void $Test$() {}
};
int main() {
$Y1$ $x$;
$x$.$Test$();
}
https://web.archive.org/web/20100216114436/http://msdn.microsoft.com/en-us/library/565w213d.aspx
Newest version: https://learn.microsoft.com/en-us/cpp/cpp/identifiers-cpp?redirectedfrom=MSDN&view=vs-2019
GCC:
6.42 Dollar Signs in Identifier Names
In GNU C, you may normally use dollar signs in identifier names. This is because many traditional C implementations allow such identifiers. However, dollar signs in identifiers are not supported on a few target machines, typically because the target assembler does not allow them.
http://gcc.gnu.org/onlinedocs/gcc/Dollar-Signs.html#Dollar-Signs
In my knowledge only letters (capital and small), numbers (0 to 9) and _ are valid for variable names according to standard (note: the variable name should not start with a number though).
All other characters should be compiler extensions.
This is not good practice. Generally, you should only use alphanumeric characters and underscores in identifiers ([a-z][A-Z][0-9]_).
Surface Level
Unlike in other languages (bash, perl), C does not use $ to denote the usage of a variable. As such, it is technically valid. In C it most likely falls under C11, 6.4.2. This means that it does seem to be supported by modern compilers.
As for your C++ question, lets test it!
int main(void) {
int $ = 0;
return $;
}
On GCC/G++/Clang/Clang++, this indeed compiles, and runs just fine.
Deeper Level
Compilers take source code, lex it into a token stream, put that into an abstract syntax tree (AST), and then use that to generate code (e.g. assembly/LLVM IR). Your question really only revolves around the first part (e.g. lexing).
The grammar (thus the lexer implementation) of C/C++ does not treat $ as special, unlike commas, periods, skinny arrows, etc... As such, you may get an output from the lexer like this from the below c code:
int i_love_$ = 0;
After the lexer, this becomes a token steam like such:
["int", "i_love_$", "=", "0"]
If you where to take this code:
int i_love_$,_and_.s = 0;
The lexer would output a token steam like:
["int", "i_love_$", ",", "_and_", ".", "s", "=", "0"]
As you can see, because C/C++ doesn't treat characters like $ as special, it is processed differently than other characters like periods.

Are dollar-signs allowed in identifiers in C++03?

What does the C++ standard say about using dollar signs in identifiers, such as Hello$World? Are they legal?
A c++ identifier can be composed of any of the following: _ (underscore), the digits 0-9, the letters a-z (both upper and lower case) and cannot start with a number.
There are a number of exceptions as C99 allows extensions to the standard (e.g. visual studio).
They are illegal. The only legal characters in identifiers are letters, numbers, and _. Identifiers also cannot start with numbers.
In C++03, the answers given earlier are correct: they are illegal. In C++11 the situation changed however:
The answer here is "Maybe":
According to §2.11, identifiers may consist of digits and identifier-nondigits, starting with one of the latter. identifier-nondigits are the usual a-z, A-Z and underscore, in addition since C++11 they include universal-character-names (e.g. \uBEAF, \UC0FFEE32), and other implementation-defined characters. So it is implementation defined if using $ in an identifier is allowed. VC10 and up supports that, maybe earlier versions, too. It even supports identifiers like こんばんは.
But: I wouldn't use them. Make identifiers as readable and portable as possible. $ is implementation defined and thus not portable.
Not legal, but many if not most of compilers support them, note this may depend on platform, thus gcc on arm does not support them due to assembly restrictions.
The relevant section is "2.8 Identifiers [lex.name]". From the basic character set, the only valid characters are A-Z a-z 0-9 and _. However, characters like é (U+00E9) are also allowed. Depending on your compiler, you might need to enter é as \u00e9, though.
They are not legal in C++. However some C/C++ derived languages (such as Java and JavaScript) do allow them.
Illegal. I think the dollar sign and backtick are the only punctuation marks on my keyboard that aren't used in C++ somewhere (the "%" sign is in format strings, which are in C++ by reference to the C standard).