Look at the following code:
int main(int argc, char* argv[])
{
// This works: (Disable Lang Ext = *Yes* (/Za))
wchar_t wc0 = L'\0';
wchar_t wc_ = L'';
assert(wc0 == wc_);
// This doesn't compile (VC++ 2010):
char c0 = '\0';
char c_ = ''; // error C2137: empty character constant
assert(c0 == c_);
return 0;
}
Why does the compiler allow defining an empty character literal for wide characters? This doesn't make sense for wide, just as it doesn't make sense for char where the compiler flags an error.
Is this allowed by the Standard?
This is a bug in VC++.
It is not allowed per the ISO standard. This is a bug in Microsoft's product. Even their page describing that particular feature makes no mention of this aberrant (or abhorrent, depending on your viewpoint) behaviour.
The definition for a character literal (as taken from 2.14.3 of C++0x but the relevant bit is unchanged from C++03) contains:
character-literal:
L’ c-char-sequence ’
c-char-sequence:
c-char
c-char-sequence c-char
c-char:
any member of the source character set except
the single-quote ’, backslash \, or new-line character
escape-sequence
universal-character-name
escape-sequence:
simple-escape-sequence
octal-escape-sequence
hexadecimal-escape-sequence
simple-escape-sequence: one of
\’ \" \? \\ \a \b \f \n \r \t \v
octal-escape-sequence:
\ octal-digit
\ octal-digit octal-digit
\ octal-digit octal-digit octal-digit
hexadecimal-escape-sequence:
\x hexadecimal-digit
hexadecimal-escape-sequence hexadecimal-digit
As you can see, there is no way that you can end up with nothing between the ' characters in L'x'. It has to be one or more of the c_char characters. In fact, this is made explicit in the following paragraph (my emphasis):
A character literal is one or more characters enclosed in single quotes, as in ’x’, optionally preceded by one of the letters u, U, or L, as in u’y’, U’z’, or L’x’, respectively.
I would argue that the first example is not allowed, per 2.23.2.1 of the C++ standard:
A character literal is one or more
characters enclosed in single quotes,
as in ’x’, optionally preceded by the
letter L, as in L’x’.
(Emphasis mine.)
Related
Is there a difference between;
int main(){
return 0;
}
and
int main(){return 0;}
and
int main(){
return
0;
}
They will all likely compile to same executable. How does the C/C++ compiler treat the extra spaces and newlines, and if there is a difference between how newlines are treated differently than spaces in C code?
Also, how about tabs? What's the significance of using tabs instead of spaces in code, if there is any?
Any sequence of 1+ whitespace symbol (space/line-break/tab/...) is equivalent to a single space.
Exceptions:
Whitespace is preserved in string literals. They can't contain line-breaks, except C++ raw literals (R"(...)"). The same applies to file names in #include.
Single-line comments (//) are terminated with line-breaks only.
Preprocessor directives (starting with #) are terminated with line-breaks only.
\ followed by a line-break removes both, allowing multi-line // comments, preprocessor directrives, and string literals.
Also, whitespace symbols are ignored if there is punctuation (anything except letters, numbers, and _) to the left and/or to the right of it. E.g. 1 + 2 and 1+2 are the same, but return a; and returna; are not.
Exceptions:
Whitespace is not ignored inside string literals, obviously. Nor in #include file names.
Operators consisting of >1 punctuation symbols can't be separated, e.g. cout < < 1 is illegal. The same applies to things like // and /* */.
A space between punctuation might be necessary to prevent it from coalescing into a single operator. Examples:
+ +a is different from ++a.
a+++b is equivalent to a++ +b, but not to a+ ++b.
Pre-C++11, closing two template argument lists in a row required a space: std::vector<std::vector<int> >.
When defining a function-like macro, the space is not allowed before the opening parenthesis (adding it turns it into an object-like macro). E.g. #define A() replaces A() with nothing, but #define A () replaces A with ().
int main()
{
char* str1 = "Tom's cat";
char* str2 = "Tom\'s cat";
}
The code can be compiled with VS 2015.
I just wonder:
Are both of the two ways compliant to the C and/or the C++ standard?
From the C++11 ISO Standard
§ 2.14.5 String Literals [lex.string]
...
15 Escape sequences and universal-character-names in non-raw string literals have the same meaning as in character literals (2.14.3), except that the single quote ’ is representable either by itself or by the escape
sequence \’
Yes, within a string literal, both are the same.
The escaped version is required for a character literal:
char x = '\'';
The standard references are two sources. First, translation phases. From C.11 §5.1.1.2 (C++.11 [lex.phases] has similar language):
Each source character set member and escape sequence in character constants and string literals is converted to the corresponding member of the execution character set; if there is no corresponding member, it is converted to an implementation defined member other than the null (wide) character.
Next is in the grammar definition for a character constant and for string literals, which allow for escape sequences. And simple-escape-sequence is an escape sequence in the grammar. C.11 §6.4.4.4 defines it (C++.11 [lex.ccon] has the same definition):
simple-escape-sequence: one of
\' \" \? \\
\a \b \f \n \r \t \v
Finally, for string literals, the standard specifies the interpretation of characters in the literal is the same as if each were a character constant, and then makes an exception of '. From C.11 §6.4.5 (C++.11 [lex.string] has similar language):
The same considerations apply to each element of the sequence in a string literal as if it
were in an integer character constant (for a character or UTF−8 string literal) or a wide
character constant (for a wide string literal), except that the single-quote ' is
representable either by itself or by the escape sequence \', but the double-quote " shall be represented by the escape sequence \".
\' is a valid character escape sequence in both C and C++. Hence, the lines
char* str1 = "Tom's cat";
char* str2 = "Tom\'s cat";
produce equivalent string literals, both in C and C++.
Yes, they're identical.
From the c++ standard, $2.13.3/7 Character literals [lex.ccon]
Table 6 — Escape sequences
new-line NL(LF) \n
horizontal tab HT \t
vertical tab VT \v
backspace BS \b
carriage return CR \r
form feed FF \f
alert BEL \a
backslash \ \\
question mark ? \?
single quote ’ \’
double quote " \"
octal number ooo \ooo
hex number hhh \xhhh
In C and C++ the rules are the same. In C,
[§6.4.4.4]/2 An integer character constant is a sequence of one or
more multibyte characters enclosed in single-quotes, as in 'x'.
In C++,
[§2.14.3]/1 A character literal is one or more characters enclosed
in single quotes, as in 'x', optionally preceded by one of the
letters u, U, or L, as in u'y', U'z', or L'x',
respectively.
The key phrase is "one or more". In contrast, a string literal can be empty, "", presumably because it consists of the null terminating character. In C, this leads to awkward initialization of a char. Either you leave it uninitialized, or use a useless value like 0 or '\0'.
char garbage;
char useless = 0;
char useless2 = '\0';
In C++, you have to use a string literal instead of a character literal if you want it to be empty.
(somecondition ? ' ' : '') // error
(somecondition ? " " : "") // necessary
What is the reason it is this way? I'm assuming C++'s reason is inherited from C.
The reason is that a character literal is defined as a character. There may be extensions that allow it to be more than one character, but it needs to be at least one character or it just doesn't make any sense. It would be the same as trying to do:
int i = ;
If you don't specify a value, what do you put there?
This is because an empty string still contains the the null character '\0' at the end, so there is still a value to bind to the variable name, whereas an empty character literal has no value.
String is a set of character terminated by a NULL character ( '\0' ).
So a Empty string will always have a NULL character in it at the end .
But in case of a character literal no value is there.
it needs at least one character.
Is this legal under C++11?
string s = R"(This is the first line
And this is the second line)";
... being equivalent to:
string s = "This is the first line\nAnd this is the second line";
Yes, that is perfectly valid. See here.
Also, from the (draft) standard 2.14.5/4:
A source-file new-line in a raw string literal results in a new-line
in the resulting execution string-literal. Assuming no whitespace at the beginning of lines in the
following example, the assert will succeed:
const char *p = R"(a\
b
c)";
assert(std::strcmp(p, "a\\\nb\nc") == 0);
What is the difference (if any) between this
_T("a string")
and
_T('a string')
?
First, _T isn't a standard part of C++. I've added the "windows" tag to your question.
Now, the difference between these is that the first is correct and the second is not. In C++, ' is for quoting single characters, and " is for quoting strings.
The second is wrong. You are placing a string literal in between single quotes.
'a string' is a so-called "multicharacter literal". It has type int, and an implementation-defined value. This is [lex.ccon] in the standard.
I don't know what values MSVC gives to multicharacter literals, and I don't know for sure what the MS-specific _T macro ends up doing with it, but I expect you get a narrow multicharacter literal on narrow builds, and a wide multicharacter literal on wide builds. The prefix L is the same for strings and character literals.
It's wrong, anyway: multicharacter literals are pretty much useless and certainly are no substitute for strings. "a string" is a string literal, which is what you want.
You use '' for single character and "" for strings. _T('a string') is wrong and its behaviour is compiler-specific.
In case of MSVC it uses first character only. Example:
#include <iostream>
#include <tchar.h>
int main()
{
if (_T('a string') == _T('a'))
std::cout << (int)'a' << " = " << _T('a');
}
output: 97 = 97
Single quotations are primarily used when denoting a single character:
char c = 'e' ;
Double quotations are used with strings and output statements:
string s = "This is a string";
cout << "Output where double quotations are used.";