Is "'" identical to "\'" as per the C/C++ standard? - c++

int main()
{
char* str1 = "Tom's cat";
char* str2 = "Tom\'s cat";
}
The code can be compiled with VS 2015.
I just wonder:
Are both of the two ways compliant to the C and/or the C++ standard?

From the C++11 ISO Standard
§ 2.14.5 String Literals [lex.string]
...
15 Escape sequences and universal-character-names in non-raw string literals have the same meaning as in character literals (2.14.3), except that the single quote ’ is representable either by itself or by the escape
sequence \’

Yes, within a string literal, both are the same.
The escaped version is required for a character literal:
char x = '\'';
The standard references are two sources. First, translation phases. From C.11 §5.1.1.2 (C++.11 [lex.phases] has similar language):
Each source character set member and escape sequence in character constants and string literals is converted to the corresponding member of the execution character set; if there is no corresponding member, it is converted to an implementation defined member other than the null (wide) character.
Next is in the grammar definition for a character constant and for string literals, which allow for escape sequences. And simple-escape-sequence is an escape sequence in the grammar. C.11 §6.4.4.4 defines it (C++.11 [lex.ccon] has the same definition):
simple-escape-sequence: one of
\' \" \? \\
\a \b \f \n \r \t \v
Finally, for string literals, the standard specifies the interpretation of characters in the literal is the same as if each were a character constant, and then makes an exception of '. From C.11 §6.4.5 (C++.11 [lex.string] has similar language):
The same considerations apply to each element of the sequence in a string literal as if it
were in an integer character constant (for a character or UTF−8 string literal) or a wide
character constant (for a wide string literal), except that the single-quote ' is
representable either by itself or by the escape sequence \', but the double-quote " shall be represented by the escape sequence \".

\' is a valid character escape sequence in both C and C++. Hence, the lines
char* str1 = "Tom's cat";
char* str2 = "Tom\'s cat";
produce equivalent string literals, both in C and C++.

Yes, they're identical.
From the c++ standard, $2.13.3/7 Character literals [lex.ccon]
Table 6 — Escape sequences
new-line NL(LF) \n
horizontal tab HT \t
vertical tab VT \v
backspace BS \b
carriage return CR \r
form feed FF \f
alert BEL \a
backslash \ \\
question mark ? \?
single quote ’ \’
double quote " \"
octal number ooo \ooo
hex number hhh \xhhh

Related

Quotation mark in return C++

I get a code like this
virtual Qstring getEnergies() const {
return "estrain"
",eslip"
",edashpot";
}
Could you guys please explain the meaning of quotation and comma marks in that code? I am really thankful
Adjacent string literals are concatenated in C++. So
"foo" "bar"
Becomes
"foobar"
In your case the function will return a Qstring with the value of
"estrain,eslip,edashpot"
This behavior is defined in section 2.2.6 [lex.phases] of the C++ standard
Adjacent string literal tokens are concatenated.
The commas are a red herring; this is just concatenation of string literals.
In source code, "abc" "def" means the same thing as "abcdef".
The quotation mark here is because your function return type is string. So "" is for the string type return.
Now the second part as to why you have comma, then as others have answered it is for getting the adjacent string literal or different string literals. If you remove the comma then you will get a single string. If you don't have the comma then the return string will be equivalent to estraineslipedashpot

Why can't character constants/literals be empty?

In C and C++ the rules are the same. In C,
[§6.4.4.4]/2 An integer character constant is a sequence of one or
more multibyte characters enclosed in single-quotes, as in 'x'.
In C++,
[§2.14.3]/1 A character literal is one or more characters enclosed
in single quotes, as in 'x', optionally preceded by one of the
letters u, U, or L, as in u'y', U'z', or L'x',
respectively.
The key phrase is "one or more". In contrast, a string literal can be empty, "", presumably because it consists of the null terminating character. In C, this leads to awkward initialization of a char. Either you leave it uninitialized, or use a useless value like 0 or '\0'.
char garbage;
char useless = 0;
char useless2 = '\0';
In C++, you have to use a string literal instead of a character literal if you want it to be empty.
(somecondition ? ' ' : '') // error
(somecondition ? " " : "") // necessary
What is the reason it is this way? I'm assuming C++'s reason is inherited from C.
The reason is that a character literal is defined as a character. There may be extensions that allow it to be more than one character, but it needs to be at least one character or it just doesn't make any sense. It would be the same as trying to do:
int i = ;
If you don't specify a value, what do you put there?
This is because an empty string still contains the the null character '\0' at the end, so there is still a value to bind to the variable name, whereas an empty character literal has no value.
String is a set of character terminated by a NULL character ( '\0' ).
So a Empty string will always have a NULL character in it at the end .
But in case of a character literal no value is there.
it needs at least one character.

const char* initialization

This is one usage I found in a open source software.And I don't understant how it works.
when I ouput it to the stdout,it was "version 0.8.0".
const char version[] = " version " "0" "." "8" "." "0";
It's called string concatenation -- when you put two (or more) quoted strings next to each other in the source code with nothing between them, the compiler puts them together into a single string. This is most often used for long strings -- anything more than one line long:
char whatever[] = "this is the first line of the string\n"
"this is the second line of the string\n"
"This is the third line of the string";
Before string concatenation was invented, you had to do that with a rather clumsy line continuation, putting a backslash at the end of each line (and making sure it was the end, because most compilers wouldn't treat it as line continuation if there was any whitespace after the backslash). There was also ugliness with it throwing off indentation, because any whitespace at the beginning of subsequent lines might be included in the string.
This can cause a minor problem if you intended to put a comma between the strings, such as when initializing an array of pointers to char. If you miss a comma, the compiler won't warn you about it -- you'll just get one string that includes what was intended to be two separate ones.
This is a basic feature of both C89 and C++98 called 'adjacent string concatenation' or thereabouts.
Basically, if two string literals are adjacent to each other with no punctuation in between, they are merged into a single string, as your output shows.
In the C++98 standard, section §2.1 'Phases of translation [lex.phases]' says:
6 Adjacent ordinary string literal tokens are concatenated. Adjacent wide string literal tokens are concatenated.
This is after the preprocessor has completed.
In the C99 standard, the corresponding section is §5.1.2.1 'Translation Phases' and it says:
6 Adjacent string literal tokens are concatenated.
The wording would be very similar in any other C or C++ standard you can lay hands on (and I do recognize that both C++98 and C99 are superseded by C++11 and C11; I just don't have electronic copies of the final standards, yet).
Part of the C++ standard implementation states that string literals that are beside each other will be concatenated together.
Quotes from C and C++ Standard:
For C (quoting C99, but C11 has something similar in 6.4.5p5):
(C99, 6.4.5p5) "In translation phase 6, the multibyte character
sequences specified by any sequence of adjacent character and
identically-prefixed string literal tokens are concatenated into a
single multibyte character sequence."
For C++:
(C++11, 2.14.5p13) "In translation phase 6 (2.2), adjacent string
literals are concatenated."
const char version[] = " version " "0" "." "8" "." "0";
is same as:
const char version[] = " version 0.8.0";
Compiler concatenates the adjacent pieces of string-literals, making one bigger piece of string-literal.
As a sidenote, const char* (which is in your title) is not same as char char[] (which is in your posted code).
The compiler automatically concatenates string literals written after each other (separated by white-space only).. It is as if you have written
const char version[] = "version 0.8.0";
EDIT: corrected pre-processor to compiler
Adjacent string literals are concatenated:
When specifying string literals, adjacent strings are concatenated.
Therefore, this declaration:
char szStr[] = "12" "34"; is identical to this declaration:
char szStr[] = "1234"; This concatenation of adjacent strings makes it
easy to specify long strings across multiple lines:
cout << "Four score and seven years "
"ago, our forefathers brought forth "
"upon this continent a new nation.";
Simply putting strings one after the other concatenates them at compile time, so:
"Hello" ", " "World!" => "Hello, World!"
This is a strange usage of the feature, usually it is to allow #define strings to be used:
#define FOO "World!"
puts("Hello, " FOO);
Will compile to the same as:
puts("Hello, World!");

String literals in C++ with _T macro

What is the difference (if any) between this
_T("a string")
and
_T('a string')
?
First, _T isn't a standard part of C++. I've added the "windows" tag to your question.
Now, the difference between these is that the first is correct and the second is not. In C++, ' is for quoting single characters, and " is for quoting strings.
The second is wrong. You are placing a string literal in between single quotes.
'a string' is a so-called "multicharacter literal". It has type int, and an implementation-defined value. This is [lex.ccon] in the standard.
I don't know what values MSVC gives to multicharacter literals, and I don't know for sure what the MS-specific _T macro ends up doing with it, but I expect you get a narrow multicharacter literal on narrow builds, and a wide multicharacter literal on wide builds. The prefix L is the same for strings and character literals.
It's wrong, anyway: multicharacter literals are pretty much useless and certainly are no substitute for strings. "a string" is a string literal, which is what you want.
You use '' for single character and "" for strings. _T('a string') is wrong and its behaviour is compiler-specific.
In case of MSVC it uses first character only. Example:
#include <iostream>
#include <tchar.h>
int main()
{
if (_T('a string') == _T('a'))
std::cout << (int)'a' << " = " << _T('a');
}
output: 97 = 97
Single quotations are primarily used when denoting a single character:
char c = 'e' ;
Double quotations are used with strings and output statements:
string s = "This is a string";
cout << "Output where double quotations are used.";

Why is an empty wchar_t literal allowed?

Look at the following code:
int main(int argc, char* argv[])
{
// This works: (Disable Lang Ext = *Yes* (/Za))
wchar_t wc0 = L'\0';
wchar_t wc_ = L'';
assert(wc0 == wc_);
// This doesn't compile (VC++ 2010):
char c0 = '\0';
char c_ = ''; // error C2137: empty character constant
assert(c0 == c_);
return 0;
}
Why does the compiler allow defining an empty character literal for wide characters? This doesn't make sense for wide, just as it doesn't make sense for char where the compiler flags an error.
Is this allowed by the Standard?
This is a bug in VC++.
It is not allowed per the ISO standard. This is a bug in Microsoft's product. Even their page describing that particular feature makes no mention of this aberrant (or abhorrent, depending on your viewpoint) behaviour.
The definition for a character literal (as taken from 2.14.3 of C++0x but the relevant bit is unchanged from C++03) contains:
character-literal:
L’ c-char-sequence ’
c-char-sequence:
c-char
c-char-sequence c-char
c-char:
any member of the source character set except
the single-quote ’, backslash \, or new-line character
escape-sequence
universal-character-name
escape-sequence:
simple-escape-sequence
octal-escape-sequence
hexadecimal-escape-sequence
simple-escape-sequence: one of
\’ \" \? \\ \a \b \f \n \r \t \v
octal-escape-sequence:
\ octal-digit
\ octal-digit octal-digit
\ octal-digit octal-digit octal-digit
hexadecimal-escape-sequence:
\x hexadecimal-digit
hexadecimal-escape-sequence hexadecimal-digit
As you can see, there is no way that you can end up with nothing between the ' characters in L'x'. It has to be one or more of the c_char characters. In fact, this is made explicit in the following paragraph (my emphasis):
A character literal is one or more characters enclosed in single quotes, as in ’x’, optionally preceded by one of the letters u, U, or L, as in u’y’, U’z’, or L’x’, respectively.
I would argue that the first example is not allowed, per 2.23.2.1 of the C++ standard:
A character literal is one or more
characters enclosed in single quotes,
as in ’x’, optionally preceded by the
letter L, as in L’x’.
(Emphasis mine.)