Is this legal under C++11?
string s = R"(This is the first line
And this is the second line)";
... being equivalent to:
string s = "This is the first line\nAnd this is the second line";
Yes, that is perfectly valid. See here.
Also, from the (draft) standard 2.14.5/4:
A source-file new-line in a raw string literal results in a new-line
in the resulting execution string-literal. Assuming no whitespace at the beginning of lines in the
following example, the assert will succeed:
const char *p = R"(a\
b
c)";
assert(std::strcmp(p, "a\\\nb\nc") == 0);
Related
I came across this code snippet in C++17 draft n4713:
#define R "x"
const char* s = R"y"; // ill-formed raw string, not "x" "y"
What is a "raw string"? What does it do?
Raw string literals are string literals that are designed to make it easier to include nested characters like quotation marks and backslashes that normally have meanings as delimiters and escape sequence starts. They’re useful for, say, encoding text like HTML. For example, contrast
"C:\\Program Files\\"
which is a regular string literal, with
R"(C:\Program Files\)"
which is a raw string literal. Here, the use of parentheses in addition to quotes allows C++ to distinguish a nested quotation mark from the quotation marks delimiting the string itself.
Basically a raw string literal is a string in which the escape characters (like \n \t or \" ) of C++ are not processed. A raw string literal which starts with R"( and ends in )" ,introduced in C++11
prefix(optional) R "delimiter( raw_characters )delimiter"
prefix - One of L, u8, u, U
Thanks to #Remy Lebeau,
delimiter is optional and is typically omitted, but there are corner cases where it is actually needed, in particular if the string content contains the character sequence )" in it, eg: R"(...)"...)", so you would need a delimiter to avoid an error, eg: R"x(...)"...)x".
See an example:
#include <iostream>
#include <string>
int main()
{
std::string normal_str = "First line.\nSecond line.\nEnd of message.\n";
std::string raw_str = R"(First line.\nSecond line.\nEnd of message.\n)";
std::string raw_str_delim = R"x("(First line.\nSecond line...)")x";
std::cout << normal_str << std::endl;
std::cout << raw_str << std::endl;
std::cout << raw_str_delim << std::endl;
return 0;
}
output:
First line.
Second line.
End of message.
First line.\nSecond line.\nEnd of message.\n
"(First line.\nSecond line...)"
Live on Godbolt
I will make an addition about a concern in one of the comments:
But here in the code the R is defined as "x" and after
expansion of the #define the code is const char* s = "x""y";
and there isn't any R"(.
The code fragment in the question is to show invalid uses of the Raw Strings. Let me get the actual 3-lines of code here:
#define R "x"
const char* s = R"y"; // ill-formed raw string literal, not "x" "y"
const char* s2 = R"(a)" "b)"; // a raw string literal followed by a normal string literal
The first line is there to not get confused by a macro. macros are preprocessed code fragments that replace parts in the source. Raw String, on the other hand, is a feature of the language that is "parsed" according to language rules.
The second line is to show the wrong use of it. Correct way would be R"(x)" where you need parenthesis in it.
And the last is to show how it can be a pain if not written carefully. The string inside parenthesis CANNOT include closing sequence of raw string. A correction might be R"_(a)" "b)_". _ can be replaced by any character (but not parentheses, backslash and spaces) and any number of them as long as closing sequence is not included inside: R"___(a)" "b)___" or R"anything(a)" "b)anything"
So if we wrap these correction within a simple C++ code:
#include <iostream>
using namespace std;
#define R "x" // This is just a macro, not Raw String nor definition of it
const char* s = R"(y)"; // R is part of language, not a macro
const char* s2 = R"_(a)" "b)_"; // Raw String shall not include closing sequence of characters; )_"
int main(){ cout << s <<endl << s2 <<endl << R <<endl; }
then the output will be
y
a)" "b
x
Raw string literal. Used to avoid escaping of any character. Anything between the delimiters becomes part of the string. prefix, if present, has the same meaning as described above.
C++Reference: string literal
a Raw string is defined like this:
string raw_str=R"(First line.\nSecond line.\nEnd of message.\n)";
and the difference is that a raw string ignores (escapes) all the special characters like \n ant \t and threats them like normal text.
So the above line would be just one line with 3 actual \n in it, instead of 3 separate lines.
You need to remove the define line and add parentheses around your string to be considered as a raw string.
I came across this code snippet in C++17 draft n4713:
#define R "x"
const char* s = R"y"; // ill-formed raw string, not "x" "y"
What is a "raw string"? What does it do?
Raw string literals are string literals that are designed to make it easier to include nested characters like quotation marks and backslashes that normally have meanings as delimiters and escape sequence starts. They’re useful for, say, encoding text like HTML. For example, contrast
"C:\\Program Files\\"
which is a regular string literal, with
R"(C:\Program Files\)"
which is a raw string literal. Here, the use of parentheses in addition to quotes allows C++ to distinguish a nested quotation mark from the quotation marks delimiting the string itself.
Basically a raw string literal is a string in which the escape characters (like \n \t or \" ) of C++ are not processed. A raw string literal which starts with R"( and ends in )" ,introduced in C++11
prefix(optional) R "delimiter( raw_characters )delimiter"
prefix - One of L, u8, u, U
Thanks to #Remy Lebeau,
delimiter is optional and is typically omitted, but there are corner cases where it is actually needed, in particular if the string content contains the character sequence )" in it, eg: R"(...)"...)", so you would need a delimiter to avoid an error, eg: R"x(...)"...)x".
See an example:
#include <iostream>
#include <string>
int main()
{
std::string normal_str = "First line.\nSecond line.\nEnd of message.\n";
std::string raw_str = R"(First line.\nSecond line.\nEnd of message.\n)";
std::string raw_str_delim = R"x("(First line.\nSecond line...)")x";
std::cout << normal_str << std::endl;
std::cout << raw_str << std::endl;
std::cout << raw_str_delim << std::endl;
return 0;
}
output:
First line.
Second line.
End of message.
First line.\nSecond line.\nEnd of message.\n
"(First line.\nSecond line...)"
Live on Godbolt
I will make an addition about a concern in one of the comments:
But here in the code the R is defined as "x" and after
expansion of the #define the code is const char* s = "x""y";
and there isn't any R"(.
The code fragment in the question is to show invalid uses of the Raw Strings. Let me get the actual 3-lines of code here:
#define R "x"
const char* s = R"y"; // ill-formed raw string literal, not "x" "y"
const char* s2 = R"(a)" "b)"; // a raw string literal followed by a normal string literal
The first line is there to not get confused by a macro. macros are preprocessed code fragments that replace parts in the source. Raw String, on the other hand, is a feature of the language that is "parsed" according to language rules.
The second line is to show the wrong use of it. Correct way would be R"(x)" where you need parenthesis in it.
And the last is to show how it can be a pain if not written carefully. The string inside parenthesis CANNOT include closing sequence of raw string. A correction might be R"_(a)" "b)_". _ can be replaced by any character (but not parentheses, backslash and spaces) and any number of them as long as closing sequence is not included inside: R"___(a)" "b)___" or R"anything(a)" "b)anything"
So if we wrap these correction within a simple C++ code:
#include <iostream>
using namespace std;
#define R "x" // This is just a macro, not Raw String nor definition of it
const char* s = R"(y)"; // R is part of language, not a macro
const char* s2 = R"_(a)" "b)_"; // Raw String shall not include closing sequence of characters; )_"
int main(){ cout << s <<endl << s2 <<endl << R <<endl; }
then the output will be
y
a)" "b
x
Raw string literal. Used to avoid escaping of any character. Anything between the delimiters becomes part of the string. prefix, if present, has the same meaning as described above.
C++Reference: string literal
a Raw string is defined like this:
string raw_str=R"(First line.\nSecond line.\nEnd of message.\n)";
and the difference is that a raw string ignores (escapes) all the special characters like \n ant \t and threats them like normal text.
So the above line would be just one line with 3 actual \n in it, instead of 3 separate lines.
You need to remove the define line and add parentheses around your string to be considered as a raw string.
For example, in the command line this works (the 1st argument has quotes but the 2nd argument doesn't):
"test.bat" "a" b
i.e it know that "a" is the 1st argument and b is the second
but using system() it doesn't work:
system("test.bat" "a" b)
this also doesn't work:
system("test.bat" \"a\" b)
This is gonna be simplest if we use a raw string literal. A raw string literal is a way of writing a string in c++ where nothing gets escaped. Let's look at an example:
char const* myCommand = R"(test.bat "a" b)";
The R at the beginning indicates that it's a raw string literal, and if you call system(myCommand), it will be exactly equivalent to typing
$ test.bat "a" b
into the command line. Now, suppose you want to escape the quotes on the command line:
$ test.bat \"a\" b
With a raw string literal, this is simple:
char const* myCommand = R"(test.bat \"a\" b)";
system(myCommand);
Or, alternatively:
system(R"(test.bat \"a\" b)");
Hope this helps!
A bit more info on raw string literals: Raw string literals are a great feature, and they basically allow you to copy+paste any text directly into your program. They begin with R, followed by a quote and a parenthesis. Only the stuff inside the parenthesis gets included. Examples:
using std::string;
string a = R"(Hello)"; // a == "Hello"
Begin and end with "raw":
string b = R"raw(Hello)raw"; // b == "Hello"
Begin and end with "foo"
string c = R"foo(Hello)foo"; // c == "Hello"
Begin and end with "x"
string d = R"x(Hello)x"; // d == "Hello"
The important thing is that we begin and end the literal with the same string of letters (called the delimiter), followed by the parenthesis. This ensures we never have a reason to escape something inside the raw string literal, because we can always change the delimiter so that it's not something found inside the string.
I got it to work now:
system(R"(C:\"to erase\test.bat" "a")");
I found the answer: system("test.bat" ""a"" b);
or more precisely: system("\"test.bat\" ""a"" b");
So the answer is to escape the quotes with a double quote
In C and C++ the rules are the same. In C,
[§6.4.4.4]/2 An integer character constant is a sequence of one or
more multibyte characters enclosed in single-quotes, as in 'x'.
In C++,
[§2.14.3]/1 A character literal is one or more characters enclosed
in single quotes, as in 'x', optionally preceded by one of the
letters u, U, or L, as in u'y', U'z', or L'x',
respectively.
The key phrase is "one or more". In contrast, a string literal can be empty, "", presumably because it consists of the null terminating character. In C, this leads to awkward initialization of a char. Either you leave it uninitialized, or use a useless value like 0 or '\0'.
char garbage;
char useless = 0;
char useless2 = '\0';
In C++, you have to use a string literal instead of a character literal if you want it to be empty.
(somecondition ? ' ' : '') // error
(somecondition ? " " : "") // necessary
What is the reason it is this way? I'm assuming C++'s reason is inherited from C.
The reason is that a character literal is defined as a character. There may be extensions that allow it to be more than one character, but it needs to be at least one character or it just doesn't make any sense. It would be the same as trying to do:
int i = ;
If you don't specify a value, what do you put there?
This is because an empty string still contains the the null character '\0' at the end, so there is still a value to bind to the variable name, whereas an empty character literal has no value.
String is a set of character terminated by a NULL character ( '\0' ).
So a Empty string will always have a NULL character in it at the end .
But in case of a character literal no value is there.
it needs at least one character.
This is one usage I found in a open source software.And I don't understant how it works.
when I ouput it to the stdout,it was "version 0.8.0".
const char version[] = " version " "0" "." "8" "." "0";
It's called string concatenation -- when you put two (or more) quoted strings next to each other in the source code with nothing between them, the compiler puts them together into a single string. This is most often used for long strings -- anything more than one line long:
char whatever[] = "this is the first line of the string\n"
"this is the second line of the string\n"
"This is the third line of the string";
Before string concatenation was invented, you had to do that with a rather clumsy line continuation, putting a backslash at the end of each line (and making sure it was the end, because most compilers wouldn't treat it as line continuation if there was any whitespace after the backslash). There was also ugliness with it throwing off indentation, because any whitespace at the beginning of subsequent lines might be included in the string.
This can cause a minor problem if you intended to put a comma between the strings, such as when initializing an array of pointers to char. If you miss a comma, the compiler won't warn you about it -- you'll just get one string that includes what was intended to be two separate ones.
This is a basic feature of both C89 and C++98 called 'adjacent string concatenation' or thereabouts.
Basically, if two string literals are adjacent to each other with no punctuation in between, they are merged into a single string, as your output shows.
In the C++98 standard, section §2.1 'Phases of translation [lex.phases]' says:
6 Adjacent ordinary string literal tokens are concatenated. Adjacent wide string literal tokens are concatenated.
This is after the preprocessor has completed.
In the C99 standard, the corresponding section is §5.1.2.1 'Translation Phases' and it says:
6 Adjacent string literal tokens are concatenated.
The wording would be very similar in any other C or C++ standard you can lay hands on (and I do recognize that both C++98 and C99 are superseded by C++11 and C11; I just don't have electronic copies of the final standards, yet).
Part of the C++ standard implementation states that string literals that are beside each other will be concatenated together.
Quotes from C and C++ Standard:
For C (quoting C99, but C11 has something similar in 6.4.5p5):
(C99, 6.4.5p5) "In translation phase 6, the multibyte character
sequences specified by any sequence of adjacent character and
identically-prefixed string literal tokens are concatenated into a
single multibyte character sequence."
For C++:
(C++11, 2.14.5p13) "In translation phase 6 (2.2), adjacent string
literals are concatenated."
const char version[] = " version " "0" "." "8" "." "0";
is same as:
const char version[] = " version 0.8.0";
Compiler concatenates the adjacent pieces of string-literals, making one bigger piece of string-literal.
As a sidenote, const char* (which is in your title) is not same as char char[] (which is in your posted code).
The compiler automatically concatenates string literals written after each other (separated by white-space only).. It is as if you have written
const char version[] = "version 0.8.0";
EDIT: corrected pre-processor to compiler
Adjacent string literals are concatenated:
When specifying string literals, adjacent strings are concatenated.
Therefore, this declaration:
char szStr[] = "12" "34"; is identical to this declaration:
char szStr[] = "1234"; This concatenation of adjacent strings makes it
easy to specify long strings across multiple lines:
cout << "Four score and seven years "
"ago, our forefathers brought forth "
"upon this continent a new nation.";
Simply putting strings one after the other concatenates them at compile time, so:
"Hello" ", " "World!" => "Hello, World!"
This is a strange usage of the feature, usually it is to allow #define strings to be used:
#define FOO "World!"
puts("Hello, " FOO);
Will compile to the same as:
puts("Hello, World!");