C++: Quote escapes for an entire line [duplicate] - c++

I came across this code snippet in C++17 draft n4713:
#define R "x"
const char* s = R"y"; // ill-formed raw string, not "x" "y"
What is a "raw string"? What does it do?

Raw string literals are string literals that are designed to make it easier to include nested characters like quotation marks and backslashes that normally have meanings as delimiters and escape sequence starts. They’re useful for, say, encoding text like HTML. For example, contrast
"C:\\Program Files\\"
which is a regular string literal, with
R"(C:\Program Files\)"
which is a raw string literal. Here, the use of parentheses in addition to quotes allows C++ to distinguish a nested quotation mark from the quotation marks delimiting the string itself.

Basically a raw string literal is a string in which the escape characters (like \n \t or \" ) of C++ are not processed. A raw string literal which starts with R"( and ends in )" ,introduced in C++11
prefix(optional) R "delimiter( raw_characters )delimiter"
prefix - One of L, u8, u, U
Thanks to #Remy Lebeau,
delimiter is optional and is typically omitted, but there are corner cases where it is actually needed, in particular if the string content contains the character sequence )" in it, eg: R"(...)"...)", so you would need a delimiter to avoid an error, eg: R"x(...)"...)x".
See an example:
#include <iostream>
#include <string>
int main()
{
std::string normal_str = "First line.\nSecond line.\nEnd of message.\n";
std::string raw_str = R"(First line.\nSecond line.\nEnd of message.\n)";
std::string raw_str_delim = R"x("(First line.\nSecond line...)")x";
std::cout << normal_str << std::endl;
std::cout << raw_str << std::endl;
std::cout << raw_str_delim << std::endl;
return 0;
}
output:
First line.
Second line.
End of message.
First line.\nSecond line.\nEnd of message.\n
"(First line.\nSecond line...)"
Live on Godbolt

I will make an addition about a concern in one of the comments:
But here in the code the R is defined as "x" and after
expansion of the #define the code is const char* s = "x""y";
and there isn't any R"(.
The code fragment in the question is to show invalid uses of the Raw Strings. Let me get the actual 3-lines of code here:
#define R "x"
const char* s = R"y"; // ill-formed raw string literal, not "x" "y"
const char* s2 = R"(a)" "b)"; // a raw string literal followed by a normal string literal
The first line is there to not get confused by a macro. macros are preprocessed code fragments that replace parts in the source. Raw String, on the other hand, is a feature of the language that is "parsed" according to language rules.
The second line is to show the wrong use of it. Correct way would be R"(x)" where you need parenthesis in it.
And the last is to show how it can be a pain if not written carefully. The string inside parenthesis CANNOT include closing sequence of raw string. A correction might be R"_(a)" "b)_". _ can be replaced by any character (but not parentheses, backslash and spaces) and any number of them as long as closing sequence is not included inside: R"___(a)" "b)___" or R"anything(a)" "b)anything"
So if we wrap these correction within a simple C++ code:
#include <iostream>
using namespace std;
#define R "x" // This is just a macro, not Raw String nor definition of it
const char* s = R"(y)"; // R is part of language, not a macro
const char* s2 = R"_(a)" "b)_"; // Raw String shall not include closing sequence of characters; )_"
int main(){ cout << s <<endl << s2 <<endl << R <<endl; }
then the output will be
y
a)" "b
x

Raw string literal. Used to avoid escaping of any character. Anything between the delimiters becomes part of the string. prefix, if present, has the same meaning as described above.
C++Reference: string literal
a Raw string is defined like this:
string raw_str=R"(First line.\nSecond line.\nEnd of message.\n)";
and the difference is that a raw string ignores (escapes) all the special characters like \n ant \t and threats them like normal text.
So the above line would be just one line with 3 actual \n in it, instead of 3 separate lines.
You need to remove the define line and add parentheses around your string to be considered as a raw string.

Related

Convert string to raw string

char str[] = "C:\Windows\system32"
auto raw_string = convert_to_raw(str);
std::cout << raw_string;
Desired output:
C:\Windows\system32
Is it possible? I am not a big fan of cluttering my path strings with extra backslash. Nor do I like an explicit R"()" notation.
Any other work-around of reading a backslash in a string literally?
That's not possible, \ has special meaning inside a non-raw string literal, and raw string literals exist precisely to give you a chance to avoid having to escape stuff. Give up, what you need is R"(...)".
Indeed, when you write something like
char const * str{"a\nb"};
you can verify yourself that strlen(str) is 3, not 4, which means that once you compile that line, in the binary/object file there's only one single character, the newline character, corresponding to \n; there's no \ nor n anywere in it, so there's no way you can retrieve them.
As a personal taste, I find raw string literals great! You can even put real Enter in there. Often just for the price of 3 characters - R, (, and ) - in addtion to those you would write anyway. Well, you would have to write more characters to escape anything needs escaping.
Look at
std::string s{R"(Hello
world!
This
is
Me!)"};
That's 28 keystrokes from R to last " included, and you can see in a glimpse it's 6 lines.
The equivalent non-raw string
std::string s{"Hello\nworld!\nThis\nis\nMe!"};
is 30 keystrokes from R to last " included, and you have to parse it carefully to count the lines.
A pretty short string, and you already see the advantage.
To answer the question, as asked, no it is not possible.
As an example of the impossibility, assume we have a path specified as "C:\a\b";
Now, str is actually represented in memory (in your program when running) using a statically allocated array of five characters with values {'C', ':', '\007', '\010', '\000'} where '\xyz' represents an OCTAL representation (so '\010' is a char equal to numerically to 8 in decimal).
The problem is that there is more than one way to produce that array of five characters using a string literal.
char str[] = "C:\a\b";
char str1[] = "C:\007\010";
char str2[] = "C:\a\010";
char str3[] = "C:\007\b";
char str4[] = "C:\x07\x08"; // \xmn uses hex coding
In the above, str1, str2, str3, and str4 are all initialised using equivalent arrays of five char.
That means convert_to_raw("C:\a\b") could quite legitimately assume it is passed ANY of the strings above AND
std::cout << convert_to_raw("C:\a\b") << '\n';
could quite legitimately produce output of
C:\007\010
(or any one of a number of other strings).
The practical problem with this, if you are working with windows paths, is that c:\a\b, C:\007\010, C:\a\010, C:\007\b, and C:\x07\x08 are all valid filenames under windows - that (unless they are hard links or junctions) name DIFFERENT files.
In the end, if you want to have string literals in your code representing filenames or paths, then use \\ or a raw string literal when you need a single backslash. Alternatively, write your paths as string literals in your code using all forward slashes (e.g. "C:/a/b") since windows API functions accept those too.

C++ string variables not accepting enter and tab whitespace?

Why does the following work:
string input = "a long string of text pasted from a .txt file";
But this version does not?
string input =
"
some
large
string ";
I thought C++ doesn't care about whitespace.
You can do something like this. It's called a raw string literal:
string input =
R"(
some
large
string )";
This will include the endline characters as well. The format is R"(string-literal)"
For the most parts no, it does not care about whitespace. But there are exceptions and string literals are one of them.
The rule is string literals cannot span multiple lines. But adjacent literals are automatically concatenated so you can just do
const char string[] = "very "
"long "
"string";
and it will be equivalent to
const char string[] = "very long string";
I am not sure about the origin of the rule, I suspect it might have been done to prevent confusion whether the newline should be part of the string or not (it's not unless explicitly escaped). Or maybe just some grammar/parser thing. Compiling C/C++ is kind of complicated and happens in multiple phases, see cppreference - string literals already have plenty of special treatment.

What is a raw string?

I came across this code snippet in C++17 draft n4713:
#define R "x"
const char* s = R"y"; // ill-formed raw string, not "x" "y"
What is a "raw string"? What does it do?
Raw string literals are string literals that are designed to make it easier to include nested characters like quotation marks and backslashes that normally have meanings as delimiters and escape sequence starts. They’re useful for, say, encoding text like HTML. For example, contrast
"C:\\Program Files\\"
which is a regular string literal, with
R"(C:\Program Files\)"
which is a raw string literal. Here, the use of parentheses in addition to quotes allows C++ to distinguish a nested quotation mark from the quotation marks delimiting the string itself.
Basically a raw string literal is a string in which the escape characters (like \n \t or \" ) of C++ are not processed. A raw string literal which starts with R"( and ends in )" ,introduced in C++11
prefix(optional) R "delimiter( raw_characters )delimiter"
prefix - One of L, u8, u, U
Thanks to #Remy Lebeau,
delimiter is optional and is typically omitted, but there are corner cases where it is actually needed, in particular if the string content contains the character sequence )" in it, eg: R"(...)"...)", so you would need a delimiter to avoid an error, eg: R"x(...)"...)x".
See an example:
#include <iostream>
#include <string>
int main()
{
std::string normal_str = "First line.\nSecond line.\nEnd of message.\n";
std::string raw_str = R"(First line.\nSecond line.\nEnd of message.\n)";
std::string raw_str_delim = R"x("(First line.\nSecond line...)")x";
std::cout << normal_str << std::endl;
std::cout << raw_str << std::endl;
std::cout << raw_str_delim << std::endl;
return 0;
}
output:
First line.
Second line.
End of message.
First line.\nSecond line.\nEnd of message.\n
"(First line.\nSecond line...)"
Live on Godbolt
I will make an addition about a concern in one of the comments:
But here in the code the R is defined as "x" and after
expansion of the #define the code is const char* s = "x""y";
and there isn't any R"(.
The code fragment in the question is to show invalid uses of the Raw Strings. Let me get the actual 3-lines of code here:
#define R "x"
const char* s = R"y"; // ill-formed raw string literal, not "x" "y"
const char* s2 = R"(a)" "b)"; // a raw string literal followed by a normal string literal
The first line is there to not get confused by a macro. macros are preprocessed code fragments that replace parts in the source. Raw String, on the other hand, is a feature of the language that is "parsed" according to language rules.
The second line is to show the wrong use of it. Correct way would be R"(x)" where you need parenthesis in it.
And the last is to show how it can be a pain if not written carefully. The string inside parenthesis CANNOT include closing sequence of raw string. A correction might be R"_(a)" "b)_". _ can be replaced by any character (but not parentheses, backslash and spaces) and any number of them as long as closing sequence is not included inside: R"___(a)" "b)___" or R"anything(a)" "b)anything"
So if we wrap these correction within a simple C++ code:
#include <iostream>
using namespace std;
#define R "x" // This is just a macro, not Raw String nor definition of it
const char* s = R"(y)"; // R is part of language, not a macro
const char* s2 = R"_(a)" "b)_"; // Raw String shall not include closing sequence of characters; )_"
int main(){ cout << s <<endl << s2 <<endl << R <<endl; }
then the output will be
y
a)" "b
x
Raw string literal. Used to avoid escaping of any character. Anything between the delimiters becomes part of the string. prefix, if present, has the same meaning as described above.
C++Reference: string literal
a Raw string is defined like this:
string raw_str=R"(First line.\nSecond line.\nEnd of message.\n)";
and the difference is that a raw string ignores (escapes) all the special characters like \n ant \t and threats them like normal text.
So the above line would be just one line with 3 actual \n in it, instead of 3 separate lines.
You need to remove the define line and add parentheses around your string to be considered as a raw string.

How do you call a batch file with an argument that has quotes, using system()

For example, in the command line this works (the 1st argument has quotes but the 2nd argument doesn't):
"test.bat" "a" b
i.e it know that "a" is the 1st argument and b is the second
but using system() it doesn't work:
system("test.bat" "a" b)
this also doesn't work:
system("test.bat" \"a\" b)
This is gonna be simplest if we use a raw string literal. A raw string literal is a way of writing a string in c++ where nothing gets escaped. Let's look at an example:
char const* myCommand = R"(test.bat "a" b)";
The R at the beginning indicates that it's a raw string literal, and if you call system(myCommand), it will be exactly equivalent to typing
$ test.bat "a" b
into the command line. Now, suppose you want to escape the quotes on the command line:
$ test.bat \"a\" b
With a raw string literal, this is simple:
char const* myCommand = R"(test.bat \"a\" b)";
system(myCommand);
Or, alternatively:
system(R"(test.bat \"a\" b)");
Hope this helps!
A bit more info on raw string literals: Raw string literals are a great feature, and they basically allow you to copy+paste any text directly into your program. They begin with R, followed by a quote and a parenthesis. Only the stuff inside the parenthesis gets included. Examples:
using std::string;
string a = R"(Hello)"; // a == "Hello"
Begin and end with "raw":
string b = R"raw(Hello)raw"; // b == "Hello"
Begin and end with "foo"
string c = R"foo(Hello)foo"; // c == "Hello"
Begin and end with "x"
string d = R"x(Hello)x"; // d == "Hello"
The important thing is that we begin and end the literal with the same string of letters (called the delimiter), followed by the parenthesis. This ensures we never have a reason to escape something inside the raw string literal, because we can always change the delimiter so that it's not something found inside the string.
I got it to work now:
system(R"(C:\"to erase\test.bat" "a")");
I found the answer: system("test.bat" ""a"" b);
or more precisely: system("\"test.bat\" ""a"" b");
So the answer is to escape the quotes with a double quote

C++ Escape occurrences of \ in a string

Is there a simple way to escape all occurrences of \ in a string? I start with the following string:
#include <string>
#include <iostream>
std::string escapeSlashes(std::string str) {
// I have no idea what to do here
return str;
}
int main () {
std::string str = "a\b\c\d";
std::cout << escapeSlashes(str) << "\n";
// Desired output:
// a\\b\\c\\d
return 0;
}
Basically, I am looking for the inverse to this question. The problem is that I cannot search for \ in the string, because C++ already treats it as an escape sequence.
NOTE: I am not able to change the string str in the first place. It is parsed from a LaTeX file. Thus, this answers to a similar question does not apply. Edit: The parsing failed due to an unrelated problem, the question here is about string literals.
Edit: There are nice solutions to find and replace known escape sequences, such as this answer. Another option is to use boost::regex("\p{cntrl}"). However, I haven't found one that works for unknown (erroneous) escape sequences.
You can use raw string literal. See http://en.cppreference.com/w/cpp/language/string_literal
#include <string>
#include <iostream>
int main() {
std::string str = R"(a\b\c\d)";
std::cout << str << "\n";
return 0;
}
Output:
a\b\c\d
It is not possible to convert the string literal a\b\c\d to a\\b\\c\\d, i.e. escaping the backslashes.
Why? Because the compiler converts \c and \d directly to c and d, respectively, giving you a warning about Unknown escape sequence \c and Unknown escape sequence \d (\b is fine as it is a valid escape sequence). This happens directly to the string literal before you have any chance to work with it.
To see this, you can compile to assembler
gcc -S main.cpp
and you will find the following line somewhere in your assembler code:
.string "a\bcd"
Thus, your problem is either in your parsing function or you use string literals for experimenting and you should use raw strings R"(a\b\c\d)" instead.