C++ Escape occurrences of \ in a string - c++

Is there a simple way to escape all occurrences of \ in a string? I start with the following string:
#include <string>
#include <iostream>
std::string escapeSlashes(std::string str) {
// I have no idea what to do here
return str;
}
int main () {
std::string str = "a\b\c\d";
std::cout << escapeSlashes(str) << "\n";
// Desired output:
// a\\b\\c\\d
return 0;
}
Basically, I am looking for the inverse to this question. The problem is that I cannot search for \ in the string, because C++ already treats it as an escape sequence.
NOTE: I am not able to change the string str in the first place. It is parsed from a LaTeX file. Thus, this answers to a similar question does not apply. Edit: The parsing failed due to an unrelated problem, the question here is about string literals.
Edit: There are nice solutions to find and replace known escape sequences, such as this answer. Another option is to use boost::regex("\p{cntrl}"). However, I haven't found one that works for unknown (erroneous) escape sequences.

You can use raw string literal. See http://en.cppreference.com/w/cpp/language/string_literal
#include <string>
#include <iostream>
int main() {
std::string str = R"(a\b\c\d)";
std::cout << str << "\n";
return 0;
}
Output:
a\b\c\d

It is not possible to convert the string literal a\b\c\d to a\\b\\c\\d, i.e. escaping the backslashes.
Why? Because the compiler converts \c and \d directly to c and d, respectively, giving you a warning about Unknown escape sequence \c and Unknown escape sequence \d (\b is fine as it is a valid escape sequence). This happens directly to the string literal before you have any chance to work with it.
To see this, you can compile to assembler
gcc -S main.cpp
and you will find the following line somewhere in your assembler code:
.string "a\bcd"
Thus, your problem is either in your parsing function or you use string literals for experimenting and you should use raw strings R"(a\b\c\d)" instead.

Related

C++: Quote escapes for an entire line [duplicate]

I came across this code snippet in C++17 draft n4713:
#define R "x"
const char* s = R"y"; // ill-formed raw string, not "x" "y"
What is a "raw string"? What does it do?
Raw string literals are string literals that are designed to make it easier to include nested characters like quotation marks and backslashes that normally have meanings as delimiters and escape sequence starts. They’re useful for, say, encoding text like HTML. For example, contrast
"C:\\Program Files\\"
which is a regular string literal, with
R"(C:\Program Files\)"
which is a raw string literal. Here, the use of parentheses in addition to quotes allows C++ to distinguish a nested quotation mark from the quotation marks delimiting the string itself.
Basically a raw string literal is a string in which the escape characters (like \n \t or \" ) of C++ are not processed. A raw string literal which starts with R"( and ends in )" ,introduced in C++11
prefix(optional) R "delimiter( raw_characters )delimiter"
prefix - One of L, u8, u, U
Thanks to #Remy Lebeau,
delimiter is optional and is typically omitted, but there are corner cases where it is actually needed, in particular if the string content contains the character sequence )" in it, eg: R"(...)"...)", so you would need a delimiter to avoid an error, eg: R"x(...)"...)x".
See an example:
#include <iostream>
#include <string>
int main()
{
std::string normal_str = "First line.\nSecond line.\nEnd of message.\n";
std::string raw_str = R"(First line.\nSecond line.\nEnd of message.\n)";
std::string raw_str_delim = R"x("(First line.\nSecond line...)")x";
std::cout << normal_str << std::endl;
std::cout << raw_str << std::endl;
std::cout << raw_str_delim << std::endl;
return 0;
}
output:
First line.
Second line.
End of message.
First line.\nSecond line.\nEnd of message.\n
"(First line.\nSecond line...)"
Live on Godbolt
I will make an addition about a concern in one of the comments:
But here in the code the R is defined as "x" and after
expansion of the #define the code is const char* s = "x""y";
and there isn't any R"(.
The code fragment in the question is to show invalid uses of the Raw Strings. Let me get the actual 3-lines of code here:
#define R "x"
const char* s = R"y"; // ill-formed raw string literal, not "x" "y"
const char* s2 = R"(a)" "b)"; // a raw string literal followed by a normal string literal
The first line is there to not get confused by a macro. macros are preprocessed code fragments that replace parts in the source. Raw String, on the other hand, is a feature of the language that is "parsed" according to language rules.
The second line is to show the wrong use of it. Correct way would be R"(x)" where you need parenthesis in it.
And the last is to show how it can be a pain if not written carefully. The string inside parenthesis CANNOT include closing sequence of raw string. A correction might be R"_(a)" "b)_". _ can be replaced by any character (but not parentheses, backslash and spaces) and any number of them as long as closing sequence is not included inside: R"___(a)" "b)___" or R"anything(a)" "b)anything"
So if we wrap these correction within a simple C++ code:
#include <iostream>
using namespace std;
#define R "x" // This is just a macro, not Raw String nor definition of it
const char* s = R"(y)"; // R is part of language, not a macro
const char* s2 = R"_(a)" "b)_"; // Raw String shall not include closing sequence of characters; )_"
int main(){ cout << s <<endl << s2 <<endl << R <<endl; }
then the output will be
y
a)" "b
x
Raw string literal. Used to avoid escaping of any character. Anything between the delimiters becomes part of the string. prefix, if present, has the same meaning as described above.
C++Reference: string literal
a Raw string is defined like this:
string raw_str=R"(First line.\nSecond line.\nEnd of message.\n)";
and the difference is that a raw string ignores (escapes) all the special characters like \n ant \t and threats them like normal text.
So the above line would be just one line with 3 actual \n in it, instead of 3 separate lines.
You need to remove the define line and add parentheses around your string to be considered as a raw string.

Replace single backslash with double in a string c++

I am trying to replace one backslash with two. To do that I tried using the following code
str = "d:\test\text.txt"
str.replace("\\","\\\\");
The code does not work. Whole idea is to pass str to deletefile function, which requires double blackslash.
since c++11, you may try using regex
#include <regex>
#include <iostream>
int main() {
auto s = std::string(R"(\tmp\)");
s = std::regex_replace(s, std::regex(R"(\\)"), R"(\\)");
std::cout << s << std::endl;
}
A bit overkill, but does the trick is you want a "quick" sollution
There are two errors in your code.
First line: you forgot to double the \ in the literal string.
It happens that \t is a valid escape representing the tab character, so you get no compiler error, but your string doesn't contain what you expect.
Second line: according to the reference of string::replace,
you can replace a substring by another substring based on the substring position.
However, there is no version that makes a substitution, i.e. replace all occurences of a given substring by another one.
This doesn't exist in the standard library. It exists for example in the boost library, see boost string algorithms. The algorithm you are looking for is called replace_all.

What is a raw string?

I came across this code snippet in C++17 draft n4713:
#define R "x"
const char* s = R"y"; // ill-formed raw string, not "x" "y"
What is a "raw string"? What does it do?
Raw string literals are string literals that are designed to make it easier to include nested characters like quotation marks and backslashes that normally have meanings as delimiters and escape sequence starts. They’re useful for, say, encoding text like HTML. For example, contrast
"C:\\Program Files\\"
which is a regular string literal, with
R"(C:\Program Files\)"
which is a raw string literal. Here, the use of parentheses in addition to quotes allows C++ to distinguish a nested quotation mark from the quotation marks delimiting the string itself.
Basically a raw string literal is a string in which the escape characters (like \n \t or \" ) of C++ are not processed. A raw string literal which starts with R"( and ends in )" ,introduced in C++11
prefix(optional) R "delimiter( raw_characters )delimiter"
prefix - One of L, u8, u, U
Thanks to #Remy Lebeau,
delimiter is optional and is typically omitted, but there are corner cases where it is actually needed, in particular if the string content contains the character sequence )" in it, eg: R"(...)"...)", so you would need a delimiter to avoid an error, eg: R"x(...)"...)x".
See an example:
#include <iostream>
#include <string>
int main()
{
std::string normal_str = "First line.\nSecond line.\nEnd of message.\n";
std::string raw_str = R"(First line.\nSecond line.\nEnd of message.\n)";
std::string raw_str_delim = R"x("(First line.\nSecond line...)")x";
std::cout << normal_str << std::endl;
std::cout << raw_str << std::endl;
std::cout << raw_str_delim << std::endl;
return 0;
}
output:
First line.
Second line.
End of message.
First line.\nSecond line.\nEnd of message.\n
"(First line.\nSecond line...)"
Live on Godbolt
I will make an addition about a concern in one of the comments:
But here in the code the R is defined as "x" and after
expansion of the #define the code is const char* s = "x""y";
and there isn't any R"(.
The code fragment in the question is to show invalid uses of the Raw Strings. Let me get the actual 3-lines of code here:
#define R "x"
const char* s = R"y"; // ill-formed raw string literal, not "x" "y"
const char* s2 = R"(a)" "b)"; // a raw string literal followed by a normal string literal
The first line is there to not get confused by a macro. macros are preprocessed code fragments that replace parts in the source. Raw String, on the other hand, is a feature of the language that is "parsed" according to language rules.
The second line is to show the wrong use of it. Correct way would be R"(x)" where you need parenthesis in it.
And the last is to show how it can be a pain if not written carefully. The string inside parenthesis CANNOT include closing sequence of raw string. A correction might be R"_(a)" "b)_". _ can be replaced by any character (but not parentheses, backslash and spaces) and any number of them as long as closing sequence is not included inside: R"___(a)" "b)___" or R"anything(a)" "b)anything"
So if we wrap these correction within a simple C++ code:
#include <iostream>
using namespace std;
#define R "x" // This is just a macro, not Raw String nor definition of it
const char* s = R"(y)"; // R is part of language, not a macro
const char* s2 = R"_(a)" "b)_"; // Raw String shall not include closing sequence of characters; )_"
int main(){ cout << s <<endl << s2 <<endl << R <<endl; }
then the output will be
y
a)" "b
x
Raw string literal. Used to avoid escaping of any character. Anything between the delimiters becomes part of the string. prefix, if present, has the same meaning as described above.
C++Reference: string literal
a Raw string is defined like this:
string raw_str=R"(First line.\nSecond line.\nEnd of message.\n)";
and the difference is that a raw string ignores (escapes) all the special characters like \n ant \t and threats them like normal text.
So the above line would be just one line with 3 actual \n in it, instead of 3 separate lines.
You need to remove the define line and add parentheses around your string to be considered as a raw string.

regular expression Can't find sequence of numbers

Let
exp = ^[0-9!##$%^&*()_+-=[]{};':"\|,.<>/?\s]*$
be a regular expression that allows me to find all sequences of numbers with or without special characters.
by using exp I manage to extract all sequences of numbers that are greater than 5. But the number 98200 cannot be extracted. I am not using any limits to how long should the sequence of numbers be.
Source code:
#include <boost/regex.hpp>
#include iostream;
using namespace std;
int main()
{
string s = "16000";
string exp = ^[0-9!##$%^&*()_+-=[]{};':"\\|,.<>\\/?\\s]*$
const boost::regex e(exp);
bool isSequence = boost::regex_match(s,e);
//isSequence is boolean and should be equal to 1
cout << isSequence << endl;
return 0;
}
In C#, you need to escape the ]. You don't need to escape [ {} () when they are inside a character class. Also, if you want to include the dash as an included character in the character class, it should be at the beginning or end of the list. The sequence that you have of +-= translates to [+,-./0123456789:;<=] which makes your regex redundant. Finally, because of the terminal quantifier, you are allowing matching of zero length strings. This may be what you want, but if not, consider the '+' quantifier.
What about simply
[^A-Za-z]+
with or without the ^ $ anchors at the beginning/end
Indiscriminately escaping everything works for me.. :)
string exp = "^[0-9\\!##\\$\\%\\^&*\\(\\)_\\+\\-=\\[\\]\\{\\};\\\':\\\"\\\\|,\\.<>\\/?\\s]*$";
Note the double backslash... I'm sure you can workout which of the characters in your list means anything special, and only escape those, as I don't have the time to lookup what has special meaning in this context, I escaped everything, and this works fine for a few of the cases I tested
16000 => returns 1 16A000 => returns 0 16#000 => returns 1
Which I'm guessing is what you want...
I have shifted the brackets to the front of the character class and therewith I get the output 1 for 98200 using the following code:
#include <string>
#include <boost/regex.hpp>
#include <iostream>
using namespace std;
int main()
{
std::cout << "main()\n";
string s = "98200";
string exp = "^[][0-9!##$%^&*()_+-={};':\"\\|,.<>\\/?\\s]*$";
const boost::regex e(exp);
bool isSequence = boost::regex_match(s,e);
//isSequence is boolean and should be equal to 1
cout << isSequence << endl;
return 0;
}
/**
Local Variables:
compile-command: "g++ -g test.cc -o test.exe -lboost_regex-mt; ./test.exe"
End:
*/
EDIT: Note, that I used my experience with emacs regular
expressions. The info pages of emacs explain: "To include a ] in a
character set, you must make it the first character." I tried this
with boost::regexp and it worked. Later on when I had more time I read
in the boost manual
http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html#boost_regex.syntax.perl_syntax.character_sets
that this is not specified for the perl regular expression syntax.
The perl syntax is the standard setting for boost::regex. According to the
specification the comment by
https://stackoverflow.com/users/2872922/ron-rosenfeld is the best
answer.
In the following program I eliminate the character range which was incidentally encoded into your regular expression.
Testing shows that the bracket at the beginning of the character set is included into the character set. So it turns out that my statement was right even if it is not specified in the official manual of boost::regex.
Nevertheless, I suggest that https://stackoverflow.com/users/2872922/ron-rosenfeld inserts his comment as an answer and you mark it as the solution. This will help others reading this thread.
#include <string>
#include <boost/regex.hpp>
#include <iostream>
using namespace std;
int main()
{
std::cout << "main()\n";
string s = "98-[2]00";
string exp = "^[][0-9!##$%^&*()_+={};':\"|,.<>/?\\s-]*$";
const boost::regex e(exp);
bool isSequence = boost::regex_match(s,e);
//isSequence is boolean and should be equal to 1
cout << isSequence << endl;
return 0;
}
/**
Local Variables:
compile-command: "g++ -g test.cc -o test.exe -lboost_regex-mt; ./test.exe"
End:
*/
I asked at http://lists.boost.org/boost-users/2013/12/80707.php
The answer of John Maddock (the author of the boost::regex library) is:
>I discovered that if one uses an closing bracket as the first character of
>a
>character class the character class includes this bracket.
>This works with the standard setting of boost::regex (i.e., perl-regular
>expressions) but it is not documented in the
>manual page
>
>http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/syntax/
>perl_syntax.html#boost_regex.syntax.perl_syntax.character_sets
>
>Is this an undocumented feature, a bug or did I misinterpret something in
>the manual?
It's a feature, both Perl and POSIX extended regular expression behave the
same way.
John.

How can I match the \0 character in a regex in C++?

I need to match the text '\0' with the same regex that I would match 'a' or 'b'. (a regex for a character constant in C++). I've tried a bunch of different regexes, but haven't gotten a successful one yet. My latest attempt:
^['].|\\0[']
Most of the other things I've tried have given seg faults, so this is really the closest I've gotten.
This works pretty nicely with what I've tested ('a','b','\0').
If you don't have std::regex or boost::regex I guess what you can get out of it is the fact that the regex I used is ('.'|'\\0').
#include <boost/regex.hpp>
#include <string>
#include <iostream>
#include <vector>
int main() {
std::vector<std::string> strings;
strings.push_back(R"('a')");
strings.push_back(R"('b')");
strings.push_back(R"('\0')");
boost::regex rgx(R"(('.'|'\\0'))");
boost::smatch match;
for(auto& i : strings) {
if(boost::regex_match(i,match, rgx)) {
boost::ssub_match submatch = match[1];
std::cout << submatch.str() << '\n';
}
}
}
Example
There's nothing magic about '\0'; it's just a character, like any other character, and there's nothing (almost) special you have to do to use it in a regular expression. The only problem you might run into is if you use it in the middle of a character literal that you pass to a function that treats it as the end of a string. To avoid that, force it into a std::string:
const char s[] = "a\0b";
std::string not_my_str(s); // not_my_str holds "a"
std::string str(s, 3); // str holds "a\0b"
Once you've constructed the string object, the embedded '\0' gets no special treatment. Except, of course, if you copy the contents with a function that treats it specially.
The regex that works (in this instance, using the C header ) is:
^('(.|([\\]0))')
Thanks to #WhozCraig for the help!