How to see backslashes in string - c++

For a function I am making, I take a string in as a parameter and do things with it. However I treat characters in the string specially if there is a backslash before it. However I am having problems even seeing the blackslash!
std::string s = "01234\6";
std::cout << s << std::endl;
std::cout << s.at(5) << std::endl;
if(s.at(5)== '\\')
std::cout << "It's a backslash" << std::endl;
else
std::cout << "It's not a backslash" << std::endl;
outputs
01234
It's not a backslash
How am I supposed to check if mystring.at(i) == '\\' if it isn't showing up at all?
The input will be coming from another file (which I can't modify) like
myfunc("% \% %");
If I read the string I count 3 '%' characters (so its not ignored by the backslash), and 0 '\' characters
edit: Code how I count
char percent = '%';
int current_index = 0;
int percent_count = 0;
int ret = str.find(percent, current_index);
while(ret != std::string::npos)
{
percent_count++;
current_index = ret +1;
ret = str.find(percent, current_index);
}
return percent_count;

C++ supports three kinds of escape sequences:
simple-escape-sequence. It is one of:
\’ \" \? \\
\a \b \f \n \r \t \v
octal-escape-sequence. It is one of:
\ octal-digit
\ octal-digit octal-digit
\ octal-digit octal-digit octal-digit
\0 is the most well known octal escape sequence that represents the null character.
hexadecimal-escape-sequence. It is one of:
\x hexadecimal-digit
hexadecimal-escape-sequence hexadecimal-digit
When you use:
std::string s = "01234\6";
the \6 part represents an octal escape sequence. It does not represent two characters.
It is the same as
std::string s = "01234?";
where ? is the character represented by the octal number 6.
In order to have \ as an element of the string, you'll need to use:
std::string s = "01234\\6";

The checking method is right, but \ escape 6, so \6 is counted once, you can check sizeof("12345\6"), which 7, or strlen("12345\6"), which is 6.
Change "12345\6" to "12345\\6".

The C++ compiler would have already treated it specially if you have backslash in the string:
std::string s = "01234\6"; //\6 is treated differently already, as unicode character \6, not as backslash + 6
Unless what you mean is you want to have a text with backslash (say, from I/O). In that case, you should put \\ to make your compiler understand that you mean it as real backslash not a unicode character:
std::string s = "01234\\6"; //double backslash here
Then you can test your program.

No compiler C++ will interpret \ as a backslash, since its the escape character. You will have to use \\ to denote a backslash in a string.

Related

how can I print "\' in c++?

I have a homework assignment where part of the menu has to have "R\C" printed, but when I run the program the console just prints "RC". Does anyone know why is this happening and how I can fix it?
This is what I have in Visual Studio:
cout << "R\C" << endl;
The \C is being interpreted as an (invalid) escape sequence. You need to escape the \ character as \\ in order to print it as a single \, eg:
cout << "R\\C" << endl;
Alternatively, in C++11 and later, you can use a raw string literal instead, so you do not need to escape the \ character:
cout << R"(R\C)" << endl;
Escape \ with another \:
cout << "R\\C" << endl;
c++ reserve some characters, so you can't directly input them, usually you will have to put \ in front of them to signify that you want to use "\" as a string.
You have to use escape sequences for certain characters. For the character that you specified you would have to output as “\\” and your output would be \. Other escape sequences are:
\’
\t For Tab
\n For newline
\? For question marks
See this for more information.
You can use escape sequences.., like \t, \n, \a...
If you want to print ' \ ', you have to code like this
cout<<"\\";

Regex for matching C++ string constant

I'm currently working on a C++ preprocessor and I need to match string constants with more than 0 letters like this "hey I'm a string.
I'm currently working with this one here \"([^\\\"]+|\\.)+\" but it fails on one of my test cases.
Test cases:
std::cout << "hello" << " world";
std::cout << "He said: \"bananas\"" << "...";
std::cout << "";
std::cout << "\x12\23\x34";
Expected output:
std::cout << String("hello") << String(" world");
std::cout << String("He said: \"bananas\"") << String("...");
std::cout << "";
std::cout << String("\x12\23\x34");
On the second one I instead get
std::cout << String("He said: \")bananas\"String(" << ")...";
Short repro code (using the regex by AR.3):
std::string in_line = "std::cout << \"He said: \\\"bananas\\\"\" << \"...\";";
std::regex r("\"([^\"]+|\\.|(?<=\\\\)\")+\"");
in_line = std::regex_replace(in_line, r, "String($&)");
Lexing a source file is a good job for regexes. But for such a task, let's use a better regex engine than std::regex. Let's use PCRE (or boost::regex) at first. At the end of this post, I'll show what you can do with a less feature-packed engine.
We only need to do partial lexing, ignoring all unrecognized tokens that won't affect string literals. What we need to handle is:
Singleline comments
Multiline comments
Character literals
String literals
We'll be using the extended (x) option, which ignores whitespace in the pattern.
Comments
Here's what [lex.comment] says:
The characters /* start a comment, which terminates with the characters */. These comments do not nest.
The characters // start a comment, which terminates immediately before the next new-line character. If
there is a form-feed or a vertical-tab character in such a comment, only white-space characters shall appear
between it and the new-line that terminates the comment; no diagnostic is required. [ Note: The comment
characters //, /*, and */ have no special meaning within a // comment and are treated just like other
characters. Similarly, the comment characters // and /* have no special meaning within a /* comment.
— end note ]
# singleline comment
// .* (*SKIP)(*FAIL)
# multiline comment
| /\* (?s: .*? ) \*/ (*SKIP)(*FAIL)
Easy peasy. If you match anything there, just (*SKIP)(*FAIL) - meaning that you throw away the match. The (?s: .*? ) applies the s (singleline) modifier to the . metacharacter, meaning it's allowed to match newlines.
Character literals
Here's the grammar from [lex.ccon]:
character-literal:
encoding-prefix(opt) ’ c-char-sequence ’
encoding-prefix:
one of u8 u U L
c-char-sequence:
c-char
c-char-sequence c-char
c-char:
any member of the source character set except the single-quote ’, backslash \, or new-line character
escape-sequence
universal-character-name
escape-sequence:
simple-escape-sequence
octal-escape-sequence
hexadecimal-escape-sequence
simple-escape-sequence: one of \’ \" \? \\ \a \b \f \n \r \t \v
octal-escape-sequence:
\ octal-digit
\ octal-digit octal-digit
\ octal-digit octal-digit octal-digit
hexadecimal-escape-sequence:
\x hexadecimal-digit
hexadecimal-escape-sequence hexadecimal-digit
Let's define a few things first, which we'll need later on:
(?(DEFINE)
(?<prefix> (?:u8?|U|L)? )
(?<escape> \\ (?:
['"?\\abfnrtv] # simple escape
| [0-7]{1,3} # octal escape
| x [0-9a-fA-F]{1,2} # hex escape
| u [0-9a-fA-F]{4} # universal character name
| U [0-9a-fA-F]{8} # universal character name
))
)
prefix is defined as an optional u8, u, U or L
escape is defined as per the standard, except that I've merged universal-character-name into it for the sake of simplicity
Once we have these, a character literal is pretty simple:
(?&prefix) ' (?> (?&escape) | [^'\\\r\n]+ )+ ' (*SKIP)(*FAIL)
We throw it away with (*SKIP)(*FAIL)
Simple strings
They're defined in almost the same way as character literals. Here's a part of [lex.string]:
string-literal:
encoding-prefix(opt) " s-char-sequence(opt) "
encoding-prefix(opt) R raw-string
s-char-sequence:
s-char
s-char-sequence s-char
s-char:
any member of the source character set except the double-quote ", backslash \, or new-line character
escape-sequence
universal-character-name
This will mirror the character literals:
(?&prefix) " (?> (?&escape) | [^"\\\r\n]+ )* "
The differences are:
The character sequence is optional this time (* instead of +)
The double quote is disallowed when unescaped instead of the single quote
We actually don't throw it away :)
Raw strings
Here's the raw string part:
raw-string:
" d-char-sequence(opt) ( r-char-sequence(opt) ) d-char-sequence(opt) "
r-char-sequence:
r-char
r-char-sequence r-char
r-char:
any member of the source character set, except a right parenthesis )
followed by the initial d-char-sequence (which may be empty) followed by a double quote ".
d-char-sequence:
d-char
d-char-sequence d-char
d-char:
any member of the basic source character set except:
space, the left parenthesis (, the right parenthesis ), the backslash \,
and the control characters representing horizontal tab,
vertical tab, form feed, and newline.
The regex for this is:
(?&prefix) R " (?<delimiter>[^ ()\\\t\x0B\r\n]*) \( (?s:.*?) \) \k<delimiter> "
[^ ()\\\t\x0B\r\n]* is the set of characters that are allowed in delimiters (d-char)
\k<delimiter> refers to the previously matched delimiter
The full pattern
The full pattern is:
(?(DEFINE)
(?<prefix> (?:u8?|U|L)? )
(?<escape> \\ (?:
['"?\\abfnrtv] # simple escape
| [0-7]{1,3} # octal escape
| x [0-9a-fA-F]{1,2} # hex escape
| u [0-9a-fA-F]{4} # universal character name
| U [0-9a-fA-F]{8} # universal character name
))
)
# singleline comment
// .* (*SKIP)(*FAIL)
# multiline comment
| /\* (?s: .*? ) \*/ (*SKIP)(*FAIL)
# character literal
| (?&prefix) ' (?> (?&escape) | [^'\\\r\n]+ )+ ' (*SKIP)(*FAIL)
# standard string
| (?&prefix) " (?> (?&escape) | [^"\\\r\n]+ )* "
# raw string
| (?&prefix) R " (?<delimiter>[^ ()\\\t\x0B\r\n]*) \( (?s:.*?) \) \k<delimiter> "
See the demo here.
boost::regex
Here's a simple demo program using boost::regex:
#include <string>
#include <iostream>
#include <boost/regex.hpp>
static void test()
{
boost::regex re(R"regex(
(?(DEFINE)
(?<prefix> (?:u8?|U|L) )
(?<escape> \\ (?:
['"?\\abfnrtv] # simple escape
| [0-7]{1,3} # octal escape
| x [0-9a-fA-F]{1,2} # hex escape
| u [0-9a-fA-F]{4} # universal character name
| U [0-9a-fA-F]{8} # universal character name
))
)
# singleline comment
// .* (*SKIP)(*FAIL)
# multiline comment
| /\* (?s: .*? ) \*/ (*SKIP)(*FAIL)
# character literal
| (?&prefix)? ' (?> (?&escape) | [^'\\\r\n]+ )+ ' (*SKIP)(*FAIL)
# standard string
| (?&prefix)? " (?> (?&escape) | [^"\\\r\n]+ )* "
# raw string
| (?&prefix)? R " (?<delimiter>[^ ()\\\t\x0B\r\n]*) \( (?s:.*?) \) \k<delimiter> "
)regex", boost::regex::perl | boost::regex::no_mod_s | boost::regex::mod_x | boost::regex::optimize);
std::string subject(R"subject(
std::cout << L"hello" << " world";
std::cout << "He said: \"bananas\"" << "...";
std::cout << "";
std::cout << "\x12\23\x34";
std::cout << u8R"hello(this"is\a\""""single\\(valid)"
raw string literal)hello";
"" // empty string
'"' // character literal
// this is "a string literal" in a comment
/* this is
"also inside"
//a comment */
// and this /*
"is not in a comment"
// */
"this is a /* string */ with nested // comments"
)subject");
std::cout << boost::regex_replace(subject, re, "String\\($&\\)", boost::format_all) << std::endl;
}
int main(int argc, char **argv)
{
try
{
test();
}
catch(std::exception ex)
{
std::cerr << ex.what() << std::endl;
}
return 0;
}
(I left syntax highlighting disabled because it goes nuts on this code)
For some reason, I had to take the ? quantifier out of prefix (change (?<prefix> (?:u8?|U|L)? ) to (?<prefix> (?:u8?|U|L) ) and (?&prefix) to (?&prefix)?) to make the pattern work. I believe it's a bug in boost::regex, as both PCRE and Perl work just fine with the original pattern.
What if we don't have a fancy regex engine at hand?
Note that while this pattern technically uses recursion, it never nests recursive calls. Recursion could be avoided by inlining the relevant reusable parts into the main pattern.
A couple of other constructs can be avoided at the price of reduced performance. We can safely replace the atomic groups (?>...) with normal groups (?:...) if we don't nest quantifiers in order to avoid catastrophic backtracking.
We can also avoid (*SKIP)(*FAIL) if we add one line of logic into the replacement function: All the alternatives to skip are grouped in a capturing group. If the capturing group matched, just ignore the match. If not, then it's a string literal.
All of this means we can implement this in JavaScript, which has one of the simplest regex engines you can find, at the price of breaking the DRY rule and making the pattern illegible. The regex becomes this monstrosity once converted:
(\/\/.*|\/\*[\s\S]*?\*\/|(?:u8?|U|L)?'(?:\\(?:['"?\\abfnrtv]|[0-7]{1,3}|x[0-9a-fA-F]{1,2}|u[0-9a-fA-F]{4}|U[0-9a-fA-F]{8})|[^'\\\r\n])+')|(?:u8?|U|L)?"(?:\\(?:['"?\\abfnrtv]|[0-7]{1,3}|x[0-9a-fA-F]{1,2}|u[0-9a-fA-F]{4}|U[0-9a-fA-F]{8})|[^"\\\r\n])*"|(?:u8?|U|L)?R"([^ ()\\\t\x0B\r\n]*)\([\s\S]*?\)\2"
And here's an interactive demo you can play with:
function run() {
var re = /(\/\/.*|\/\*[\s\S]*?\*\/|(?:u8?|U|L)?'(?:\\(?:['"?\\abfnrtv]|[0-7]{1,3}|x[0-9a-fA-F]{1,2}|u[0-9a-fA-F]{4}|U[0-9a-fA-F]{8})|[^'\\\r\n])+')|(?:u8?|U|L)?"(?:\\(?:['"?\\abfnrtv]|[0-7]{1,3}|x[0-9a-fA-F]{1,2}|u[0-9a-fA-F]{4}|U[0-9a-fA-F]{8})|[^"\\\r\n])*"|(?:u8?|U|L)?R"([^ ()\\\t\x0B\r\n]*)\([\s\S]*?\)\2"/g;
var input = document.getElementById("input").value;
var output = input.replace(re, function(m, ignore) {
return ignore ? m : "String(" + m + ")";
});
document.getElementById("output").innerText = output;
}
document.getElementById("input").addEventListener("input", run);
run();
<h2>Input:</h2>
<textarea id="input" style="width: 100%; height: 50px;">
std::cout << L"hello" << " world";
std::cout << "He said: \"bananas\"" << "...";
std::cout << "";
std::cout << "\x12\23\x34";
std::cout << u8R"hello(this"is\a\""""single\\(valid)"
raw string literal)hello";
"" // empty string
'"' // character literal
// this is "a string literal" in a comment
/* this is
"also inside"
//a comment */
// and this /*
"is not in a comment"
// */
"this is a /* string */ with nested // comments"
</textarea>
<h2>Output:</h2>
<pre id="output"></pre>
Regular expressions can be tricky for beginners but once you understand it's basics and well tested divide and conquer strategy, it will be your goto tool.
What you need to search for quote (") not starting with () back slash and read all characters upto next quote.
The regex I came up is (".*?[^\\]"). See a code snippet below.
std::string in_line = "std::cout << \"He said: \\\"bananas\\\"\" << \"...\";";
std::regex re(R"((".*?[^\\]"))");
in_line = std::regex_replace(in_line, re, "String($1)");
std::cout << in_line << endl;
Output:
std::cout << String("He said: \"bananas\"") << String("...");
Regex Explanation:
(".*?[^\\]")
Options: Case sensitive; Numbered capture; Allow zero-length matches; Regex syntax only
Match the regex below and capture its match into backreference number 1 (".*?[^\\]")
Match the character “"” literally "
Match any single character that is NOT a line break character (line feed, carriage return) .*?
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) *?
Match any character that is NOT the backslash character [^\\]
Match the character “"” literally "
String($1)
Insert the character string “String” literally String
Insert an opening parenthesis (
Insert the text that was last matched by capturing group number 1 $1
Insert a closing parenthesis )
Read the relevant sections from the C++ standard, they are called lex.ccon and lex.string.
Then convert each rule you find there into a regular expression (if you really want to use regular expressions; it might turn out that they are not capable of doing this job).
Then, build more complicated regular expressions out of them. Be sure to name your regular expressions exactly as the rules from the C++ standard, so that you can recheck them later.
If, instead of using regular expressions, you want to use an existing tool, here is one: http://clang.llvm.org/doxygen/Lexer_8cpp_source.html. Have a look at the LexStringLiteral function.

Check if character equals \ in C++

I'm trying to see if a character c equals \
if (c == '\')
//do something
I don't know exactly how this is called but everything after \ turns in a character string.
Backslash is used as the escape character in C++, as it is in many other languages. If you want a literal backslash, you need to use \\:
if (c == '\\') {
}
\ backslash is an escape character.
Escape sequences are used to represent certain special characters
within string literals and character literals.
Read here
So you should do:
if (c == '\\'){
}
You need escape sequences:
\\ backslash byte 0x5c in ASCII encoding
Change the code to
if (c == '\\')

Is it possible to return "weird" characters in a char?

I would like to know is it possbile to return "weird" characters, or rather ones that are important to the language
For example: \ ; '
I would like to know that because I need to return them by one function that's checking the unicode value of the text key, and is returning the character by it's number, I need these too.
I get a 356|error: missing terminating ' character
Line 356 looks as following
return '\';
Ideas?
The backslash is an escape for special characters. If you want a literal backslash you have to escape it with another backslash. Try:
return '\\';
The only problem here is that a backslash is used to escape characters in a literal. For example \n is a new line, \t is a horizontal tab. In your case, the compiler is seeing \' and thinking you mean a ' character (this is so you could have the ' character like so: '\''). You just need to escape your backslash:
return '\\';
Despite this looking like a character literal with two characters in it, it's not. \\ is an escape sequence which represents a single backslash.
Similarly, to return a ', you would do:
return '\'';
The list of available escape sequences are given by Table 7:
You can have a character literal containing any character from the execution character set and the resulting char will have the value of that character. However, if the value does not fit in a char, it will have implementation-defined value.
Any character can be returned.
Yet for some of them, you have to escape it using backslash: \.
So for returning backslash, you have to return:
return '\\';
To get a plain backslash use '\\'.
In C the following characters are represented using a backslash:
\a or \A : A bell
\b or \B : A backspace
\f or \F : A formfeed
\n or \N : A new line
\r or \R : A carriage return
\t or \T : A horizontal tab
\v or \V : A vertical tab
\xhh or \Xhh : A hexadecimal bit pattern
\ooo : An octal bit pattern
\0 : A null character
\" : The " character
\' : The ' character
\\ : A backslash (\)
A plain backslash confuses the system because it expects a character to follow it. Thus, you need to "escape" it. The octal/hexadecimal bit patterns may not seem too useful at first, but they let you use ANSI escape codes.
If the character following the backslash does not specify a legal escape sequence, as shown above, the result is implementation defined, but often the character following the backslash is taken literally, as though the escape were not present.
If you have to return such characters(",',\,{,]...etc) more then once, you should write a function that escapes that characters. I wrote that function once and it is:
function EscapeSpecialChars (_data) {
try {
if (!GUI_HELPER.NOU(_data)) {
return _data;
}
if (typeof (_data) != typeof (Array)) {
return _data;
}
while (_data.indexOf("
") > 0) {
_data = _data.replace("
", "");
}
while (_data.indexOf("\n") > 0) {
_data = _data.replace("\n", "\\n");
}
while (_data.indexOf("\r") > 0) {
_data = _data.replace("\r", "\\r");
}
while (_data.indexOf("\t") > 0) {
_data = _data.replace("\t", "\\t");
}
while (_data.indexOf("\b") > 0) {
_data = _data.replace("\b", "\\b");
}
while (_data.indexOf("\f") > 0) {
_data = _data.replace("\f", "\\f");
}
return _data;
} catch (err) {
alert(err);
}
},
then use it like this:
return EscapeSpecialChars("\'"{[}]");
You should improve the function. It was working for me, but it is not escaping all special characters.

How do I add a backslash after every character in a string?

I need to transform a literal filepath (C:/example.txt) to one that is compatible with the various WinAPI Registry functions (C://example.txt) and I have no idea on how to go about doing it.
I've broken it down to having to add a backslash after a certain character (/ in this case) but i'm completely stuck after that.
Guidance and Code Examples will be greatly appreciated.
I'm using C++ and VS2012.
In C++, strings are made up of individual characters, like "foo". Strings can be composed of printable characters, such as the letters of the alphabet, or non-printable characters, such as the enter key or other control characters.
You cannot type one of these non-printable characters in the normal way when populating a string. For example, if you want a string that contains "foo" then a tab, and then "bar", you can't create this by typing:
fooTABbar
because this will simply insert that many spaces -- it won't actually insert the TAB character.
You can specify these non-printable characters by "escaping" them out. This is done by inserting a back slash character (\) followed by the character's code. In the case of the string above TAB is represented by the escape sequence \t, so you would write: "foo\tbar".
The character \ is not itself a non-printable character, but C++ (and C) recognize it to be special -- it always denotes the beginning of an escape sequence. To include the character "\" in a string, it has to itself be escaped, with \\.
So in C++ if you want a string that contains:
c:\windows\foo\bar
You code this using escape sequences:
string s = "c:\\windows\\foo\\bar"
\\ is not two chars, is one char:
for(size_t i = 0, sz = sPath.size() ; i < sz ; i++)
if(sPath[i]=='/') sPath[i] = '\\';
But be aware that some APIs work with \ and some with /, so you need to check in which cases to use this replacement.
If replacing every occurrence of a forward slash with two backslashes is really what you want, then this should do the job:
size_t i = str.find('/');
while (i != string::npos)
{
string part1 = str.substr(0, i);
string part2 = str.substr(i + 1);
str = part1 + R"(\\)" + part2; // Use "\\\\" instead of R"(\\)" if your compiler doesn't support C++11's raw string literals
i = str.find('/', i + 1);
}
EDIT:
P.S. If I misunderstood the question and your intention is actually to replace every occurrence of a forward slash with just one backslash, then there is a simpler and more efficient solution (as #RemyLebeau points out in a comment):
size_t i = str.find('/');
while (i != string::npos)
{
str[i] = '\\';
i = str.find('/', i + 1);
}
Or, even better:
std::replace_if(str.begin(), str.end(), [] (char c) { return (c == '/'); }, '\\');