Detecting Quotes in a String C++ [duplicate] - c++

This question already has answers here:
How can I get double quotes into a string literal?
(2 answers)
Closed 2 years ago.
I am having issues detecting double quotes ("") or quotation marks in general from a string.
I tried using the str.find(""") or str.find("""") however, the first one doesn't compile and the second does not find the location. It returns 0. I have to read a string from a file, for example:
testFile.txt
This is the test "string" written in the file.
I would read the string using and search it
string str;
size_t x;
getline(inFile, str);
x = str.find("""");
however the value returned is 0. Is there another way to find the quotation marks that enclose 'string'?

The string """" doesn't contain any quotes. It is actually an empty string: when two string literals are next to each other, they get merged into just one string literal. For example "hello " "world" is equivalent to "hello world".
To embed quotes into a string you can do one of the following:
Escape the quotes you want to have inside your string, e.g. "\"\"".
Use a raw character string, e.g. R"("")".

You should use backslash to protect your quotes.
string a = str.find("\"")
will find ".

The " character is a special one, used to mark the beginning and end of a string literal.
String literals have the unusual property that consecutive ones are concatenated in translation phase 6, so for example the sequence "Hello" "World" is identical to "HelloWorld". This also means that """" is identical to "" "" is identical to "" - it's just a long way to write the empty string.
The documentation on string literals does say they can contain both unescaped and escaped characters. An escape is a special character that suppresses the special meaning of the next character. For example
\"
means "really just a double quote, not with the special meaning that it begins or ends a string literal".
So, you can write
"\""
for a string consisting of a single double quote.
You can also use a character literal since you only want one character anyway, with the 4th overload of std::string::find.

Related

Replace every " with \" in Lua

X-Problem: I want to dump an entire lua-script to a single string-line, which can be compiled into a C-Program afterwards.
Y-Problem: How can you replace every " with \" ?
I think it makes sense to try something like this
data = string.gsub(line, "c", "\c")
where c is the "-character. But this does not work of course.
You need to escape both quotes and backslashes, if I understand your Y problem:
data = string.gsub(line, "\"", "\\\"")
or use the other single quotes (still escape the backslash):
data = string.gsub(line, '"', '\\"')
A solution to your X-Problem is to safely escape any sequence that could interfere with the interpreter.
Lua has the %q option for string.format that will format and escape the provided string in such a way, that it can be safely read back by Lua. It should be also true for your C interpreter.
Example string: This \string's truly"tricky
If you just enclosed it in either single or double-quotes, there'd still be a quote that ended the string early. Also there's the invalid escape sequence \s.
Imagine this string was already properly handled in Lua, so we'll just pass it as a parameter:
string.format("%q", 'This \\string\'s truly"tricky')
returns (notice, I used single-quotes in code input):
"This \\string's truly\"tricky"
Now that's a completely valid Lua string that can be written and read from a file. No need to manually escape every special character and risk implementation mistakes.
To correctly implement your Y approach, to escape (invalid) characters with \, use proper pattern matching to replace the captured string with a prefix+captured string:
string.gsub('he"ll"o', "[\"']", "\\%1") -- will prepend backslash to any quote

Matching a string containing special characters with regex in perl

I have a line in my file which contains the following string
$print = "SM_sdo_debugss_cxct6_CSCTM_4 \csctm_gen[4]_ctm_i_nctm_I_csctm (4+5)";
$my_meta = '\csctm_gen[4]_ctm_i_nctm_I_csctm';
print "I got this\n" if($print =~ /\Q$my_meta\E/);
But it's not able to find the $my_meta string in $print. Why?
Your first string is in double quotes, so backslash escape sequences are processed.
\cs stands for Ctrl-S, which can also be written chr(19) or "\x13".
Your second string is in single quotes, which ignores backslash escapes (apart from \\ and \').
So your regex ends up looking for a 3-character sequence \ c s, but your target string contains a single byte 0x13.
To fix this, either write "... \\cs ..." in your first string (the first backslash escapes the second one), or use single quotes for your first string ('... \cs ...').

Processing a string with the null character

I have a text file full of strings (computer paths) which I want to process by replacing every backslash with an underscore, in addition to replacing every number ( integer or float) with an underscore as well, the original string looks like that :
string = "\Software\Microsoft\0\Windows\CurrentVersion\Internet Settings\5.0\Cache"
Usually, I could replace easily the backslash with the following command:
string=string.replace('\\','_')
and apply some regular expressions such as: '(\d(?:\.\d)?)' to replace the numbers.
However in my case I couldn't do either, because python recognise always '\0' as a null character and '\5.0' as ENQ, in fact any number follow the backslash will be treated the same way as well.
Any suggested way to replace them ?
e.g. is there a way to convert my string to raw string as a start ?
Always remember: Backslash(\) escapes special characters. If you want to use the backslash itself, you need to escape it too. Your string should look like this:
string = "\\Software\\Microsoft\\0\\Windows\\CurrentVersion\\Internet Settings\\5.0\\Cache"

Include )" in raw string literal without terminating said literal

The two characters )" terminate the raw string literal in the example below.
The sequence )" could appear in my text at some point, and I want the string to continue even if this sequence is found within it.
R"(
Some Text)"
)"; // ^^
How can I include the sequence )" within the string literal without terminating it?
Raw string literals let you specify an almost arbitrary* delimiter:
//choose ### as the delimiter so only )###" ends the string
R"###(
Some Text)"
)###";
*The exact rules are: "any member of the basic source character set except:
space, the left parenthesis (, the right parenthesis ), the backslash \,
and the control characters representing horizontal tab,
vertical tab, form feed, and newline" (N3936 §2.14.5 [lex.string] grammar) and "at most 16 characters" (§2.14.5/2)
Escaping won't help you since this is a raw literal, but the syntax is designed to allow clear demarcation of start and end, by introducing a little arbitrary phrase like aha.
R"aha(
Some Text)"
)aha";
By the way note the order of ) and " at the end, opposite of your example.
Regarding the formal, at first sight (studying the standard) it might seem as if escaping works the same in raw string literals as in ordinary literals. Except one knows that it doesn't, so how is that possible, when no exception is noted in the rules? Well, when raw string literals were introduced in C++11 it was by way of introducing an extra undoing translation phase, undoing the effect of e.g. escaping!, to wit, …
C++11 §2.5/3
” Between the
initial and final double quote characters of the raw string, any transformations performed in phases 1
and 2 (trigraphs, universal-character-names, and line splicing) are reverted; this reversion shall apply
before any d-char, r-char, or delimiting parenthesis is identified.
This takes care of Unicode character specifications (the universal-character-names like \u0042), which although they look and act like escapes are formally, in C++, not escape sequences.
The true formal escapes are handled, or rather, not handled!, by using a custom grammar rule for the content of a raw string literal. Namely that in C++ §2.14.5 the raw-string grammar entity is defined as
" d-char-sequenceopt ( r-char-sequenceopt ) d-char-sequenceopt "
where an r-char-sequence is defined as a sequence of r-char, each of which is
” any member of the source character set, except
a right parenthesis ) followed by the initial d-char-sequence
[like aha above] (which may be empty) followed by a double quote "
Essentially the above means that not only can you not use escapes directly in raw strings (which is much of the point, it's positive, not negative), you can't use Unicode character specifications directly either.
Here's how to do it indirectly:
#include <iostream>
using namespace std;
auto main() -> int
{
cout << "Ordinary string with a '\u0042' character.\n";
cout << R"(Raw string without a '\u0042' character, and no \n either.)" "\n";
cout << R"(Raw string without a '\u0042' character, i.e. no ')" "\u0042" R"(' character.)" "\n";
}
Output:
Ordinary string with a 'B' character.
Raw string without a '\u0042' character, and no \n either.
Raw string without a '\u0042' character, i.e. no 'B' character.
You can use,
R"aaa(
Some Text)"
)aaa";
Here aaa will be your string delimiter.

C++ Unrecognized escape sequence

I want to create a string that contains all possible special chars.
However, the compiler gives me the warning "Unrecognized escape sequence" in this line:
wstring s=L".,;*:-_⁊#‘⁂‡…–«»¤¤¡=„+-¶~´:№\¯/?‽!¡-¢–”¥—†¿»¤{}«[-]()·^°$§%&«|⸗<´>²³£­€™℗#©®~µ´`'" + wstring(1,34);
Can anybody please tell me which one of the characters I may not add to this string the way I did?
You have to escape \ as \\, otherwise \¯ will be interpreted as an (invalid) escape sequence:
wstring s=L".,;*:-_⁊#‘⁂‡…–«»¤¤¡=„+-¶~´:№\\¯/?‽!¡-¢–”¥—†¿»¤{}«[-]()·^°$§%&«|⸗<´>²³£­€™℗#©®~µ´`'" + wstring(1,34);
Escape sequence is a character string that has a different meaning than the literal characters themselves. In C and C++ the sequence begins with \ so if your string contains a double quote or backslash it must be escaped properly using \" and \\
In long copy-pasted strings it may be difficult to spot those characters and it's also less maintainable in the future so it's recommended to use raw string literals with the prefix R so you don't need any escapes at all
wstring s = LR"(.,;*:-_⁊#‘⁂‡…–«»¤¤¡=„+-¶~´:№\¯/?‽!¡-¢–”¥—†¿»¤{}«[-]()·^°$§%&«|⸗<´>²³£­€™℗#©®~µ´`')"
+ wstring(1,34);
A special delimiter string may be inserted outside the parentheses like this LR"delim(special string)delim" in case your raw string contains a )" sequence