Parsing Quoted Strings Having Nested Escape Sequences in jsoncpp - c++

I'm using the jsoncpp library here. I am confused by the parsing of single quotation marks (') and double quotation marks (").
Json::Value root;
Json::Reader reader;
const std::string json_str1 = "{\"name\":\"Say \\\"Hello\\\"!\"}";
const std::string json_str2 = "{\"name\":\"Say \"Hello\"!\"}";
const std::string json_str3 = "{\"name\":\"Say \\\'hi\\\'!\"}";
const std::string json_str4 = "{\"name\":\"Say \'hi\'!\"}";
const std::string json_str5 = "{\"name\":\"Say 'hi'!\"}";
reader.parse(json_str1, root, false); // success
reader.parse(json_str2, root, false); // fail
reader.parse(json_str3, root, false); // fail
reader.parse(json_str4, root, false); // success
reader.parse(json_str5, root, false); // success
Why must double quotations be like \\\" but single quotations must be \' or just ', but can't be \\\'?

Escaping Delimiters
The reason for escaping quotation marks with \ is to allow the parser(s) to distinguish between a quotation mark that is intended to be a character within the quoted string, and a delimiting quotation mark that is intended to close the string.
As you know, in the C++ language, double-quotes " are used to delimit character strings. But if you want to create a string that contains a double quotation mark ", the \ is used as an escape so the C++ parser knows to interpret the following character as a character, not as the closing delimiter:
const std::string double_quote = """; // WRONG!
const std::string double_quote = "\""; // good
With Two Parsers
In your code, there are two parsers that are involved: the C++ parser that is part of the C++ compiler that will be compiling this code, and the JSON parser that is part of the jsoncpp library. The C++ parser interprets this code at compile time, while the jsoncpp parser interprets the strings at run time.
Like C++, JSON also uses double quotes " to delimit strings. A simple JSON document as seen by the jsoncpp parser looks something like:
{"name":"Xiaoying"}
To enclose this JSON document into a C++ string, the double quotation marks " within the JSON document need to be escaped with \ as follows:
const std::string json_name = "{\"name\":\"Xiaoying\"}"; // good
This tells C++ to create a string having the contents {"name":"Xiaoying"}.
Nested delimiters
Things start to get complicated when the JSON document itself contains delimiters that must also be escaped. Like C++, JSON also uses the backslash \ as an escape. Now the question becomes, how to distinguish a backslash \ intended as an escape for the jsoncpp parser from a backslash \ intended as an escape for the C++ parser? The way to do this is to use a double backslash \\ sequence, which is translated by the C++ parser into a single backslash '\' character within the string. That single backslash, when passed to the jsoncpp parser at runtime, will at that time be interpreted as an escape character.
Things are further complicated by the fact that the rules for use of the backslash in JSON are different than the rules for C++. In particular, in C++ single quotes ' may be escaped with a backslash (as in \'), but this is not a legal pattern in JSON.
Here is an explanation for each of the five cases you presented:
1. json_str1
The C++ statement
const std::string json_str1 = "{\"name\":\"Say \\\"Hello\\\"!\"}";
produces a JSON document that looks like
{"name":"Say \"Hello\"!"}
When the jsoncpp parser sees this, it will know by the backslashes that "Say \"Hello\"!" means this is a string containing Say "Hello"!
2. json_str2
The C++ statement
const std::string json_str2 = "{\"name\":\"Say \"Hello\"!\"}";
produces a JSON document that looks like
{"name":"Say "Hello"!"}
Since the quotation marks around "Hello" are not escaped, the jsoncpp parser will fail.
3. json_str3
The C++ statement
const std::string json_str3 = "{\"name\":\"Say \\\'hi\\\'!\"}";
produces a JSON document that looks like
{"name":"Say \'hi\'!"}
Since the \' pattern is not recognized in JSON, this will fail in the jsoncpp parser.
4. json_str4
The C++ statement
const std::string json_str4 = "{\"name\":\"Say \'hi\'!\"}";
produces a JSON document that looks like
{"name":"Say 'hi'!"}
This is because the C++ parser interpreted the \' sequence as a single ' character.
5. json_str5
The C++ statement
const std::string json_str5 = "{\"name\":\"Say 'hi'!\"}";
produces a JSON document that looks like
{"name":"Say 'hi'!"}
See also
For the C++ escape sequence rules: http://en.cppreference.com/w/cpp/language/escape
For the JSON escape sequence rules: http://www.json.org/

Related

Encoding XPath string in SelectChildNode having both single and double quotes in C++

I have been trying to pass a string to fetch the node and am successful for string with only single quote (') and only double quotes ("). But am unable to parse it when string contains both singe and double quotes. I have my string in CString as-
CString str=L("H'el"lo");
and all other combinations of these. Can you please tell me how to do in C++, i have seen the examples are in C# but that are not helping me out.
Here's the link for C# Encoding XPath Expressions with both single and double quotes
XmlNode n = doc.SelectSingleNode(“/root/emp[lname=" + str + "]“);
How should i make my str work for string containing both single and double quotes in any order.

JSONCPP is adding extra double quotes to string

I have a root in JSONcpp having string value like this.
Json::Value root;
std::string val = "{\"stringval\": \"mystring\"}";
Json::Reader reader;
bool parsingpassed = reader.parse(val, root, false);
Now when I am trying to retrieve this value using this piece of code.
Json::StreamWriterBuilder builder;
builder.settings_["indentation"] = "";
std::string out = Json::writeString(builder, root["stringval"]);
here out string ideally should be giving containing:
"mystring"
whereas it is giving output like this:
"\"mystring\"" \\you see this in debug mode if you check your string content
by the way if you print this value using stdout it will be printed something like this::
"mystring" \\ because \" is an escape sequence and prints " in stdout
it should be printing like this in stdout:
mystring \\Expected output
Any idea how to avoid this kind of output when converting json output to std::string ?
Please avoid suggesting fastwriter as it also adds newline character and it deprecated API as well.
Constraint: I do not want to modify the string by removing extra \" with string manipulation rather I am willing to know how I can I do that with JSONcpp directly.
This is StreamWriterBuilder Reference code which I have used
Also found this solution, which gives optimal solution to remove extra quotes from your current string , but I don't want it to be there in first place
I had this problem also until I realized you have to use the Json::Value class accessor functions, e.g. root["stringval"] will be "mystring", but root["stringval"].asString() will be mystring.
Okay so This question did not get answer after thorough explanation as well and I had to go through JSONCPP apis and documentation for a while.
I did not find any api as of now which takes care of this scenario of extra double quote addition.
Now from their wikibook I could figure out that some escape sequences might come in String. It is as designed and they haven't mentioned exact scenario.
\" - quote
\\ - backslash
\/ - slash
\n - newline
\t - tabulation
\r - carriage return
\b - backspace
\f - form feed
\uxxxx , where x is a hexadecimal digit - any 2-byte symbol
Link Explaining what all extra Escape Sequence might come in String
Anyone coming around this if finds out better explanation for the same issue , please feel free to post your answer.Till then I guess only string manipulation is the option to remove those extra escape sequence..

Extrea backslash added to n when parsing a string from XML

I read an xml data into C++ application.Some of the data is multiline string.Each new line is broken by '\n' escape character.But when it is loaded into the program the backslash n gets extra backslash from the left.For example:
In XML:
<node attrStr = "Hello!\nWhat's your name?" />
In the program:
"Hello!\\nWhat's your name?"
So it causes '\' and 'n' to become separate characters.
It doesn't happen if the string is hardcoded into the program source code.
How this issue can be solved?
Important to note that the XML string is read into std::wstring to take care of unicode characters.
Found the answer here.
Replacing '\n' with
inside XML solves the issue.
If you want to escape a newline character in XML you will have to use the entity
. So the correct XML would look like:
<node attrStr = "Hello!
What's your name?" />
Since XML does not allow character escaping with backslash the string "\n" is read as two normal characters '\' and 'n'.
If you want to load the XML content with correct line breaks, you must replace the "\n" parts with "
" as suggested in the answer proposed by #Angew.
Alternatively, you could also modify or pre-process the XML file before reading it.
The two characters \ and n after each other do not inherently have any special meaning. In some contexts, these two characters are used to encode a newline. String literals in C++ source files are such a context. XML files are not such a context.
This means that when parsing an XML file containing the substring \n, you will get a string containing the substring \n in the memory of your C++ program. Anything else would be wrong. If you want \n in your data to represent a newline, you have to use string substitution once the data is in memory.
After parsing the string, simply replace each \n occurence with a the ASCII character LF and you're set. This is how you could do it (inefficiently) with the standard library:
std::string s = getTheStringFromXml();
for (size_t idx = 0;;)
{
idx = s.find("\\n", idx);
if (idx == s.npos)
break;
s[idx] = '\n';
s.erase(idx + 1);
}
This issue occurs also in JavaScript, and the fix
works well

Using boost::regex to replace a backslash with double backslash and double quote with a slash quote

I'm going batty trying to get this to work. Here's what I have so far, but ça ne marche pas.
const std::string singleslash("\\\\\\\\");
const std::string doublequote("\\\"\"\\");
const std::string doubleslash("\\\\\\\\\\\\");
const std::string slashquote("\\\\\\\\\"\\");
std::string temp(Variables);
temp.assign(boost::regex_replace(temp,boost::regex(singleslash),doubleslash,boost::match_default));
temp.assign(boost::regex_replace(temp,boost::regex(doublequote),slashquote,boost::match_default));
Someone please save me.
Update It seems that I'm not using regex_replace properly. Here's a simpler example that doesn't work either...
std::string w("Watermelon");
temp.assign(boost::regex_replace(w,boost::regex("W"),"x",boost::match_all | boost::format_all));
MessageBox((HWND)Window, temp.c_str(), "temp", MB_OK);
This gives me "Watermelon" instead of "xatermelon"
Update 2 Using boost::regex wrong... this one works
boost::regex pattern("W");
temp.assign(boost::regex_replace(w,pattern,std::string("x")));
Update 3 Here's what ultimately worked
std::string w("Watermelon wishes backslash \\ and another backslash \\ and \"\"fatness\"\"");
temp.assign(w);
MessageBox((HWND)Window, temp.c_str(), "original", MB_OK);
const boost::regex singlebackslashpat("\\\\");
const std::string doublebackslash("\\\\\\\\");
temp.assign(boost::regex_replace(w,singlebackslashpat,doublebackslash));
MessageBox((HWND)Window, temp.c_str(), "double-backslash", MB_OK);
const boost::regex doublequotepat("\"\"");
const std::string backslashquote("\\\\\\\"");
temp.assign(boost::regex_replace(temp,doublequotepat,backslashquote));
MessageBox((HWND)Window, temp.c_str(), "temp", MB_OK);
So, I'm not a boost::regex expert and don't have Boost conveniently installed where I am right now, but let's try to work this through step by step.
The patterns to match against
To match a double-quote in the input, you just need a double-quote in the regex (double-quotes aren't magical in regexes), which means all you need is a string containing a double-quote. "\"" should be fine.
To match a backslash in the input, you need an escaped backslash in the regex, which means two consecutive backslashes; each of those needs to be doubled again in a string literal. So "\\\\". [EDITED: I typed eight instead of four before, which was a mistake.]
The output formats
Again, double-quotes aren't magical in match replacement formats (or whatever the right terminology is) but backslashes are. So to get two backslashes in the output you need four in the string, which means you need 8 in the string literal. So: "\\\\\\\\".
To get a backslash followed by a double-quote, your string needs to be two backslashes and a double-quote, and all of those need to be preceded with backslashes in the string literal. So: "\\\\\"".
[EDITED to add the actual code for easier copy-and-pasting:]
const std::string singleslash("\\\\");
const std::string doublequote("\"");
const std::string doubleslash("\\\\\\\\");
const std::string slashquote("\\\\\"");
Matching flags
After reading tofutim's update, I tried to look up match_all and found no documentation for it. It does, however, appear to be a possible match flag value, and the header file in which it's defined has the following cryptic comment next to it: "must find the whole of input even if match_any is set". The similarly-cryptic comment attached to match_any is "don't care what we match". I'm not sure what any of that means and it seems like these flags are deprecated or something, but in any case you probably don't want to be using them.
(After a very quick look at the source, I think what match_all does is to accept a match only if it ends at the end of the input. So you might try replacing n instead of W in your revised test case and see whether that works. Alternatively, perhaps I missed something and it has to match the entire input, which you could check by replacing Watermelon instead of W or n. Or you could not bother, if you happen not to be curious about this.)
Give that a try and report back...
I have no boost here, but single(back)slash must be written as \\ in regex and thus as c++ string literal is four backslahses. The replacement string has to be escaped and in c++ again, so its eight backslashes.
Double quote in regex must not be escaped, so it is "" and in c++ \"\". The replacement again has to be escaped, so its \\", and of course in c++, so it is \\\\\".
according to your update 3 the patterns and replacement strings must be initialized like this:
const std::string singleslashpat("\\\\");
const std::string doublequotepat("\"\"");
const std::string doubleslash("\\\\\\\\");
const std::string slashquote("\\\\\"");

how to use fout ()

Can some help me i have create this command
fout <<"osql -Ubatascan -Pdtsbsd12345 -dpos -i""c:\\temp_pd.sql"""<<endl;
Result Output
osql -Ubatascan -Pdtsbsd12345 -dpos -ic:\temp_pd.sql
Output that i want
osql -Ubatascan -Pdtsbsd12345 -dpos -i"c:\temp_pd.sql"
can some one help?
What you're doing is actually writing multiple string literals next to each other. The expression
"foo""bar"
gets parsed as the two string literals "foo" and "bar". The C and C++ languages say that when you have string literals next to each other, they get pasted together into one big string literal at compile time. So, the above expression is entirely equivalent to the single string literal "foobar".
Hence, your expression gets parsed as the following three string literals:
"osql -Udatascan -Pdtsbsd7188228 -dpos -i"
"c:\\temp_pd.sql"
""
Which when pasted together form the string "osql -Udatascan -Pdtsbsd7188228 -dpos -ic:\\temp_pd.sql" (note that the third string is the empty string""`).
What you want to do is to use the escape sequence \" to include a literal quotation mark within your string literal. Write it like this:
"osql -Udatascan -Pdtsbsd7188228 -dpos -i\"c:\\temp_pd.sql\""
Normally, the quotation mark " gets interpreted as the end of a string literal, except when it's preceded by a backslash, in which case it gets interpreted as a quotation mark character within the string.