JSONCPP is adding extra double quotes to string - c++

I have a root in JSONcpp having string value like this.
Json::Value root;
std::string val = "{\"stringval\": \"mystring\"}";
Json::Reader reader;
bool parsingpassed = reader.parse(val, root, false);
Now when I am trying to retrieve this value using this piece of code.
Json::StreamWriterBuilder builder;
builder.settings_["indentation"] = "";
std::string out = Json::writeString(builder, root["stringval"]);
here out string ideally should be giving containing:
"mystring"
whereas it is giving output like this:
"\"mystring\"" \\you see this in debug mode if you check your string content
by the way if you print this value using stdout it will be printed something like this::
"mystring" \\ because \" is an escape sequence and prints " in stdout
it should be printing like this in stdout:
mystring \\Expected output
Any idea how to avoid this kind of output when converting json output to std::string ?
Please avoid suggesting fastwriter as it also adds newline character and it deprecated API as well.
Constraint: I do not want to modify the string by removing extra \" with string manipulation rather I am willing to know how I can I do that with JSONcpp directly.
This is StreamWriterBuilder Reference code which I have used
Also found this solution, which gives optimal solution to remove extra quotes from your current string , but I don't want it to be there in first place

I had this problem also until I realized you have to use the Json::Value class accessor functions, e.g. root["stringval"] will be "mystring", but root["stringval"].asString() will be mystring.

Okay so This question did not get answer after thorough explanation as well and I had to go through JSONCPP apis and documentation for a while.
I did not find any api as of now which takes care of this scenario of extra double quote addition.
Now from their wikibook I could figure out that some escape sequences might come in String. It is as designed and they haven't mentioned exact scenario.
\" - quote
\\ - backslash
\/ - slash
\n - newline
\t - tabulation
\r - carriage return
\b - backspace
\f - form feed
\uxxxx , where x is a hexadecimal digit - any 2-byte symbol
Link Explaining what all extra Escape Sequence might come in String
Anyone coming around this if finds out better explanation for the same issue , please feel free to post your answer.Till then I guess only string manipulation is the option to remove those extra escape sequence..

Related

How to remove backslashes from QString?

Using QNetworkManager get method I am receiving a json from a url.
Doing: qDebug()<<(QString)reply->readAll(); the result is:
"\r\n[{\"id\":\"1\",\"name\":\"Jhon\",\"surname\":\"Snow\",\"phone\":\"358358358\"}]"
So I am doing strReply = strReply.simplified(); , and the result is:
"[{\"id\":\"1\",\"name\":\"Jhon\",\"surname\":\"Snow\",\"phone\":\"358358358\"}]"
But I can't use that to parse it like a Json to use it in my qt program.
So I think I need to remove every backslashes \ and obtain:
"[{"id":"1","name":"Jhon","surname":"Snow","phone":"348348348"}]"
I tried strReply.remove(QRegExp( "\\\" ) ); but any odd concatenation of \ is causing the interpreter to think at every thing that comes after the last \ as a string.
You're probably running into qDebug's feature that escapes quotes and newlines. Your string most probably doesn't actually have any backslashes in it.
When you're trying to print a string using qDebug(), you need to use qDebug().noquote() if you don't want qDebug() to artificially insert backslashes in the output.
So your string should be fine. It doesn't have any backslashes in it at all.
As described in the documentation You can remove a character with remove function
QString t = "Ali Baba";
t.remove(QChar('a'), Qt::CaseInsensitive);
// Will result "li Bb"
You can put '\\' instead of 'a' to remove your backslashes from your QString

Strategy to replace spaces in string

I need to store a string replacing its spaces with some character. When I retrieve it back I need to replace the character with spaces again. I have thought of this strategy while storing I will replace (space with _a) and (_a with _aa) and while retrieving will replace (_a with space) and (_aa with _a). i.e even if the user enters _a in the string it will be handled. But I dont think this is a good strategy. Please let me know if anyone has a better one?
Replacing spaces with something is a problem when something is already in the string. Why don't you simply encode the string - there are many ways to do that, one is to convert all characters to hexadecimal.
For instance
Hello world!
is encoded as
48656c6c6f20776f726c6421
The space is 0x20. Then you simply decode back (hex to ascii) the string.
This way there are no space in the encoded string.
-- Edit - optimization --
You replace all % and all spaces in the string with %xx where xx is the hex code of the character.
For instance
Wine having 12% alcohol
becomes
Wine%20having%2012%25%20alcohol
%20 is space
%25 is the % character
This way, neither % nor (space) are a problem anymore - Decoding is easy.
Encoding algorithm
- replace all `%` with `%25`
- replace all ` ` with `%20`
Decoding algorithm
- replace all `%xx` with the character having `xx` as hex code
(You may even optimize more since you need to encode only two characters: use %1 for % and %2 for , but I recommend the %xx solution as it is more portable - and may be utilized later on if you need to code more characters)
I'm not sure your solution will work. When reading, how would you
distinguish between strings that were orginally " a" and strings that
were originally "_a": if I understand correctly, both will end up
"_aa".
In general, given a situation were a specific set of characters cannot
appear as such, but must be encoded, the solution is to choose one of
allowed characters as an "escape" character, remove it from the set of
allowed characters, and encode all of the forbidden characters
(including the escape character) as a two (or more) character sequence
starting with the escape character. In C++, for example, a new line is
not allowed in a string or character literal. The escape character is
\; because of that, it must be encoded as an escape sequence as well.
So we have "\n" for a new line (the choice of n is arbitrary), and
"\\" for a \. (The choice of \ for the second character is also
arbitrary, but it is fairly usual to use the escape character, escaped,
to represent itself.) In your case, if you want to use _ as the
escape character, and "_a" to represent a space, the logical choice
would be "__" to represent a _ (but I'd suggest something a little
more visually suggestive—maybe ^ as the escape, with "^_" for
a space and "^^" for a ^). When reading, anytime you see the escape
character, the following character must be mapped (and if it isn't one
of the predefined mappings, the input text is in error). This is simple
to implement, and very reliable; about the only disadvantage is that in
an extreme case, it can double the size of your string.
You want to implement this using C/C++? I think you should split your string into multiple part, separated by space.
If your string is like this : "a__b" (multiple space continuous), it will be splited into:
sub[0] = "a";
sub[1] = "";
sub[2] = "b";
Hope this will help!
With a normal string, using X characters, you cannot write or encode a string with x-1 using only 1 character/input character.
You can use a combination of 2 chars to replace a given character (this is exactly what you are trying in your example).
To do this, loop through your string to count the appearances of a space combined with its length, make a new character array and replace these spaces with "//" this is just an example though. The problem with this approach is that you cannot have "//" in your input string.
Another approach would be to use a rarely used char, for example "^" to replace the spaces.
The last approach, popular in a combination of these two approaches. It is used in unix, and php to have syntax character as a literal in a string. If you want to have a " " ", you simply write it as \" etc.
Why don't you use Replace function
String* stringWithoutSpace= stringWithSpace->Replace(S" ", S"replacementCharOrText");
So now stringWithoutSpace contains no spaces. When you want to put those spaces back,
String* stringWithSpacesBack= stringWithoutSpace ->Replace(S"replacementCharOrText", S" ");
I think just coding to ascii hexadecimal is a neat idea, but of course doubles the amount of storage needed.
If you want to do this using less memory, then you will need two-letter sequences, and have to be careful that you can go back easily.
You could e.g. replace blank by _a, but you also need to take care of your escape character _. To do this, replace every _ by __ (two underscores). You need to scan through the string once and do both replacements simultaneously.
This way, in the resulting text all original underscores will be doubled, and the only other occurence of an underscore will be in the combination _a. You can safely translate this back. Whenever you see an underscore, you need a lookahed of 1 and see what follows. If an a follows, then this was a blank before. If _ follows, then it was an underscore before.
Note that the point is to replace your escape character (_) in the original string, and not the character sequence to which you map the blank. Your idea with replacing _a breaks. as you do not know if _aa was originally _a or a (blank followed by a).
I'm guessing that there is more to this question than appears; for example, that you the strings you are storing must not only be free of spaces, but they must also look like words or some such. You should be clear about your requirements (and you might consider satisfying the curiosity of the spectators by explaining why you need to do such things.)
Edit: As JamesKanze points out in a comment, the following won't work in the case where you can have more than one consecutive space. But I'll leave it here anyway, for historical reference. (I modified it to compress consecutive spaces, so it at least produces unambiguous output.)
std::string out;
char prev = 0;
for (char ch : in) {
if (ch == ' ') {
if (prev != ' ') out.push_back('_');
} else {
if (prev == '_' && ch != '_') out.push_back('_');
out.push_back(ch);
}
prev = ch;
}
if (prev == '_') out.push_back('_');

Using boost::regex to replace a backslash with double backslash and double quote with a slash quote

I'm going batty trying to get this to work. Here's what I have so far, but ça ne marche pas.
const std::string singleslash("\\\\\\\\");
const std::string doublequote("\\\"\"\\");
const std::string doubleslash("\\\\\\\\\\\\");
const std::string slashquote("\\\\\\\\\"\\");
std::string temp(Variables);
temp.assign(boost::regex_replace(temp,boost::regex(singleslash),doubleslash,boost::match_default));
temp.assign(boost::regex_replace(temp,boost::regex(doublequote),slashquote,boost::match_default));
Someone please save me.
Update It seems that I'm not using regex_replace properly. Here's a simpler example that doesn't work either...
std::string w("Watermelon");
temp.assign(boost::regex_replace(w,boost::regex("W"),"x",boost::match_all | boost::format_all));
MessageBox((HWND)Window, temp.c_str(), "temp", MB_OK);
This gives me "Watermelon" instead of "xatermelon"
Update 2 Using boost::regex wrong... this one works
boost::regex pattern("W");
temp.assign(boost::regex_replace(w,pattern,std::string("x")));
Update 3 Here's what ultimately worked
std::string w("Watermelon wishes backslash \\ and another backslash \\ and \"\"fatness\"\"");
temp.assign(w);
MessageBox((HWND)Window, temp.c_str(), "original", MB_OK);
const boost::regex singlebackslashpat("\\\\");
const std::string doublebackslash("\\\\\\\\");
temp.assign(boost::regex_replace(w,singlebackslashpat,doublebackslash));
MessageBox((HWND)Window, temp.c_str(), "double-backslash", MB_OK);
const boost::regex doublequotepat("\"\"");
const std::string backslashquote("\\\\\\\"");
temp.assign(boost::regex_replace(temp,doublequotepat,backslashquote));
MessageBox((HWND)Window, temp.c_str(), "temp", MB_OK);
So, I'm not a boost::regex expert and don't have Boost conveniently installed where I am right now, but let's try to work this through step by step.
The patterns to match against
To match a double-quote in the input, you just need a double-quote in the regex (double-quotes aren't magical in regexes), which means all you need is a string containing a double-quote. "\"" should be fine.
To match a backslash in the input, you need an escaped backslash in the regex, which means two consecutive backslashes; each of those needs to be doubled again in a string literal. So "\\\\". [EDITED: I typed eight instead of four before, which was a mistake.]
The output formats
Again, double-quotes aren't magical in match replacement formats (or whatever the right terminology is) but backslashes are. So to get two backslashes in the output you need four in the string, which means you need 8 in the string literal. So: "\\\\\\\\".
To get a backslash followed by a double-quote, your string needs to be two backslashes and a double-quote, and all of those need to be preceded with backslashes in the string literal. So: "\\\\\"".
[EDITED to add the actual code for easier copy-and-pasting:]
const std::string singleslash("\\\\");
const std::string doublequote("\"");
const std::string doubleslash("\\\\\\\\");
const std::string slashquote("\\\\\"");
Matching flags
After reading tofutim's update, I tried to look up match_all and found no documentation for it. It does, however, appear to be a possible match flag value, and the header file in which it's defined has the following cryptic comment next to it: "must find the whole of input even if match_any is set". The similarly-cryptic comment attached to match_any is "don't care what we match". I'm not sure what any of that means and it seems like these flags are deprecated or something, but in any case you probably don't want to be using them.
(After a very quick look at the source, I think what match_all does is to accept a match only if it ends at the end of the input. So you might try replacing n instead of W in your revised test case and see whether that works. Alternatively, perhaps I missed something and it has to match the entire input, which you could check by replacing Watermelon instead of W or n. Or you could not bother, if you happen not to be curious about this.)
Give that a try and report back...
I have no boost here, but single(back)slash must be written as \\ in regex and thus as c++ string literal is four backslahses. The replacement string has to be escaped and in c++ again, so its eight backslashes.
Double quote in regex must not be escaped, so it is "" and in c++ \"\". The replacement again has to be escaped, so its \\", and of course in c++, so it is \\\\\".
according to your update 3 the patterns and replacement strings must be initialized like this:
const std::string singleslashpat("\\\\");
const std::string doublequotepat("\"\"");
const std::string doubleslash("\\\\\\\\");
const std::string slashquote("\\\\\"");

lua gsub special replacement producing invalid capture index

I have a piece of lua code (executing in Corona):
local loginstr = "emailAddress={email} password={password}"
print(loginstr:gsub( "{email}", "tester#test.com" ))
This code generates the error:
invalid capture index
While I now know it is because of the curly braces not being specified appropriately in the gsub pattern, I don't know how to fix it.
How should I form the gsub pattern so that I can replace the placeholder string with the email address value?
I've looked around on all the lua-oriented sites I can find but most of the documentation seems to revolve around unassociated situations.
As I've suggested in the comments above, when the e-mail is encoded as a URL parameter, the %40 used to encode the '#' character will be used as a capture index. Since the search pattern doesn't have any captures (let alone 40 of them), this will cause a problem.
There are two possible solutions: you can either decode the encoded string, or encode your replacement string to escape the '%' character in it. Depending on what you are going to do with the end result, you may need to do both.
the following routine (I picked up from here - not tested) can decode an encoded string:
function url_decode(str)
str = string.gsub (str, "+", " ")
str = string.gsub (str, "%%(%x%x)",
function(h) return string.char(tonumber(h,16)) end)
str = string.gsub (str, "\r\n", "\n")
return str
end
For escaping the % character in string str, you can use:
str:gsub("%%", "%%%%")
The '%' character is escaped as '%%', and it needs to be ascaped on both the search pattern and the replace pattern (hence the amount of % characters in the replace).
Are you sure your problem isn't that you're trying to gsub on loginurl rather than loginstr?
Your code gives me this error (see http://ideone.com/wwiZk):
lua: prog.lua:2: attempt to index global 'loginurl' (a nil value)
and that sounds similar to what you're seeing. Just fixing it to use the right variable:
print(loginstr:gsub( "{email}", "tester#test.com" ))
says (see http://ideone.com/mMj0N):
emailAddress=tester#test.com password={password}
as desired.
I had this in value part so You need to escape value with: value:gsub("%%", "%%%%").
Example of replacing "some value" in json:
local resultJSON = json:gsub(, "\"SOME_VALUE\"", value:gsub("%%", "%%%%"))

Regex - If contains '%', can only contain '%20'

I am wanting to create a regular expression for the following scenario:
If a string contains the percentage character (%) then it can only contain the following: %20, and cannot be preceded by another '%'.
So if there was for instance, %25 it would be rejected. For instance, the following string would be valid:
http://www.test.com/?&Name=My%20Name%20Is%20Vader
But these would fail:
http://www.test.com/?&Name=My%20Name%20Is%20VadersAccountant%25
%%%25
Any help would be greatly appreciated,
Kyle
EDIT:
The scenario in a nutshell is that a link is written to an encoded state and then launched via JavaScript. No decoding works. I tried .net decoding and JS decoding, each having the same result - The results stay encoded when executed.
Doesn't require a %:
/^[^%]*(%20[^%]*)*$/
Which language are you using?
Most languages have a Uri Encoder / Decoder function or class.
I would suggest you decode the string first and than check for valid (or invalid) characters.
i.e. something like /[\w ]/ (empty is a space)
With a regex in the first place you need to respect that www.example.com/index.html?user=admin&pass=%%250 means that the pass really is "%250".
Another solution if look-arounds are not available:
^([^%]|%([013-9a-fA-F][0-9a-fA-F]|2[1-9a-fA-F]))*$
Reject the string if it matches %[^2][^0]
I think that would find what you need
/^([^%]|%%|%20)+$/
Edit: Added case where %% is valid string inside URI
Edit2: And fixed it for case where it should fail :-)
Edit3:
In case you need to use it in editor (which would explain why you can't use more programmatic way), then you have to correctly escape all special characters, for example in Vim that regex should lool:
/^\([^%]\|%%\|%20\)\+$/
Maybe a better approach is to deal with that validation after you decode that string:
string name = HttpUtility.UrlDecode(Request.QueryString["Name"]);
/^([^%]|%20)*$/
This requires a test against the "bad" patterns. If we're allowing %20 - we don't need to make sure it exists.
As others have said before, %% is valid too... and %%25would be %25
The below regex matches anything that doesn't fit into the above rules
/(?<![^%]%)%(?!(20|%))/
The first brackets check whether there is a % before the character (meaning that it's %%) and also checks that it's not %%%. it then checks for a %, and checks whether the item after doesn't match 20
This means that if anything is identified by the regex, then you should probably reject it.
I agree with dominic's comment on the question. Don't use Regex.
If you want to avoid scanning the string twice, you can just iteratively search for % and then check that it is being followed by 20 and nothing else. (Update: allow a % after to be interpreted as a literal %nnn sequence)
// pseudo code
pos = 0
while (pos = mystring.find(pos, '%'))
{
if mystring[pos+1] = "%" then
pos = pos + 2 // ok, this is a literal, skip ahead
else if mystring.substring(pos,2) != "20"
return false; // string is invalid
end if
}
return true;