Why does Qt reject a valid JSON? - c++

Using Qt-5.0, I have this JSON string
{"type":"FILE"}
I expected that fromBinaryData accept .toLocal8Bit() of the string as a valid format but it doesn't.
QString j = "{\"type\":\"FILE\"}";
auto doc = QJsonDocument::fromBinaryData(j.toLocal8Bit());
doc.isNull() // It's true, means the entry is not valid
Did I miss something?

I have no idea of Qt, so I googled for a second. Here's what I found:
What you have is a string, a text representation. It's not the binary format Qt uses internally. The binary data would not be readable. QJsonDocument::fromBinaryData expects such a binary blob.
What you want to do seems to be achieved with QJsonDocument::fromJson which expects an UTF8-encoded Json string.

Instead of fromBinaryData use fromJson with the same argument, I had this exact problem yesterday and that was what worked for me.

Related

Parsing JSON style text using sscanf()

["STRING", FLOAT, FLOAT, FLOAT],
I need to parse three values from this string - a STRING and three FLOATS.
sscanf() returns zero, probably I got the format specifiers wrong.
sscanf(current_line.c_str(), "[\"%s[^\"]\",%f,%f,%f],",
&temp_node.id,
&temp_node.pos.x,
&temp_node.pos.y,
&temp_node.pos.z))
Do you know what's wrong?
Please read the manual page on sscanf(3). The %s format does not match using a regular expression, it just scans non-whitespace characters. Even if it worked as you assumed, your regular expression would not be able to handle all JSON strings correctly (which might not be a problem if your input data format is sufficiently restricted, but would be unclean nonetheless).
Use a proper JSON parser. It's not really complicated. I used cJSON for a moderately complex case, you should be able to integrate it within a few hours.
To fix your immediate problem, use this format specifier:
"[\"%[^\"]s\",%f,%f,%f],"
The right syntax for parsing a set is %[...]s instead of %s[...].
That being said, sscanf() is not the right tool for parsing JSON. Even the "fixed" code would fail to parse strings that contain escaped quotes, for instance.
Use a proper JSON parser library.

QT5.7 - Why i get a malformed json value with QString but perfect with std::string?

I try to get a json response from an api in Ruby On Rails.
When I call this url directly with curl or postman I get a perfect json response.
When I use my program with QT5.7 windows compiled in Static for a program in 32bits, I get a perfect response only if use std::string.
But, if I use QDebug for print a QString() I get this malformed and Strange result :
"{\"success\":true,\"files\":[\"C:/Perl/lib/pods/perlcn.pod\",\"C:/Perl/lib/pods/perldata.pod\",\"C:/Perl/lib/pods/perldebguts.pod\",\"C:/Perl/lib/pods/perldelta.pod\",\"C:/Perl/lib/pods/perldiag.pod\",\"C:/Perl/lib/pods/perldoc.pod\",\"C:/Perl/lib/pods/perldos.pod\",\"C:/Perl/lib/pods/perldsc.pod\",\"C:/Perl/lib/pods/perldtrace.pod\",\"C:/Perl/lib/pods/perlebcdic.pod\",\"C:/Perl/lib/pods/perlembed.pod\",\"C:/Perl/lib/pods/perlexperiment.pod\",\"C:/Perl/lib/pods/perlfaq.pod\",\"C:/Perl/lib/pods/perlfaq1.pod\",\"C:/Perl/lib/pods/perlfaq2.pod\",\"C:/Perl/lib/pods/perlfaq3.pod\",\"C:/Perl/lib/pods/perlfaq4.pod\",\"C:/Perl/lib/pods/perlfaq5.pod\",\"C:/Perl/lib/pods/perlfaq6.pod\",\"C:/Perl/lib/pods/perlfaq7.pod\",\"C:/Perl/lib/pods/perlfaq8.pod\",\"C:/Perl/lib/pods/perlfaq9.pod\",\"C:/Perl/lib/pods/perlfilter.pod\",\"C:/Perl/lib/pods/perlfork.pod\",\"C:/Perl/lib/pods/perlform.pod\",\"C:/Perl/lib/pods/perlfreebsd.pod\",\"C:/Perl/lib/pods/perlfunc.pod\",\"C:/Perl/lib/pods/perlgit.pod\",\"C:/Perl/lib/pods/perlglossaîv
So, if I print std::string, I have a perfect json, exactly what i want :
{"success":true,"files":["C:/Perl/lib/pods/perlcn.pod","C:/Perl/lib/pods/perldata.pod","C:/Perl/lib/pods/perldebguts.pod","C:/Perl/lib/pods/perldelta.pod","C:/Perl/lib/pods/perldiag.pod","C:/Perl/lib/pods/perldoc.pod","C:/Perl/lib/pods/perldos.pod","C:/Perl/lib/pods/perldsc.pod","C:/Perl/lib/pods/perldtrace.pod","C:/Perl/lib/pods/perlebcdic.pod","C:/Perl/lib/pods/perlembed.pod","C:/Perl/lib/pods/perlexperiment.pod","C:/Perl/lib/pods/perlfaq.pod","C:/Perl/lib/pods/perlfaq1.pod","C:/Perl/lib/pods/perlfaq2.pod","C:/Perl/lib/pods/perlfaq3.pod","C:/Perl/lib/pods/perlfaq4.pod","C:/Perl/lib/pods/perlfaq5.pod","C:/Perl/lib/pods/perlfaq6.pod","C:/Perl/lib/pods/perlfaq7.pod","C:/Perl/lib/pods/perlfaq8.pod","C:/Perl/lib/pods/perlfaq9.pod","C:/Perl/lib/pods/perlfilter.pod","C:/Perl/lib/pods/perlfork.pod","C:/Perl/lib/pods/perlform.pod","C:/Perl/lib/pods/perlfreebsd.pod","C:/Perl/lib/pods/perlfunc.pod","C:/Perl/lib/pods/perlgit.pod","C:/Perl/lib/pods/perlglossary.pod","C:/Perl/lib/pods/perlgpl.pod","C:/Perl/lib/pods/perlguts.pod","C:/Perl/lib/pods/perlhack.pod","C:/Perl/lib/pods/perlhacktips.pod","C:/Perl/lib/pods/perlhacktut.pod","C:/Perl/lib/pods/perlhaiku.pod","C:/Perl/lib/pods/perlhist.pod","C:/Perl/lib/pods/perlhpux.pod","C:/Perl/lib/pods/perlhurd.pod","C:/Perl/lib/pods/perlintern.pod","C:/Perl/lib/pods/perlinterp.pod","C:/Perl/lib/pods/perlintro.pod","C:/Perl/lib/pods/perliol.pod","C:/Perl/lib/pods/perlipc.pod","C:/Perl/lib/pods/perlirix.pod","C:/Perl/lib/pods/perljp.pod","C:/Perl/lib/pods/perlko.pod","C:/Perl/lib/pods/perllexwarn.pod","C:/Perl/lib/pods/perllinux.pod","C:/Perl/lib/pods/perllocale.pod","C:/Perl/lib/pods/perllol.pod","C:/Perl/lib/pods/perlmacos.pod","C:/Perl/lib/pods/perlmacosx.pod","C:/Perl/lib/pods/perlmod.pod","C:/Perl/lib/pods/perlmodinstall.pod","C:/Perl/lib/pods/perlmodlib.pod","C:/Perl/lib/pods/perlmodstyle.pod","C:/Perl/lib/pods/perlmroapi.pod","C:/Perl/lib/pods/perlnetware.pod","C:/Perl/lib/pods/perlnewmod.pod","C:/Perl/lib/pods/perlnumber.pod","C:/Perl/lib/pods/perlobj.pod","C:/Perl/lib/pods/perlootut.pod","C:/Perl/lib/pods/perlop.pod","C:/Perl/lib/pods/perlopenbsd.pod","C:/Perl/lib/pods/perlopentut.pod","C:/Perl/lib/pods/perlos2.pod","C:/Perl/lib/pods/perlos390.pod","C:/Perl/lib/pods/perlos400.pod","C:/Perl/lib/pods/perlpacktut.pod","C:/Perl/lib/pods/perlperf.pod","C:/Perl/lib/pods/perlplan9.pod","C:/Perl/lib/pods/perlpod.pod","C:/Perl/lib/pods/perlpodspec.pod","C:/Perl/lib/pods/perlpodstyle.pod","C:/Perl/lib/pods/perlpolicy.pod","C:/Perl/lib/pods/perlport.pod","C:/Perl/lib/pods/perlpragma.pod","C:/Perl/lib/pods/perlqnx.pod","C:/Perl/lib/pods/perlre.pod","C:/Perl/lib/pods/perlreapi.pod","C:/Perl/lib/pods/perlrebackslash.pod","C:/Perl/lib/pods/perlrecharclass.pod","C:/Perl/lib/pods/perlref.pod","C:/Perl/lib/pods/perlreftut.pod","C:/Perl/lib/pods/perlreguts.pod","C:/Perl/lib/pods/perlrepository.pod","C:/Perl/lib/pods/perlrequick.pod","C:/Perl/lib/pods/perlreref.pod","C:/Perl/lib/pods/perlretut.pod","C:/Perl/lib/pods/perlriscos.pod","C:/Perl/lib/pods/perlrun.pod","C:/Perl/lib/pods/perlsec.pod","C:/Perl/lib/pods/perlsolaris.pod","C:/Perl/lib/pods/perlsource.pod","C:/Perl/lib/pods/perlstyle.pod","C:/Perl/lib/pods/perlsub.pod","C:/Perl/lib/pods/perlsymbian.pod","C:/Perl/lib/pods/perlsyn.pod","C:/Perl/lib/pods/perlsynology.pod","C:/Perl/lib/pods/perlthrtut.pod"]}
I have no idea what i can do because i need to parse my json with QString for QJsonDocument and QJsonObject.
I have try many things like
QNetworkAccessManager
Or (ugly thing for understand and debug) like :
Curl external
Thanks
Are you using qDebug() for stdout output? This is not what it should be used for.
It displays, in debug format, the current contents of many types. For QString it means the string in quotes with certain character - double quote included - escaped with \. That doesn't mean the string itself contains escaped data. It's only presented to you like that by QDebug.

Get Non Indented string from Json Object

I have a json object in c++. I am using json_cpp library.
I want to get the string from the Json::Value object. I am using it like below.
Json::Value obj;
....
....
....
string str = obj.toStyledString();
This returns the string in the pretty print format. But I want the string without any indentation. How can I do that as there are no other functions provided in the class?
You could use Json::FastWriter it does not have any indentation and formatting since it outputs everything on a single line. it is normally not suitable for 'human' consumption.
std::string toUnStyledString(const Json::Value& value)
{
Json::FastWriter writer;
return writer.write( value );
}
The function toStyledString also simply uses a Json::StyledWriter if you look into the definition of Json::Value::toStyledString.
Well, if this library doesn't provide appropriate methods then you could write them yourself. The JSON format is rather simple, so I don't think that it will take a lot of work.
Here you can find a good graphical representation of JSON format:
http://json.org
P.S. I've never worked with this particular library, so I propose sort of a general solution.
UPDATE: another option is to get a string returned by toStyledString() and remove indentation. But it requires string processing and will probably be resource consuming. Note that you can't just remove tabs/spaces/new line symbols, because they can be a part of JSON object.
Why do you want unindented string again?

C++ Runtime string formatting

Usually I use streams for formatting stuff however in this case ?I don't know the format until runtime.
I want to be able to take something like the following format string:
Hello {0}! Your last login was on {1,date:dd/mm/yy}.
...and feed in the variables "Fire Lancer" and 1247859223, and end up with the following formatted string:
Hello Fire Lancer! Your last login was on 17/07/09.
In other languages I use there is built in support for this kind of thing, eg pythons format string method, however in c++ there doesn't seem to be any such functionality, accept the C print methods which are not very safe.
Also this is for a high performance program, so whatever solution I use needs to parse the format string once and store it (eg mayby a Parse method that returns a FormatString object with a Format(string) method), not reparse the string every time the format method is called...
Your format string looks very much like those used in ICU MessageFormat. Did you consider using it?
Boost Formatting does that for you:
http://www.boost.org/doc/libs/1_39_0/libs/format/doc/format.html
Check out this question and answer for examples of usage:
boost::format will do the positional arguments portion, but not the date formatting...

RegEx to parse or validate Base64 data

Is it possible to use a RegEx to validate, or sanitize Base64 data? That's the simple question, but the factors that drive this question are what make it difficult.
I have a Base64 decoder that can not fully rely on the input data to follow the RFC specs. So, the issues I face are issues like perhaps Base64 data that may not be broken up into 78 (I think it's 78, I'd have to double check the RFC, so don't ding me if the exact number is wrong) character lines, or that the lines may not end in CRLF; in that it may have only a CR, or LF, or maybe neither.
So, I've had a hell of a time parsing Base64 data formatted as such. Due to this, examples like the following become impossible to decode reliably. I will only display partial MIME headers for brevity.
Content-Transfer-Encoding: base64
VGhpcyBpcyBzaW1wbGUgQVNDSUkgQmFzZTY0IGZvciBTdGFja092ZXJmbG93IGV4YW1wbGUu
Ok, so parsing that is no problem, and is exactly the result we would expect. And in 99% of the cases, using any code to at least verify that each char in the buffer is a valid base64 char, works perfectly. But, the next example throws a wrench into the mix.
Content-Transfer-Encoding: base64
http://www.stackoverflow.com
VGhpcyBpcyBzaW1wbGUgQVNDSUkgQmFzZTY0IGZvciBTdGFja092ZXJmbG93IGV4YW1wbGUu
This a version of Base64 encoding that I have seen in some viruses and other things that attempt to take advantage of some mail readers desire to parse mime at all costs, versus ones that go strictly by the book, or rather RFC; if you will.
My Base64 decoder decodes the second example to the following data stream. And keep in mind here, the original stream is all ASCII data!
[0x]86DB69FFFC30C2CB5A724A2F7AB7E5A307289951A1A5CC81A5CC81CDA5B5C1B19481054D0D
2524810985CD94D8D08199BDC8814DD1858DAD3DD995C999B1BDDC8195E1B585C1B194B8
Anyone have a good way to solve both problems at once? I'm not sure it's even possible, outside of doing two transforms on the data with different rules applied, and comparing the results. However if you took that approach, which output do you trust? It seems that ASCII heuristics is about the best solution, but how much more code, execution time, and complexity would that add to something as complicated as a virus scanner, which this code is actually involved in? How would you train the heuristics engine to learn what is acceptable Base64, and what isn't?
UPDATE:
Do to the number of views this question continues to get, I've decided to post the simple RegEx that I've been using in a C# application for 3 years now, with hundreds of thousands of transactions. Honestly, I like the answer given by Gumbo the best, which is why I picked it as the selected answer. But to anyone using C#, and looking for a very quick way to at least detect whether a string, or byte[] contains valid Base64 data or not, I've found the following to work very well for me.
[^-A-Za-z0-9+/=]|=[^=]|={3,}$
And yes, this is just for a STRING of Base64 data, NOT a properly formatted RFC1341 message. So, if you are dealing with data of this type, please take that into account before attempting to use the above RegEx. If you are dealing with Base16, Base32, Radix or even Base64 for other purposes (URLs, file names, XML Encoding, etc.), then it is highly recommend that you read RFC4648 that Gumbo mentioned in his answer as you need to be well aware of the charset and terminators used by the implementation before attempting to use the suggestions in this question/answer set.
From the RFC 4648:
Base encoding of data is used in many situations to store or transfer data in environments that, perhaps for legacy reasons, are restricted to US-ASCII data.
So it depends on the purpose of usage of the encoded data if the data should be considered as dangerous.
But if you’re just looking for a regular expression to match Base64 encoded words, you can use the following:
^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?$
^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?$
This one is good, but will match an empty String
This one does not match empty string :
^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{4})$
The answers presented so far fail to check that the Base64 string has all pad bits set to 0, as required for it to be the canonical representation of Base64 (which is important in some environments, see https://www.rfc-editor.org/rfc/rfc4648#section-3.5) and therefore, they allow aliases that are different encodings for the same binary string. This could be a security problem in some applications.
Here is the regexp that verifies that the given string is not just valid base64, but also the canonical base64 string for the binary data:
^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/][AQgw]==|[A-Za-z0-9+/]{2}[AEIMQUYcgkosw048]=)?$
The cited RFC considers the empty string as valid (see https://www.rfc-editor.org/rfc/rfc4648#section-10) therefore the above regex also does.
The equivalent regular expression for base64url (again, refer to the above RFC) is:
^(?:[A-Za-z0-9_-]{4})*(?:[A-Za-z0-9_-][AQgw]==|[A-Za-z0-9_-]{2}[AEIMQUYcgkosw048]=)?$
Neither a ":" nor a "." will show up in valid Base64, so I think you can unambiguously throw away the http://www.stackoverflow.com line. In Perl, say, something like
my $sanitized_str = join q{}, grep {!/[^A-Za-z0-9+\/=]/} split /\n/, $str;
say decode_base64($sanitized_str);
might be what you want. It produces
This is simple ASCII Base64 for StackOverflow exmaple.
The best regexp which I could find up till now is in here
https://www.npmjs.com/package/base64-regex
which is in the current version looks like:
module.exports = function (opts) {
opts = opts || {};
var regex = '(?:[A-Za-z0-9+\/]{4}\\n?)*(?:[A-Za-z0-9+\/]{2}==|[A-Za-z0-9+\/]{3}=)';
return opts.exact ? new RegExp('(?:^' + regex + '$)') :
new RegExp('(?:^|\\s)' + regex, 'g');
};
Here's an alternative regular expression:
^(?=(.{4})*$)[A-Za-z0-9+/]*={0,2}$
It satisfies the following conditions:
The string length must be a multiple of four - (?=^(.{4})*$)
The content must be alphanumeric characters or + or / - [A-Za-z0-9+/]*
It can have up to two padding (=) characters on the end - ={0,2}
It accepts empty strings
To validate base64 image we can use this regex
/^data:image/(?:gif|png|jpeg|bmp|webp)(?:;charset=utf-8)?;base64,(?:[A-Za-z0-9]|[+/])+={0,2}
private validBase64Image(base64Image: string): boolean {
const regex = /^data:image\/(?:gif|png|jpeg|bmp|webp|svg\+xml)(?:;charset=utf-8)?;base64,(?:[A-Za-z0-9]|[+/])+={0,2}/;
return base64Image && regex.test(base64Image);
}
The shortest regex to check RFC-4648 compiliance enforcing canonical encoding (i.e. all pad bits set to 0):
^(?=(.{4})*$)[A-Za-z0-9+/]*([AQgw]==|[AEIMQUYcgkosw048]=)?$
Actually this is the mix of this and that answers.
I found a solution that works very well
^(?:([a-z0-9A-Z+\/]){4})*(?1)(?:(?1)==|(?1){2}=|(?1){3})$
It will match the following strings
VGhpcyBpcyBzaW1wbGUgQVNDSUkgQmFzZTY0IGZvciBTdGFja092ZXJmbG93IGV4YW1wbGUu
YW55IGNhcm5hbCBwbGVhcw==
YW55IGNhcm5hbCBwbGVhc3U=
YW55IGNhcm5hbCBwbGVhc3Vy
while it won't match any of those invalid
YW5#IGNhcm5hbCBwbGVhcw==
YW55IGNhc=5hbCBwbGVhcw==
YW55%%%%IGNhcm5hbCBwbGVhc3V
YW55IGNhcm5hbCBwbGVhc3
YW55IGNhcm5hbCBwbGVhc
YW***55IGNhcm5hbCBwbGVh=
YW55IGNhcm5hbCBwbGVhc==
YW55IGNhcm5hbCBwbGVhc===
My simplified version of Base64 regex:
^[A-Za-z0-9+/]*={0,2}$
Simplification is that it doesn't check that its length is a multiple of 4. If you need that - use other answers. Mine is focusing on simplicity.
To test it: https://regex101.com/r/zdtGSH/1