Fixing wrong JSON data - c++

I'm working with a JSON data that I download from web. Problem with this JSON is that it's contents are incorrect. To show the problem here is a simplified preview:
[
{
"id": 0,
"name": "adsad"
},
{
"id": "123",
"name": "aawew"
}
]
So there is an array of these items, where in some value for "id" is string and somewhere it is an integer. This is the data that I get and I can't make the source fix this.
The solution I came up with was to fix this data before serializing it and here is my naive algorithm where Defaults::intTypes() is a vector of all key that should be integer but are sometimes string:
void fixJSONData(QString& data) {
qDebug() << "Fixing JSON data ( thread: " << QThread::currentThreadId() << ")";
QElapsedTimer timer;
timer.start();
for (int i = 0; i < data.size(); ++i) {
for (const auto& key : Defaults::intTypes()) {
if (data.mid(i, key.size() + 3) == "\"" + key + "\":") {
int newLine = i + key.size() + 3;
while (data[newLine] != ',' && data[newLine] != '}') {
if (data[newLine] == '"') {
data.remove(newLine, 1);
} else {
++newLine;
}
}
i = newLine;
break;
}
}
}
qDebug() << "Fixing done in " << timer.elapsed() << " ms.";
}
Well it does fix the problem, but the algorithm is too slow and it is too slow (went through 4.5 million characters in 390 seconds). How could this be done faster?
P.S.: for JSON serialization I use nlohmann::json library.
Edit: After reading up a bit deeper into JSON rules, it looks like that example above is absolutely valid JSON file. Should this be an issue related to C++ being strongly type dependent so it can't serialize an array of different elements into C++ classes?
Edit2: What I would like to create from that json string is QVector<Model> where:
class Model {
unsigned id;
QString name;
}

Although there must be several way to improve this conversion maybe there is a much more effective solution.
Most of the JSON libraries allow the end user to define custom serializer/deserializer for an object. If you create a custom deserializer then it can parse the original data and you don't have to modify the stream or files.
It's not only faster but also more elegant.
(If the given JSON library doesn't support custom deserialization I would consider choosing an other one.)

Related

Accessing JSON values in C++

I am trying to write a program that navigates your local disc in Unreal Engine for a small application. I have put together a REST server using Gradle, and long story short, I am given a JSON with a machines directories. I want to pull out the specific directories names, to be returned as string (FText specifically, but that not too important here) array.
I found a library created by nLohmann on github (https://github.com/nlohmann/json) which seems to be the best way to handle a JSON in c++. For the life of me, however, I can't figure out how to pull the directory names out. I've tried an iterator and a straightforward .value() call.
The code and a JSON example are below, any insight would be greatly appreciated.
char buffer[1024];
FILE *lsofFile_p = _popen("py C:\\Users\\jinx5\\CWorkspace\\sysCalls\\PullRoots.py", "r");
fgets(buffer, sizeof(buffer), lsofFile_p);
_pclose(lsofFile_p);
std::string rootsJson(buffer);
string s = rootsJson.substr(1);
s = ReplaceAll(s, "'", "");
//here my string s will contain: [{"description":"Local Disk","name":"C:\\"},{"description":"Local Disk","name":"D:\\"},{"description":"CD Drive","name":"E:\\"}]
//These are two syntax examples I found un nlohmann's docs, neither seems to work
auto j = json::parse(s);
string descr = j.value("description", "err");
I think your problem comes from number of \ in your literal string. You need 5 \ for C:\\ : C:\\\\\.
Here is a working example :
#include "json.hpp"
#include <string>
using namespace std;
using json = nlohmann::json;
int main(){
json j = json::parse("[{\"description\":\"Local Disk\",\"name\":\"C:\\\\\"},{\"description\":\"Local Disk\",\"name\":\"D:\\\\\"},{\"description\":\"CD Drive\",\"name\":\"E:\\\\\"}]");
cout << j.is_array() << endl;
for (auto& element : j) {
std::cout << "description : " << element["description"] << " | " << " name : " << element["name"] << '\n';
}
return 0;
}

Passing stringstream as ostream&, no content being read

I'm coding an assignment where, basically, I have a client and server communicating. For this, I'm using a stringstream in the server side to process the requests (that come in the form of strings) and help build a string response. The server side holds an object (FileManager) that contains several MultimediaFile objects, each with information about themselves. This information can be printed through a method, of which one of the parameters is an ostream&.
To get the request from the client, the stringstream works just fine, but when I pass it to the method, for some reason it doesn't read the content (whereas if I pass std::cout it works just fine). I'm thinking it has something to do with the way stringstream uses the << operator to read information, but I'm unsure how to correct this.
I know I could simply store the information I need about the object in a string or something like that (instead of using ostream&), but the professor wants the method to print directly to the passed ostream, and not return a string.
Here's the code:
bool processRequest(TCPServer::Cnx& cnx, const string& request, string& response)
{
bool changeData = false;
if (request == "delMedias" || request == "delGroups") changeData = true;
TCPServer::Lock lock(cnx, changeData);
cerr << "request: '" << request << "'" << endl;
string operation = "";
string filename = "";
stringstream iss;
iss.str(request); //next few lines get request info, works just fine!
if (iss.rdbuf()->in_avail() != 0)
iss >> operation;
if (iss.rdbuf()->in_avail() != 0)
iss.get();
if (iss.rdbuf()->in_avail() != 0)
getline(iss, filename);
iss.str("");
if (operation.size() == 0 || filename.size() == 0) return false;
response = "";
if (operation.compare("print") == 0)
{
filem.showFile(filename, iss); //Problem here!!!!
if (iss.rdbuf()->in_avail() != 0) //iss is always empty
{
response = iss.str();
response = "Print info: " + response;
}
else
response = "No info received!";
}
else if (operation.compare("play") == 0)
{
filem.playFile(filename);
response = "File played!";
}
else
response = "Operation not valid";
cerr << response << endl;
return true;
}
TCPServer is a class provided by the professor to make things easier, but basically the client sends a string request and, at the end of this function, receives a string response from the server. filem is the FileManager class object, and here's the code for the showFile method:
void FileManager::showFile(string name, ostream& s)
{
if (mfiles.find(name) != mfiles.end())
mfiles[name]->printFile(s);
}
mfiles is a map of string to MultimediaFile* (specifically std::shared_ptr of MultimediaFile, but anyway). Basically, this code checks if there is a file named name, and if so, calls the method printFile(s) where s here would be the stringstream. Here's the method's code:
void MultimediaFile::printFile(ostream& s) const
{
s << "File name: " << name << "\tFilepath: " << filepath << "\n";
}
name and filepath are instance variables of the MultimediaFile class. So, yeah, here I was expecting my stringstream to receive this information, which would then be used to build the response string in the main part of the code, just after the method call. That's not what happens however, and the stringstream is always empty (and since this works with std::cout, then the problem is not the data in the MultimediaFile object).
Again, I would say stringstream behaves differently than cout when getting data through the << operator, but I couldn't find any information that would help me in this case... Does anybody have an idea?
If there's any other information you need please let me know. And thanks in advance!
So, apparently I found a solution.
To test some things, I tried creating a separate bit of code just to check how the method would behave with the stringstream. And... it worked. The only difference between my separate tests, and the problem itself was that, in my program, the stringstream was being used to get data from the request string before being passed to the method to get data from there. So, what I did is, I created a second stringstream that was passed to the method... and it worked.
Basically, this is what I changed (the rest of the code is the same as the original post):
response = "";
stringstream iss2; //create another stringstream
if (operation == "print")
{
filem.showFile(filename, iss2); //use it instead of the first one
if (iss2.rdbuf()->in_avail() != 0)
{
response = iss2.str();
response = "Print info: " + response;
}
else
response = "No info received!";
}
I have no idea why the first stringstream doesn't work; maybe one of the methods I used to get the request information (str(), get(), getline() or the >> operator) change the state of the stringstream so that it doesn't accept new information? I don't know, just random thoughts.
But anyway, this works, so I'm happy for now...
P.S.: I also changed the operation.compare("print") to operation == "print". I couldn't remember the reason I used compare, and I got no warnings during compilation like I thought I had, so, yeah...

Getting raw string (or bytes) for a double value in rapidjson parsing?

Is there a way to get underlying bytes for a double value while parsing json using rapidjson.
Look at below code:
string temp_json2 = "{\"byte_size\":1000.3}";
rapidjson::Document doc;
doc.Parse<0>(temp_json2.c_str());
rapidjson::Value::ConstMemberIterator iter = doc.MemberBegin();
const rapidjson::Value& key = iter->name;
std::cout << key.GetString() << " = ";
const rapidjson::Value& val = iter->value;
std::cout << val.GetDouble();
I want to get something like
val.GetRawString(); instead of val.GetDouble();
Reason I need this is that I don't want any precision to be lost during conversion. Note that I don't have a choice to modifying json to put quotes around double value.
Looks like it is possible:
{
"hash": "00000000206d413bdd4d020a7df959176440e7b52f120f3416db11cb26aaaa8f",
"bigint": 13671375398414879143589706241811147679151753447299444772946167816777,
"time": "1551597576",
"special": false
}
rapidjson::Document document;
document.Parse<rapidjson::kParseNumbersAsStringsFlag>( JSONmessage );
std::cout << document["hash"].GetString() << std::endl;
std::cout << document["bigint"].GetString() << std::endl;
Source: https://github.com/Tencent/rapidjson/issues/1458
Currently no.
I am working on a "full-precision" parsing option (for quite long time) which can be precisely parse string into double. The double-to-string conversion is already exact using grisu2 algorithm. But if a string cannot be represented by double precisely, it will still lose some precision.
To support your requirement, it may need to add an parsing option, and changing the SAX and DOM interface. If you would like this feature to be implement, please report here for further discussion.

JSON parser that can handle large input (2 GB)?

So far, I've tried (without success):
QJsonDocument – "document too large" (looks like the max size is artificially capped at 1 << 27 bytes)
Boost.PropertyTree – takes up 30 GB RAM and then segfaults
libjson – takes up a few gigs of RAM and then segfaults
I'm gonna try yajl next, but Json.NET handles this without any issues so I'm not sure why it should be such a big problem in C++.
Check out https://github.com/YasserAsmi/jvar. I have tested it with a large database (SF street data or something, which was around 2GB). It was quite fast.
Well, I'm not proud of my solution, but I ended up using some regex to split my data up into top-level key-value pairs (each one being only a few MB), then just parsed each one of those pairs with Qt's JSON parser and passed them into my original code.
Yajl would have been exactly what I needed for something like this, but I went with the ugly regex hack because:
Fitting my logic into Yajl's callback structure would have involved rewriting enough of my code to be a pain, and this is just for a one-off MapReduce job so the code itself doesn't matter long-term anyway.
The data set is controlled by me and guaranteed to always work with my regex.
For various reasons, adding dependencies to Elastic MapReduce deployments is a bigger hassle than it should be (and static Qt compilation is buggy), so for the sake of not doing more work than necessary I'm inclined to keep dependencies to a minimum.
This still works and performs well (both time-wise and memory-wise).
Note that the regex I used happens to work for my data specifically because the top-level keys (and only the top level keys) are integers; my code below is not a general solution, and I wouldn't ever advise a similar approach over a SAX-style parser where reasons #1 and #2 above don't apply.
Also note that this solution is extra gross (splitting and manipulating JSON strings before parsing + special cases for the start and end of the data) because my original expression that captured the entire key-value pairs broke down when one of the pairs happened to exceed PCRE's backtracking limit (it's incredibly annoying in this case that that's even a thing, especially since it's not configurable through either QRegularExpression or grep).
Anyway, here's the code; I am deeply ashamed:
QFile file( argv[1] );
file.open( QIODevice::ReadOnly );
QTextStream textStream( &file );
QString jsonKey;
QString jsonString;
QRegularExpression jsonRegex( "\"-?\\d+\":" );
bool atEnd = false;
while( atEnd == false )
{
QString regexMatch = jsonRegex.match
(
jsonString.append( textStream.read(1000000) )
).captured();
bool isRegexMatched = regexMatch.isEmpty() == false;
if( isRegexMatched == false )
{
atEnd = textStream.atEnd();
}
if( atEnd || (jsonKey.isEmpty() == false && isRegexMatched) )
{
QString jsonObjectString;
if( atEnd == false )
{
QStringList regexMatchSplit = jsonString.split( regexMatch );
jsonObjectString = regexMatchSplit[0]
.prepend( jsonKey )
.prepend( LEFT_BRACE )
;
jsonObjectString = jsonObjectString
.left( jsonObjectString.size() - 1 )
.append( RIGHT_BRACE )
;
jsonKey = regexMatch;
jsonString = regexMatchSplit[1];
}
else
{
jsonObjectString = jsonString
.prepend( jsonKey )
.prepend( LEFT_BRACE )
;
}
QJsonObject jsonObject = QJsonDocument::fromJson
(
jsonObjectString.toUtf8()
).object();
QString key = jsonObject.keys()[0];
... process data and store in boost::interprocess::map ...
}
else if( isRegexMatched )
{
jsonKey = regexMatch;
jsonString = jsonString.split( regexMatch )[1];
}
}
I've recently finished (probably still a bit beta) such a library:
https://github.com/matiu2/json--11
If you use the json_class .. it'll load it all into memory, which is probably not what you want.
But you can parse it sequentially by writing your own 'mapper'.
The included mapper, iterates through the JSON, mapping the input to JSON classes:
https://github.com/matiu2/json--11/blob/master/src/mapper.hpp
You could write your own that does whatever you want with the data, and feed a file stream into it, so as not to load the whole lot into memory.
So as an example to get you started, this just outputs the json data in some random format, but doesn't fill up the memory any (completely untested nor compiled):
#include "parser.hpp"
#include <fstream>
#include <iterator>
#include <string>
int main(int argc, char **) {
std::ifstream file("hugeJSONFile.hpp");
std::istream_iterator<char> input(file);
auto parser = json::Parser(input);
using Parser = decltype(parser);
using std::cout;
using std::endl;
switch (parser.getNextType()) {
case Parser::null:
parser.readNull();
cout << "NULL" << endl;
return;
case Parser::boolean:
bool val = parser.readBoolean();
cout << "Bool: " << val << endl;
case Parser::array:
parser.consumeOneValue();
cout << "Array: ..." << endl;
case Parser::object:
parser.consumeOneValue();
cout << "Map: ..." << endl;
case Parser::number: {
double val = parser.readNumber<double>();
cout << "number: " << val << endl;
}
case Parser::string: {
std::string val = parser.readString();
cout << "string: " << val << endl;
}
case Parser::HIT_END:
case Parser::ERROR:
default:
// Should never get here
throw std::logic_error("Unexpected error while parsing JSON");
}
return 0;
}
Addendum
Originally I had planned for this library to never copy any data. eg. read a string just gave you a start and end iterator to the string data in the input, but because we actually need to decode the strings, I found that methodology too impractical.
This library automatically converts \u0000 codes in JSON to utf8 encoding in standard strings.
When dealing with records you can for example format your json and use the newline as a separator between objects, then parse each line separately eg:
"records": [
{ "someprop": "value", "someobj": { ..... } ... },
.
.
.
or:
"myobj": {
"someprop": { "someobj": {}, ... },
.
.
.
I just faced the same problem with Qt's 5.12 JSON support. Fortunately starting with Qt 5.15 (64 Bit) reading of large JSON files (I tested 1GB files) works flawlessly.

Parsing youtube data with C++ and Jsoncpp

Here is an example feed that I would like to parse:
https://gdata.youtube.com/feeds/api/users/aniBOOM/subscriptions?v=2&alt=json
You can check it with http://json.parser.online.fr/ to see what it contains.
I have a small problem while parsing data feed provided by youtube. First issue was the way the youtube provided the data wrapped inside feed field and because of that I couldn't parse the username straight from original json file so I had to parse first entry field and generate new Json data from that.
Anyway the problem is that for some reason that doesn't include more than the first username and I don't know why because if you check that feed on online parser the entry should contain all the usernames.
`
data = value["feed"]["entry"];
Json::StyledWriter writer;
std::string outputConfig = writer.write( data );
//This removes [ at the beginning of entry and also last ] so we can treat it as a Json data
size_t found;
found=outputConfig.find_first_of("[");
int sSize = outputConfig.size();
outputConfig.erase(0,1);
outputConfig.erase((sSize-1),sSize);
reader.parse(outputConfig, value2, false);
cout << value2 << endl;
Json::Value temp;
temp = value2["yt$username"]["yt$display"];
cout << temp << endl;
std::string username = writer.write( temp );
int sSize2 = username.size();
username.erase(0,1);
username.erase((sSize2-3),sSize2);
`
But for some reason [] fix also cuts the data I'm generating, if I print out the data without removing [] I can see all the users but in that case I can't extract temp = value2["yt$username"]["yt$display"];
In JSON, the brackets denote Arrays (nice reference here). You can see this in the online parser, also -- Objects (items with one or more key/value pairs {"key1": "value1", "key2": "value2"}) are denoted with blue +/- signs and Arrays (items inside brackets separated by commas [{arrayItem1}, {arrayItem2}, {arrayItem3}]) are denoted with red +/- signs.
Since entry is an Array, you should be able to iterate through them by doing something like this:
// Assumes value is a Json::Value
Json::Value entries = value["feed"]["entry"];
size_t size = entries.size();
for (size_t index=0; index<size; ++index) {
Json::Value entryNode = entries[index];
cout << entryNode["yt$username"]["yt$display"].asString() << endl;
}