RapidJSON c++ efficient and scalable way to append json object to file - c++

i have a json file that has array of json objects. i am using rapidjson c++.
i want to append new object to json array that is inside this file
currently what i do is that i read the whole file in a json object using fileread stream and the i add new member (new json object) using AddMember inside array of that document that i read previously. and now i overwrite this new object inside the file and repeat the process for new objects.
this solution is not scalable. Can someone pointout anyother solution using rapidjson or raw filestream. help will be appreciated, i've been looking all over the internet but no luck.
is there something like append to file incrementally using json.
or any other scalable solution because my file size will get very large with time and thus reading the whole file everytime and then appending a new object and then rewrite the whole file will be a waste to memory and cpu time.
help me with this one please

This question is from some years ago, but this answer is still relevant.
The goal is to append a json object with rapidjson to a potentially already existing file which contains a json array. The following is satisfied:
No reading or parsing of the already existing file.
The new object is added directly to the already existing file without document merging.
Time does not depend on what has been added previously.
Here is the code with comments:
bool appendToFile(const std::string& filename, const rapidjson::Document& document)
{
using namespace rapidjson;
// create file if it doesn't exist
if (FILE* fp = fopen(filename.c_str(), "r"); !fp)
{
if (fp = fopen(filename.c_str(), "w"); !fp)
return false;
fputs("[]", fp);
fclose(fp);
}
// add the document to the file
if (FILE* fp = fopen(filename.c_str(), "rb+"); fp)
{
// check if first is [
std::fseek(fp, 0, SEEK_SET);
if (getc(fp) != '[')
{
std::fclose(fp);
return false;
}
// is array empty?
bool isEmpty = false;
if (getc(fp) == ']')
isEmpty = true;
// check if last is ]
std::fseek(fp, -1, SEEK_END);
if (getc(fp) != ']')
{
std::fclose(fp);
return false;
}
// replace ] by ,
fseek(fp, -1, SEEK_END);
if (!isEmpty)
fputc(',', fp);
// append the document
char writeBuffer[65536];
FileWriteStream os(fp, writeBuffer, sizeof(writeBuffer));
Writer<FileWriteStream> writer(os);
document.Accept(writer);
// close the array
std::fputc(']', fp);
fclose(fp);
return true;
}
return false;
}

I do not know if there is a readymade library for that, but if you decide to do it yourself is not impossible.
In few steps you could:
1) Load the all JSON in ram.
2) Take every request to append JSON and save it to a log file
3) Update JSON In RAM after written request to log
4) Every x seconds block changes, write the all JSON to disk and clear the log file
5) Unblock changes
6) Goto 2
Further optimizations could b:
1) Check for log file on start (after a crash) and apply log requests
2) When you write the JSON file do not rewrite completely but check if there were only appends at the end and write only the new part.
How does this sound ?

Related

compare data in a JSON::Value variable and then update to file

I am trying to update a data to two JSON files by providing the filename at run time.
This is the updateTOFile function which will update data stored in JSON variable to two different in two different threads.
void updateToFile()
{
while(runInternalThread)
{
std::unique_lock<std::recursive_mutex> invlock(mutex_NodeInvConf);
FILE * pFile;
std::string conff = NodeInvConfiguration.toStyledString();
pFile = fopen (filename.c_str(), "wb");
std::ifstream file(filename);
fwrite (conff.c_str() , sizeof(char), conff.length(), pFile);
fclose (pFile);
sync();
}
}
thread 1:
std::thread nt(&NodeList::updateToFile,this);
thread 2:
std::thread it(&InventoryList::updateToFile,this);
now it's updating the files even if no data has changed from the previous execution. I want to update the file only if there's any change compared to previously stored one. if there is no change then it should print the data is same.
Can anyone please help with this??
Thanks.
You can check if it has changed before writing.
void updateToFile()
{
std::string previous;
while(runInternalThread)
{
std::unique_lock<std::recursive_mutex> invlock(mutex_NodeInvConf);
std::string conf = NodeInvConfiguration.toStyledString();
if (conf != previous)
{
// TODO: error handling missing like in OP
std::ofstream file(filename);
file.write (conf.c_str() , conf.length());
file.close();
previous = std::move(conf);
sync();
}
}
}
However such constant polling in loop is likely inefficient. You may add Sleeps to make it less diligent. Other option is to track by NodeInvConfiguration itself if it has changed and clear that flag when storing.

what the schema of reading big xml data using "Memory Mapped Files"?

i have a big xml file( osm map data file to parse). the initial code to process is like this:
FILE* file = fopen(fileName.c_str(), "r");
size_t BUF_SIZE = 10 * 1024 * 1024;
char* buf = new char[BUF_SIZE];
string contents;
while (!feof(file))
{
int ret = fread(buf, BUF_SIZE, 1, file);
assert(ret != -1);
contents.append(buf);
}
size_t pos = 0;
while (true)
{
pos = contents.find('<', pos);
if (pos == string::npos) break;
// Case: found new node.
if (contents.substr(pos, 5) == "<node")
{
do something;
}
// Case: found new way.
else if (contents.substr(pos, 4) == "<way")
{
do something;
}
}
then here people tell me i should use memory mapping file to process those "big data file",
detail is here:
how to read to huge file into buffer,
i mean when it is a fixed size and not very large, may i could load one time into memory and append the content to a string object, then i could apply find(), method and other string method to extract the node content of a xml file. ( the code in the beginning of my question use this method and i test that will produce right result). Then if the file is very large, how apply those methods (not using xml library such as libxml)?
in one word, for small xml file, i could load the whole content to a std::string and apply the find(), substr() operation and got wanted information in the xml file. when the xml file is very large, when i need use the memory mapping file to cope with. then could append the whole content to a std::string, how could i parse the file not using exsit xml library?
hope i have clearly express my question.
If you're using std::string members to get the data you need, you're almost certainly not parsing the XML in the traditional sense of parsing XML. (That is, you're very probably not making any use of XML's hierarchical structure. Although you are extracting data from XML, "parsing XML" means something much more specific to most people.)
That said, the C equivalents of the std::string members you seem to be OK with, such as memcmp and the GNU extension memmem, just take pointers and lengths. Read their documentation and use them in place of their std:;string-member equivalents.

How to add rapidjson::Document in file

I need to parse a file, get some data and write them in another file using RapidJson.
Right now I could retrieve values and put them in a document. My only problem is to insert that document in file:
FILE * pFile = fopen ("read.json" , "r");
FILE * wFile = fopen ("Test.json" , "w");
if (pFile != NULL)
{
rapidjson::FileStream is(pFile);
rapidjson::Document document;
document.ParseStream<0>(is);
string mMeshID = a.GetString();
//how to add that document to wfile
fclose (pFile);
}
Is there any way to write a RapidJson::Document in file ?
EDIT: the only way I found is:
// Convert JSON document to string
GenericStringBuffer< UTF8<> > buffer;
Writer<GenericStringBuffer< UTF8<> > > writer(buffer);
doc.Accept(writer);
const char* str = buffer.GetString();
fprintf(wFile, "%s", str);
fclose(wFile);
There is better documentation about FileWriteStream after this question was asked.
Using FileWriteStream instead of StringBuffer can reduce memory usage. FileWriteStream uses a fixed size of buffer (can be stored in the stack), while StringBuffer needs to store the whole JSON in (heap) memory. It becomes a big difference for big JSON.
You better use
fwrite (buffer.GetString(), buffer.GetSize(), 1, wFile);
it's safer (in case the buffer is not null-terminated) and faster (no strlen).
Other than that and the lack of error checking in your code, it's fine and should write the JSON to the file NP.

Read/write operation works neither good nor bad

I am programming a face detection algorithm. In my code I'm parsing an XML file (in a recursion way, very inefficient takes my about 4 minutes to parse the whole XML file). I'd like to save the XML content using Iosteam binary to a file. I'm using a struct in C++ in order to use the raw data.
My goal is to parse the XML only if the raw data file is not exist.
The method work like this:
If the raw data file is not exist, parse the XML file and save the data to a file.
If the raw data file exist, read the raw data from the file
My problem is: whenever I open the raw data file and read from it. I get to read only small amount of byte from the file, I don't know how much, but in a certain point I receive only 0x00 data on my buffer.
My guess: I believe this has to do with the OS buffer, Which has a certain amount of buffer for read and write operations. I might be wrong about this. Though I'm not sure which one from the operations doesn't work well, it's either the write or read.
I was thinking to write / read the raw data char by char or line by line. In the other hand the file doesn't contain a text, which means that I can't read line by line or char by char.
The raw data size is
size_t datasize = DataSize(); == 196876 (Byte)
Which is retrieve in this function
/* Get the upper bound for predefined cascade size */
size_t CCacadeInterpreter::DataSize()
{
// this is an upper boundary for the whole hidden cascade size
size_t datasize = sizeof(HaarClassifierCascade) * TOTAL_CASCADE+
sizeof(HaarStageClassifier)*TOTAL_STAGES +
sizeof(HaarClassifier) * TOTAL_CLASSIFIERS +
sizeof(void*)*(TOTAL_CASCADE+TOTAL_STAGES+TOTAL_CLASSIFIERS);
return datasize;
}
The method work like this
BYTE * CCacadeInterpreter::Interpreter()
{
printf("|Phase - Load cascade from memory | CCacadeInterpreter::Interpreter | \n");
size_t datasize = DataSize();
// Create a memory structure
nextFreeSpace = pStartMemoryLocation = new BYTE [datasize];
memset(nextFreeSpace,0x00,datasize);
// Try to open a predefined cascade file on the current folder (instead of parsing the file again)
fstream stream;
stream.open(cascadeSavePath); // ...try existing file
if (stream.is_open())
{
stream.seekg(0,ios::beg);
stream.read((char*)pStartMemoryLocation , datasize); // **ream from file**
stream.close();
printf("|Load cascade from saved memory location | CCacadeInterpreter::Interpreter | \n");
printf("Completed\n\n");
stream.close();
return pStartMemoryLocation;
}
// Open the cascade file and parse the cascade xml file
std::fstream cascadeFile;
cascadeFile.open(cascadeDestanationPath, std::fstream::in); // open the file with read only attributes
if (!cascadeFile.is_open())
{
printf("Error: couldn't open cascade XML file\n");
delete pStartMemoryLocation;
return NULL;
}
// Read the file XML file , line by line
string buffer, str;
getline(cascadeFile,str);
while(cascadeFile)
{
buffer+=str;
getline(cascadeFile,str);
}
cascadeFile.close();
split(buffer, '<',m_tokens);
// Parsing begins
pHaarClassifierCascade = (HaarClassifierCascade*)nextFreeSpace;
nextFreeSpace += sizeof(HaarClassifierCascade);
pHaarClassifierCascade->count=0;
pHaarClassifierCascade->orig_window_size_height=20;
pHaarClassifierCascade->orig_window_size_width=20;
m_deptInTree=0;
m_numOfStage = 0;
m_numOfTotalClassifiers=0;
while (m_tokens.size())
{
Parsing();
}
// Save the current cascade into a file
SaveBlockToMemory(pStartMemoryLocation,datasize);
printf("\nCompleted\n\n");
return pStartMemoryLocation;
}
bool CCacadeInterpreter::SaveBlockToMemory(BYTE * pStartMemoryLocation,size_t dataSize)
{
fstream stream;
if (stream.is_open() )
stream.close();
stream.open(cascadeSavePath); // ...try existing file
if (!stream.is_open()) // ...else, create new file...
stream.open(cascadeSavePath, ios_base::in | ios_base::out | ios_base::trunc);
stream.seekg(0,ios::beg);
stream.write((char*)pStartMemoryLocation,dataSize);
stream.close();
return true;
}
Try using the Boost IOstreams library.
It has an easy to use wrrapers for file handling

Reading writing from offset

bool CReadWrite::write(unsigned long long offset, void* pvsrc, unsigned long long nbytes)
{ int WriteResult;
pFile = fopen("D:\\myfile.bin","wb");
if (!pFile){
puts("Can't open file");
return false;
}
//offset = fseek(pFile,offset,
WriteResult = fwrite (pvsrc, 1, nbytes, pFile);
if (WriteResult == nbytes){
puts("Wrote to file");
fclose(pFile);
return true;
}
else{
puts("Unable to write to File.");
fclose(pFile);
return false;
}
}
This is my class function so far. I'm basically opening a file, checking to see if it did indeed open if not get out. Writes the file, checks to see if the file see if the file was indeed written to returns true. else return false. As you can tell by my parameters, I'm looking to create an offset where I can give a particular offset i.e. 10, and start from 10 and then from there write. I know for sure I need to use fseek but I can't assume that I'm at the beginning of the file or anywhere in the file. Im pretty sure i need to use SEEK_SET but I may be wrong. Any thoughts on implemented the above desires? Thanks.
If you're using fopen without the append setting (as you are, "wb" creates an empty file), you can assume you're at the beginning.
Regardless, SEEK_SET sets the position to the given offset from the beginning.
If the file doesn't have the offset that you want to seek to (as it is in your case), then the question is what are you required to do? If just pad - then write offset padding bytes, and then your content, otherwise maybe you wanted to use "a" and not "w". "w" truncates the existing content of the file, while "a" opens for append and sets position to the end of the existing content.
More details here.