Minify output from rapidjson - c++

I am using rapidjson to output some data for doing some statistic and plotting of a c++ programms algorithm like an internal runtime snapshots of the algorithm.
I output json like this:
string filename="output.json";
StringBuffer sb;
PrettyWriter<StringBuffer> writer(sb);
writer.StartArray();
for (std::vector<O_Class>::const_iterator netItr = O_Class_Array.begin(); netItr != O_Class_Array.end(); ++netItr)
netItr->Serialize(writer);
writer.EndArray();
ofstream out;
out.open(filename);
out << sb.GetString() ;
As files become quite big (~100MiB) i'd like to output minified json, but I didn't find a documented way of doing so.
With an external minifier I shrunk filesize from 100 to 18MB and like to have the same result as native in my application.
Any ideas?
Thanks for any suggestions!

Replace PrettyWriter for Writer.
And you could ZIP the content too. This will significantly reduce the size.

Related

How do I combine hundreds of binary files to a single output file in c++?

I have a folder filled with hundreds of .aac files, and I'm trying to "pack" them into one file in the most efficient way that I can.
I have tried the following, but only end up with a file that's only a few bytes long or audio that sounds warbled and distorted heavily.
// Now we want to get all of the files in the folder and then repack them into an aac file as fast as possible
void Repacker(string FileName)
{
string data;
boost::filesystem::path p = "./tmp/";
boost::filesystem::ofstream aacwriter;
aacwriter.open("./" + FileName + ".aac", ios::app);
boost::filesystem::ifstream aacReader;
boost::filesystem::directory_iterator it{ p };
cout << "Repacking File!" << endl;
while (it != boost::filesystem::directory_iterator{}) {
aacReader.open(*it, std::ios::in | ios::binary);
cout << "Writing " << *it << endl;
aacReader >> data;
++it;
aacwriter << data;
aacReader.close();
}
aacwriter.close();
}
I have looked at the following questions to try and help solve this issue
Merging two files together
Merge multiple txt files into one
How do I read an entire file into a std::string in C++?
Read whole ASCII file into C++ std::string
Merging Two Files Into One
However unfortunately, none of these answer my question.
They all either have to do with text files, or functions that don't deal with hundreds of files at once.
I am trying to write Binary data, not text. The audio is either all warbled or the file is only a few bytes long.
If there's a memory efficent method to do this, please let me know. I am using C++ 20 and boost.
Thank you.
These files have an internal structure: header, blocks/frames, etc. and the simple presence of multiple headers within the concatenated file will mess up the expected result.
Take a look at the AAC file format structure, you'll see that it's not so simple.
Your best try should be to use FFMPEG, since it has a feature to concatenate media files without being forced to reencode data. It's a bit complex because FFMPEG's command line is quite complex and not always extremely intuitive, but it should works as long as all AAC files uses the same encoding and characteristics. Otherwise, you'll need to re-encode them - but it can be done automatically, too.
Check this web research to get some base informations.
Otherwise, you may use the base libraries used by FFMPEG, for example libavcodec (available at ffmpeg.org), Fraunhofer FDK AAC, etc. but you'll have way, way more work to do and, finally, you'll do exactly what FFMPEG already do, since it relies on these libraries. Other AAC libraries won't be really easier to use.
Obviously, you can also "embed" FFMPEG within your application, call tools like ffprobe to analyze files and call ffmpeg executable automatically, as a child process.
CAUTION: Take a GREAT care about licensing if you plan to distribute your program. FFMPEG licensing is really not simple, most of the time it's distributed as sources to avoid vicious cases.

Pass Binary string/file content from c++ to node js

I'm trying to pass the content of a binary file from c++ to node using the node-gyp library. I have a process that creates a binary file using the .fit format and I need to pass the content of the file to js to process it. So, my first aproach was to extract the content of the file in a string and try to pass it to node like this.
char c;
std::string content="";
while (file.get(c)){
content+=c;
}
I'm using the following code to pass it to Node
v8::Local<v8::ArrayBuffer> ab = v8::ArrayBuffer::New(args.GetIsolate(), (void*)content.data(), content.size());
args.GetReturnValue().Set(ab);
In node a get an arrayBuffer but when I print the content to a file it is different to the one that show a c++ cout.
How can I pass the binary data succesfully?
Thanks.
Probably the best approach is to write your data to a binary disk file. Write to disk in C++; read from disk in NodeJS.
Very importantly, make sure you specify BINARY MODE.
For example:
myFile.open ("data2.bin", ios::out | ios::binary);
Do not use "strings" (at least not unless you want to uuencode). Use buffers. Here is a good example:
How to read binary files byte by byte in Node.js
var fs = require('fs');
fs.open('file.txt', 'r', function(status, fd) {
if (status) {
console.log(status.message);
return;
}
var buffer = new Buffer(100);
fs.read(fd, buffer, 0, 100, 0, function(err, num) {
...
});
});
You might also find these links helpful:
https://nodejs.org/api/buffer.html
<= Has good examples for specific Node APIs
http://blog.paracode.com/2013/04/24/parsing-binary-data-with-node-dot-js/
<= Good discussion of some of the issues you might face, including "endianness" and "interpreting numbers"
ADDENDUM:
The OP clarified that he's considering using C++ as a NodeJS Add-On (not a standalone C++ program.
Consequently, using buffers is definitely an option. Here is a good tutorial:
https://community.risingstack.com/using-buffers-node-js-c-plus-plus/
If you choose to go this route, I would DEFINITELY download the example code and play with it first, before implementing buffers in your own application.
It depends but for example using redis
Values can be strings (including binary data) of every kind, for
instance you can store a jpeg image inside a value. A value can't be
bigger than 512 MB.
If the file is bigger than 512MB, then you can store it in chunks.
But I wouldnt suggest since this is an in-memory data store
Its easy to implement in both c++ and node.js

C++ read text line-by-line, speed/efficiency savings needed

I have a series of large text files (10s - 100s of thousands of lines) that I want to parse line-by-line. The idea is to check if the line has a specific word/character/phrase and to, for now, record to a secondary file if it does.
The code I've used so far is:
ifstream infile1("c:/test/test.txt");
while (getline(infile1, line)) {
if (line.empty()) continue;
if (line.find("mystring") != std::string::npos) {
outfile1 << line << '\n';
}
}
The end goal is to be writing those lines to a database. My thinking was to write them to the file first and then to import the file.
The problem I'm facing is the time taken to complete the task. I'm looking to minimize the time as far as possible, so any suggestions as to time savings on the read/write scenario above would be most welcome. Apologies if anything is obvious, I've only just started moving into C++.
Thanks
EDIT
I should say that I'm using VS2015
EDIT 2
So this was my own dumb fault, when switching to Release and changing the architecture type I had noticeable speed increases. Thanks to everyone for pointing me in that direction. I'm also looking at the mmap stuff and that's proving useful too. Thanks guys!
When you use ifstream to read and process to/from really big files, you have to increase the default buffer size that is used (normally 512 bytes).
The best buffer size depends on your needs, but as a hint you can use the partition block size of the file(s) your reading/writing. To know that information you can use a lot of tools or even code.
Example in Windows:
fsutil fsinfo ntfsinfo c:
Now, you have to create a new buffer to ifstream like this:
size_t newBufferSize = 4 * 1024; // 4K
char * newBuffer = new char[newBufferSize];
ifstream infile1;
infile1.rdbuf()->pubsetbuf(newBuffer, newBufferSize);
infile1.open("c:/test/test.txt");
while (getline(infile1, line)) {
/* ... */
}
delete newBuffer;
Do the same with the output stream and don't forget set new buffer before open file or it may not work.
You can play with values to find the very best size for you.
You'll note the difference.
C-style I/O functions are much faster than fstream.
You may use fgets/fputs to read/write each text line.

Reducing Size / Compressing JSON String in C++

Is there a way to compress a JSON string in c++ , so that the overall size can be reduced ?
In my case mobile app which retreives XML create by CCUserDefault, then it converts that XML to JSON using rapidJson. Now I want to reduce its size or compress it using any cpp library.
Assuming you just want to minimise the size of the string (as opposed to general compression such as gzip), then a library such as rapidjson could be used.
There's an example in this unit test:
Roughly:
StringStream s("{ \"hello\" : \"world\" ");
StringBuffer buffer;
Writer<StringBuffer> writer(buffer);
Reader reader;
reader.Parse<0>(s, writer);
EXPECT_STREQ("{\"hello\":\"world\"}", buffer.GetString());
You could use zlib to compress the JSON string in memory and decompress it back. Perhaps using the ideas in here

ZLib Decompression in C++

I am trying to get a function going to unzip a single text file compressed with .gz. It needs to uncompress the .gz file given its path and write the uncompressed text file given its destination. I am using C++ and what I have seen is that ZLIB does exactly what I need except I cannot find 1 single example anywhere on the net that shows it doing this. Can anyone show me an example or at least guide me in the right direction?
If you just want to inflate a file with raw deflated data (i.e. no archive) you can use something like this:
gzFile inFileZ = gzopen(fileName, "rb");
if (inFileZ == NULL) {
printf("Error: Failed to gzopen %s\n", filename);
exit(0);
}
unsigned char unzipBuffer[8192];
unsigned int unzippedBytes;
std::vector<unsigned char> unzippedData;
while (true) {
unzippedBytes = gzread(inFileZ, unzipBuffer, 8192);
if (unzippedBytes > 0) {
unzippedData.insert(unzippedData.end(), unzipBuffer, unzipBuffer + unzippedBytes);
} else {
break;
}
}
gzclose(inFileZ);
The unzippedData vector now holds your inflated data. There are probably more efficient ways to store the inflated data, especially if you know the uncompressed size in advance, but this approach works for me.
If you only want to save the inflated data to a file without any further processing you could skip the vector and just write the unzipBuffer contents to another file.
You can use the gzopen(), gzread(), and gzclose() functions of zlib, much like you would the corresponding stdio functions fopen(), etc. That will read the gzip file and decompress it. You can then use fopen(), fwrite(), etc. to write the uncompressed data back out.
You can use ZLibComplete to do this. There is a complete example in C++ on the front page of GZip decompression.
http://rudi-cilibrasi.github.io/zlibcomplete/
Ah, I'll assume http://zlib.net/zlib_how.html does what you want?