Deflation compression algorithm for huge data streams - c++

I've got C++ program that is getting data buffer from time to time, and should add it to existing compressed file.
I tried to make POC by reading 1k chunks from some file, passing them to compressed stream and uncompress it when the data is over.
I use Poco::DeflatingOutputStream to compress each chunk to the file, and Poco::InflatingOutputStream to check that after decompressing I get the original file.
However, it seems that after decompressing the stream my data went almost identical to the original file, except that between every 2 consecutive chunks of data i get a few garbage characters such as : à¿_ÿ
here's an example of line that is split between 2 chunks. the original line looks like that :
elevated=0 path=/System/Library/CoreServices/Dock.app/Contents/MacOS/Dock exist
while the decompressed line is :
elevated=0 path=/System/Libr à¿_ÿary/CoreServices/Dock.app/Contents/MacOS/Dock exist
May 19 19:12:51 PANMMUZNG8WNREM kernel[0]: pid=904 uid=1873876126 sbit=0
any idea what am i doing wrong. Here's my POC code:
int zip_unzip() {
std::ostringstream stream1;
Poco::DeflatingOutputStream gzipper(stream1, Poco::DeflatingStreamBuf::STREAM_ZLIB);
std::ifstream bigFile("/tmp/in.log");
constexpr size_t bufferSize = 1024;
char buffer[bufferSize];
while (bigFile) {
bigFile.read(buffer, bufferSize);
gzipper << buffer;
}
gzipper.close();
std::string zipped_string = stream1.str();
//////////////////
std::ofstream stream2("/tmp/out.log", std::ios::binary);
Poco::InflatingOutputStream gunzipper(stream2, InflatingStreamBuf::STREAM_ZLIB);
gunzipper << zipped_string;
gunzipper.close();
return 0;
}

Ok, i just realized i used the '<<' operator on each read from the HugeFile (the original decompressed file) without care, since there was no null termination symbol '/0' at the end of each window i read from the file.
That's the fixed version :
#include <stdio.h>
#include <fstream>
#include <Poco/DeflatingStream.h>
#include <Poco/Exception.h>
#include <iostream>
int BetterZip()
{
try {
// Create gzip file.
std::ofstream output_file("/tmp/out.gz", std::ios::binary);
Poco::DeflatingOutputStream output_stream(output_file, Poco::DeflatingStreamBuf::STREAM_GZIP);
// INPUT
std::ifstream big_file("/tmp/hugeFile");
constexpr size_t ReadBufferSize = 1024;
char buffer[ReadBufferSize];
while (big_file) {
big_file.read(buffer, ReadBufferSize);
output_stream.write(buffer, big_file.gcount());
}
output_stream.close();
} catch (const Poco::Exception& ex) {
std::cout << "Error : (error code " << ex.code() << " (" << ex.displayText() << ")";
return EINVAL;
}
return 0;
}

Related

Inconsistent results with fstream::read

My program use a small SQLite3 database. To make sure it actually exist when the program is launched, I have a database creation script in a file, that is executed.
The script work without problem.
However, when using C++ I/O functions to read from that file, I am getting really often invalid characters at the end of my file, which result in the script containing errors and not being properly executed by the SQLite library. Here is an example when displaying the buffer content:
// Proper content from the file, then a random character is there
1
Error executing request: near "1": syntax error
Other characters also appear, whitespaces, numbers, letters...
Here is the code where I load my script :
std::cerr << "Creating database if needed...\n";
char sql_script[] = "/path/to/script.sql";
int script_length;
bool result = false;
std::ifstream script_fs(sql_script, std::fstream::binary | std::fstream::in);
if (script_fs) {
char* buffer;
char** err_msg = NULL;
script_fs.seekg(0, script_fs.end);
script_length = script_fs.tellg();
script_fs.seekg(0, script_fs.beg);
buffer = new char[script_length];
script_fs.read(buffer, script_length);
std::cout << "sql:\n" << buffer << "\n";
if (sqlite3_exec(m_db, buffer, NULL, NULL, err_msg) == SQLITE_OK){
result = true;
} else {
std::cerr << "Error executing: " << sqlite3_errmsg(m_db) << "\n" << err_msg << "\n";
}
delete buffer;
script_fs.close();
} else {
std::cerr << "Error opening script: " << strerror(errno) << "\n";
}
return result;
}
Why is this happening and how can I fix this ?
You need to make sure that you have a null-terminated string.
Allocate memory for one more character.
Assign the null character to the last element of buffer.
buffer = new char[script_length+1];
script_fs.read(buffer, script_length);
buffer[script_length] = '\0';
Also, use the array form of delete.
delete [] buffer;
Don't mix C and C++, If you want to read sql query file in C++ using ifstream then below code in C++ can be one approach in which you don't need to manage memory, take care of things like allocating one extra char of '\0' etc. :
#include <iostream>
#include <fstream>
#include <vector>
using namespace std;
int main() {
ifstream fin("test.sql", std::fstream::binary | std::fstream::in);
std::string sqlquery = std::string(std::istreambuf_iterator<char>(fin), std::istreambuf_iterator<char>());
std::cout<<sqlquery<<std::endl;
return 0;
}

Linux CGI Web API - how to stdout a binary JPEG image under C++?

I'm writing a Web API, using CGI under linux. All is great, using gcc. I am returning an image (jpeg) to the host: std::cout << "Content-Type: image/jpeg\n\n" and now must send the binary jpeg image. Loading the image into a char* buffer and std::cout << buffer; does not work. I do get back an empty image. I suspect stdout stops on the first 00 byte.
I'm receiving from the web server a 200 OK with an incomplete image.
I was going to redirect to the file in an open folder on the device, but this must be a secure transfer and not available to anyone who knows the url.
I'm stumped!
The code snippet looks like this:
std:string imagePath;
syslog(LOG_DEBUG, "Processing GetImage, Image: '%s'", imagePath.c_str());
std::cout << "Content-Type: image/jpeg\n\n";
int length;
char * buffer;
ifstream is;
is.open(imagePath.c_str(), ios::in | ios::binary);
if (is.is_open())
{
// get length of file:
is.seekg(0, ios::end);
length = (int)is.tellg();
is.seekg(0, ios::beg);
// allocate memory:
buffer = new char[length]; // gobble up all the precious memory, I'll optimize it into a smaller buffer later
// OH and VECTOR Victor!
syslog(LOG_DEBUG, "Reading a file: %s, of length %d", imagePath.c_str(), length);
// read data as a block:
is.read(buffer, length);
if (is)
{
syslog(LOG_DEBUG, "All data read successfully");
}
else
{
syslog(LOG_DEBUG, "Error reading jpg image");
return false;
}
is.close();
// Issue is this next line commented out - it doesn't output the full buffer
// std::cout << buffer;
// Potential solution by Captain Obvlious - I'll test in the morning
std::cout.write(buffer, length);
}
else
{
syslog(LOG_DEBUG, "Error opening file: %s", imagePath.c_str());
return false;
}
return true;
As it's already been pointed out to you, you need to use write() instead of IO formatting operations.
But you don't even need to do that. You don't need to manually copy one file to another, one buffer at a time, when iostreams will be happy to do it for you.
std::ifstream is;
is.open(imagePath.c_str(), std::ios::in | std::ios::binary);
if (is.is_open())
{
std::cout << is.rdbuf();
}
That's pretty much it.
This boiled down to a much simpler block. I hard coded the imagePath for this example. Put this in your linux web server's cgi_bin folder, place a jpg in ../www_images/image0001.jpg and from your client call the web server via http:///cgi_bin/test and you return the image.
#include <stdio.h>
#include <iostream>
#include <fstream>
int test()
{
std::ifstream fileStream;
std::string imagePath = "../www_images/image0001.jpg"; // pass this variable in
// output an image header - CGI
syslog(LOG_DEBUG, "Processing GetImage, Image: '%s'", imagePath.c_str());
std::cout << "Content-Type: image/jpeg\n\n";
// output binary image
fileStream.open(imagePath.c_str(), std::ios::in | std::ios::binary);
if (fileStream.is_open())
{
std::cout << fileStream.rdbuf();
}
else
{
return 1; // error - not handled in this code
}
return 0;
}
ps: no religious wars on brackets please. ;)

How often should I check whether an fstream object is open?

I use an fstream object to write data to a text file. I write some initial data to it once and then write more data to it in a loop. Every time before writing to the file stream, I check whether it's open. Is this unnecessary? Should I only check once immediately after creating the fstream object?
Basically, I do this:
#define STOP 32700 // some value that indicates no more data is available
#include <string>
#include <exception>
#include <fstream>
int main (int argc, char* argv[])
{
try {
double data[5] {};
const std::string output_file_name {"name.txt"}
std::fstream outputFile (output_file_name, std::ios::out | std::ios::trunc);
if (outputFile.is_open()) // successfully opened file
outputFile << "initial text\n";
else // if text file could not be opened
throw Fstream_Exception;
/* do some other stuff (in various threads) */
do { // ok, now get data and write it to the file!
getData(&data[0]);
if (outputFile.is_open())
outputFile << data[0] << '\n';
else
throw Fstream_Exception;
} while (data[0] != STOP);
}
catch (Fstream_Exception& fstream_exception) {
/* handle exception */
}
}
The stream itself can throw exceptions when an error occurs. Just use its exceptions() method and pass the types of errors you want it to detect. This is more convenient than checking the state flags after each operation.

Uncompress data in memory using Boost gzip_decompressor

I'm trying to decompress binary data in memory using Boost gzip_decompressor. From this answer, I adapted the following code:
vector<char> unzip(const vector<char> compressed)
{
vector<char> decompressed = vector<char>();
boost::iostreams::filtering_ostream os;
os.push(boost::iostreams::gzip_decompressor());
os.push(boost::iostreams::back_inserter(decompressed));
boost::iostreams::write(os, &compressed[0], compressed.size());
return decompressed;
}
However, the returned vector has zero length. What am I doing wrong? I tried calling flush() on the os stream, but it did not make a difference
Your code works for me with this simple test program:
#include <iostream>
#include <vector>
#include <boost/iostreams/filtering_stream.hpp>
#include <boost/iostreams/filter/gzip.hpp>
std::vector<char> unzip(const std::vector<char> compressed)
{
std::vector<char> decompressed = std::vector<char>();
boost::iostreams::filtering_ostream os;
os.push(boost::iostreams::gzip_decompressor());
os.push(boost::iostreams::back_inserter(decompressed));
boost::iostreams::write(os, &compressed[0], compressed.size());
return decompressed;
}
int main() {
std::vector<char> compressed;
{
boost::iostreams::filtering_ostream os;
os.push(boost::iostreams::gzip_compressor());
os.push(boost::iostreams::back_inserter(compressed));
os << "hello\n";
os.reset();
}
std::cout << "Compressed size: " << compressed.size() << '\n';
const std::vector<char> decompressed = unzip(compressed);
std::cout << std::string(decompressed.begin(), decompressed.end());
return 0;
}
Are you sure your input was compressed with gzip and not some other method (e.g. raw deflate)? gzip compressed data begins with bytes 1f 8b.
I generally use reset() or put the stream and filters in their own block to make sure that output is complete. I did both for compression above, just as an example.

How can I get the duration of an MP3 file (CBR or VBR) with a very small library or native code c/c++?

I can't use any mp3 code that is patented by Fraunhofer, so no encoders OR decoders (e.g. ffmpeg, lame, MAD, etc.), plus it's too big.
I am doing this on Windows, but DirectShow's IMediaDet seems to slow down over time, calling it a few hundred times brings my system to a crawl, even re-using the same interface object and just putting the file name and getting duration!
So, is there some code out there that can read VBR files with C/C++ and get the duration?
There was another post on here to do CBR in C++, but the code makes a ton of assumptions and wont work for VBR of course.
Most MP3 files have an ID3 header. It is not hard to decode that and get the duration.
Here is some very basic and ugly code that illustrates the technique.
#include <iostream>
#include <iomanip>
size_t GetMP3Duration(const std::string sFileName);
int main(int argc, char* argv[])
{
try
{
size_t nLen = GetMP3Duration(argv[1]);
if (nLen==0)
{
std::cout << "Not Found" << std::endl;
}
else
{
std::cout << nLen << " miliseconds" << std::endl;
std::cout << nLen/60000 << ":";
nLen %= 60000;
std::cout << nLen/1000 << ".";
std::cout << std::setw(3) << std::setfill('0') << nLen%1000 << std::endl;
}
}
catch (std::exception &e)
{
std::cout << "Exception: " << e.what() << std::endl;
}
return 0;
}
#include <cstring>
#include <vector>
#include <iostream>
#include <fstream>
#include <cctype>
#include <cstdlib>
unsigned DecodeMP3SafeInt(unsigned nVal)
{
// nVal has 4 bytes (8-bits each)
// - discard most significant bit from each byte
// - reverse byte order
// - concatenate the 4 * 7-bit nibbles into a 24-bit size.
unsigned char *pValParts = reinterpret_cast<unsigned char *>(&nVal);
return (pValParts[3] & 0x7F) |
((pValParts[2] & 0x7F) << 7) |
((pValParts[1] & 0x7F) << 14) |
((pValParts[0] & 0x7F) << 21);
}
#pragma pack(1)
struct MP3Hdr {
char tag[3];
unsigned char maj_ver;
unsigned char min_ver;
unsigned char flags;
unsigned int size;
};
struct MP3ExtHdr {
unsigned int size;
unsigned char num_flag_bytes;
unsigned char extended_flags;
};
struct MP3FrameHdr {
char frame_id[4];
unsigned size;
unsigned char flags[2];
};
#pragma pack()
size_t GetMP3Duration(const std::string sFileName)
{
std::ifstream fin(sFileName.c_str(), std::ifstream::binary);
if (!fin)
throw std::invalid_argument("Cannot open file");
// Read Header
MP3Hdr hdr = { 0 };
fin.read(reinterpret_cast<char *>(&hdr), sizeof(hdr));
if (!fin.good())
throw std::invalid_argument("Error reading file");
if (0 != ::memcmp(hdr.tag, "ID3", 3))
throw std::invalid_argument("Not an MP3 File");
// Read extended header, if present
if (0 != (hdr.flags&0x40))
{
fin.seekg(sizeof(MP3ExtHdr), std::ifstream::cur);
if (!fin.good())
throw std::invalid_argument("Error reading file");
}
// read a chunk of file.
const size_t nDefaultSize(2048);
std::vector<char> vBuff(nDefaultSize);
fin.read(&vBuff[0], vBuff.size());
size_t nSize = fin.gcount();
if (!nSize)
throw std::invalid_argument("Error reading file");
vBuff.resize(nSize);
size_t nUsed = 0;
while (nSize-nUsed > sizeof(MP3FrameHdr))
{
MP3FrameHdr *pFrame = reinterpret_cast<MP3FrameHdr *>(&vBuff[nUsed]);
nUsed += sizeof(MP3FrameHdr);
size_t nDataLen = DecodeMP3SafeInt(pFrame->size);
if (nDataLen > (nSize-nUsed))
throw std::invalid_argument("Corrupt file");
if (!::isupper(pFrame->flags[0])) // past end of tags
return 0;
if (0 == ::memcmp(pFrame->frame_id, "TLEN", 4))
{
// skip an int
nUsed += sizeof(int);
// data is next
return atol(&vBuff[nUsed]);
}
else
{
nUsed += nDataLen;
}
}
return 0;
}
Jeff,
the only valid way is to go through whole mp3 file, find every mp3 frame inside of it and compute total duration for them.
Main characteristic of mp3 file is that their density might differ, and also that lot's of other binary data could be included inside of it. ID3 tags for example, that any decoder will skip upon reading.
Anyway - look here for mp3 frame header info:
http://www.mp3-converter.com/mp3codec/mp3_anatomy.htm
try to create code that will correctly parse header by header, calculate their duration (from sampling frequency) and then total the durations for all frames.
You don't have to decode the frames, just use headers from them.
If you don't mind LGPL try http://sourceforge.net/projects/mpg123net/
I found a library that does it, LGPL v3: http://www.codeproject.com/KB/audio-video/mpegaudioinfo.aspx
How about tagLib or id3lib?
They are not decoders per se, they are more of extracting the track/artist/album and host of other information that will enable you to do what you need to do...