Why am i getting not great compression with zlib? - c++

By default i output a file that is 120mb. Here i have a input and output buffer thats double that. When i run this code i get an output of 10mb (default gives me 11mb). When i zip the raw 128mb file i get 700kb. Why am i getting 11mb instead of <1mb like zip gives me? Using 7-zip manager i asked it to compress with gzip using deflate and it give me a 4.6mb file which is still much smaller. I'm very curious why this is happening. It feels like i am doing something wrong.
static UInt32 len=0;
static char buf[1024*1024*256];
static char buf2[1024*1024*256];
static char *curbuf=buf;
z_stream strm;
void initzstuff()
{
strm.zalloc = 0;
strm.zfree = 0;
strm.opaque = 0;
int ret = deflateInit(&strm, Z_BEST_COMPRESSION);
if (ret != Z_OK)
return;
}
void flush_file(MyOstream o, bool end){
strm.avail_in = len;
strm.next_in = (UInt8*)buf;
strm.avail_out = sizeof(buf2);
strm.next_out = (UInt8*)buf2;
int ret = deflate(&strm, (end ? Z_FINISH : Z_NO_FLUSH));
assert(ret != Z_STREAM_ERROR);
int have = sizeof(buf2) - strm.avail_out;
fwrite(buf2, 1, have, o);
if(end)
{
(void)deflateEnd(&strm);
}
len=0;
curbuf=buf;
/*
fwrite(buf, 1, len, o);
len=0;
curbuf=buf;
//*/
}

Zip can use Deflate64 or other compression algorithm (like BZip2), and when your file is very sparce that can result in such difference.
Also, standard for ZLib tells only about the format of compressed data, and how the data is compressed is chosen by archivators, so 7-zip can use some heuristics which makes the ouput smaller.

Probably chunk-size? zlib.net/zpipe.c gives a fairly good example.
You'll probably get better performance too if you chunk rather than try to do the entire stream.

Related

Decoding with OGG/Vorbis gives no sound

I'd like to play an Ogg/Vorbis audio/video file, but right now I can't get to read audio from a file.
My algorithm to read audio is:
Initialize required structures:
vorbis_info info;
vorbis_comment comment;
vorbis_dsp_state dsp;
vorbis_block block;
vorbis_info_init(&info);
vorbis_comment_init(&comment);
Read headers:
Call vorbis_synthesis_headerin(&info, &comment, packet); until it returns OV_ENOTVORBIS
vorbis_synthesis_init(&dsp, &info);
vorbis_block_init(&dsp, &block);
Pass the first non-header packet to function below
Parse packets, do it until audioReady == READY
putPacket(ogg_packet *packet) {
int ret;
ret = vorbis_synthesis(&block, packet);
if( ret == 0 ) {
ret = vorbis_synthesis_blockin(&dsp, &block);
audioReady = (ret == 0) ? READY : NOT_READY;
} else {
audioReady = NOT_READY;
}
}
Read audio data:
float** rawData = nullptr;
readSamples = vorbis_synthesis_pcmout(&dsp, &rawData);
if( readSamples == 0 ) {
audioReady = NOT_READY;
return;
}
int16_t* newData = new int16_t[readSamples * getChannels()];
int16_t* dst = newData;
for(unsigned int i=0; i<readSamples; ++i) {
for(unsigned char ch=0; ch<getChannels(); ++ch) {
*(dst++) = math::clamp<int16_t>(rawData[ch][i]*32767 + 0.5f, -32767, 32767);
}
}
audioData.push_back({readSamples * getChannels() , newData});
vorbis_synthesis_read(&dsp, static_cast<int>(readSamples));
audioReady = NOT_READY;
This is where it gets wrong: after examining the newData contents it is revealed that it contains a very silent sound. I doubt if it is the right data which means somewhere along my algorithm I did something wrong.
I tried to find some examples of similar programs, but all I got are sources with very spaghetti-like code, which seems to do the same algorithm like mine, yet they do their job. (There is one off such library: https://github.com/icculus/theoraplay )
Is there any reason why I'm getting (almost) silence in my application?
PS: If you are wondering if I might getting OGG packets wrong, then I assure you this part of my code is working right, as I'm also reading video data from the same file, using the same code and it shows the video right.
I've found it: during reading packets I assumed that one Ogg Page = one Ogg packet. I's wrong: for audio one page can contain many packets. To read it properly one has to make a code like:
do{
putPacket(&packet);
}while( ogg_stream_packetout(&state, &packet) == 1 );
I did this mistake because for video packets (which I did first) a page contains only one packet.

Caching images in c++. Using buffer_body or other things instead of file_body?

I have slightly modified version of this https://www.boost.org/doc/libs/develop/libs/beast/example/http/server/async/http_server_async.cpp.
What it does:
According to the correctness of the request it returns the required image or an error.
What I'm going to do:
I want to keep frequently requesting images in local cache like an LRU cache to decrease response time
What I've tried:
I wanted to use buffer_body instead of file_body but some difficulties occurred with respond part, so I discarded this idea.
I tried to decode an png image to std::string, I thought this way I could keep it in std::unordered_map easier, but again problems arose with response part of the code
Here is the response part:
http::response<http::file_body> res {
std::piecewise_construct,
std::make_tuple(std::move(body)),
std::make_tuple(http::status::ok, req.version()) };
res.set(http::field::content_type, "image/png");
res.content_length(size);
res.keep_alive(req.keep_alive());
return send(std::move(res));
If doing it by encoding and decoding the image as string is ok I provide below the code where I read it to a string:
std::unordered_map<std::string, std::string> cache;
std::string load_file_contents(const std::string& filepath)
{
static const size_t MAX_LOAD_DATA_SIZE = 1024 * 1024 * 8 ; // 8 Mbytes.
std::string result;
static const size_t BUFF_SIZE = 8192; // 8 Kbytes
char buf[BUFF_SIZE];
FILE* file = fopen( filepath.c_str(), "rb" ) ;
if ( file != NULL )
{
size_t n;
while( result.size() < MAX_LOAD_DATA_SIZE )
{
n = fread( buf, sizeof(char), BUFF_SIZE, file);
if (n == 0)
break;
result.append(buf, n);
}
fclose(file);
}
return result;
}
template<class Body, class Allocator, class Send>
void handle_request(
beast::string_view doc_root,
http::request<Body, http::basic_fields<Allocator>>&& req,
Send&& send)
{
.... // skipping this part not to paste all the code
if(cache.find(path) == cache.end())
{
// if not in cache
std::ifstream image(path.c_str(), std::ios::binary);
// not in the cache and could open, so get it and decode it as a binary file
cache.emplace(path, load_file_contents(path));
}
.... // repsonse part (provided above) response should take from cache
}
ANY HELP WILL BE APPRECIATED! THANK YOU!
Sometimes there is no need to cache these files, for example, in my case changing file_body to vector_body or string_body were enough to speed up respond time almost twice

How to create a gz-compatible file with zlib?

I want to use the zlib to produce a gz-compatible output file with C++.
I installed the developer package for zlib, which can be used -- as I understand it -- to create gz-compatible files both on Unix and on Windows.
sudo aptitude install libz-dev
Although I write a C++-program, I quite I followed the usage example in the relevant points, I think. I also compiled the example to zpipe.c unchanged.
Alas, what I get is not a gz-compatible output.
$ ./zpipe.x < data.txt > x.gz
$ file x.gz
x.gz: data
$ gunzip x.gz
gzip: x.gz: not in gzip format
I thought that the reason here might be, because deflateSetHeader is not called. So I added that into my own source code, i.e. (excerpt, you can find the full code here):
struct DeflateWrap { // RAII wrapper
z_stream strm_ ; // C-Struct from zlib.h
explicit DeflateWrap() : strm_{} {
strm_.zalloc = Z_NULL;
strm_.zfree = Z_NULL;
strm_.opaque = Z_NULL;
auto ret = deflateInit2(&strm_, LEVEL,
Z_DEFLATED, 15, 9, Z_DEFAULT_STRATEGY);
if(ret != Z_OK) throw std::runtime_error("Error ZLib-Init");
}
// ...more, eg. operator-> and *...
};
void pack(const string& infn) {
DeflateWrap dwrap {};
//...
dwrap->avail_in = indata.size();
dwrap->next_in = reinterpret_cast<unsigned char*>(indata.data());
gz_header header {0}; // <<< HEADER HERE
header.name = const_cast<unsigned char*>(
reinterpret_cast<const unsigned char*>(infn.c_str()));
header.comment = Z_NULL;
header.extra = Z_NULL;
bool first = true;
do {
dwrap->avail_out = outdata.size();
dwrap->next_out = reinterpret_cast<unsigned char*>(outdata.data());
if(first) {
cerr << deflateSetHeader(&(dwrap.strm_), &header); // <<< SET HDR HERE
first = false;
}
deflate(&(dwrap.strm_), Z_FINISH); // zlib.h: this packs
auto toWrite = outdata.size() - dwrap->avail_out;
outf.write(outdata.data(), toWrite);
} while (dwrap->avail_out == 0);
}
To my interpretation I followed the manual for deflateSetHeader:
I even used deflateInit2 instead of deflateInit, probably unnecessarily
the call of deflateSetHeader is immediatly after deflateInit2
the call of deflateSetHeader is before any call of deflate
...and still I get a -2, i.e. Z_STREAM_ERROR from the deflateSetHeader call. Although, the output I produce can be uncompressed with zpipe.c, therefore it can't be totally wrong, can it?
Any idea how to set a gz-compatible header?
Update:
As I see it I use the C++-pendant to
SET_BINARY_MODE(stdin);
SET_BINARY_MODE(stdout);
by opening the files like this:
ifstream inf{ infn, ifstream::binary };
ofstream outf { infn + ".gz", ofstream::binary };
Also, I wonder why the zpipe.c example I produced also does not make a gunzip-compatible file, as I described before. From what I read here it should.
Although I read the documentation of deflateSetHeader that the output file is gz-compatible, a bit further down there is a hint that it may be not so.
This library supports reading and writing files in gzip (.gz) format with an interface similar to that of stdio, using the functions that start with "gz". The gzip format is different from the zlib format. gzip is a gzip wrapper, documented in RFC 1952, wrapped around a deflate stream.
Thus, when I use the different set of functions gz... I get gz-compatible output and simpler code:
struct GzWrite { // RAII-Wrapper
gzFile gz_ ; // C-Struct aus zlib.h
explicit GzWrite(const string& filename)
: gz_{gzopen(filename.c_str(),"wb9")}
{
if(gz_==NULL) throw std::runtime_error(strerror(errno));
}
~GzWrite() {
gzclose(gz_);
}
int write(const char* data, size_t len) {
return gzwrite(gz_, data, len);
}
GzWrite(const GzWrite&) = delete; // keine Kopie
GzWrite& operator=(const GzWrite&) = delete; // keine Zuweisung
};
void packe(const string& infn) {
vector<char> indata = lese(infn); // lese Eingabe
GzWrite gz{infn+".gz"}; // initialisiere Ausgabe
auto res = gz.write(indata.data(), indata.size());
if(res==0) throw std::runtime_error("Fehler beim Schreiben");
}
windowBits can also be –8..–15 for raw deflate. In this case, -windowBits determines the window size. deflate() will then generate raw deflate data with no zlib header or trailer, and will not compute an adler32 check value.
windowBits can also be greater than 15 for optional gzip encoding. Add 16 to windowBits to write a simple gzip header and trailer around the compressed data instead of a zlib wrapper. The gzip header will have no file name, no extra data, no comment, no modification time (set to zero), no header crc, and the operating system will be set to 255 (unknown). If a gzip stream is being written, strm->adler is a crc32 instead of an adler32.

ZLib Inflate() failing with -3 Z_DATA_ERROR

I am trying to unzip a file by calling the inflate function but it always fails with Z_DATA_ERROR even when I use the example program from the website. I am thinking that maybe the zip file I have is not supported. I have attached a picture of the zip header below.
And here is the function that I wrote to perform the unzipping. I read in the whole file at once (about 34KB) and pass it into this function. Note I have tried passing the whole zip file with the zip header as well as skipping over the zip file header and only passing the zipped data both fail with Z_DATA_ERROR when inflate() is called.
int CHttpDownloader::unzip(unsigned char * pDest, unsigned long * ulDestLen, unsigned char * pSource, int iSourceLen){
int ret = 0;
unsigned int uiUncompressedBytes = 0; // Number of uncompressed bytes returned from inflate() function
unsigned char * pPositionDestBuffer = pDest; // Current position in dest buffer
unsigned char * pLastSource = &pSource[iSourceLen]; // Last position in source buffer
z_stream strm;
// Skip over local file header
SLocalFileHeader * header = (SLocalFileHeader *) pSource;
pSource += sizeof(SLocalFileHeader) + header->sFileNameLen + header->sExtraFieldLen;
// We should now be at the beginning of the stream data
/* allocate inflate state */
strm.zalloc = Z_NULL;
strm.zfree = Z_NULL;
strm.opaque = Z_NULL;
strm.avail_in = 0;
strm.next_in = Z_NULL;
ret = inflateInit2(&strm, 16+MAX_WBITS);
if (ret != Z_OK){
return -1;
}
// Uncompress the data
strm.avail_in = header->iCompressedSize; //iSourceLen;
strm.next_in = pSource;
do {
strm.avail_out = *ulDestLen;
strm.next_out = pPositionDestBuffer;
ret = inflate(&strm, Z_NO_FLUSH);
assert(ret != Z_STREAM_ERROR); /* state not clobbered */
switch (ret) {
case Z_NEED_DICT:
ret = Z_DATA_ERROR; /* and fall through */
case Z_DATA_ERROR:
case Z_MEM_ERROR:
(void)inflateEnd(&strm);
return -2;
}
uiUncompressedBytes = *ulDestLen - strm.avail_out;
*ulDestLen -= uiUncompressedBytes; // ulDestSize holds number of free/empty bytes in buffer
pPositionDestBuffer += uiUncompressedBytes;
} while (strm.avail_out == 0);
// Close the decompression stream
inflateEnd(&strm);
ASSERT(ret == Z_STREAM_END);
return 0;
}
So my question is, is the type of zip file I am reading in not supported by ZLib's inflate() function? Or is there something wrong with my CHttpDownloader::unzip() function? Thanks for any help :)
Inflate() was failing because it was looking for GZip headers which were not present. If you initialize the stream with:
ret = inflateInit2(&strm, -MAX_WBITS);
Passing a negative window bits value prevents inflate from checking for gzip or zlib headers and unzipping works as expected.
That file that begins with 50 4B 03 04 is a zip file. The zlib library does not process zip files directly. zlib can help with the compression, decompression, and crc calculations. However you need other code to process the zip file format.
You can look at contrib/minizip in the zlib distribution, or libzip.

Libzip - read file contents from zip

I using libzip to work with zip files and everything goes fine, until i need to read file from zip
I need to read just a whole text files, so it will be great to achieve something like PHP "file_get_contents" function.
To read file from zip there is a function "int
zip_fread(struct zip_file *file, void *buf, zip_uint64_t nbytes)".
Main problem what i don't know what size of buf must be and how many nbytes i must read (well i need to read whole file, but files have different size). I can just do a big buffer to fit them all and read all it's size, or do a while loop until fread return -1 but i don't think it's rational option.
You can try using zip_stat to get file size.
http://linux.die.net/man/3/zip_stat
I haven't used the libzip interface but from what you write it seems to look very similar to a file interface: once you got a handle to the stream you keep calling zip_fread() until this function return an error (ir, possibly, less than requested bytes). The buffer you pass in us just a reasonably size temporary buffer where the data is communicated.
Personally I would probably create a stream buffer for this so once the file in the zip archive is set up it can be read using the conventional I/O stream methods. This would look something like this:
struct zipbuf: std::streambuf {
zipbuf(???): file_(???) {}
private:
zip_file* file_;
enum { s_size = 8196 };
char buffer_[s_size];
int underflow() {
int rc(zip_fread(this->file_, this->buffer_, s_size));
this->setg(this->buffer_, this->buffer_,
this->buffer_ + std::max(0, rc));
return this->gptr() == this->egptr()
? traits_type::eof()
: traits_type::to_int_type(*this->gptr());
}
};
With this stream buffer you should be able to create an std::istream and read the file into whatever structure you need:
zipbuf buf(???);
std::istream in(&buf);
...
Obviously, this code isn't tested or compiled. However, when you replace the ??? with whatever is needed to open the zip file, I'd think this should pretty much work.
Here is a routine I wrote that extracts data from a zip-stream and prints out a line at a time. This uses zlib, not libzip, but if this code is useful to you, feel free to use it:
#
# compile with -lz option in order to link in the zlib library
#
#include <zlib.h>
#define Z_CHUNK 2097152
int unzipFile(const char *fName)
{
z_stream zStream;
char *zRemainderBuf = malloc(1);
unsigned char zInBuf[Z_CHUNK];
unsigned char zOutBuf[Z_CHUNK];
char zLineBuf[Z_CHUNK];
unsigned int zHave, zBufIdx, zBufOffset, zOutBufIdx;
int zError;
FILE *inFp = fopen(fName, "rbR");
if (!inFp) { fprintf(stderr, "could not open file: %s\n", fName); return EXIT_FAILURE; }
zStream.zalloc = Z_NULL;
zStream.zfree = Z_NULL;
zStream.opaque = Z_NULL;
zStream.avail_in = 0;
zStream.next_in = Z_NULL;
zError = inflateInit2(&zStream, (15+32)); /* cf. http://www.zlib.net/manual.html */
if (zError != Z_OK) { fprintf(stderr, "could not initialize z-stream\n"); return EXIT_FAILURE; }
*zRemainderBuf = '\0';
do {
zStream.avail_in = fread(zInBuf, 1, Z_CHUNK, inFp);
if (zStream.avail_in == 0)
break;
zStream.next_in = zInBuf;
do {
zStream.avail_out = Z_CHUNK;
zStream.next_out = zOutBuf;
zError = inflate(&zStream, Z_NO_FLUSH);
switch (zError) {
case Z_NEED_DICT: { fprintf(stderr, "Z-stream needs dictionary!\n"); return EXIT_FAILURE; }
case Z_DATA_ERROR: { fprintf(stderr, "Z-stream suffered data error!\n"); return EXIT_FAILURE; }
case Z_MEM_ERROR: { fprintf(stderr, "Z-stream suffered memory error!\n"); return EXIT_FAILURE; }
}
zHave = Z_CHUNK - zStream.avail_out;
zOutBuf[zHave] = '\0';
/* copy remainder buffer onto line buffer, if not NULL */
if (zRemainderBuf) {
strncpy(zLineBuf, zRemainderBuf, strlen(zRemainderBuf));
zBufOffset = strlen(zRemainderBuf);
}
else
zBufOffset = 0;
/* read through zOutBuf for newlines */
for (zBufIdx = zBufOffset, zOutBufIdx = 0; zOutBufIdx < zHave; zBufIdx++, zOutBufIdx++) {
zLineBuf[zBufIdx] = zOutBuf[zOutBufIdx];
if (zLineBuf[zBufIdx] == '\n') {
zLineBuf[zBufIdx] = '\0';
zBufIdx = -1;
fprintf(stdout, "%s\n", zLineBuf);
}
}
/* copy some of line buffer onto the remainder buffer, if there are remnants from the z-stream */
if (strlen(zLineBuf) > 0) {
if (strlen(zLineBuf) > strlen(zRemainderBuf)) {
/* to minimize the chance of doing another (expensive) malloc, we double the length of zRemainderBuf */
free(zRemainderBuf);
zRemainderBuf = malloc(strlen(zLineBuf) * 2);
}
strncpy(zRemainderBuf, zLineBuf, zBufIdx);
zRemainderBuf[zBufIdx] = '\0';
}
} while (zStream.avail_out == 0);
} while (zError != Z_STREAM_END);
/* close gzip stream */
zError = inflateEnd(&zStream);
if (zError != Z_OK) {
fprintf(stderr, "could not close z-stream!\n");
return EXIT_FAILURE;
}
if (zRemainderBuf)
free(zRemainderBuf);
fclose(inFp);
return EXIT_SUCCESS;
}
With any streaming you should consider the memory requirements of your app.
A good buffer size is large, but you do not want to have too much memory in use depending on your RAM usage requirements. A small buffer size will require you call your read and write operations more times which are expensive in terms of time performance. So, you need to find a buffer in the middle of those two extremes.
Typically I use a size of 4096 (4KB) which is sufficiently large for many purposes. If you want, you can go larger. But at the worst case size of 1 byte, you will be waiting a long time for you read to complete.
So to answer your question, there is no "right" size to pick. It is a choice you should make so that the speed of your app and the memory it requires are what you need.