Decompression and extraction of files from streaming archive on the fly - c++

I'm writing a browser plugin, similiar to Flash and Java in that it starts downloading a file (.jar or .swf) as soon as it gets displayed. Java waits (I believe) until the entire jar files is loaded, but Flash does not. I want the same ability, but with a compressed archive file. I would like to access files in the archive as soon as the bytes necessary for their decompression are downloaded.
For example I'm downloading the archive into a memory buffer, and as soon as the first file is possible to decompress, I want to be able to decompress it (also to a memory buffer).
Are there any formats/libraries that support this?
EDIT: If possible, I'd prefer a single file format instead of separate ones for compression and archiving, like gz/bzip2 and tar.

There are 2 issues here
How to write the code.
What format to use.
On the file format, You can't use the .ZIP format because .ZIP puts the table of contents at the end of the file. That means you'd have to download the entire file before you can know what's in it. Zip has headers you can scan for but those headers are not the official list of what's in the file.
Zip explicitly puts the table of contents at the end because it allows fast adding a files.
Assume you have a zip file with contains files 'a', 'b', and 'c'. You want to update 'c'. It's perfectly valid in zip to read the table of contents, append the new c, write a new table of contents pointing to the new 'c' but the old 'c' is still in the file. If you scan for headers you'll end up seeing the old 'c' since it's still in the file.
This feature of appending was an explicit design goal of zip. It comes from the 1980s when a zip could span multiple floppy discs. If you needed to add a file it would suck to have to read all N discs just to re-write the entire zip file. So instead the format just lets you append updated files to the end which means it only needs the last disc. It just reads the old TOC, appends the new files, writes a new TOC.
Gzipped tar files don't have this problem. Tar files are stored header, file, header file, and the compression is on top of that so it's possible to decompress as the file it's downloaded and use the files as they become available. You can create gzipped tar files easily in windows using winrar (commercial) or 7-zip (free) and on linux, osx and cygwin use the tar command.
On the code to write,
O3D does this and is open source so you can look at the code
http://o3d.googlecode.com
The decompression code is in o3d/import/cross/...
It targets the NPAPI using some glue which can be found in o3d/plugin/cross

Check out the boost::zlib filters. They make using zlib a snap.
Here's the sample from the boost docs that will decompress a file and write it to the console:
#include <fstream>
#include <iostream>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/zlib.hpp>
int main()
{
using namespace std;
ifstream file("hello.z", ios_base::in | ios_base::binary);
filtering_streambuf<input> in;
in.push(zlib_decompressor());
in.push(file);
boost::iostreams::copy(in, cout);
}

Sure, zlib for example uses z_stream for incremental compression and decompression via functions inflateInit, inflate, deflateInit, deflate. libzip2 has similar abilities.
For incremental extraction from the archive (as it gets deflated), look e.g. to the good old tar format.

Related

Downloading my programs data from a webserver (Its basically just a .exe turned into .txt) but when I put it into a .exe it does not run?

So currently I am using a basic Http request to pull the exe data from my server weblink.com/Program.exe
it returns my program in .txt form but when I put it into a file it will not run.
I assume this is because I need metadata but have no clue how to find that process or even how to google something as specific as that... So I am either asking for a solution (how to add proper .exe metadata) or if there is a better way to download files like that in C++
*Note I cannot use basic windows functions such as DownloadToFileA or External Library's (Like LibCurl/Curl)
OutFile.open(XorStr("C:\\Users\\Program.exe").c_str(), std::ios::out);
if (OutFile.is_open())
{
OutFile << Output;
//Initialize .exe Meta Data???
}
OutFile.close();
You need to open your file in binary mode otherwise newline translation will screw up your executable:
OutFile.open(XorStr("C:\\Users\\Program.exe").c_str(), std::ios::out | std::ios::binary);

Read only a part of a binary zip file

I am looking for a way to read a part of a binary zip file (starting position and number of bytes to read). Currently I'm investigating this on Windows, but optimally it would be platform independent. For a normal binary file (unzipped), this can be achieved in the following way:
//Open the file
std::ifstream file (path, std::ios::in | std::ios::binary | std::ios::ate);
//Move to the position to start reading
file.seekg(64);
//Read 128 bytes of the file
std::vector<unsigned char> mDataBuffer;
mDataBuffer.resize( 128 ) ;
file.read( (char*)( &mDataBuffer[0]), 128 ) ;
//Read as string
std::string s_data( mDataBuffer.begin(), mDataBuffer.end());
file.close()
This example is a slightly modified version of this one.
There are also many unzip packages available (e.g. zlib or minizip). Each covering functions to unzip a file. I could simply unzip my zipped file, save it on the disk and read it using the method above.
Unfortunately, I didn't find an example to read only a part of a binary zip file (if that is even possible), straight from the zipped file. Because my file is quite large, I don't want to unzip it completely onto the hard drive. Furthermore, the part that I want to read is quite small, so it would be a waste of cpu time to completely unzip the file. For the same reasons, I also don't want to decompress the complete file into my memory. I am looking for a genuine way to read only a part of a zipped file.
How could this be accomplished in c++?
Apparently there is no general way to seek in zip files. This was according to:
A comment of #πάντα ῥεῖ.
A general thread on searching in zipped files here.
A similar question here (although the question itself is about Python).

Setting std::ofstream to zip archived file

I'd like to write some files directly to zip archive file (rather that creating them first on some folder and copy them to the archive on the second stage).
Therefore, i'm wondering if there's an option to set the ofstream to point directly on the file inside the archive.
for example, say i have archive in /tmp called data.zip, and inside it there's a file data1.log
can i do something like :
std::ofstream ostr("/tmp/data.zip/data1.log", std::ios::binary);
and start pushing data using the '<<' operator ?
thanks,
can i do something like :
std::ofstream ostr("/tmp/data.zip/data1.log", std::ios::binary);
and start pushing data using the '<<' operator ?
No, that's not possible.
Also note the std::ostream& operator<<(std::ostream&,const T&) operator is explicitly reserved for text formatted output, not to write binary data.
To achieve such you would need a std::streambuf implementation, that wraps the incoming character data to a file that is (finally?) compressed and added to the archive.
The C++ standard library has no notion how to magically interact with binary .zip files.
Also what would you mean "start pushing data"? A .zip archive also contains information about particular compressed file names and their relative paths.
How would you interact with the std::ofstream interface to specify which file's data to add actually?
You should research on c++ wrappers for the LZMA/7zip library, that let you control adding files to archives.

How to open a gzip file using fopen (or a function with the same return value as fopen) in C++?

I currently have some code reading files which are not compressed, it uses the following approach to read a file in C++
FILE* id = fopen("myfile.dat", "r");
after obtaining id, different parts of the code access the file using fread, fseek, etc.
I would like to adapt my code so as to open a gzip version of the file, e.g. "myfile.dat.gz" without needing to change too much.
Ideally I would implement a wrapper to fopen, call it fopen2, which can read both myfile.dat and myfile.dat.gz, i.e. it should return a pointer to a FILE object, so that the remaining of the code does not need to be changed.
Any suggestions?
Thank you.
PS: it would be fine to decompress the whole file in memory, if this approach provides a solution
zlib provides analogs of fopen(), fread(), etc. called gzopen(), gzread(), etc. for reading and writing gzip files. If the file is not gzip-compressed, it will be read just as the f functions would. So you would only need to change the function names and link in zlib.

Loading files merged with executable

I'm trying to merge a file with my executable, and read the merged file. I merge them with the Windows command;
copy /b Game.exe+Image.jpg TheGame.exe
Here's what I've tried:
std::ifstream f("Image.jpg");
if (f.good()) {
std::cout << "Found Image.jpg" << std::endl;
}
Image.jpg is in the same directory as the resulting executable file, and it works. However when I use the command to merge them and then delete the Image.jpg file it is not found (although it is merged with the executable.)
Any suggestions?
ifstream only works with external files. You deleted the file it is trying to open, so of course it will not find the file. What you are attempting cannot (easily) be done using binary merges. If you want to store a file inside of an executable, the correct approach is to store it in a resource instead. Read the following chapter on MSDN for more information.
Introduction to Resources
In particular, the following example shows how go create a new resource in an .exe file and write data into it. The example copies a resource from another .exe file, but you can write whatever you want. In tbis case, replace RT_DIALOG with RT_RCDATA, and write your image data.
Using Resources