iostream to zlib and files with C++? - c++

.NET has spoiled me and made me realize how simple certain things can be :(
With C++ i'd like to use either fopen or ostream/istream to push data to either zlib directly or to some kind of memory buffer (then zlib) then proceed to dump it to a file. I'd like something similar to load it back in.
I looked at zlibs example and while it looks simple it isnt an iostream or file and i need to use buffers. Does anyone know of a existing solution?

You need to use boost IOstream API to read the zipped files and create an iostream. I wrote code for this based on Boost documentation. The code and a related question on how to read the zipped file line -by-line is available here:
How can I read line-by-line using Boost IOStreams' interface for Gzip files?

Related

How do I input and output various file types in c++

I've seen a lot of examples of i/o with text files I'm just wondering if you can do the same with other file types like mp3's, jpg's, zip files, etc..?
Will iostream and fstream work for all of these or do I need another library? Do I need a new sdk?
It's all binary data so I'd think it would be that simple. But I've been unpleasently surprised before.
Could I convert all files to text or binary?
It depend on what you mean by "work"
You can think of those files as a book written in Greek.
If you want to just mess with binary representation (display text in Greek on screen) then yes, you can do that.
If you want to actually extract some info: edit video stream, remove voice from audio (actually understand what is written), then you would need to either parse file format yourself (learn Greek) or use some specialized library (hire a translator).
Either way, filestreams are suited to actually access those files data (and many libraries do use them under the hood)
You can work on binary streams by opening them with openmode binary :
ifstream ifs("mydata.mp3", ios_base::binary);
Then you read and write any binary content. However, if you need to generate or modify such content, play a video or display a piture, the you you need to know the inner details of the format you are using. This can be exremely complex, so a library would be recomended. And even with a library, advanced programming skills are required.
Examples of open source libraries: ffmpeg for usual audio/video format, portaudio for audio, CImg for image processing (in C++), libpng for png graphic format, lipjpeg for jpeg. Note that most libraries offer a C api.
Some OS also supports some native file types (example, windows bitmaps).
You can open these files using fstream, but the important thing to note is you must be intricately aware of what is contained within the file in order to process it.
If you just want to open it and spit out junk, then you can definitely just start at the first line of the file and exhaustively push all data into your console.
If you know what the file looks like on the inside, then you can process it just as you would any other file.
There may be specific libraries for processing specific files, but the fstream library will allow you to access any file you'd like.
All files are just bytes. There's nothing stopping you from reading/writing those bytes however you see fit.
The trick is doing something useful with those bytes. You could read the bytes from a .jpg file, for example, but you have to know what those bytes mean, and that's complicated. Usually it's best to use libraries written by people who know about the format in question, and let them deal with that complexity.

Opening an existing .doc file using ofstream in C++

Assuming I have a file with .doc extension in Windows platform, how can I open the the file for outputting its contents on the screen using the ofstream object in C++? I am aware that the object can be used to open files in text and binary modes. But I would like to know if a .doc (or even .pdf) file can be opened and its contents read.
I've never actually done this before, but after reading up on it, I think I might have a suggestion. The .docx format is actually just XML that is zipped up. After unzipping, the file is located at word/document.xml. Doing this in a program is where it gets fun.
Two options: If you're using C++ CLR (.NET) then Microsoft has an SDK for you. It should make it pretty easy to open Office documents.
Otherwise if you're just using regular C++, you might have to do some extra work.
Open the file and unzip it using a library like zlib
Find the document.xml file inside
Parse the XML document. You'll probably want to use some kind of XML parsing library for this. You'll have to look up the specs for the XML to figure out how to get the text you want.
C++ std library has ifstream class that can be used to read simple text files, and for read binary files too.
It is up to you to interpret these bytes in the file. To proper interpret the binary file you need to know the format of the file.
If you think of MS Word files then I would start from here: http://en.wikipedia.org/wiki/Office_Open_XML to understand MS Word 2007 format.
You might find the Boost Iostreams library ( http://www.boost.org/doc/libs/1_52_0/libs/iostreams/doc/home.html ) somehow useful if you want to make some filter by yourself.

Saving a webpage to disk using C++

I've managed to download a "file" from the internet with the help of wininet library, but I can't seem to save a "webpage" i.e. something I can edit later on with a text editor or with ifstream.
In this case, what are the tools I should resort to? Can wininet save a webpage to disk? Should I consider cURL (though I haven't managed to download regular files due to lack of documentation of cURL)? Do I need to learn what's called socket programming?
NB: I'm on Windows, using MinGW but can switch to MSVC if necessary, I'm looking for source code in the webpage, eventually I'm after the text in a webpage.
Also, I am not familiar with any of the functions in wininet, curl, or sockets. What do I need to learn of these?
Any help is greatly appreciated!
If your program is going to run both on windows and unix, then use cURL. Otherwise, stick with MSVC and WinINet functions http://msdn.microsoft.com/en-us/library/windows/desktop/aa385473(v=vs.85).aspx It's much easier to use in terms of the efforts required to get your program running and distributed (esp. if you're not linking your program against cUrl statically. Otherwise, you'll need to take libcurl.dll everywhere your program runs on Windows). With WinINet, you simply need to include a header and a library to use the functions.
If you're going to use WinINet, refer to this code snippet: http://www.programmershelp.co.uk/showcode.php?e=57
Use the same code except for the while loop. Instead of reading one byte at a time, read them by chunks and write them to the output file handle.
If you're going to use cURL, refer to this post: Download file using libcurl in C/C++

C++ file container (e.g. zip) for easy access

I have a lot of small files I need to ship with an application I build and I want to put this files into an archive to make copying and redistributing more easy.
I also really like the idea of having them all in one place so I need to compare the md5 of one file only in case something goes wrong.
I'm thinking about a class which can load the archive and return a list of files within the archive and load a file into memory if I need to access it.
I already searched the Internet for different methods of achieving what I want and found out about zlib and the lzma sdk.
Both didn't really appeal to me because I don't really found out how portable zlib is and I didn't like the lzma sdk as it is just to much and I don't want to blow up the application because of this problem. Another downside with zlib is that I don't have the C/C++ experience (I'm really new to C++) to get everything explained in the manual.
I also have to add that this is a time critical problem. I though some time about implementing a simple format like tar in a way I can easy access the files within my application but I just didn't find the time to do that yet.
So what I'm searching for is a library that allows me to access the files within an archive. I'd be glad if anybody could point me in the right direction here.
Thanks in advance,
Robin.
Edit: I need the archive to be accessed under linux and windows. Sorry I didn't mention that in the beginning.
For zipping, I've always been partial to ZipUtils, which makes the process easy and is built on top of the zlib and info-zip libraries.
The answer depends on whether you plan to modify the archive via code after the archive is initially built.
If you don't need to modify it, you can use TAR - it's a handy and simple format. If you want compression, you can implement tar.gz reader or find some library that does this (I believe there are some available, including open-source ones).
If your application needs random access to the data or it needs to modify the archive, then regular TAR or ZIP archives are not good. Virtual file system such as our SolFS or CodeBase file system will fit much better: virtual file systems are suited for frequent modifications of the storage, while archives target mainly write-once-read-many usage scenarios.
zlib is highly portable and very widely used. if you can't make sense of the C++ interface, there are alternatives for many other languages - see 'Related External Links' here.
Take another look before you search for something different.
If you're using Qt or Windows you can also pack data into the executable's resource area. You would only have to distribute the executable file using this technique. There's a well defined API already written and tested to access that data.
The zlib API is the way to go. Simple and portable. Lookat unzip.h header for APIs that access archive files. It is in C and very easy.
If the files are small, you can dump them into string literals (search for bin2h utility) and include in your project. Then change the code that read the files. If all files are currently read using ifstream class, simply changing it to istringstream class and recompile the code.
Try using Quazip - it's quite simple to use. You can use it as a stream from which you read the compressed file on the fly.

extracting compressed file with boost::iostreams

I'm searching for a way to extract a file in c++ by using the boost::iostreams classes.
There is an example in the boost documentation. But it outputs the content of the compressed file to std::cout.
I'm looking for a way to extract it to a file structure.
Does anybody know how to do that?
Thanks!
Boost.IOStreams does not support compressed archives, just single compressed files. If you want to extract a .zip or .tar file to a directory tree, you'll need to use a different library.
The example in the documentation shows how to decompress the file and push the result to another stream.
If you want the output to be directed to an in-memory array instead, you can use a stream of type boost::iostreams::stream<boost::iostreams::array_source>instead.
That is basically a stream-wrapper around an array.
I'm not sure what you mean when you say you want the output in "a file structure" though.
Looks to me like the call to boost::iostreams::copy takes an ostream as the second parameter. Have you tried creating an ofstream with your output file name and using that?
You probably don't want that library. You might want to look around for some others.
E.g. zziplib