I have a task to create a c++ console application that can access various data (editing the comment, reading names of files in the zip archive, reading data in the header or central directory etc.) from a zip file. I am allowed to use only basic libraries
I did some googling and found the zip file structure, then i made a simple code to check if i can read any data with ifstream, it returned various characters (which after further googling seem to represent hexadecimal values in UTF-8 encoding). That's where my fairly limited knowledge and what i can find in google ends.
How do i properly read the various info included in the zip file then?
Related
Is there mature compression format that allows arbitrary file operations for items inside like Delete/Insert/Update but not requiring full archive recreation for this.
I'm aware of Sqlar based on Sqlite file format that naturally supports this since the mentioned operations is just deleting/inserting/updating records containing blobs. But it is more like experimental project created with other goals in mind and not widely adopted
UPDATE: to be more precise with what I have in mind, this is more like file system inside the archive when the files inserted might occupy a different "sectors" inside this container, depending on the scenario of previous delete and update operations. But the "chain" of the file is compressed while being added so occupies effectively less space than the original file.
The .zip format. You may need to copy the zip file contents to do a delete, but you don't need to recreate the archive.
Update:
The .zip format can, in principle, support the deletion and addition of entries without copying the entire zip file, as well as the re-use of the space from deleted entries. The central directory at the end can be updated and cheaply rewritten. I have heard of it being done. You would have to deal with fragmentation, as with any file system. I am not aware of an open-source library that supports using a zip file as a file system. The .zip format does not support breaking an entry into sectors that could be scattered across the zip file, as file systems do. A single entry has to be contiguous in a zip file.
I'm looking to replace the zip library that I am using in a small utility with something a bit better.
One of the deficiencies in the library I am currently using is that it doesn't appear to validate zip file very well - I can corrupt the file by changing random characters and the library doesn't notice.
I am looking for a C++ zip library that has a function to validate the zip file without extracting all the files in the library.
Someone recommended ziplib to me, but I don't see anything in there about checking the integrity of a zip library.
Does anyone know if ziplib has this capability? Or have a better recommendation?
Libraries like libzip and libarchive allow you to read archive entries a chunk at a time. You can simply read the entire archive to verify it, repeatedly overwriting the same buffer in memory with the decompressed data and thereby discarding it.
I've seen a lot of examples of i/o with text files I'm just wondering if you can do the same with other file types like mp3's, jpg's, zip files, etc..?
Will iostream and fstream work for all of these or do I need another library? Do I need a new sdk?
It's all binary data so I'd think it would be that simple. But I've been unpleasently surprised before.
Could I convert all files to text or binary?
It depend on what you mean by "work"
You can think of those files as a book written in Greek.
If you want to just mess with binary representation (display text in Greek on screen) then yes, you can do that.
If you want to actually extract some info: edit video stream, remove voice from audio (actually understand what is written), then you would need to either parse file format yourself (learn Greek) or use some specialized library (hire a translator).
Either way, filestreams are suited to actually access those files data (and many libraries do use them under the hood)
You can work on binary streams by opening them with openmode binary :
ifstream ifs("mydata.mp3", ios_base::binary);
Then you read and write any binary content. However, if you need to generate or modify such content, play a video or display a piture, the you you need to know the inner details of the format you are using. This can be exremely complex, so a library would be recomended. And even with a library, advanced programming skills are required.
Examples of open source libraries: ffmpeg for usual audio/video format, portaudio for audio, CImg for image processing (in C++), libpng for png graphic format, lipjpeg for jpeg. Note that most libraries offer a C api.
Some OS also supports some native file types (example, windows bitmaps).
You can open these files using fstream, but the important thing to note is you must be intricately aware of what is contained within the file in order to process it.
If you just want to open it and spit out junk, then you can definitely just start at the first line of the file and exhaustively push all data into your console.
If you know what the file looks like on the inside, then you can process it just as you would any other file.
There may be specific libraries for processing specific files, but the fstream library will allow you to access any file you'd like.
All files are just bytes. There's nothing stopping you from reading/writing those bytes however you see fit.
The trick is doing something useful with those bytes. You could read the bytes from a .jpg file, for example, but you have to know what those bytes mean, and that's complicated. Usually it's best to use libraries written by people who know about the format in question, and let them deal with that complexity.
I am using .txt files in my program for reading and writing records (records contains both text and numerals). Recently i came to know that .dat file also can be used like .txt for file operations. I would like to know the difference between the two and the advantages and disadvantages of one over another.
Text files or .txt files are a bit hard to parse in programs and easy to read. whereas .dat is usually used to store data that is not just plain text.
Generally .txt files contains letters, characters and symbols which is readable.
.dat is binary text file in which data is not always printable on screen.
The extension of a file is a helper so that the operating system (or user) can choose the appropriate program to open it. The actual file contents do not matter. There are some conventions what extensions to use but there is nothing from keeping you to use any arbitrary extension for your files. For instance you can rename a .jar file to .zip-file and be able to open the file with pkunzip.
So for C++ the extension does not matter, but for you as a programmer it may give a hint of the file contents i.e. open it in text or binary mode.
In most languages like C/C++ there is no difference what is your file type in file operations(Read, Write or Edit).
just if you want to work with binary files you should open them in binary format because if you reached \0 in text file it's file end. Dat files are binary too!
If you want to store and read some data, XML file and somtimes DAT files are better because of good libraries to read them. they don't need hard parsing of Text files
I'm working on application that must enrypt and zip files. So, I create some data in memory (text, binary or whatever), encrypt it and save to disk (file1 and file2). The I call e.g. "zip out.zip file1 file2 ".
I do not want to save this files to disk, but immediately create zip and pack these files from memory.
How should I do that?
Thanks a lot!
You could try to use the zlib library to be able to create zip files from memory buffers.
The boost:iostreams could also be a good solution.
For zlib there is an extension for zip called minizip in the contribs. For minizip you can find code to work with in-memory buffers on the authors page:
Justin Fletcher wrote a very simple implementation of a memory access method for the ioapi code (ioapi_mem_c.zip).
Note that you must compress first and then encrypt. Encrypted data can't be compressed anymore.
Interestingly enough, I wasn't able to find a library to create ZIP files from C. zlib only allows to (de-)compress individual entries in a ZIP archive.
It comes with contrib/minizip; maybe that can get you started.