What is the fastest way to read individual files (in a random fashion) from a zip file?
As I understand it, zip files have a directory that stores the individual file entries, and I could scan this directory to build an external index. Are there any standardized ways (i.e. existing libraries) that already do that? Or could I use a specialized type of zip file?
Scanning the directory and building the index is the fastest and best way to provide random access to the compressed entries archived in a zip file. The directory is usually small and lies at the end of the archive. If you have seekable media, then this is what you want.
The zip format is documented pretty well; it's not too hard to do. The devil is in the details, though. If your zip files use ZIP64 extensions, encryption, split archives.. that's when it gets tricky. For simple zip files, doing what you imagine is not so difficult.
Still it would be easier to use an external library.
Minizip seems to be a good library for reading or writing zip files. It uses the zlib library.
http://www.winimage.com/zLibDll/minizip.html
Related
The reason why I need this is because for example: There are a lots of files and folders inside a "some_important_folder" folder. User can usually browse to "some_important_folder" folder and go deeper into it to see its' subfolders and files like in any normal file explorer can do. But since in my use case, the user doesn't need to interact with the files and folders in "some_important_folder" folder at all. Therefore, I was wondering if there is any way to hide the complexity of the folders in "some_important_folder" folder and show to user as a single file only. But my programs (written in C++) can still somehow access the files and folders in it like normal such as: "C:\Users\user\Documents\some_important_folder\someFolder\someFileThatUserDoesntNeedToKnow.exe"
Something like .rar or .zip file but since the "some_important_folder" folder might be very big in size (more than TB), I don't think it would be good to convert the whole folder to a .zip file as it would take lots of redundant space from the hard disc and the process would be very slow
Have you considered encrypting your folders? That way if you wanted to only access the folder using your C++ app, you could pass down the password/decrypted for it, making your app the only access point you'd have to that folder.
Yes, both windows and linux have similar technology.
On windows, you can use "Compound File Binary Format". It is a general-purpose file format that provides a file-system-like structure within a file for the storage of arbitrary, application-specific streams of data. In fact, ealier office doc file format is based on this technology. The following is the doc link from microsoft and wiki. And I believe you can google some sample code.
https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-cfb/53989ce4-7b05-4f8d-829b-d08d6148375b
https://en.wikipedia.org/wiki/Compound_File_Binary_Format
On linux, you can loop mount a file as file system as #stark mentioned. You can google "linux loop mount file", the following is the first article I found:
https://www.jamescoyle.net/how-to/2096-use-a-file-as-a-linux-block-device
I have many text files that are located in different directories -
dir1/.../textfiles/<various .txt files>
dir2/.../textfiles/<various .txt files>
and so on...
I need a c++ solution to compress and archive all these files present in different directories. I also need a way to search, decompress and open only a particular file in this archive.
One solution I can think of is to use system calls to create a tar archive.
I actually want a purely C++ based solution to this problem that is simple and fast and gives the desired result.
I searched a lot about this on the internet and found a few solutions like using Chilkat or libtar libraries but I do not intend to use them.
Another one that I found out is this.
Is there any simple C++ solution to this problem?
After having false starts with poco's zip and minizip (both have issues, minizip can't decompress files larger than 2gb and poco zip corrupts any zip file larger than 2 gigs it compresses) I was wondering if there was anything else left?
So any suggestions for a C++ archive library that can handle zip AND zip64?
7-zip handles both, as far as I could tell from a quick glance at their source code. It's also LGPL, which should allow its use in a closed source app.
Well there is the all-around very proven ZLIB : http://zlib.net/
I need to find a library that allows me to easily get a directory listing of all the files inside a ZIP archive and allows me to extract any given file inside the archive to memory (a buffer). Preferably, it should be a high-level library since my requirements aren't very complex (what I mentioned above is pretty much all I need).
Previously I tried PhysFS which has the behavior I need (easily access files inside an archive), but it's unsuitable because of other reasons (there are many archives and PhysFS would require me to mount all of them individually, which is not an option). Another library that kinda has the functionality I need is Chilkat, but it's shareware so I can't use it either.
Any other suggestions?
While .zip uses zlib http://zlib.net compression, it alone is not sufficient to get a directory listing from a .zip file.
You also need code that can read the .zip dictionary format. Check out Minizip http://www.winimage.com/zLibDll/minizip.html. It provides a code and simple zip/unzip command line executables.
edit 2 The code is entirely C (so is Zlib) -- the page has links to two c++ wrapper libs that both seem to be dead links.
How about zlib? http://zlib.net/ "A Massively Spiffy Yet Delicately Unobtrusive Compression Library (Also Free, Not to Mention Unencumbered by Patents)"
I like the idea of using compressed folders as containers for file formats. They are used for LibreOffice or Dia. So if I want to define a special purpose file format, I can define a folder and file structure and just zip the root folder and have a single file with all the data in a single file. Imported files just live as originals inside the compressed file. Defining a binary file format from zero with this features would be a lot of work.
Now to my question: Are there applications which are using compressed folders as file formats and do versioning inside the folder? The benefits would be great. You could just commit a state in your project into your file and the versioning is just decorated with functions from your own application. Also diffs could be presented your own way.
Libraries for working with compressed files and for versioning are available. The used versioning system should be a distributed system, where the repository lives inside your working folder and not seperate as for example subversion with its client-server model.
What do you think? I'm sure there are applications out there using this approach, but I couldn't find one. Or is there a major drawback in this approach?
Sounds like an interesting idea.
I know many applications claim they have "unlimited" undo and redo,
but that's only back to the most recent time I opened this file.
With your system, your application could "undo" to previous versions of the file,
even before the version I saw the most recent time I opened this file -- that might be a nifty feature.
Have you looked at TortoiseHg?
TortoiseHg uses Mercurial, which is
"a distributed system, where the repository lives inside your working folder".
Rather than defining a new compressed versioned file format and all the software to work with it from scratch,
perhaps you could use the Mercurial file format and borrow the TortoiseHg and Mercurial source code to work with it.
What happens if I'm working on a project using 2 different applications,
and each application wants to store the entire project in its own slightly different compressed versioned file format?
What I found now is that OpenOffice aka LibreOffice has kind of versioning inside. LibreOffice file is a zip file with a structured content (XMLs, direcories, ...) inside. You are able to mark the current content as a version. This results in creating a VersionList.xml which contains information about all the versions. A Versions directory is added and this contains files like Version1, Version2 and so on. These files are the actual documents at that state.