C/C++: Easily unzip to memory

C/C++: Easily unzip to memory - c++

I need to find a library that allows me to easily get a directory listing of all the files inside a ZIP archive and allows me to extract any given file inside the archive to memory (a buffer). Preferably, it should be a high-level library since my requirements aren't very complex (what I mentioned above is pretty much all I need).
Previously I tried PhysFS which has the behavior I need (easily access files inside an archive), but it's unsuitable because of other reasons (there are many archives and PhysFS would require me to mount all of them individually, which is not an option). Another library that kinda has the functionality I need is Chilkat, but it's shareware so I can't use it either.
Any other suggestions?

While .zip uses zlib http://zlib.net compression, it alone is not sufficient to get a directory listing from a .zip file.
You also need code that can read the .zip dictionary format. Check out Minizip http://www.winimage.com/zLibDll/minizip.html. It provides a code and simple zip/unzip command line executables.
edit 2 The code is entirely C (so is Zlib) -- the page has links to two c++ wrapper libs that both seem to be dead links.

How about zlib? http://zlib.net/ "A Massively Spiffy Yet Delicately Unobtrusive Compression Library (Also Free, Not to Mention Unencumbered by Patents)"

Related

Match the text (.txt) files to images (.jpg) files

I have two folders in my system, one is "Image folder (contains images)" and 2nd is "Text folder (contains text files)". In these folder, few images and text files have the same names like: abc.jpg and abc.txt.
Actually, I want to find out the text file from the 2nd folder using input image name (or I want to match the text files name with the image names).
Thereafter, I wants to copy the matched text file into the "Image folder".
I am working on WINDOWS operating system.

If your issue is to find two different file names (but with similar basenames), notice that:
directories and folders are unknown to the C++11 or C++14 standard. Future C++17 standard might provide a filesystem library (but you won't find a mature implementation easily today)
POSIX and Windows has directories (not folders). You could use (notably on Linux or MacOSX) POSIX functions like opendir(3), readdir(3), closedir(3) combined with stat(2) to explore them, or use some higher level library functions like nftw(3).
basename(3) could be useful, but you can use string functions once you know that / is used as a directory separator.
some framework libraries, notably Qt, POCO, Boost, .... provide useful functions on directories and may give a common abstraction of them usable on several operating systems. Actually I recommend using a framework library, because it is easier and more portable.
The notion of file, of file systems, and of directory is very operating system specific (and some academic OSes don't have them and provide a different notion of persistence). Read Operating Systems: Three Easy Pieces (freely downloadable) for an overview. On Linux and POSIX systems, a file is really some i-node, a directory is a kind of file having entries mapping names to i-nodes, and a file could have several names in various directories (e.g. using link(2)). The C++ standard knows about standard streams, e.g. thru its input/output library.
Copying a file generally means to copy its content (byte by byte) so is not an elementary operation. In practice better copy large blocks of at least 16 kilobytes. Some libraries provide functions to copy files.
On Windows (which I don't know) the notion of file and "folder" is different, and the directory separator is \. You need to dive into Microsoft documentation. Even Microsoft documentation speaks of directories. But using a framework library would be simpler (and more portable).
BTW, the terminology of folder is generally wrong. You see some folders (not all of them) on your GUI or desktop environment, but the OS (and your program) knows about directories and files.
Sometimes, using some higher-level abstraction than files is useful. For example the SQLite library provides you with some database abstraction, GDBM gives indexed files, and you might consider using some database system like PostGreSQL or MongoDB, etc etc.... YMMV.

How can I get a .pdf's article abstract into my C++ program

I was searching for some ways to read .pdf files and I wasn't able to get anything from it, I would probably need a library but all the options I found is very confusing and hard to deal with.
I was wondering which way would be the best way for me to do this task, which is to search through the .pdf and get the content in the Abstract section of it. (which is text)

The easiest and cheapest is using an open source library which is popular and known to other programmers.
Before trying to write your own PDF reader from scratch, take look at these:
Parsing:
PoDoFo
The PoDoFo library is a free, portable C++ library which includes
classes to parse PDF files and modify their contents into memory. The
changes can be written back to disk easily. The parser can also be
used to extract information from a PDF file (for example the parser
could be used in a PDF viewer). Besides parsing PoDoFo includes also
very simple classes to create your own PDF files. All classes are
documented so it is easy to start writing your own application using
PoDoFo.
Generating:
LibHaru
Haru is a free, cross platform, open-sourced software library for
generating PDF written in ANSI-C. It can work as both a static-library
(.a, .lib) and a shared-library (.so, .dll).
panda
A PDF generation API written in C

g++: Use ZIP files as input

We have the Boost library in our side. It consists of a huge number of files which never change and only a tiny portion of it is used. We swap the whole boost directory if we are changing versions. Currently we have the Boost sources in our SVN, file by file which makes the checkout operations very slow, especially on Windows.
It would be nice if there were a notation / plugin to address C++ files inside ZIP files, something like:
// #ZIPFS ASSIGN 'boost' 'boost.zip/boost'
#include <boost/smart_ptr/shared_ptr.hpp>
Are there any support for compiler hooks in g++? Are there any effort regarding ZIP support? Other ideas?

I assume that make or a similar buildsystem is involved in the process of building your software. I'd put the zip file in the repository, and add a rule to the Makefile to extract it before the actual build starts.
For example, suppose your zip file is in the source tree at "external/boost.zip", and it shall be extracted to "external/boost", and it contains at its toplevel a file "boost_version.h".
# external/Makefile
unpack_boost: boost/boost_version.h
boost/boost_version.h: boost.zip
unzip $<
I don't know the exact syntax of the unzip call, ask your manpage about this.
Then in other Makefiles, you can let your source files depend on the unpack_boost target in order to have make unpack Boost before a source file is compiled.
# src/Makefile (excerpt)
unpack_boost:
make -C ../external unpack_boost
source_file.cpp: unpack_boost
If you're using a Makefile generator (or an entirely different buildsystem), please check the documentation for these programs for how to create something like the custom target unpack_boost. For example, in CMake, you can use the add_custom_command directive.
The fine print: The boost/boost_version.h file is not strictly necessary for the Makefile to work. You could just put the unzip command into the unpack_boost target, but then the target would effectively be phony, that is: it would be executed during each build. The file inbetween (which of course you need to replace by a file which is actually present in the zip archive) ensures that unzip only runs if necessary.

A year ago I was in the same position as you. We kept our source in SVN and, even worse, included boost in the same repository (same branch) as our own code. Trying to work on multiple branches was impossible, as it would take most of a day to check-out a fresh working copy. Moving boost into a separate vendor repository helped, but it would still take hours to check-out.
I switched the team over to git. To give you an idea of how much better it is than SVN, I have just created a repository containing the boost 1.45.0 release, then cloned it over the network. (Cloning copies all of the repository history, which in this case is a single commit, and creates a working copy.)
That clone took six minutes.
In the first six seconds a compressed copy of the repository was copied to my machine. The rest of the time was spent writing all of those tiny files.
I heartily recommend that you try git. The learning curve is steep, but I doubt you'll get much pre-compiler hacking done in the time it would take to clone a copy of boost.

We've been facing similar issues in our company. Managing boost versions in build environments is never going to be easy. With 10+ developers, all coding on their own system(s), you will need some kind of automation.
First, I don't think it's good idea to store copies of big libraries like boost in SVN or any SCM system for that matter, that's not what those systems are designed for, except if you plan to modify code in boost yourself. But let's assume you're not doing that.
Here's how we manage it now, after trying lots of different methods, this works best for us.
For every version of boost that we use, we put the whole tree (unzipped) on a file server and we add extra subdirectories, one for each architecture/compiler-combination, where we put the compiled libraries.
We keep copies of these trees on every build system and in the global system environment we add variables like:
BOOST_1_48=C:\boost\1.48 # Windows environment var
or
BOOST_1_48=/usr/local/boost/1.48 # Linux environment var, e.g. in /etc/profile.d/boost.sh
This directory contains the boost tree (boost/*.hpp) and the added precompiled libs (e.g. lib/win/x64/msvc2010/libboost_system*.lib, ...)
All build configurations (vs solutions, vs property files, gnu makefiles, ...) define an internal variable, importing the environment vars, like:
BOOSTROOT=$(BOOST_1_48) # e.g. in a Makefile, or an included Makefile
and further build rules all use the BOOSTROOT setting for defining include paths and library search paths, e.g.
CXXFLAGS += -I$(BOOSTROOT)
LFLAGS += -L$(BOOSTROOT)/lib/linux/x64/ubuntu/precise
LFLAGS += -lboost_date_time
The reason for keeping local copies of boost is compilation speed. It takes up quite a bit of disk space, especially the compiled libs, but storage is cheap and a developer losing lots of time compiling code is not. Plus, this only needs to be copied once.
The reason for using global environment vars is that build configurations are transferrable from one system to another, and can thus be safely checked in to your SCM system.
To smoothen things a bit, we've developed a little tool that takes care of the copying and setting the global environment. With a CLI, this can even be included in the build process.
Different working environments mean different rules and cultures, but believe me, we've tried lots of things and finally, we decided to define some kind of convention. Maybe ours can inspire you...

This is something you would not do in g++, because any other application that wants to do it would also have to be modified.
Store the files on a compressed filesystem. Then every application gets the benefit automatically.

It should be possible in an OS to allow transparent access to files inside a ZIP file. I know that I put it in the design of my own OS a long time ago (2004 or so) but never got it to a point where it was usable. The downside is that seeking backwards in a file inside a ZIP is slower as it's compressed (and you can't rewind the compressor state, so you have to seek from the start instead). This also makes using a zip-inside-a-zip slow for rewinding and reading. Fortunately, most cases just read a file sequentially.
It should also be retrofittable to current OSes, at least in client space. You can hook the filesystem access functions used (fopen, open, ...) and add a set of virtual file descriptors that your own software would return for a given filename. If it's a real file just pass it on, if it's not open the underlying file (possibly again via this very function) and pass a virtual handle. When accessing the file contents, read directly from the zip file without caching.
On Linux you would use an LD_PRELOAD to inject it into existing software (at usage time), on Windows you can hook the system calls or inject a DLL into the space of software to hook the same functions.
Does anybody know if this already exists? I can't see any clear reason it wouldn't...

C/C++ Windows+Linux ZIP Library for only unpacking?

Continuation of:
Standalone Cross Platform (Windows/Linux)) File Compression for C/C++?
After many attempts on ZLIB ZZLIB LIBZIP MINIZIP I always get many problems at the compilation stage. Many google searches turned out OS-specific libraries and I can't really find anything that fit my 'simple' needs.
I reduced my needs for the library (Or wrapper?) to this:
Works on both Windows and Linux OR 2 separate libraries; one which works on Windows and the other one on Linux, I can make 2 separate projects for Windows and Linux if it is really neccesary
Unpack file from zip to specified directory
Check if file exists in zip file
C OR C++ OR Mixed (yeah, that doesn't matter)
Preferably Very Simple to include into any project
(eg 5 c/cpp files and 1-3 header files? anyway not tons files, when I open all the libzip and zlib archives I have something like: "O my ..")
I've checked many stackoveflow threads too with the words "Windows Linux ZIP C C++" but all the results seem so have libraries which I OR don't know how to compile OR is too difficult to use OR it has too many 'needed stuff' for just simple zip extract and check if file exists.
I had put that project away for a later date and begun it now, and all those compilation errors came up (especially that VC++2010 doesn't have the C-99 inttypes.h)

I have had very good experience with Zipstream C++ library which gives you a nice OOP way of handling zip files.
If your project already uses some of the bigger libs like Boost , then you could try to use the boost::iostreams with the gzip filter, however the functionality is somehow limited.
Or if you happen to use Poco take a look at they're implementation Poco::Zip

C++: how to build my own utility library?

I am starting to be proficient enough with C++ so that I can write my own C++ based scripts (to replace bash and PHP scripts I used to write before).
I find that I am starting to have a very small collection of utility functions and sub-routines that I'd like to use in several, otherwise unrelated C++ scripts.
I know I am not supposed to reinvent the wheel and that I could use external libraries for some of the utilities I'm creating for myself. However, it's fun to create my own utility functions, they are perfectly tailored to the job I have in mind, and it's for me a large part of the learning process. I'll see about using more polished external libraries when I am proficient enough to work on more serious, long term projects.
So, the question is: how do I manage my personal utility library in a way that the functions can be easily included in my various scripts?
I am using linux/Kubuntu, vim, g++, etc. and mostly coding CLI scripts.
Don't assume too much in terms of experience! ;) Links to tutorials or places where relevant topics are properly documented are welcome.

"Shared objects for the object disoriented!"
"Dissecting shared libraries"

Just stick your hpp and cpp files in seperate directories somewhere. That way, it's easy to add the directory containing the C++ files to any new project, and easy to add the headers to the include path.
If you find compile time starts to suffer, then you might want to consider putting these files in a static library.

If you are compiling by hand you will want to create a makefile to remove the tedium of compiling your libraries. This tutorial helped me when I was learning to do what you are doing, and it has additional links on the site for more detailed tutorials on the makefile.

Unless it's very large, you should probably just keep your utility library in a .h file (for the declarations) and a .cpp file (for the implementation).
Just copy both files into your project folders and use #include "MyLibrary.h", or set the appropriate directory settings so you can use #include <MyLibrary.h> without copying the files each time you want to use them.
If the library gains substantial size, you might consider looking into static libraries.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js