File formats with included versioning - compression

I like the idea of using compressed folders as containers for file formats. They are used for LibreOffice or Dia. So if I want to define a special purpose file format, I can define a folder and file structure and just zip the root folder and have a single file with all the data in a single file. Imported files just live as originals inside the compressed file. Defining a binary file format from zero with this features would be a lot of work.
Now to my question: Are there applications which are using compressed folders as file formats and do versioning inside the folder? The benefits would be great. You could just commit a state in your project into your file and the versioning is just decorated with functions from your own application. Also diffs could be presented your own way.
Libraries for working with compressed files and for versioning are available. The used versioning system should be a distributed system, where the repository lives inside your working folder and not seperate as for example subversion with its client-server model.
What do you think? I'm sure there are applications out there using this approach, but I couldn't find one. Or is there a major drawback in this approach?

Sounds like an interesting idea.
I know many applications claim they have "unlimited" undo and redo,
but that's only back to the most recent time I opened this file.
With your system, your application could "undo" to previous versions of the file,
even before the version I saw the most recent time I opened this file -- that might be a nifty feature.
Have you looked at TortoiseHg?
TortoiseHg uses Mercurial, which is
"a distributed system, where the repository lives inside your working folder".
Rather than defining a new compressed versioned file format and all the software to work with it from scratch,
perhaps you could use the Mercurial file format and borrow the TortoiseHg and Mercurial source code to work with it.
What happens if I'm working on a project using 2 different applications,
and each application wants to store the entire project in its own slightly different compressed versioned file format?

What I found now is that OpenOffice aka LibreOffice has kind of versioning inside. LibreOffice file is a zip file with a structured content (XMLs, direcories, ...) inside. You are able to mark the current content as a version. This results in creating a VersionList.xml which contains information about all the versions. A Versions directory is added and this contains files like Version1, Version2 and so on. These files are the actual documents at that state.

Related

Is there a way to create a folder that is interpreted by the OS (Windows, OS, Linux) as a single file?

The reason why I need this is because for example: There are a lots of files and folders inside a "some_important_folder" folder. User can usually browse to "some_important_folder" folder and go deeper into it to see its' subfolders and files like in any normal file explorer can do. But since in my use case, the user doesn't need to interact with the files and folders in "some_important_folder" folder at all. Therefore, I was wondering if there is any way to hide the complexity of the folders in "some_important_folder" folder and show to user as a single file only. But my programs (written in C++) can still somehow access the files and folders in it like normal such as: "C:\Users\user\Documents\some_important_folder\someFolder\someFileThatUserDoesntNeedToKnow.exe"
Something like .rar or .zip file but since the "some_important_folder" folder might be very big in size (more than TB), I don't think it would be good to convert the whole folder to a .zip file as it would take lots of redundant space from the hard disc and the process would be very slow
Have you considered encrypting your folders? That way if you wanted to only access the folder using your C++ app, you could pass down the password/decrypted for it, making your app the only access point you'd have to that folder.
Yes, both windows and linux have similar technology.
On windows, you can use "Compound File Binary Format". It is a general-purpose file format that provides a file-system-like structure within a file for the storage of arbitrary, application-specific streams of data. In fact, ealier office doc file format is based on this technology. The following is the doc link from microsoft and wiki. And I believe you can google some sample code.
https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-cfb/53989ce4-7b05-4f8d-829b-d08d6148375b
https://en.wikipedia.org/wiki/Compound_File_Binary_Format
On linux, you can loop mount a file as file system as #stark mentioned. You can google "linux loop mount file", the following is the first article I found:
https://www.jamescoyle.net/how-to/2096-use-a-file-as-a-linux-block-device

vtk: Modelling examples not working (Delaunay3D, finance etc)

I'm new to vtk, and I've succesfully built vtk 8.1.1 from source, using Cmake and Visual Studio 2017, with the default options and examples.
I've already solved an issue with the Infovis folder examples.
Now, I'm trying to run the examples from the Modelling folder:
The problem is that when I try to run these examples, it opens a window that closes so fast I can't even see what it says, so I have no clue about the error.
The Delaunay3D.cxx file begins with these comments:
`// Delaunay3D
// Usage: Delaunay3D InputFile(.vtp) OutputFile(.vtu)
// where
// InputFile is an XML PolyData file with extension .vtp
// OutputFile is an XML Unstructured Grid file with extension .vtu
`
So it looks like I need external data files, and the same is true for the other examples. But, where do I get these files, and where do I place them?
Some of the examples in the source files are not complete i.e. as you found out, some of them require external input files which may be missing or mistakes in CMakeLists.txt etc. In the parent folder of the folder that you have attached screenshot of (i.e. the Modelling directory) there is also a folder for Python examples. In that folder, there is a Delaunay3D.py file which creates random points as input instead of reading them from file. So you can do the same. The names and signatures of functions in Python and C++ are the same by modifying the Delaunay3D.cxx code or adding some code in the TestDelaunay3D.cxx. But there is no such file for the finance example, unfortunately.
I find it useful to use VTK code along with Paraview. Paraview is built on top of VTK. It has most of the VTK filters available through the GUI. In Paraview you can also create some data and save it to file using File->Save Data. You can then use that as input for the examples. Once you become familiar with VTK file types and VTK sources, generating data does not require a lot of code. So you can do it yourself by modifying any of the example code (like it is done in the Delaunay3D.py).
About where to place the input files, in this particular case you can place them anywhere but when you run the executable that was built, you must enter the path of the input file correctly on the command line.
Updates based on comments:
The Python wrappers provide almost complete features available with the C++ version. The exceptions are noted here. If you decide to use VTK Python then a good resource to read is the VTK Numpy interface.
Paraview implements a majority of VTK filters and sources. So it can do a lot of creation and modification of geometries. In addition, you can use programmable filters and sources for doing things which are not available through Gui. In the programmable filters you can write any Python script which can import vtk and use all its functionality.
But if for your use case you only need a subset of the functionality Paraview provides then you may want to write your own GUI.

How does Pismo File Mount mount ZIP files onto Windows Explorer?

I have been using Pismo File Mount for many years, and I have always wondered how it actually works.
Let's say, I am currently working on an application that creates a package format similar to the ZIP format. For ease of access, I want to create a shell extension that works similar to how Pismo File Mount works. For those who have not used Pismo File Mount before, this is how it works:
The user right-clicks a ZIP file in Windows Explorer.
The user then clicks "Mount" to mount the ZIP file.
The user can now access his/her files immediately.
The user does not have to extract the ZIP file to view its contents.
There's a catch. I do not want to use the Pismo File Mount API, perhaps for various reasons like commercial or legal ones.
The question is, how does Pismo File Mount integrate itself into Windows Explorer programmatically, in terms of the Windows API and C++?
I wrote Pismo File Mount, and the ZIP reader included in the PFM Audit Package.
There is no consise or realistically postable answer to the question. To do what PFM does, in C/C++, to Windows API's (kernel and user), it would take 10's of thousands of lines of difficult code and a large time investment.
PFM is built as a file system driver (kernel module), with user mode support DLL's and executables. The driver uses a protocol to talk with user mode formatting code that (for example) decodes the ZIP file format and serves the contents through the kernel mode driver to client applications.
There exist two ways:
Shell namespace extension. The folder created by the shell namespace extension is not an actual filesystem folder and accessibility of the files in such folder is usually limited to Explorer itself and applications aware of shell extensions and the ways to work with them.
Filesystem filter driver which creates a virtual directory on the existing disk. Such directory is seen by all applications as a real directory, where those applications can read and write files and subdirectories. All filesystem operations go through such driver.
Pismo File Mount works via the filter driver, AFAIK.
Our CallbackFilter product provides a way to create virtual directories and files. It includes a driver and calls your user-mode code for actual operations. But filter approach is a bit complicated -- a virtual disk created with a filesystem driver (eg. with our Callback File System product) is easier to implement and manage due to differences in architectures of the filter driver stack and filesystem drivers.
Sounds like a fairly ordinary shell extension. Explorer has a powerful extension mechanism which allows it to list non-file objects such as Printers and the contents of a zip file. The particular details (columns and rows) are provided by a DLL.
You can observe this by zipping up a set of images; the ordinary thumbnail view probably won't work as that part of Explorer is usually not copied.

Is there a build system does not use timestamp to check file change

Build systems like make use timestamp check if a dependency is changed during two build. Here are a few common issue I run into with timestamp
Open a file, make some change, but later, I decided it is not good. I revert the change, for example, git checkout -- file if I am using git for the project.
Open a file, I just accidentally hit keyboard shortcut for save of the editor
Either way, the file's timestamp is changed. If now I want to build the project, everything depending on that file need to be rebuilt. This often means the whole project.
Is there anyway around these issues? For example, a build system using a version control for checking file change, preferably git. Or any other solutions to the above issues are welcome.
Many thanks in advance.
SCons uses checksums, not timestamps by default. However, checksumming requires reading all the contents of all the source files on disk, and that is much slower than simply reading directory entries, which is why most build systems use timestamps.
Software Build Systems gives a good overview of these issues.

g++: Use ZIP files as input

We have the Boost library in our side. It consists of a huge number of files which never change and only a tiny portion of it is used. We swap the whole boost directory if we are changing versions. Currently we have the Boost sources in our SVN, file by file which makes the checkout operations very slow, especially on Windows.
It would be nice if there were a notation / plugin to address C++ files inside ZIP files, something like:
// #ZIPFS ASSIGN 'boost' 'boost.zip/boost'
#include <boost/smart_ptr/shared_ptr.hpp>
Are there any support for compiler hooks in g++? Are there any effort regarding ZIP support? Other ideas?
I assume that make or a similar buildsystem is involved in the process of building your software. I'd put the zip file in the repository, and add a rule to the Makefile to extract it before the actual build starts.
For example, suppose your zip file is in the source tree at "external/boost.zip", and it shall be extracted to "external/boost", and it contains at its toplevel a file "boost_version.h".
# external/Makefile
unpack_boost: boost/boost_version.h
boost/boost_version.h: boost.zip
unzip $<
I don't know the exact syntax of the unzip call, ask your manpage about this.
Then in other Makefiles, you can let your source files depend on the unpack_boost target in order to have make unpack Boost before a source file is compiled.
# src/Makefile (excerpt)
unpack_boost:
make -C ../external unpack_boost
source_file.cpp: unpack_boost
If you're using a Makefile generator (or an entirely different buildsystem), please check the documentation for these programs for how to create something like the custom target unpack_boost. For example, in CMake, you can use the add_custom_command directive.
The fine print: The boost/boost_version.h file is not strictly necessary for the Makefile to work. You could just put the unzip command into the unpack_boost target, but then the target would effectively be phony, that is: it would be executed during each build. The file inbetween (which of course you need to replace by a file which is actually present in the zip archive) ensures that unzip only runs if necessary.
A year ago I was in the same position as you. We kept our source in SVN and, even worse, included boost in the same repository (same branch) as our own code. Trying to work on multiple branches was impossible, as it would take most of a day to check-out a fresh working copy. Moving boost into a separate vendor repository helped, but it would still take hours to check-out.
I switched the team over to git. To give you an idea of how much better it is than SVN, I have just created a repository containing the boost 1.45.0 release, then cloned it over the network. (Cloning copies all of the repository history, which in this case is a single commit, and creates a working copy.)
That clone took six minutes.
In the first six seconds a compressed copy of the repository was copied to my machine. The rest of the time was spent writing all of those tiny files.
I heartily recommend that you try git. The learning curve is steep, but I doubt you'll get much pre-compiler hacking done in the time it would take to clone a copy of boost.
We've been facing similar issues in our company. Managing boost versions in build environments is never going to be easy. With 10+ developers, all coding on their own system(s), you will need some kind of automation.
First, I don't think it's good idea to store copies of big libraries like boost in SVN or any SCM system for that matter, that's not what those systems are designed for, except if you plan to modify code in boost yourself. But let's assume you're not doing that.
Here's how we manage it now, after trying lots of different methods, this works best for us.
For every version of boost that we use, we put the whole tree (unzipped) on a file server and we add extra subdirectories, one for each architecture/compiler-combination, where we put the compiled libraries.
We keep copies of these trees on every build system and in the global system environment we add variables like:
BOOST_1_48=C:\boost\1.48 # Windows environment var
or
BOOST_1_48=/usr/local/boost/1.48 # Linux environment var, e.g. in /etc/profile.d/boost.sh
This directory contains the boost tree (boost/*.hpp) and the added precompiled libs (e.g. lib/win/x64/msvc2010/libboost_system*.lib, ...)
All build configurations (vs solutions, vs property files, gnu makefiles, ...) define an internal variable, importing the environment vars, like:
BOOSTROOT=$(BOOST_1_48) # e.g. in a Makefile, or an included Makefile
and further build rules all use the BOOSTROOT setting for defining include paths and library search paths, e.g.
CXXFLAGS += -I$(BOOSTROOT)
LFLAGS += -L$(BOOSTROOT)/lib/linux/x64/ubuntu/precise
LFLAGS += -lboost_date_time
The reason for keeping local copies of boost is compilation speed. It takes up quite a bit of disk space, especially the compiled libs, but storage is cheap and a developer losing lots of time compiling code is not. Plus, this only needs to be copied once.
The reason for using global environment vars is that build configurations are transferrable from one system to another, and can thus be safely checked in to your SCM system.
To smoothen things a bit, we've developed a little tool that takes care of the copying and setting the global environment. With a CLI, this can even be included in the build process.
Different working environments mean different rules and cultures, but believe me, we've tried lots of things and finally, we decided to define some kind of convention. Maybe ours can inspire you...
This is something you would not do in g++, because any other application that wants to do it would also have to be modified.
Store the files on a compressed filesystem. Then every application gets the benefit automatically.
It should be possible in an OS to allow transparent access to files inside a ZIP file. I know that I put it in the design of my own OS a long time ago (2004 or so) but never got it to a point where it was usable. The downside is that seeking backwards in a file inside a ZIP is slower as it's compressed (and you can't rewind the compressor state, so you have to seek from the start instead). This also makes using a zip-inside-a-zip slow for rewinding and reading. Fortunately, most cases just read a file sequentially.
It should also be retrofittable to current OSes, at least in client space. You can hook the filesystem access functions used (fopen, open, ...) and add a set of virtual file descriptors that your own software would return for a given filename. If it's a real file just pass it on, if it's not open the underlying file (possibly again via this very function) and pass a virtual handle. When accessing the file contents, read directly from the zip file without caching.
On Linux you would use an LD_PRELOAD to inject it into existing software (at usage time), on Windows you can hook the system calls or inject a DLL into the space of software to hook the same functions.
Does anybody know if this already exists? I can't see any clear reason it wouldn't...