I am using QuestDB on Linux with Ext4 file system and would like to explore an option of using a for of compression. I cannot find a list of supported File System on official website so wonder has anyone tried to run QuestDB on a filesystem with full compression like ZFS or BTRFS?
QuestDb will work on any file system that allows memory mapped files, so that should include all local file systems. It definitely works on zfs (FreeBSD 12-13) with compressed data sets because I have used it in those scenarios and it performs very well.
Related
We've recently ran into an issue with file corruption after large files are unzipped. The unzip process completes without error but can be missing last 5k bytes.
Our current process: A .ZIP file is downloaded from S3 onto the linux pod, perl code using IO::Uncompress::Unzip unzips a single .JSON file, the .JSON is uploaded back to S3.
There is another layer of challenge too. When using native windows or linux tools locally the files unzip completely, no missing bytes. However, at times single characters are changed within the file (We've seen corrupted JSON, changing "}]}" to }M}" or misspelled words, "item" to "idem"). This problem seems worse with tools like 7zip and Winrar.
In checking the details on the .ZIP file it looks to using windows for the encoding compression, which research says uses a GBK encoding. I suspect there may be a decoding issue with linux and some tools that use UTF8 decoding, but I've been unable to confirm that. Plus, we've experienced even the local windows unzip process changing single characters.
We've tried using IO::Uncompress::Unzip locally, which resulted in incomplete file.
We've tried using Archive::Zip locally, which errors out on any files over 4 GB.
We've tried using Compress::Raw::Zlib, but that also didn't work.
We've tried autoflush on the file handle, which resulted in incomplete file.
Has anyone encountered similar behaviors?
I have been using Pismo File Mount for many years, and I have always wondered how it actually works.
Let's say, I am currently working on an application that creates a package format similar to the ZIP format. For ease of access, I want to create a shell extension that works similar to how Pismo File Mount works. For those who have not used Pismo File Mount before, this is how it works:
The user right-clicks a ZIP file in Windows Explorer.
The user then clicks "Mount" to mount the ZIP file.
The user can now access his/her files immediately.
The user does not have to extract the ZIP file to view its contents.
There's a catch. I do not want to use the Pismo File Mount API, perhaps for various reasons like commercial or legal ones.
The question is, how does Pismo File Mount integrate itself into Windows Explorer programmatically, in terms of the Windows API and C++?
I wrote Pismo File Mount, and the ZIP reader included in the PFM Audit Package.
There is no consise or realistically postable answer to the question. To do what PFM does, in C/C++, to Windows API's (kernel and user), it would take 10's of thousands of lines of difficult code and a large time investment.
PFM is built as a file system driver (kernel module), with user mode support DLL's and executables. The driver uses a protocol to talk with user mode formatting code that (for example) decodes the ZIP file format and serves the contents through the kernel mode driver to client applications.
There exist two ways:
Shell namespace extension. The folder created by the shell namespace extension is not an actual filesystem folder and accessibility of the files in such folder is usually limited to Explorer itself and applications aware of shell extensions and the ways to work with them.
Filesystem filter driver which creates a virtual directory on the existing disk. Such directory is seen by all applications as a real directory, where those applications can read and write files and subdirectories. All filesystem operations go through such driver.
Pismo File Mount works via the filter driver, AFAIK.
Our CallbackFilter product provides a way to create virtual directories and files. It includes a driver and calls your user-mode code for actual operations. But filter approach is a bit complicated -- a virtual disk created with a filesystem driver (eg. with our Callback File System product) is easier to implement and manage due to differences in architectures of the filter driver stack and filesystem drivers.
Sounds like a fairly ordinary shell extension. Explorer has a powerful extension mechanism which allows it to list non-file objects such as Printers and the contents of a zip file. The particular details (columns and rows) are provided by a DLL.
You can observe this by zipping up a set of images; the ordinary thumbnail view probably won't work as that part of Explorer is usually not copied.
I had created a large archive using an old version of minizip version 1.01h which is based on zlib library. It does not support Zip64.
The source file was a text file much larger than 4GB. The compressed size of the archive is 2GB. Since it was created without Zip64 support the archive is corrupt. I am unable to restore the archive. Is there a way to recover at least a part of the text file from this corrupt archive?
You could try this streaming unzip, which ignores the central directory.
How can I find out a mime-type or content-type of a given file?
I cannot use the suffix because the file could be renamed.
Possible additions would be categorizing them as jpg, gif, png and so on are image files and can be opened by editing applications, which has been set in the OS.
Thank you in advance.
What platform? On *nix, you should refer to how the program file does it, which is based on a few heuristics including checks of the first few bytes of a file (many file formats start with a fixed header, including many image formats).
If you're on Windows, the *nix file command is probably still instructive, even if you can't reuse its code as directly. There may also be some better solution in the Windows APIs (I'm not a Windows programmer).
This could help, it is with C# but I think you can get the idea.
http://kseesharp.blogspot.com/2008/04/c-get-mimetype-from-file-name.html
You can use some sort of class for acccesing Windows Registry from qt or using the Windows API directly from qt.
I am not a qt programmer.
You can't, not from within Qt.
However, if all you want is to show a file with the correct application, you can use QDesktopServices::openUrl(), which wraps open (Windows/OSX) and xdg-open (Unix). The URL may also be a local file (in that case, use QUrl::fromLocalFile() to construct the URL).
For categorizing image files. For Qt and just only for image files.
QByteArray imageFormat = QImageReader::imageFormat(fileName); //Where fileName - path to your file
Further mapping imageFormat to mime-type is not complicable
It doesn't look like Qt offers that capability yet. However, you may want to look into the libmagic library which does what the file and other similar commands do. Assuming the library is maintained properly it will include new MIME types as time passes. From what I can tell it is part of the file tool suite. Under a Debian system do something like this:
sudo apt-get install libmagic-dev
to get the development library and use #include to make use of it.
I like the idea of using compressed folders as containers for file formats. They are used for LibreOffice or Dia. So if I want to define a special purpose file format, I can define a folder and file structure and just zip the root folder and have a single file with all the data in a single file. Imported files just live as originals inside the compressed file. Defining a binary file format from zero with this features would be a lot of work.
Now to my question: Are there applications which are using compressed folders as file formats and do versioning inside the folder? The benefits would be great. You could just commit a state in your project into your file and the versioning is just decorated with functions from your own application. Also diffs could be presented your own way.
Libraries for working with compressed files and for versioning are available. The used versioning system should be a distributed system, where the repository lives inside your working folder and not seperate as for example subversion with its client-server model.
What do you think? I'm sure there are applications out there using this approach, but I couldn't find one. Or is there a major drawback in this approach?
Sounds like an interesting idea.
I know many applications claim they have "unlimited" undo and redo,
but that's only back to the most recent time I opened this file.
With your system, your application could "undo" to previous versions of the file,
even before the version I saw the most recent time I opened this file -- that might be a nifty feature.
Have you looked at TortoiseHg?
TortoiseHg uses Mercurial, which is
"a distributed system, where the repository lives inside your working folder".
Rather than defining a new compressed versioned file format and all the software to work with it from scratch,
perhaps you could use the Mercurial file format and borrow the TortoiseHg and Mercurial source code to work with it.
What happens if I'm working on a project using 2 different applications,
and each application wants to store the entire project in its own slightly different compressed versioned file format?
What I found now is that OpenOffice aka LibreOffice has kind of versioning inside. LibreOffice file is a zip file with a structured content (XMLs, direcories, ...) inside. You are able to mark the current content as a version. This results in creating a VersionList.xml which contains information about all the versions. A Versions directory is added and this contains files like Version1, Version2 and so on. These files are the actual documents at that state.