Goal
I'm porting a filesystem to Windows, and am writing a more Windows-like interface for the mounter executable. Part of this process is letting the user locate a partition and pick a drive letter. Ultimately the choice of partition has to result in something I can open using CreateFile(), open(), fopen() or similar.
Leads
Windows seems to revolve around the concept of volumes, which don't seem quite analogous to disks, and only occur for already mounted filesystems.
Promising leads I've had include:
IOCTL_DISK_GET_DRIVE_LAYOUT_EX
Physical Disks and Volumes
Displaying Volume Paths
However these all end in volumes or offsets thereof, not the /dev/sda1 partition-specific-style handle I'm after.
This question is after a very similar thing, I considered a bounty until I observed the OP is after physical disk names, not partitions. This answer contains a method to brute force partition names, I'd like to avoid that (or see documentation containing bounds for the possible paths).
Question
I'd like:
Correct terminology and documentation for unmounted partitions in Windows.
An effective and documented method to reliably retrieve all available partitions.
The closest fit to the partition file abstraction as available in Linux, wherein all IO is bound to the appropriate area of the disk for the partition opened.
Update0
While the main goal is still opening raw partitions, it appears the solution may involve first acquiring a handle to each disk drive, and then using that in turn to acquire each partition. How to enumerate all the disk drives (even those without mounted volumes on them already) is required.
As you noted, you can use IOCTL_DISK_GET_DRIVE_LAYOUT_EX to get a list of partitions.
There's a good overview of the related concepts here. I wonder if the missing link for you is
Detecting the Type of Disk
There is no specific function to
programmatically detect the type of
disk a particular file or directory is
located on. There is an indirect
method.
First, call GetVolumePathName. Then,
call CreateFile to open the volume
using the path. Next, use
IOCTL_VOLUME_GET_VOLUME_DISK_EXTENTS
with the volume handle to obtain the
disk number and use the disk number to
construct the disk path, such as
"\?\PhysicalDriveX". Finally, use
IOCTL_DISK_GET_DRIVE_LAYOUT_EX to
obtain the partition list, and check
the PartitionType for each entry in
the partition list.
The full list of disk management control codes may have more that would be useful. To be honest I'm not sure how the Unix partition name maps onto Windows, maybe it just doesn't directly.
If you can imagine moving from safe haven of userspace and the Windows API (win32) to coding a device driver with NTTDK, you could try IoReadPartitionTableEx or some other low level disk function.
To be blunt, the best way to reliably get all mounted/unmounted disk partitions is to parse the mbr/gpt yourself.
First to clear a few things up: Disks contain partitions and partitions combine to create volumes. Therefore, you can have one volume which consists of two partitions from two different disks.
IOCTL_DISK_GET_DRIVE_LAYOUT_EX is the closest solution you're going to get without doing it manually. The problem with this is that it relies on windows which can incorrectly parse the MBR for god knows what reason. My current working theory is that if Windows was installed via EFI but is being booted via MBR, youll see this sort of issue. Windows manages to get away with this because most partition managers copy the important partition information to the MBR alongside the GPT. But this means that you wont get important information like the partition UUID (which is only stored in the GPT).
All of the other solutions involve getting the Volume information which is completely different from the partition information.
Side Note: a Volume id will usually be of the form \\.\Volume{PARTITION_UUID}. Cases where this would not hold: if the drive is partitioned with MBR and not GPT (MBR does not have a partition UUID, therefore windows makes one up), if you have a raid drive, or if you have a volume consisting of partitions from multiple disks (kinda the same thing as raid). Those are just the cases that come to my mind, dont hold me to them.
I think you're slightly mistaken in an earlier phase. For instance, you seem to assume that "mounting" works in Windows like it works in Unix. It's a bit different.
Let's start at the most familiar end. Paths like C:\ use drive letters. Those are essentially just a set of symbolic links nowadays (On Windows, they're more formally known as "junctions"). There's a base set for all users, and each user can add their own. Even if there is no drive letter for a volume, there will still be a volume name like \\?\Volume{4c1b02c1-d990-11dc-99ae-806e6f6e6963}\. You can use this volume name in calls to CreateFile() etc. I'm not sure if fopen() likes them, though.
The function QueryDosDevice will get you the Windows device name for a drive letter or a volume name. A device name looks like "\Device\HarddiskVolume1", but you can't pass it to CreateFile
Microsoft has example code to enumerate all partitions.
On Windows, like on Linux, you can open the partition itself as if it were a file. This is quite well documented under CreateFile.
Related
There's windows program "Everything Search" http://www.voidtools.com/ that reads file names of the NTFS volume faster than I assume is possible by recursive descent (it reads filenames of almost 2bln files on 4TB HDD in less than 10 seconds).
I know that it probably reads NTFS folder structure directly of the volume in bulk, and makes sense of it without calling OS filesystem functions.
How exactly can it be done? What system functions should I call to get that information about NTFS volume that fast and how can I parse it into file and directory names? Are there any libraries in any language that help with that?
If you are not sure what I am asking, there are more details in my previous question (I was asked to rephrase it): Can I read whole NTFS directory tree into RAM at once?
The NTFS volume has a low-visiblity structure it relies on called the master file table. There are APIs for querying this table directly, but they require some privileges to invoke, because you have to get a handle to the volume. The main function to query the master file table is DeviceIOControl and the control code is FSCTL_ENUM_USN_DATA
The control code appears to be a USN-related code - which is a touch misleading in this particular case - but it will give the basic flavor of the call and related structures. You get back an enumeration of records that look like usn records, but they're thin wrappers around master file table entries.
The records each have FileName, IDs and parent IDs. The FileNames are the "local" name of the file or folder, and to get the full name, you would expect to traverse the table structure.
It is lightning fast - way faster than recursing through the file system. You'll get back (and will have to filter out) things that aren't exposed in any of the normal file APIs - things you definitely don't want to expose to users, for example.
As the title goes, I want to trigger a notification when some events happen.
A event above can be user-defined, such as updating specified files in 1-miniute.
If files are stored locally, I can easily make it with the system call inotify, but the case is that files locate on a distributed file system such as mfs..
How to make it? I wonder to know if there are some solutions or open-source project to solve this problem. Thanks.
If you have only black-box access (e.g. NFS protocol) to the remote system(s), you don't have much options unless the protocol supports what you need. So I'll assume you have control over the remote systems.
The "trivial" approach is running a local inotify/fanotify listener on each computer that would forward the notification over the network. FAM can do this over NFS.
A problem with all notification-based system is the risk of lost notifications in various edge cases. This becomes much more acute over a network - e.g. client confirms reciept of notification, then immediately crashes. There are reliable message queues you can build on but IMHO this way lies madness...
A saner approach is stateless hash-based scan.
I like to call the following design "hnotify" but that's not an established term. The ideas are widely used by many version control and backup systems, dating back to Plan 9.
The core idea is if you know cryptographic hashes for files, you can compose a single hash that represents a directory of files - it changes if any of the files changed - and you can build these bottom-up to represent the whole filesystem's state.
(Git stores things this way and is very efficient at it.)
Why are hash trees cool? If you have 2 hash trees — one representing the filesystem state you saw at point in the past, one representing the current state — you can easily find out what changed between them:
You start at the roots. If they are different you read the 2 root directories and compare hashes for subdirectories.
If a subdirectory has same hash in both trees, then nothing under it changed. No point going there.
If a subdirectory's hash changed, compare its contents recursively — call step (1).
If one has a subdirectory the other doesn't, well that's a change. With some global table you can also detect moves/renames.
Note that if few files changed, you only read a small portion of the current state. So the remote system doesn't have to send you the whole tree of hashes, it can be an interactive ping-pong of "give me hashes for this directory; ok now for this...".
(This is akin to how Git's dumb http protocol worked; there is a newer protocol with less round trips.)
This is as robust and bug-proof as polling the whole filesystem for changes — you can't miss anything — but reasonably efficient!
But how does the server track current hashes?
Unfortunately, fully hashing all disk writes is too expensive for most people. You may get if for free if you're lucky to be running a deduplicating filesystem, e.g. ZFS or Btrfs.
Otherwise you're stuck with re-reading all changed files (which is even more expensive than doing it in the filesystem layer) or using fake file hashes: upon any change to a file, invent a new random "hash" to invalidate it (and try to keep the fake hashes on moves). Still compute real hashes up the tree. Now you may have false positives — you "detect a change" when the content is the same — but never false negatives.
Anyway, the point is that whatever stateful hacks you do (e.g. inotify with periodic scans to be sure), you only do them locally on the server. Across the network, you only ever send hashes that represent snapshots of current state (or its subtrees)! This way you can have a distributed system with many servers and clients, intermittent connectivity, and still keep your sanity.
P.S. Btrfs can efficiently find differences from an older snapshot. But this is a snapshot taken on the server (and causing all data to be preserved!), less flexible than a client-side lightweight tree-of-hashes.
P.S. One of your tags is HadoopFS. I'm not really familiar with it, but I suspect a lot of its files are write-once-then-immutable, and it might be able to natively give you some kind of file/chunk ids that can serve as fake hashes?
Existing tools
The first tool that springs to my mind is bup index. bup is a very clever deduplicating backup tool built on git (only scalable to huge data), so it sits on the foundation described above. In theory, indexing data in bup on the server and doing git fetch over the network would even implement the hash-walking comparison of what's new that I described above — unfortunately the git repositories that bup produces are too big for git itself to cope with. Also you probably don't want bup to read and store all your data. But bup index is a separate subsystem that quickly scans a filesystem for potential changes, without yet reading the changed files.
Currently bup doesn't use inotify but it's been discussed in depth.
Oh, and bup uses Bloom Filters which are a nearly optimal way to represent sets with false positives. I'm almost certain Bloom filters have a role to play in optimizion stateless notification protocols ("here is a compressed bitmap of all I have; you should be able to narrow your queries with it" or "here is a compressed bitmap of what I want to be notified about"). Not sure if the way bup uses them is directly useful to you, but this data structure should definitely be in your toolbelt.
Another tool is git annex. It's also based on Git (are you noticing a trend?) but is designed to keep the data itself out of Git repos (so git fetch should just work!) and has a "WORM" option that uses fake hashes for faster performance.
Alternative design: compressed replayable journal
I used to think the above is the only sane stateless approach for clients to check what's changed. But I just read http://arstechnica.com/apple/2007/10/mac-os-x-10-5/7/ about OS X's FSEvents framework, which has a perhaps simpler design:
ALL changes are logged to a file. It's kept forever.
Clients can ask "replay for me everything since event 51348".
The magic trick is the log has coarse granularity ("something in this directory changed, go re-scan it to find out what", repeated changes within 30 seconds are combined) so this journal file is very compact.
At the low level you might resort to similar techniques — e.g. hashes — but the top-level interface is different: instead of snapshots you deal with a timeline of events. It may be an easier fit for some applications.
I'm trying to speedup directory enumeration in C++, where I'm recursing into subdirectories. I currently have an app which spends 95% of it's time in FindFirst/FindNextFile APIs, and it takes several minutes to enumerate all the files on a given volume. I know it's possible to do this faster because there is an app that does: Everything. It enumerates my entire drive in seconds.
How might I accomplish something like this?
I realize this is an old post, but there is a project on source forge that does exactly what you are asking and the source code is available.
You can find the project here: NTFS-Search
"Everything" builds an index in the background, so queries are against the index not the file system itself.
There are a few improvements to be made - at least over the straight-forward algorrithm:
First, breadth search over depth search. That is, enumerate and process all files in a single folder before recursing into the sub folders you found. This improves locality - usually a lot.
On Windows 7 / W2K8R2, you can use FindFirstFileEx with FindExInfoBasic, the main speedup being omitting the short file name on NTFS file systems where this is enabled.
Separate threads help if you enumerate different physical disks (not just drives). For the same disk it only helps if it's an SSD ("zero seek time"), or you spend significant time processing a file name (compared to the time spent on disk access).
[edit] Wikipedia actually has some comments -
Basically, they are skipping the file system abstraction layer, and access NTFS directly. This way, they can batch calls and skip expensive services of the file system - such as checking ACL's.
A good starting point would be the NTFS Technical Reference on MSDN.
"Everything" accesses directory information at a lower level than the Win32 FindFirst/FindNext APIs.
I believe it reads and interprets the NTFS MFT structures directly, and that this is one of the main reasons for its performance. It's also why it requires admin privileges and why "Everything" only indexes local or removable NTFS volumes (not network drives, for example).
A couple other utilities that do the similar things are:
FindOnClick by 2Brightsparks
Search GT
A little reverse engineering with a debugger on these tools might give you some insight on the techniques they use.
Don't recurse immediately, save a list of directories you find and dive into them when finished. You want to do linear access to each directory, to take advantage of locality of reference and any caching the OS is doing.
If you're already doing the best you can to get the maximum speed from the API, the next step is to do low-level disk accesses and bypass Windows altogether. You might get some guidance from the NTFS drivers for Linux, or perhaps you can use one directly.
If you are doing this on NTFS, here's a lib for low level access: NTFSLib.
You can enumerate through all file records in $MFT, each representing a real file on disk. You can get all file attributes from the record, including $DATA.
This may be the fastest way to enumerate all files/directories on NTFS volumes, 200k~300k files per minute as I tested.
I want to create a simple program which is working very similar to RAID1. It should work like this:
First i want to give the primary HDD-s drive letter and than the secondary one. I will only write to the primary HDD! If any new data is copied to the primary HDD it should automatically copy it to the secondary one.
I need some help where should i start all this? How to monitor the written data in the primary HDD? Obviously there are many ways to do what i want (i think), but i need the simpliest way.
If this isn't so complicated, than how can i handle that case if the primary HDD has two or more partition, because then i should check the secondary HDD's partition too, and then create/resize them if necessary?
Thanks in advance!
kampi
The concept of mirroring disk writes to another disk in real time is the basis for high availability, and implementing these schemes are not trivial.
The company I work for makes DoubleTake, which does real time mirroring & replication of file based IO to local or remote volumes. This is a little different than what you are describing, which appears to be block based disk/volume replication, but many of the concepts are similar.
For file based replication, there are a quite a few nasty scenarios, i'll describe a few:
Synchronizing the contents of one volume to another volume, keeping in mind that changes can occur while you are doing this. I suppose you could simply this by requiring that volumes start out totally formatted. But for people that have data that will not be a good solution!
keeping up with disk changes: What if the volume you are mirroring to is slower than the source volume? Where do you buffer? To Disk? Memory?
Anyways we use a kernel mode file system filter driver to capture the disk IO, and then our user mode service grabs this IO and forwards it to a local or remote disk.
If you want to learn about file system filtering, one of the best books (its old but good) is File System Internals, by Rajeev Nagar. Its a must read for doing any serious work with file system filters.
Also take a look at the file system filter samples on the Windows 7 WDK, its free, and they have good file mon examples that will get you seeing disk changes pretty quickly.
Good Luck!
In windows is it possible to know what kind of disk we are dealing with from a c/c++ program? forget about gpt or mbr, how to know whether it is basic or dynamic? Program input can be drive letter or any info related to disk, output should be dynamic or basic.
No need of a direct way of doing, even if it is lengthy process, its okay.
I couldn't find much in msdn. Please help me out.
There is a way in windows, but it's not straight forward.
There is no direct API to determine if a disk is Basic or Dynamic, however all dynamic disks will have LDM Information.
So if a drive has a partion with LDM information on it, then it's going to be a dynamic disk.
the DeviceIoControl() method with the IOCTL_DISK_GET_DRIVE_LAYOUT_EX control code can be used to get this information.
Here is a post with a sample console application to do what you're asking for.
As per from MSDN http://msdn.microsoft.com/en-us/library/aa363785(VS.85).aspx
Detecting the Type of Disk
There is no specific function to programmatically detect the type of disk a particular file or directory is located on. There is an indirect method.
First, call GetVolumePathName. Then, call CreateFile to open the volume using the path. Next, use IOCTL_VOLUME_GET_VOLUME_DISK_EXTENTS with the volume handle to obtain the disk number and use the disk number to construct the disk path, such as "\?\PhysicalDriveX". Finally, use IOCTL_DISK_GET_DRIVE_LAYOUT_EX to obtain the partition list, and check the PartitionType for each entry in the partition list.
check out for GetDriveType() .