How do I extract a file out of a virtual disk?

How do I extract a file out of a virtual disk? - c++

Given a block of data (which the filesystem thinks is the whole drive) and the type of filesystem (fat32, ntfs, ext3) I would like to know how to extract files out of that block of data. Any ideas on how to do this?

You ultimately have two options:
Mount the filesystem contained in the virtual disk image on the host machine. Tools like losetup can be helpful to accomplish this.
Find an appropriate library that will allow you to poke at the volume in userspace. Basically, you want a user-mode filesystem driver that will let a program inspect the directory structure and extract files. You might be able to repurpose parts of fuse-ext2 and ntfs-3g.
This all assumes that the virtual disk is just a flat image file, not a specialized container like VMDK or VDI. If it is, you'll either need to extract the flat image or find a library that is capable of providing the flat content to other libraries.

You mount it to some point using
mount image /mount/point -o loop,ro
and access the files in it. Afterwards, you can unmount again.
But I do not understan what this has to do with C or C++.

Related

Detecting mounted drives on Linux and Mac OS X

I’m using QDir::drives() to get the list of drives. It works great on Windows, but on Linux and Mac it only returns a single item “/”, i. e. root. It is expected behavior, but how can I get a list of drives on Mac and Linux?
Non-Qt, native API solutions are also welcome.
Clarification on "drive" definition: I'd like to get a list of mount points that are visble as "drives" in Finder or Linux built-in file manager.

As far as the filesystem is concerned, there is no concept of drives in Unix/Linux (I can't vouch for MacOSX but I'd say it's the same). The closest thing would probably be mount points, but a normal application shouldn't bother about them since all is already available under the filesystem root / (hence the behaviour of QDir::drives() that you observe).
If you really want to see which mount points are in use, you could parse the output of the mount command (without any arguments) or, at least on Linux, the contents of the /etc/mtab file. Beware though, mount points can get pretty hairy real quick (loop devices, FUSE filesystems, network shares, ...) so, again, I wouldn't recommend making use of them unless your application is designed to administer them.
Keep in mind that on Unix-y OSes, mount points are normally a matter for system administrators, not end-users, unless we're speaking of removable media or transient network shares.
Edit: Following your clarifications in the comments, on Linux you should use getmntent or getmntent_r to parse the contents of the /etc/mtab file and thus get a list of all mount points and the corresponding devices.
The trick after that is to determine which ones you want to display (removable? network share?). I know that /sys/block/... can help with that, but I don't know all the details so you'll have to dig a bit more.
For example, to check whether /dev/sdd1 (a USB key) mounted on /media/usb0/ is a removable device, you could do (note how I use the device name sdd, not the partition name sdd1):
$ cat /sys/block/sdd/removable
1
As opposed to my main hard drive:
$ cat /sys/block/sda/removable
0
Hope this puts you on the right track.

For OS X, the Disk Arbitration framework can be used to list and monitor drives and mount points

Scraping the output of mount shell command is certainly one option on either platform - although, what is your definition of a drive here? Physical media, removable drivers, network volumes? You'll need to do a lot of filtering.
On MacOSX, the mount point for removable media, network volumes, and secondary hard-drives is always under /Volumes/, so simply enumerating items in this directory will do the trick if your definition of a drive is broad. This ought to be fairly safe as they're all automounted .
On Linux, there are a variety of locations depending on the particular distro in use. /mnt/ is the traditional, but there are others.

In linux, the way to get information about drives currently mounted is to parse the mtab file. glibc provides a macro _PATH_MNTTAB to locate this file. See http://www.gnu.org/software/libc/manual/html_node/Mount-Information.html#Mount-Information

If you know the format of the drive/drives in question, you can use the df command to output the list of drives from the console or programatically as a system command. For example, to find all the ext4 drives:
df -t ext4
You can simply add additional formats onto the same command if you are interested in more than one type:
df -t ext4 -t tmpfs
This is going to return to you the physical location of the drive, the amount of memory it has, the amount of memory used, the amount of memory free, the use% and where it is mounted on the filesystem.
df will show you all of the drives mounted on the system, but some are going to be things that aren't really what you are looking for like temporary file systems, etc.
Not sure if this will work on OSX or not, but it does work on my Ubuntu 12.04 distribution.

Another way is to check for "Volumes"
df -H | grep "/Volumes"

I know that this is old, but it failed to mention getfsstat which I ended up using in macos. You can get a list of mounts (which will include most disks) using getfsstat. See man 2 getfsstat for further information.

hardlink multiple file to one file

I have many files in a folder. I want to concatenate all these files to a single file. For example cat * > final_file;
But this will increase disk space. Is there is a way where I can hardlink all the files to final_file? For example ln * final_file.

This is not possible using links.
If you really need this kind of feature and can not afford to create one large file you could go for a custom file system driver. FUSE will allow you to write a simple file system driver which runs in the user space and allows to access the files as they were one large file.
You could also write a custom block device (e.g. by emulating the NBD "Network Block Device" protocol) which combines two or more files into one large block device.
Getting to know the concrete use case would help to give a better answer.

No. Hardlinking links 2 files, nothing more. The filesystem does not support that at an underlying level.

C++ file container (e.g. zip) for easy access

I have a lot of small files I need to ship with an application I build and I want to put this files into an archive to make copying and redistributing more easy.
I also really like the idea of having them all in one place so I need to compare the md5 of one file only in case something goes wrong.
I'm thinking about a class which can load the archive and return a list of files within the archive and load a file into memory if I need to access it.
I already searched the Internet for different methods of achieving what I want and found out about zlib and the lzma sdk.
Both didn't really appeal to me because I don't really found out how portable zlib is and I didn't like the lzma sdk as it is just to much and I don't want to blow up the application because of this problem. Another downside with zlib is that I don't have the C/C++ experience (I'm really new to C++) to get everything explained in the manual.
I also have to add that this is a time critical problem. I though some time about implementing a simple format like tar in a way I can easy access the files within my application but I just didn't find the time to do that yet.
So what I'm searching for is a library that allows me to access the files within an archive. I'd be glad if anybody could point me in the right direction here.
Thanks in advance,
Robin.
Edit: I need the archive to be accessed under linux and windows. Sorry I didn't mention that in the beginning.

For zipping, I've always been partial to ZipUtils, which makes the process easy and is built on top of the zlib and info-zip libraries.

The answer depends on whether you plan to modify the archive via code after the archive is initially built.
If you don't need to modify it, you can use TAR - it's a handy and simple format. If you want compression, you can implement tar.gz reader or find some library that does this (I believe there are some available, including open-source ones).
If your application needs random access to the data or it needs to modify the archive, then regular TAR or ZIP archives are not good. Virtual file system such as our SolFS or CodeBase file system will fit much better: virtual file systems are suited for frequent modifications of the storage, while archives target mainly write-once-read-many usage scenarios.

zlib is highly portable and very widely used. if you can't make sense of the C++ interface, there are alternatives for many other languages - see 'Related External Links' here.
Take another look before you search for something different.

If you're using Qt or Windows you can also pack data into the executable's resource area. You would only have to distribute the executable file using this technique. There's a well defined API already written and tested to access that data.

The zlib API is the way to go. Simple and portable. Lookat unzip.h header for APIs that access archive files. It is in C and very easy.
If the files are small, you can dump them into string literals (search for bin2h utility) and include in your project. Then change the code that read the files. If all files are currently read using ifstream class, simply changing it to istringstream class and recompile the code.

Try using Quazip - it's quite simple to use. You can use it as a stream from which you read the compressed file on the fly.

Obtain a list of partitions on Windows

Goal
I'm porting a filesystem to Windows, and am writing a more Windows-like interface for the mounter executable. Part of this process is letting the user locate a partition and pick a drive letter. Ultimately the choice of partition has to result in something I can open using CreateFile(), open(), fopen() or similar.
Leads
Windows seems to revolve around the concept of volumes, which don't seem quite analogous to disks, and only occur for already mounted filesystems.
Promising leads I've had include:
IOCTL_DISK_GET_DRIVE_LAYOUT_EX
Physical Disks and Volumes
Displaying Volume Paths
However these all end in volumes or offsets thereof, not the /dev/sda1 partition-specific-style handle I'm after.
This question is after a very similar thing, I considered a bounty until I observed the OP is after physical disk names, not partitions. This answer contains a method to brute force partition names, I'd like to avoid that (or see documentation containing bounds for the possible paths).
Question
I'd like:
Correct terminology and documentation for unmounted partitions in Windows.
An effective and documented method to reliably retrieve all available partitions.
The closest fit to the partition file abstraction as available in Linux, wherein all IO is bound to the appropriate area of the disk for the partition opened.
Update0
While the main goal is still opening raw partitions, it appears the solution may involve first acquiring a handle to each disk drive, and then using that in turn to acquire each partition. How to enumerate all the disk drives (even those without mounted volumes on them already) is required.

As you noted, you can use IOCTL_DISK_GET_DRIVE_LAYOUT_EX to get a list of partitions.
There's a good overview of the related concepts here. I wonder if the missing link for you is
Detecting the Type of Disk
There is no specific function to
programmatically detect the type of
disk a particular file or directory is
located on. There is an indirect
method.
First, call GetVolumePathName. Then,
call CreateFile to open the volume
using the path. Next, use
IOCTL_VOLUME_GET_VOLUME_DISK_EXTENTS
with the volume handle to obtain the
disk number and use the disk number to
construct the disk path, such as
"\?\PhysicalDriveX". Finally, use
IOCTL_DISK_GET_DRIVE_LAYOUT_EX to
obtain the partition list, and check
the PartitionType for each entry in
the partition list.
The full list of disk management control codes may have more that would be useful. To be honest I'm not sure how the Unix partition name maps onto Windows, maybe it just doesn't directly.

If you can imagine moving from safe haven of userspace and the Windows API (win32) to coding a device driver with NTTDK, you could try IoReadPartitionTableEx or some other low level disk function.

To be blunt, the best way to reliably get all mounted/unmounted disk partitions is to parse the mbr/gpt yourself.
First to clear a few things up: Disks contain partitions and partitions combine to create volumes. Therefore, you can have one volume which consists of two partitions from two different disks.
IOCTL_DISK_GET_DRIVE_LAYOUT_EX is the closest solution you're going to get without doing it manually. The problem with this is that it relies on windows which can incorrectly parse the MBR for god knows what reason. My current working theory is that if Windows was installed via EFI but is being booted via MBR, youll see this sort of issue. Windows manages to get away with this because most partition managers copy the important partition information to the MBR alongside the GPT. But this means that you wont get important information like the partition UUID (which is only stored in the GPT).
All of the other solutions involve getting the Volume information which is completely different from the partition information.
Side Note: a Volume id will usually be of the form \\.\Volume{PARTITION_UUID}. Cases where this would not hold: if the drive is partitioned with MBR and not GPT (MBR does not have a partition UUID, therefore windows makes one up), if you have a raid drive, or if you have a volume consisting of partitions from multiple disks (kinda the same thing as raid). Those are just the cases that come to my mind, dont hold me to them.

I think you're slightly mistaken in an earlier phase. For instance, you seem to assume that "mounting" works in Windows like it works in Unix. It's a bit different.
Let's start at the most familiar end. Paths like C:\ use drive letters. Those are essentially just a set of symbolic links nowadays (On Windows, they're more formally known as "junctions"). There's a base set for all users, and each user can add their own. Even if there is no drive letter for a volume, there will still be a volume name like \\?\Volume{4c1b02c1-d990-11dc-99ae-806e6f6e6963}\. You can use this volume name in calls to CreateFile() etc. I'm not sure if fopen() likes them, though.
The function QueryDosDevice will get you the Windows device name for a drive letter or a volume name. A device name looks like "\Device\HarddiskVolume1", but you can't pass it to CreateFile
Microsoft has example code to enumerate all partitions.
On Windows, like on Linux, you can open the partition itself as if it were a file. This is quite well documented under CreateFile.

how to know whether disk is basic or dynamic?

In windows is it possible to know what kind of disk we are dealing with from a c/c++ program? forget about gpt or mbr, how to know whether it is basic or dynamic? Program input can be drive letter or any info related to disk, output should be dynamic or basic.
No need of a direct way of doing, even if it is lengthy process, its okay.
I couldn't find much in msdn. Please help me out.

There is a way in windows, but it's not straight forward.
There is no direct API to determine if a disk is Basic or Dynamic, however all dynamic disks will have LDM Information.
So if a drive has a partion with LDM information on it, then it's going to be a dynamic disk.
the DeviceIoControl() method with the IOCTL_DISK_GET_DRIVE_LAYOUT_EX control code can be used to get this information.
Here is a post with a sample console application to do what you're asking for.

As per from MSDN http://msdn.microsoft.com/en-us/library/aa363785(VS.85).aspx
Detecting the Type of Disk
There is no specific function to programmatically detect the type of disk a particular file or directory is located on. There is an indirect method.
First, call GetVolumePathName. Then, call CreateFile to open the volume using the path. Next, use IOCTL_VOLUME_GET_VOLUME_DISK_EXTENTS with the volume handle to obtain the disk number and use the disk number to construct the disk path, such as "\?\PhysicalDriveX". Finally, use IOCTL_DISK_GET_DRIVE_LAYOUT_EX to obtain the partition list, and check the PartitionType for each entry in the partition list.

check out for GetDriveType() .

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js