How to match linux device path to windows drive name? - c++

I'm writing an application that on some stage performs low-level disk operations in Linux environment. The app actually consists of 2 parts, one runs on Windows and interacts with a user and another is a linux part that runs from a LiveCD. User makes a choice of Windows drive letters and then a linux part performs actions with corresponding partitions. The problem is finding a match between a Windows drive letter (like C:) and a linux device name (like /dev/sda1). This is my current solution that I rate as ugly:
store partitions information (i.e. drive letter, number of blocks, drive serial number etc.) in Windows in some pre-defined place (i.e. the root of the system partition).
read a list of partitions from /proc/partitions. Get only those partitions that has major number for SCSI or IDE hard drives and minor number that identifies them as real partitions and not the whole disks.
Try to mount each of them with either ntfs or vfat file systems. Check whether the mounted partition contains the information stored by Windows app.
Upon finding the required information written by the Windows app make the actual match. For each partition found in /proc/partitions acquire drive serial number (via HDIO_GET_IDENTITY syscall), number of blocks (from /proc/partitions) and drive offset (/sys/blocks/drive_path/partition_name/start), compare this to the Windows information and if this matches - store a Windows drive letter along with a linux device name.
There are a couple of problems in this scheme:
This is ugly. Writing data in Windows and then reading it in Linux makes testing a nightmare.
linux device major number is compared only with IDE or SCSI devices. This would probably fail, i.e. on USB or FireWire disks. It's possible to add these types of disks, but limiting the app to only known subset of possible devices seems to be rather bad idea.
looks like HDIO_GET_IDENTITY works only on IDE and SATA drives.
/sys/block hack may not work on other than IDE or SATA drives.
Any ideas on how to improve this schema? Perhaps there is another way to determine windows names without writing all the data in windows app?
P.S. The language of the app is C++. I can't change this.

Partitions have UUIDs associated with them
My knowledge of this is very shallow, but I thought that was only true for disks formatted with GPT (Guid Partition Table) partitions, rather than the old-style MBR format which 99% of the world is still stuck with?

Partitions have UUIDs associated with them. I don't know how to find these in Windows but in linux you can find the UUID for each partition with:
sudo vol_id -u device (e.g. /dev/sda1)
If there is an equivilent function in Windows you could simply store the UUIDs for whatever partition they pick then iterate through all known partitions in linux and match the UUIDs.
Edit: This may be a linux-only thing, and it may speficially be the volid util that generates these from something (instead of reading off meta-data for the drive). Having said that, there is nothing stopping you getting the source for volid and checking out what it does.

My knowledge of this is very shallow,
but I thought that was only true for
disks formatted with GPT (Guid
Partition Table) partitions, rather
than the old-style MBR format which
99% of the world is still stuck with?
Not to sounds like a linux user cliche but it Works For Me.. I use it with NTFS partitions and have had no problems. As I said in my edit, vol_id may be generating them itself. If that were the case there would be no reliance on any particular partition format, which would be swell.

Partitions have UUIDs associated with them. I don't know how to find these in Windows but in linux you can find the UUID for each partition with:
sudo vol_id -u device (e.g. /dev/sda1)
If there is an equivilent function in Windows you could simply store the UUIDs for whatever partition they pick then iterate through all known partitions in linux and match the UUIDs.
That's a good point, thank you! I've looked to the sources of vol_id (a part of the udev tarball) and it seems that for FAT(32) and NTFS it generates UUUD using the volume serial number that is read from the predefined location on the partition. Since I don't expect anything other then fat32 and ntfs I consider to use this information as a partition identifier.

You need to either mark the drive in some way (e.g. write a file etc.), or find some identifier that is only associated with that particular drive.
It is very hard, almost impossible to figure out what letter Windows would assign to a particular drive partition, without actually running Windows. This is because Windows always associates the drive that it is run from with C:. Which could be any drive, if you have more than one operating system installed. Windows also allows you to choose what drive letter it will try first, for a specific partition, causing further problems.
It would be a whole lot easier to do the GUI stuff inside Linux, than to try this mixed Window/Linux solution. I'm not say don't try it this way, what I am saying is there are very many possible pitfalls with this approach. I'm sure I don't even know about all of them.
Another option would be to see if you could actually do the Linux part, inside of Windows. If you are a very good Windows programmer, you can actually get access to the raw file-system. There are probably just as many pitfalls with this approach, because Windows will be running while all of this is in operation.
So to re-iterate I would see if you could do everything from within Linux, if you can. It's just a whole lot simpler in the long run.

In Windows you can read the "NTFS Volume Serial Number" which seams to match the UUID under Linux.
Possibilities to get the "NTFS Volume Serial" from Windows:
commandline since XP: fsutil.exe fsinfo ntfsinfo C:
under c++
HANDLE fileHandle = CreateFile(L"\\\\.\\C:", // or use syntax "\\?\Volume{GUID}"
GENERIC_READ,
FILE_SHARE_READ|FILE_SHARE_WRITE,
NULL,
OPEN_EXISTING,
NULL,
NULL);
DWORD i;
NTFS_VOLUME_DATA_BUFFER ntfsInfo;
DeviceIoControl(fileHandle,
FSCTL_GET_NTFS_VOLUME_DATA,
NULL,
0,
&ntfsInfo,
sizeof(ntfsInfo),
&i,
NULL));
cout << "UUID is " << std::hex << ntfsInfo.VolumeSerialNumber.HighPart << std::hex << ntfsInfo.VolumeSerialNumber.LowPart << endl;
Possibilities to get the UUID under Linux:
ls -l /dev/disk/by-uuid
ls -l /dev/disk/by-label
blkid /dev/sda1

Related

Retrieve physical address of file on disk

Using the Windows API, I'm trying to write a program to read data from a disk. I managed to get access to the content of the drive using CreateFile and I'm able to search through it. Let's say there are some files on that disk and I know their paths, but I'm actually interested in their physical location.
My question is:
Is it possible to retrieve the physical location or address of the files (or sector they're located in) and where are they stored on the drive without searching the whole drive? If so, what functions should I use? Using SetFilePointer or FindFirstFile don't seem to solve the solution either.
The whole point of any file system is to abstract the physical disk sectors and provide you a higher level abstraction (called files). So the answer to "Is it possible to retrieve the physical location" should be no! (in general); some code might even move the sectors of a file (e.g. a disk defragmenter and you could imagine it is running concurrently with your program, even if that is not recommended..)
For more, read wikipages on file systems and files, then read a good book such as Operating systems: Three Easy Pieces
Notice that by using files, you are expecting that your program behave similarly after having moved a file system into a different disk, provided the file paths, contents, and metadata remain the same. In particular, you could have two external USB disks enclosures with different geometries or capacities having the same file contents (perhaps even in different file systems, e.g. VFAT on one and NTFS on another), and you then expect your program to behave identically when accessing such files (in the first box or the second one). Whatever box is plugged, your program would (for example) access the same F:\MyDir\MyFile.dat file. As file systems, both boxes would appear identical. At the physical sector level, data would be organized very differently.
BTW, the physical organization of files inside a file system varies greatly from one file system to another one. You could use some Ext3 file system on your machine (since there are Ext3 drivers for Windows) - and that is actually useful to share some data between Linux & Windows on a dual boot PC -, and the file organization is different from a FAT one or a NTFS one.
You might get some way to query the kernel to get the actual physical sector location. But I am not sure it works for all file systems (what would be the meaning of a sector location for some remote NFS one). And that information could be stale before your program get it (e.g. if some defragmenter is working in parallel). Also, other processes could access and modify the same file system at the same time (so that meta data -e.g. the sector location- would be obsolete by the time your process is scheduled to run again).
On Windows and on Unix like systems, file system code runs in the kernel. And other processes could use that same code (and the same file system) while your process is not running. Both Windows and Unix have preemptive scheduling, so you have no guarantee that your process runs again in user mode before some other process is using the same file system.
Remember that in practice, your file data often stays in the page cache. And that is why you might not hear your disk working -if you still have a rotating hard disk- when accessing the same file several times in a row (e.g. running the same program on the same file twice, a few seconds apart; usually the second run is keeping the disk silent, because the file data is already in RAM).
In a comment you mention that you want
To watch the data of the file and for example see what happens to the data when it gets deleted or modified.
but that should work at the file system level. Linux has inotify(7) facilities for that (they work on most local file systems, e.g. Ext4 or BTRFS, but not on remote file systems à la nfs(5), and neither on pseudo file systems à la proc(5)). I don't know if Windows has something similar to Linux inotify (but probably yes, at least in some cases).
You probably should consider using some database (maybe as simple as sqlite), and perhaps you want ACID properties (then use some real RDBMS like PostGreSQL). With PostGreSQL you might use TRIGGERs to be aware that some data changed, even if some other program changes the same database.
You could also do some file locking, and adopt the convention that every program accessing your particular file should lock it appropriately.

USB drive WriteFile to sectors outside volume

I'm developing c++ WinAPI program to write data on my USB drive directly with CreateFile by the Volume{GUID} and WriteFile functions as dozens of examples do. As mentioned here in Remarks:
A write on a volume handle will succeed if the volume does not have a
mounted file system, or if one of the following conditions is true:
The sectors to be written to are boot sectors.
The sectors to be written to reside outside of file system space.
You have explicitly locked or dismounted the volume by using FSCTL_LOCK_VOLUME or FSCTL_DISMOUNT_VOLUME.
The volume has no actual file system. (In other words, it has a RAW file system mounted.)
I want write 100Mb of data on USB smoothly without any unmounting. So I've tried two of the cases from above.
The second case: writing outside of the file system (I've extended the number of sectors per partition without extending FAT32 table), but it doesn't work without unmounting!
The fourth case: write on the volume without any file system (unformatted). But it also doesn't work without unmounting!!!
Also I've tried to create the second partition (that is invisible for Windows) with/without file system and write directly there by the offset from the end of first partition, but also unsuccessfully: I can not read or write there.
So, if anybody knows ANSWER TO THE ONE OF THE QUESTIONS at least:
How can I create sectors inside the partition outside
the file system?
Is there an ability to write smoothly on USB directly?
Also can I write in Windows to the second partition on USB without
playing around with drivers, NTCreateFile and other deep functions?
I'm pretty sure it actually worked, but you don't realize that you're bypassing Windows. That means Windows will not have noticed what you did. So if you use Windows to check what you did, it will not report a change.
To address the individual sub-questions: A normal file system fills the entire partition, so you can't. Your functions write smoothly to USB, that's not the problem. And no, Windows normally treats USB as unpartitioned storage.

How to programmatically discover the filesystem without mounting the device (like "fdisk -l")

I need to find a system call in linux to discover the filesystem of a connected pendrive in my application. I found out that the 'fdisk -l' do the job however I need now to discover how this happens. I wasn't capable to discover reading the fdisk code, the only certain think is that:
The structs statfs or statvfs are not used;
The fdisk doesn't need to mount the device to find the filesystem;
Obs: My application is written in C++ and is running in a embedded linux system.
The command fdisk -l displays the file system represented by the System ID byte. This byte is in the Partition Table which is inside the Master Boot Record (MBR). The MBR is usually stored on the first cylinder on the first hard drive (although any hard drive can have an MBR).
I would think you could simply use the open and read system calls to read the MBR from the disk assuming the user running your program has permission:
char buf[446];
int fd = open("/dev/hda", O_RDONLY);
read(fd, buf, 446);
Look over the MBR Format and then read out the partition table to get the System ID bytes. Here's a list of types for the System ID byte.
I am only aware of how fdisk on Linux works, and last time I checked it didn't support GPT or any other partitioning formats. So this answer is relevant only to the classical MBR format.
You can use libblkid from util-linux to do this. The source distribution includes a sample which lists the partitions on a specified device, including filesystem type.

linux -- getting from /proc/partitions contents to something I can 'ls'

I am writing (yet another) file manager (to learn stuff:) and have a silly/stupid block.
On Linux, to enumerate the storage devices which can contain files, I believe the best approach is to parse the contents of the /proc/partitions file and extract the /dev/sda* entries. (right?) However, how can I map the /dev/sda* to something that I can explore programmatically to get directory contents? I am planning on using boost/filesystem, but since I cannot ls /dev/sda I assume I cannot use boost to iterate over it.
Synopsis: how can I convert /dev/sda* to something that I can 'ls'
I think you're mis-understanding exactly what /dev/sd* actually are to a program. They are devices not directories. You use the mount command to tell the operating system to "interpret" the device as a filesystem, and to attach it somewhere (root, or otherwise). It's this step that makes it into "a directory" somewhere on your filesystem. So other than raw I/O commands (which you don't want to do), get the filesystem mounted, and THEN try and explore it.
It's kind of like opening a file really. When you do this, the operating system gives your program a stream of bytes that you can randomly access the file through. But on the disk, that file could actually be scattered all over the hard drive (or whatever device). But the OS is "making" it into a "nice" format for you to deal with transparently. The same is true of the disk itself when accessing directory/file listings.
I hope my example made it clearer as to why what you're trying to do isn't as simple as you think it is.
Off the cuff, I'd say that the output of mount with no arguments might be a faster choice. That should show you the mounted filesystems and devices while /proc would show you all the devices and partitions.
the device /dev/sda* is a block device and needs to be mounted. To be able to ls it you need to have something that can interpret the file system type. First step: identify the file system type, in raw code there is usually a header code in the partition table in the first segment of the harddrive which would just be /dev/sda On a Linux system it would be something like ext3
Next you need to either write or use a library for interfacing with that filesystem, if you get the Kernel source code for Linux it has a LOT of API code for interfacing with common filesystems, and wrappers for standard POSIX calls which is exactly what you're looking for. Things like ls and cwd use system calls to retrieve information about a mounted filesystem, the disk is a block (or sometimes a character) device and you need the ability to talk to it and speak the same language.

Uniquely identify PC based on software/hardware

For a requirement to generate per-PC license keys, I need some code which will return a stable and (near) unique key on any PC. It doesn't have to be guaranteed unique, but close. It does need to be reasonably stable though, so that a given PC always generates the same result unless the hardware is substantially changed.
This is for a Windows application, using wxWidgets but a Win32 or other option is fine.
I was thinking about MAC address but what about laptops which can routinely disable the network card in power-saving mode? I came across GetCurrentHwProfile but it doesn't quite look like what I want?
One idea I had a while back for this is to use CryptProtectData as a way to identify a machine. Behind-the-scenes in that API, Microsoft has done what you're looking for. I never tested it though and I'm curious if it's actually viable.
Basically you would encode a constant magic value with CryptProtectData with CRYPTPROTECT_LOCAL_MACHINE, and the result is your machine ID.
I would just go with the MAC address method; when the wireless / LAN cards are turned off they still show up in Network Connections. You should therefore still be able to get the MAC.
Consider this: Any time you'd be able to contact your webserver or whatever you're cataloging these IDs with, the user is going to have to have some form of network card available.
Oh, and you might be able to use CPU serial number of the customer's computer supports it.
I think there no really easy and unique method so far discovered here.
GetVolumeInformation retrieves not even close to unique ID.....
To use any hardware serial is problematic because manufactures are not committed to supported it always and especially to keep it globally unique
GetCurrentHwProfile retrieves GUID but it's value affected by minor! hardware changes...
Using Product Key ... will bring U to deal with the stolen software - there lot of pirate installations over the globe.
Creation of own GUID and preserving it under registry (in any place) will not prevent duplication by cloning of image ....
etc...
From my perspective of view the best way is combine:
Volume ID + MAC's list + Machine SID + Machine Name. And obviously manage license policy on the server side ;0)
Regards
Mickel.
If you want something a bit harder to spoof than whatever the machine itself can tell you, you'll probably need to provide a USB dongle dedicated for this purpose (not just a flash drive).
For a pretty brain dead test I am using the ProductID code of the OS and the computer name - both extracted from the registry. Not really secure, but its all pretend security anyway.
edit
To answer John's question about what keys I am reading:
SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProductID
SYSTEM\CurrentControlset\Control\ComputerName\ComputerName\ComputerName
How about using the serial number of the harddisk where windows is installed?
The function GetVolumeInformation() will give you such serial number.
To access the ID assigned by the harddisk vendor instead of the ID assigned by Windows, you can use the Win32_PhysicalMedia Class.
To determine the drive where windows is installed, you could expand the variable %windir" by using the function ExpandEnvironmentStrings()
Another option, if your architecture allows, is to use UuidCreate() to generate a random GUID at installation time and save it permanently in the registry. This GUID can then be used as the ID as long as the registry remains. A new registry database is generally considered as a new installation.
A third option is to have a well-known server assigning the IDs. Upon starting up, the software could look up for the ID in the registry and if not found, would contact the server and supply it with its MAC address, hostname, harddisk serial number, Machine SID and any number of indentifyable information (keys).
The server then determines if the client is already registered or not based on the information given. The server could have a relaxed policy and for example only require most of the keys for a match, so that the mechanism would work even in the event of a complete wipe out of the registry and if part (but not all) of the hardware was replaced.
How about using the serial number of a CPU. I remember Microsoft used to provide an api for this that would run the necessary assembler code and give you back all sorts of info about the CPU including serial number. Not sure if it'd work with AMD chips or not, I think it was intel specific.
Surely CPU Id is secure and static enough!!