clusters the file is occupying [duplicate] - c++

I need to get any information about where the file is physically located on the NTFS disk. Absolute offset, cluster ID..anything.
I need to scan the disk twice, once to get allocated files and one more time I'll need to open partition directly in RAW mode and try to find the rest of data (from deleted files). I need a way to understand that the data I found is the same as the data I've already handled previously as file. As I'm scanning disk in raw mode, the offset of the data I found can be somehow converted to the offset of the file (having information about disk geometry). Is there any way to do this? Other solutions are accepted as well.
Now I'm playing with FSCTL_GET_NTFS_FILE_RECORD, but can't make it work at the moment and I'm not really sure it will help.
UPDATE
I found the following function
http://msdn.microsoft.com/en-us/library/windows/desktop/aa364952(v=vs.85).aspx
It returns structure that contains nFileIndexHigh and nFileIndexLow variables.
Documentation says
The identifier that is stored in the nFileIndexHigh and nFileIndexLow members is called the file ID. Support for file IDs is file system-specific. File IDs are not guaranteed to be unique over time, because file systems are free to reuse them. In some cases, the file ID for a file can change over time.
I don't really understand what is this. I can't connect it to the physical location of file. Is it possible later to extract this file ID from MFT?
UPDATE
Found this:
This identifier and the volume serial number uniquely identify a file. This number can change when the system is restarted or when the file is opened.
This doesn't satisfy my requirements, because I'm going to open the file and the fact that ID might change doesn't make me happy.
Any ideas?

Use the Defragmentation IOCTLs. For example, FSCTL_GET_RETRIEVAL_POINTERS will tell you the extents which contain file data.

Related

hardlink multiple file to one file

I have many files in a folder. I want to concatenate all these files to a single file. For example cat * > final_file;
But this will increase disk space. Is there is a way where I can hardlink all the files to final_file? For example ln * final_file.
This is not possible using links.
If you really need this kind of feature and can not afford to create one large file you could go for a custom file system driver. FUSE will allow you to write a simple file system driver which runs in the user space and allows to access the files as they were one large file.
You could also write a custom block device (e.g. by emulating the NBD "Network Block Device" protocol) which combines two or more files into one large block device.
Getting to know the concrete use case would help to give a better answer.
No. Hardlinking links 2 files, nothing more. The filesystem does not support that at an underlying level.

Get the oldest file in a directory

my problem is that I want to store the five oldest files from a directory in a list. Since the software should be safe against time changes done by the user I'm looking for a possibility to extract this information without using the file time. Is there any internal counter implemented in windows that can be extracted from the files meta-data? Or is it possible to set such a counter during the file creation (e.g. in a specific field of the meta-information)?
Best regards
NouGHt
Are you saying you don't want to use "the file time" in case users
have modified the files since they were created?
If that is the case, your problem may solved with the information that
Windows stores three distinct
FILETIMEs
for each file: 1) the file's creation time,
2) the file's last access time, 3) the file's last write time.
You would want the first of these. You can get all of them by calling
the win api
GetFileAttributesEx
function passing the file name. The
WIN32_FILE_ATTRIBUTE_DATA
structure that is returned to you contains all three times.

Fastest way to erase part of file in C++

I wonder which is the fastest way to erase part of a file in c++.
I know the way of write a second file and skip the part you want. But i think is slow when you work with big files.
What about database system, how they remove records so fast?
A database keeps an index, with metadata listing which parts of the file are valid and which aren't. To delete data, just the index is updated to mark that section invalid, and the main file content doesn't have to be changed at all.
Database systems typically just mark deleted records as deleted, without physically recovering the unused space. They may later reuse the space occupied by deleted records. That's why they can delete parts of a database quickly.
The ability to quickly delete a portion of a file depends on the portion of the file you wish to delete. If the portion of the file that you are deleting is at the end of the file, you can simply truncate the file, using OS calls.
Deleting a portion of a file from the middle is potentially time consuming. Your choice is to either move the remainder of the file forward, or to copy the entire file to a new location, skipping the deleted portion. Either way could be time consuming for a large file.
The fastest way I know is to open data file as a Persisted memory-mapped file and simple move over the part you don't need. Would be faster than moving to second file but still not too fast with big files.

Creating metadata for binary file

I have a binary file I'm creating in C++, I'm tasked to create a metadata format to describe the data that it can be read in Java using the metadata.
One record in the data file has Time, then 64 bytes of data, then a CRC, then a new line delimiter. How should the metadata look to describe what is in the 64 bytes? I've never created a metadata file before.
Probably you want to generate a file which describes how many entries there are in the data file, and maybe the time range. Depending on what kind of data you have, the metadata might contain either a per-record entry (RawData, ImageData, etc.) or one global entry (data stored as float.)
It totally depends on what the Java-code is supposed to do, and what use-cases you have. If you want to know whether to open the file at all depending on date, that should be part of the metadata, etc.
I think that maybe you have the design backwards.
First, think about the end.
What result do you want to see? A Java program will create some kind of .csv file?
What kind(s) of file(s)?
What information will be needed to do this?
Then design the metadata to provide the information that is needed to perform the necessary tasks (and any extra tasks you anticipate).
Try to make the metadata extensible so that adding extra metadata in the future will not break the programs that you are writing now. e.g. if the Java program finds metadata it doesn't understand, it just skips it.

How to get file path from NTFS index number?

I have dwVolumeSerialNumber, nFileIndexHigh, nFileIndexLow values obtained from a call to GetFileInformationByHandle. How can I get file path from these values?
Because of hard links, there may be multiple paths that map to the given VolumeSerialNumber and FileIndex. To find all such paths:
Iterate volumes to find one whose root directory matches dwVolumeSerialNumber
Recursively enumerate all directories on the volume, skipping symbolic links and reparse points, to find all files with matching nFileIndexHigh and nFileIndexLow.
This can be quite time-consuming. If you really need to do this as fast as possible and your filesystem is NTFS, you can raw read the entire MFT into a buffer and parse it yourself. This will get all directories that fit inside an MFT entry in one fell swoop. The rest of the directories can be read through the OS or also through raw reads, depending on the amount of work you want to do. But any way you look at it, this is a lot of work and doesn't even apply to FAT, FAT32 or any other filesystem.
A better solution is probably to hang onto the original path if at all possible.
This MSDN article shows how to get the path from a file handle.
You use OpenFileById to open a file given its file ID but you also need an open file elsewhere on the same volume, I assume to get the volume serial number.
This blog posting raises an interesting issue that you need to pass in 24 for the structure size (worked out by looking at assembly code).
I leave it as an interesting exercise (I couldn't find an easy answer) how you go from a dwVolumeSerialNumber to having a valid other handle open for that volume or a file on that volume, but maybe you already have enough information for your case. One possibility is to iterate all mounted volumes calling GetVolumeInformation to find the one with matching serial number.
Note: If you don't have the file open then you may not be able to rely on the nFileIndexHigh/Low combo (aka file ID) as described in the BY_HANDLE_FILE_INFORMATION Structure notes, which warns it can change for FAT systems, but In the NTFS file system, a file keeps the same file ID until it is deleted.
Note: The original question had an error in it. Now that the question has been fixed this answer no longer applies.
In general you can't. The information you retrieved just tells you what disk the file is on and how big it is. It does not provide enough information to identify the actual file. Specifically:
dwVolumeSerialNumber identifies the volume, and
nFileSizeHigh and nFileSizeLow give you the size of the file
If the file happens to be the only file on that volume that is that exact size, you could search the volume for a file of that size. But in general this is both expensive and unreliable, so I don't recomment it.