Does anybody know how to calculate the amount of space occupied by the file system alone?
I am trying to calculate how much space files and directories occupy in a disk without iterating thru the entire disk.
this is a sample in C++:
ULARGE_INTEGER freeBytesAvailable, totalNumberOfBytes, totalNumberOfFreeBytes;
GetDiskFreeSpaceEx(NULL, &freeBytesAvailable, &totalNumberOfBytes, &totalNumberOfFreeBytes);
mCurrentProgress = 0;
mTotalProgress = totalNumberOfBytes.QuadPart - totalNumberOfFreeBytes.QuadPart;
But the problem is that I need to exclude the size of the file system but I have no idea if it is possible or if there is an API to get this info.
Doesn't make sense. On NTFS, small files are stored in the directory. I mean literally, they're inlined. The same sector that holds the filename also holds the file contents. Therefore, you can't count that sector as either "used for files" or "used for file system overhead".
Related
I simulated 50 million pieces of data to view the disk space occupied by the data files. Value2 and str2 will be written every 10 times.The simulation code is as follows:
while (j<=50000000) {
sender.metric("xush").tag("tagName", "tag1").field("value1", 100).field("str1", "hello");
if (j % 10 == 0) {
sender.field("value2", 100).field("str2", "hello");
}
sender.$(beginTs);
sender.flush();
j++;
}
The file disk usage is as follows:
[root#idb23 2021-11-01T00]# du -hs ./*
382M ./timestamp.d
191M ./tagName.d
668M ./str1.d
382M ./str1.i
239M ./str2.d
382M ./str2.i
191M ./value1.d
191M ./value2.d
I have the following questions:
From the official doc i know that the .d file is a column_file and .k file is index_file, so what is the .i file used for?
It seems that null will also be appended to the column file, and it takes up as much space as integer 100?
The value1 is always 100 unchanged, but the column file will additionally store each piece of data? Will this design cause a waste of space? Or is my usage method wrong?
Questdb seems to take up much larger hard disk than other tsdb such as iotdb.Does the data file have a compression mechanism?
QuestDB does not compress data. All values in the column take equal space, including NULL values. Compression in theory possible by using compressed File System but it is not documented. The only way to reduce the footprint would be use smaller data types and SYMBOL type for repeatable strings.
.i file contains offsets of variable size column - of STRING type in your case.
I want to delete first 10 files of the mounted drive. This drive is Unix system drive. I have written code which working fine for local drive but not mounted drive. Its deleting randomly but not sequentially. I have written code in MFC C++. Please Let me know if any one knows the solution. The code is like below.
char fileFound[256];
WIN32_FIND_DATA info;
HANDLE hp=INVALID_HANDLE_VALUE;
int count=10;
swprintf_s(fileFound,256,"%s\\*.*","G:\\foldername");
hp=FindFirstFile(fileFound,&info);
do
{
swprintf_s(fileFound,256,"%s\\%s","G:\foldername",info.cFileName);
DeleteFile(fileFound);
count--;
}while(FindNextFile(hp,&info)&&count);
FindClose(hp);
Its deleting randomly but not sequentially.
This behavior is documented:
[...] FindFirstFile does no sorting of the search results.
As well as here:
The order in which the search returns the files, such as alphabetical order, is not guaranteed, and is dependent on the file system. If the data must be sorted, the application must do the ordering after obtaining all the results.
If you need to delete the first n files from a set of files, you need to gather the entire set of files, sort the set based on an arbitrary predicate, and then perform an action on the first n items.
Say, I have a file of an arbitrary length S and I need to remove first of its N bytes (where N is much less than S.) What is the most efficient way to do it on Windows?
I'm looking for a WinAPI to do this, if one is available.
Otherwise, what are my options -- to load it into RAM and then re-write the existing file with the remainder of data? (In this case I cannot be sure that the PC has enough RAM?) Or write the remainder of file data into a new file, erase the old one, and rename the new file into the old one. (In this case what to do if any of these steps fail? Plus how about defragmentation that this method causes on disk?)
There is no general way to do this built into the OS. There are theoretical ways to edit the file system's data structures underneath the operating system on sector or cluster boundaries, but this is different for each file system, and would need to violate any security model.
To accomplish this you can read in the data starting at byte N in chunks of say 4k, and then write them back out starting at byte zero, and then use the file truncate command (setendoffile) to set the new smaller end of file when you are finished copying the data.
The most efficient method to delete data at the beginning of the file is to modify the directory entry, on the hard drive, that tells where the data starts. Again, this is the most efficient method.
Note: This may not be possible, if the data must start on a new boundary. If this is the case, you may have to write the remainder data on the sector(s) to new sector(s), essentially moving the data.
The preferred method is to write a new file that starts with data copied after the deleted area.
Moving files on same drive is faster than copying files since data isn't duplicated; only the file pointer, (symbolic)links & file allocation/index table is updated.
The move command in CMD could be modified to allow user to set file start & end markers, effecting file truncation without copying file data, saving valuable time & RAM/Disk overheads.
Alternative would be to send the commands direct to the device/disk driver bypassing the Operating System as long as OS knows where to find the file & file properties eg. file size, name & sectors occupied on disk.
My questions is: how would it be possible to get the file disk offset if this file (very important) is small (less than one cluster, only a few bytes).
Currently I use this Windows API function:
DeviceIOControl(FileHandle, FSCTL_GET_RETRIEVAL_POINTERS, #InBuffer, SizeOf(InBuffer), #OutBuffer, SizeOf(OutBuffer), Num, Nil);
FirsExtent.Start := OutBuffer.Pair[0].LogicalCluster ;
It works perfectly with files bigger than a cluster but it just fails with smaller files, as it always returns a null offset.
What is the procedure to follow with small files ? where are they located on a NTFS volume ? Is there an alternative way to know a file offset ? This subtility doesn't seem to be documented anywhere.
Note: the question is tagged as Delphi but C++ samples or examples would be appreciated as well.
The file is probably resident, meaning that its data is small enough to fit in its MFT entry. See here for a slightly longer description:
http://www.disk-space-guide.com/ntfs-disk-space.aspx
So you'd basically need to find the location of the MFT entry in order to know where the data is on disk. Do you control this file? If so the easiest thing to do is make sure that it's always larger than the size of an MFT entry (not a documented value, but you could always just do 4K or something).
I want to get the free space on a compressed disk to show it to a end user. I'm using C++, MFC on Windows 2000 and later. The Windows API offers the GetDiskFreeSpaceEx() function.
However, this function seems to return the "uncompressed" sized of the data. This cause me some problem.
For example :
- Disk size is 100 GB
- Data size is 90 GB
- Compressed data size is 80 GB
The user will see that the disk is 90% full, but in reality, it is only 80% full.
EDIT
As Gleb pointed out, the function is returning the good information.
So here is the new question : is there a way to get both the compressed size and the uncompressed one?
I think you would have to map over all files, query with GetFileSize() and GetCompressedFileSize() and sum them up. Use GetFileAttributes() to know if a file is compressed or not, in case only parts of the whole volume is compressed, which might certainly be the case.
Hum, so that's not a trivial
operation. I suppose I must implement
some mechanism to avoid querying all
files size all the time. I mean ... if
I have a 800GB hard drive, it could
take some very long time to get all
file size.
True.
Perhaps start off by a full scan (application startup) and populate your custom data structure, e.g. a hash/map from file name to file data struct/class, then poll the drive with FindFirstChangeNotification() and update your internal structure accordingly.
You might also want to read about "Change Journals". I have never used them myself so don't know how they work, but might be worth checking out.
The function returns the amount of free space correctly. It can be demonstrated by using this simple program.
#include <stdio.h>
#include <windows.h>
void main() {
ULARGE_INTEGER p1, p2, p3;
GetDiskFreeSpaceEx(".", &p1, &p2, &p3);
printf("%llu %llu %llu\n", p1, p2, p3);
}
After compressing a previously uncompressed directory the free space grows.
So what are you talking about?