Get the size of a directory (not its content) - c++

(please before dismissing this as already answered, read the full question)
In C++ we can get the file size of a regular file by running std::filesystem::file_size(PATH); But this function does not work on directories, which is my problem.
I am in a situation where I need to know the size of directory "inode", in (most) linux systems the standard size of a directory is 4kB or block size:
$:~/tmp/test$ mkdir ex
$:~/tmp/test$ ls -l
total 4
drwxrwxr-x 2 secret secret 4096 Oct 15 08:43 ex
These 4kB inlcudes space for having a "list" of the file in that directory.
But if the number of files in the directory becomes significantly large, the size of the folder can increase (which is where I am).
I need to be able to track this increase.
So my question is that besides calling ls -l or du from C++ is there a C++-native way of getting the size of the directory?
I am aware that the reason it does not work with std::filesystem::file_size(path) is due file systems different ways of representing directories.

https://en.cppreference.com/w/cpp/filesystem/file_size
For a regular file p, returns the size determined as if by reading the
st_size member of the structure obtained by POSIX stat (symlinks are
followed)
The result of attempting to determine the size of a directory (as well
as any other file that is not a regular file or a symlink) is
implementation-defined.
file_size on a directory may actually work in some implementations, but I guess that yours doesn't. There doesn't appear to be a pure C++ alterantive, but you can call the POSIX stat function yourself. That does work on directories and reports the number you want.

Related

Get size of a directory on a disk in C++

Following is a code snippet I used to get directory size in c++:
boost::system::error_code ec;
boost::filesystem::space_info si =
boost::filesystem::space(path, ec);
if (ec.value() == 0) {
cout << si.capacity - si.available;
}
But the above snippet seems to be giving the entire disk size instead of a particular directory size on a disk since I passed 2 different directory paths to above but both of them gave the same answer. Can someone help me with finding what's wrong with this or give me another alternative for getting the directory size in c++? TIA!
Boost Reference states:
The value of the space_info object is determined as if by using ISO/IEC 9945 statvfs() to obtain an ISO/IEC 9945 struct statvfs.
Opengroup Reference states:
The statvfs() function shall obtain information about the file system containing the file named by path.
man page states:
The function statvfs() returns information about a mounted filesystem. path is the pathname of any file within the mounted filesystem.
Based on your comment:
Have a look at the example of std::filesystem::directory_entry::file_size
Or better: Overview of Filesystem library
If its still not clear:
iterate over a directory
if the entry is of your interest, add the size to the total amount
if entry is another directory, recursively enumerate the size (hint: std::filesystem::recursive_directory_iterator)

More explanation on `statfs64`

According to documentation, the structure fields explanation follows:
struct statfs {
__SWORD_TYPE f_type; /* type of file system (see below) */
__SWORD_TYPE f_bsize; /* optimal transfer block size */
fsblkcnt_t f_blocks; /* total data blocks in file system */
fsblkcnt_t f_bfree; /* free blocks in fs */
fsblkcnt_t f_bavail; /* free blocks available to
unprivileged user */
fsfilcnt_t f_files; /* total file nodes in file system */
fsfilcnt_t f_ffree; /* free file nodes in fs */
fsid_t f_fsid; /* file system id */
__SWORD_TYPE f_namelen; /* maximum length of filenames */
__SWORD_TYPE f_frsize; /* fragment size (since Linux 2.6) */
__SWORD_TYPE f_spare[5];
};
Does "total file nodes in file system" mean how much existing files we have? Does it include directories and links?
What does mean "free file nodes in fs"?
What is f_spare?
In some Linux forks (for example, in Android) I see that f_spare size is 4, and additional field f_flags is defined. What flags are defined for f_flags?
Is f_fsid just random number that uniquely identifies the file system, or what is it?
Does "total file nodes in file system" mean how much existing files we have? Does it include directories and links?
Almost. Yes, it includes directories and softlinks, but two files can can share the same inode. In that case, they are hardlinked and share the same space on hard disk, but is viewed as different files in the filesystem. Example to illustrate:
% echo Hello > test1.txt
% ln test1.txt test2.txt
% ls -i test1.txt test2.txt
14946320 test1.txt 14946320 test2.txt
The number you'll see to the left of the filenames are the inodes (you'll have a different number than in my example). As you can see, they have the same inode. If you make a change to one file, the same change will be visible through the other file.
What does mean "free file nodes in fs"?
A filesystem often have an upper limit of inodes it can keep track of. The actual type fsfilcnt_t sets one limit (18446744073709551615 on my system), but it's most probably something lower. Unless you use your filesystem in very special ways, this limit is usually not a problem.
What is f_spare? In some Linux forks (for example, in Android) I see that f_spare size is 4, and additional field f_flags is defined.
f_spare is just spare bytes to pad the struct itself. The padding bytes are reserved for future use. If one __fsword_t of info is added to the struct in the future, they'll remove one spare __fsword_t from f_spare. My system only has 4 spare __fsword_ts for example (32 bytes).
What flags are defined for f_flags?
The mount flags defined for your system may be different, but my man statfs64 page shows these:
ST_MANDLOCK
Mandatory locking is permitted on the filesystem (see fcntl(2)).
ST_NOATIME
Do not update access times; see mount(2).
ST_NODEV
Disallow access to device special files on this filesystem.
ST_NODIRATIME
Do not update directory access times; see mount(2).
ST_NOEXEC
Execution of programs is disallowed on this filesystem.
ST_NOSUID
The set-user-ID and set-group-ID bits are ignored by exec(3) for executable files on this filesystem
ST_RDONLY
This filesystem is mounted read-only.
ST_RELATIME
Update atime relative to mtime/ctime; see mount(2).
ST_SYNCHRONOUS
Writes are synched to the filesystem immediately (see the description of O_SYNC in open(2)).
ST_MANDLOCK
Mandatory locking is permitted on the filesystem (see fcntl(2)).
ST_NOATIME
Do not update access times; see mount(2).
ST_NODEV
Disallow access to device special files on this filesystem.
ST_NODIRATIME
Do not update directory access times; see mount(2).
ST_NOEXEC
Execution of programs is disallowed on this filesystem.
ST_NOSUID
The set-user-ID and set-group-ID bits are ignored by exec(3) for executable files on this filesystem
ST_RDONLY
This filesystem is mounted read-only.
ST_RELATIME
Update atime relative to mtime/ctime; see mount(2).
ST_SYNCHRONOUS
Writes are synched to the filesystem immediately (see the description of O_SYNC in open(2)).
Is f_fsid just random number that uniquely identifies the file system, or what is it?
Directly from the man statfs64 page: "Nobody knows what f_fsid is supposed to contain (but see below)" and further below:
The f_fsid field
Solaris, Irix and POSIX have a system call statvfs(2) that returns a struct statvfs (defined in ) containing an unsigned long f_fsid. Linux, SunOS, HP-UX, 4.4BSD have a system call statfs() that returns a struct statfs (defined in ) containing a fsid_t f_fsid, where fsid_t is defined as struct { int val[2]; }. The same holds for FreeBSD, except that it uses the include file .
The general idea is that f_fsid contains some random stuff such that the pair (f_fsid,ino) uniquely determines a file. Some operating systems use (a variation on) the device number, or the device number combined with the filesystem type. Several operating systems restrict giving out the f_fsid field to the superuser only (and zero it for unprivileged users), because this field is used in the filehandle of the filesystem when NFS-exported, and giving it out is a security concern.
Under some operating systems, the fsid can be used as the second argument to the sysfs(2) system call.

iOS file size during write using only C/C++ APIs

Purpose: I am monitoring file writes in a particular directory on iOS using BSD kernel queues, and poll for file sizes to determine write ends (when the size stops changing). The basic idea is to refresh a folder only after any number of file copies coming from iTunes sync. I have a completely working Objective-C implementation for this but I have my reasons for needing to implement the same thing in C++ only.
Problem: The one thing stopping me is that I can't find a C or C++ API that will get the correct file size during a write. Presumably, one must exist because Objective-C's [NSFileManager attributesOfItemAtPath:] seems to work and we all know it is just calling a C API underneath.
Failed Solutions:
I have tried using stat() and lstat() to get st_size and even st_blocks for allocated block count, and they return correct sizes for most files in a directory, but when there is a file write happening that file's size never changes between poll intervals, and every subsequent file iterated in that directory have a bad size.
I have tried using fseek and ftell but they are also resulting in a very similar issue.
I have also tried modified date instead of size using stat() and st_mtimespec, and the date doesn't appear to change during a write - not that I expected it to.
Going back to NSFileManager's ability to give me the right values, does anyone have an idea what C API call that [NSFileManager attributesOfItemAtPath:] is actually using underneath?
Thanks in advance.
Update:
It appears that this has less to do with in-progress write operations and more with specific files. After closer inspection there are some files which always return a size, and other files that never return a size when using the C API (but will work fine with the Objective-C API). Even creating a copy of the "good" files the C API does not want to give a size for the copy but works fine with the original "good" file. I have both failures and successes with text (xml) files and binary (zip) files. I am using iTunes to add these files to the iPad's app's Documents directory. It is an iPad Mini Retina.
Update 2 - Answer:
Probably any of the above file size methods will work, if your path isn't invisibly trashed, like mine was. See accepted answer on why the path was trashed.
Well this weird behavior turned out to be a problem with the paths, which result in strings that will print normally, but are likely trashed in memory enough that file descriptors sometimes didn't like it (thus only occurring in certain file paths). I was using the dirent API to iterate over the files in a directory and concatenating the dir path and file name erroneously.
Bad Path Concatenation: Obviously (or apparently not-so-obvious at runtime) str-copying over three times is not going to end well.
char* fullPath = (char*)malloc(strlen(dir) + strlen(file) + 2);
strcpy(fullPath, dir);
strcpy(fullPath, "/");
strcpy(fullPath, file);
long sizeBytes = getSize(fullPath);
free(fullPath);
Correct Path Concatenation: Use proper str-concatenation.
char* fullPath = (char*)malloc(strlen(dir) + strlen(file) + 2);
strcpy(fullPath, dir);
strcat(fullPath, "/");
strcat(fullPath, file);
long sizeBytes = getSize(fullPath);
free(fullPath);
Long story short, it was sloppy work on my part, via two typos.

Calculate NTFS and FAT file sytem size in Windows

Does anybody know how to calculate the amount of space occupied by the file system alone?
I am trying to calculate how much space files and directories occupy in a disk without iterating thru the entire disk.
this is a sample in C++:
ULARGE_INTEGER freeBytesAvailable, totalNumberOfBytes, totalNumberOfFreeBytes;
GetDiskFreeSpaceEx(NULL, &freeBytesAvailable, &totalNumberOfBytes, &totalNumberOfFreeBytes);
mCurrentProgress = 0;
mTotalProgress = totalNumberOfBytes.QuadPart - totalNumberOfFreeBytes.QuadPart;
But the problem is that I need to exclude the size of the file system but I have no idea if it is possible or if there is an API to get this info.
Doesn't make sense. On NTFS, small files are stored in the directory. I mean literally, they're inlined. The same sector that holds the filename also holds the file contents. Therefore, you can't count that sector as either "used for files" or "used for file system overhead".

Efficiently List All Sub-Directories in a Directory

Please see edit with advice taken so far...
I am attempting to list all the directories(folders) in a given directory using WinAPI & C++.
Right now my algorithm is slow & inefficient:
- Use FindFirstFileEx() to open the folder I am searching
- I then look at every file in the directory(using FindNextFile()); if its a directory file then I store its absolute path in a vector, if its just a file I do nothing.
This seems extremely inefficient because I am looking at every file in the directory.
Is there a WinAPI function that I can use that will tell me all the sub-directories in a given directory?
Do you know of an algorithm I could use to efficiently locate & identify folders in a directory(folder)?
EDIT:
So after taking the advice I have searched using FindExSearchLimitToDirectories but for me it still prints out all the files(.txt, etc.) & not just folders. Am I doing something wrong?
WIN32_FIND_DATA dirData;
HANDLE dir = FindFirstFileEx( "c:/users/soribo/desktop\\*", FindExInfoStandard, &dirData,
FindExSearchLimitToDirectories, NULL, 0 );
while ( FindNextFile( dir, &dirData ) != 0 )
{
printf( "FileName: %s\n", dirData.cFileName );
}
In order to see a performance boost there must be support at the file system level. If this does not exist then the system must enumerate every single object in the directory.
In principle, you can use FindFirstFileEx specifying the FindExSearchLimitToDirectories flag. However, the documentation states (emphasis mine):
This is an advisory flag. If the file system supports directory filtering, the function searches for a file that matches the specified name and is also a directory. If the file system does not support directory filtering, this flag is silently ignored.
If directory filtering is desired, this flag can be used on all file systems, but because it is an advisory flag and only affects file systems that support it, the application must examine the file attribute data stored in the lpFindFileData parameter of the FindFirstFileEx function to determine whether the function has returned a handle to a directory.
However, from what I can tell, and information is sparse, FindExSearchLimitToDirectories flag is not widely supported on desktop file systems.
Your best bet is to use FindFirstFileEx with FindExSearchLimitToDirectories. You must still perform your own filtering in case you meet a file system that doesn't support directory filtering at file system level. If you get lucky and hit upon a file system that does support it then you will get the performance benefit.
If you're using FindFirstFileEx, then you should be able to specify the _FINDEX_SEARCH_OPS::FindExSearchLimitToDirectories option (to be used as the fSearchOp param in FindFirstFileEx) to limit the first search (and any subsequent FindNextFile()) calls to directories.