Optimize Disk Enumeration and File Listing C++ - c++

I am writing C++ code to enumerate whole HDD and drive listing, however it takes more than 15 minutes to complete the disk enumeration of all drives (HDD capacity 500GB). and compile the response in Binary file.
However, I have a 3rd party Executable which gives me the listing of whole Disk in just less than two minutes ... Can you please look into my code and suggest me some performance improvement techniques.
EnumFiles(CString FolderPath, CString SearchParameter,WIN32_FIND_DATAW *FileInfoData)
{
CString SearchFile = FolderPath + SearchParameter;
CString FileName;
hFile = FindFirstFileW(SearchFile, FileInfoData); // \\?\C:\*
if (hFile == INVALID_HANDLE_VALUE)
{
// Error
}
else
{
do
{
FileName = FileInfoData->cFileName;
if (FileInfoData->dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
{
if (! (FileName == L"." || FileName == L".."))
{
// Save the Folder Information
EnumFiles(FolderPath + FileName +(L"\\"), SearchParameter,FileInfoData);
}
}
else
{
// Save the File Parameters
}
} while (FindNextFileW(hFile, FileInfoData));
}
FindClose(hFile);
}

Related

FindNextFile Faild with Space Character

I wrote a simple code to do some operation on every file in every folder (subfolders).
It's perfectly works until the path comes with 'SPACE
' character program crashs and INVALID_HANDLE_VALUE has been called. This is function:
int dirListFiles(char* startDir)
{
HANDLE hFind;
WIN32_FIND_DATAA wfd;
char path[MAX_PATH];
sprintf(path, "%s\\*", startDir);
std::string fileName;
std::string s_path = startDir;
std::string fullPath;
fprintf(stdout, "In Directory \"%s\"\n\n", startDir);
if ((hFind = FindFirstFileA(path, &wfd)) == INVALID_HANDLE_VALUE)
{
printf("FindFirstFIle failed on path = \"%s\"\n", path);
abort();
}
BOOL cont = TRUE;
while (cont == TRUE)
{
if ((strncmp(".", wfd.cFileName, 1) != 0) && (strncmp("..", wfd.cFileName, 2) != 0))
{
if (wfd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
{
sprintf(path, "%s\\%s", startDir, wfd.cFileName);
dirListFiles(path);
}
else
{
fileName = wfd.cFileName;
fullPath = s_path + "\\" + fileName;
std::string fileExt = PathFindExtension(fullPath.c_str());
if (fileExt == ".cpp")
{
... Some operation on file
}
}
}
cont = FindNextFile(hFind, &wfd);
}
FindClose(hFind);
For example, If FindNextFile wants to Open Program Files (x86) which has space between file name cause error and program exit. What Can I do for supporting spaces? What Is Problem?
Space is legal character in directory and file names.
First I propose to modify slightly your code:
if ((hFind = FindFirstFileA(path, &wfd)) == INVALID_HANDLE_VALUE)
{
printf("FindFirstFIle failed on path = \"%s\". Error %d\n", path, GetLastError());
return 0; // I think you shouldn't abort on error, just skip this dir.
}
Now check error codes reported by your program.
For some paths I have got error #5 (access denied). Examples:
c:\Program Files (x86)\Google\CrashReports\*
c:\ProgramData\Microsoft\Windows Defender\Clean Store\*
c:\Windows\System32\config\*
Got two cases with code #123 (Invalid name) for path names unmanageable by FindFirstFileA. To correct this behavior it would be better to use wide version of function FindFirstFileW. See both answers for c++ folder only search. For new Windows applications you should use wide version of API, converting with MultiByteToWideChar and WideCharToMultiByte if needed.
You have also logic error. Code skips all directories and files starting with dot.

readdir on AWS EFS doesn't return all files in directory

After having written many files to a series of folders on EFS (10k or so). Readdir stops returning all of the files in each directory.
I have a C++ application that in one part of its process it generates a lot of files and each file is given a symlink. After that I need to get a list of the file in a folder to then select a subset to rename. When I run the function that gets the list of files, it does not return all the files that are actually there. This code runs fine on my local machine, but on an AWS server with a mounted EFS drive, it stops working after a while.
In order to troubleshoot this issue, I have made my code only write one file at a time. I have also setup my code to use getFiles() to give me a count of how many files there are in a folder after writing each batch of files (around 17 files). When the number of files reaches ~950 files, getFiles() starts listing ~910 files and no longer increments. When its writing files, the files are varied but fairly small (2 bytes - 300K) and its writing about 200 files a second. Each file also has a symlink created to it.
When reading and writing files I am using posix open(), write(), read() and close(). I have verified that I do in fact close all files after reading or writing.
I am trying to figure out:
1. Why is readdir not working? Or why is it not listing all the files?
2. What is different about EFS that could be causing issues?
These are the functions I am using to get the list of files in a folder:
DIR * FileUtil::getDirStream(std::string path) {
bool success = false;
if (!folderExists(path)){
return NULL;
}
DIR * dir = opendir(path.c_str());
success = dir != NULL;
int count = 0;
while(!success){
int fileRetryDelay = BlazingConfig::getInstance()->getFileRetryDelay();
const int sleep_milliseconds = (count+1)*fileRetryDelay;
std::this_thread::sleep_for(std::chrono::milliseconds(sleep_milliseconds));
std::cout<<"Was unable to get Dir stream for "<<path<<std::endl;
dir = opendir(path.c_str());
success = dir != NULL;
count++;
if(count > 6){
break;
}
}
if(success == -1){
std::cout<<"Can't get Dir stream for "<<path<<". Error was: "<<errno<<std::endl;
}
return dir;
}
int FileUtil::getDirEntry(DIR * dirp, struct dirent * & prevDirEntry, struct dirent * & dirEntry){
bool success = false;
if (dirp == NULL){
return -1;
}
int returnCode = readdir_r(dirp, prevDirEntry, &dirEntry);
success = (dirEntry == NULL && returnCode == 0) || dirEntry != NULL;
int count = 0;
while(!success){
int fileRetryDelay = BlazingConfig::getInstance()->getFileRetryDelay();
const int sleep_milliseconds = (count+1)*fileRetryDelay;
std::this_thread::sleep_for(std::chrono::milliseconds(sleep_milliseconds));
std::cout<<"Was unable to get dirent with readdir"<<std::endl;
returnCode = readdir_r(dirp, prevDirEntry, &dirEntry);
success = (dirEntry == NULL && returnCode == 0) || dirEntry != NULL;
count++;
if(count > 6){
break;
}
}
if(success == -1){
std::cout<<"Can't get dirent with readdir. Error was: "<<errno<<std::endl;
}
return returnCode;
}
std::vector<std::string> FileUtil::getFiles(std::string baseFolder){
DIR *dir = getDirStream(baseFolder);
std::vector <std::string> subFolders;
if (dir != NULL) {
struct dirent *prevDirEntry = NULL;
struct dirent *dirEntry = NULL;
int len_entry = offsetof(struct dirent, d_name) + fpathconf(dirfd(dir), _PC_NAME_MAX) + 1;
prevDirEntry = (struct dirent *)malloc(len_entry);
int returnCode = getDirEntry(dir, prevDirEntry, dirEntry);
while (dirEntry != NULL) {
if( dirEntry->d_type == DT_REG || dirEntry->d_type == DT_LNK){
std::string name(dirEntry->d_name);
subFolders.push_back(name);
}
returnCode = getDirEntry(dir, prevDirEntry, dirEntry);
}
free(prevDirEntry);
closedir (dir);
} else {
std::cout<<"Could not open directory err num is"<<errno<<std::endl;
/* could not open directory */
perror ("");
}
return subFolders;
}
The functions were written this way to try to be as robust as possible, since there can be many threads performing file operations, I wanted to be able to have the code retry in case of any failures. Unfortunately when getFiles() returns the wrong result, it does not give me any indication of failure.
Note: when I use readdir as opposed to readdir_r I still have the same issue.

How do I organize the file recursive search with file operations?

I write for myself a small program in C ++, which could perform some operations on files that it finds (in my filter), and that's stumbled on the mechanism of searching for files. At start the program asks the full path, and then by file type recursively looking for them in all subdirectories of the selected directory. The trouble is that after performing an operation (cycle fopen - operation - fclose) can not rename or delete the file. The program simply exits with code 0. It is I sin on the file searching mechanism, as is likely, the function uses image for the time of its implementation and does not delete or rename the file. I tried different options to manage files through WinAPI, std (fstream) and just fopen / fclose. Nothing comes out.
Code snippet:
int main() {
char sPath[MAX_PATH] = "C:\\TmpDir";
char sExt[10] = "doc";
char sEXT[10] = "DOC";
GetFileList(sPath, sExt, sEXT);
printf("Results= %d\n", rez);
system("pause");
return 0;
}
void GetFileList(LPTSTR sPath, LPTSTR sExt, LPTSTR sEXT) {
WIN32_FIND_DATA pFILEDATA;
HANDLE hFile = FindFirstFile(strcat(sPath, "\\*.*"), &pFILEDATA);
sPath[strlen(sPath) - strlen(strstr(sPath, "*.*"))] = '\0';
if (hFile != INVALID_HANDLE_VALUE) {
char * chBuf;
do {
if (strlen(pFILEDATA.cFileName) == 1 && strchr(pFILEDATA.cFileName, '.') != NULL)
if (FindNextFile(hFile, &pFILEDATA) == 0)
break;
if (strlen(pFILEDATA.cFileName) == 2 && strstr(pFILEDATA.cFileName, "..") != NULL)
if (FindNextFile(hFile, &pFILEDATA) == 0)
break;
if (pFILEDATA.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) {
GetFileList(strcat(sPath, pFILEDATA.cFileName), sExt, sEXT);
sPath[strlen(sPath) - strlen(pFILEDATA.cFileName) - 1] = '\0';
} else {
if ((chBuf = strrchr(pFILEDATA.cFileName, '.'))) {
if (strstr(chBuf + 1, sExt) || strstr(chBuf + 1, sEXT)) {
CharToOem(sPath, sPath);
printf("%s", sPath);
OemToChar(sPath, sPath);
CharToOem(pFILEDATA.cFileName, pFILEDATA.cFileName);
printf("%s\n", pFILEDATA.cFileName);
/* Какая-то операция с файлом.
...
Конец операции с файлом. */
rez++;
}
}
}
} while (FindNextFile(hFile, &pFILEDATA));
}
}

PathFileExists returns false when executing application through RemoteApp

My executable built in C++/WinAPI will check for a file placed in the same folder and I use PathFileExists for that. When I run it on a normal computer it finds the file but when I publish the executable on RemoteApp and I run it from Web Access the file is not found. What would I be missing?
// This is the file I want to find (located in the same directory as the EXE)
wstring myfile = L"myfile.conf";
BOOL abspath = FALSE;
// Trying to get the absolute path first
DWORD nBufferLength = MAX_PATH;
wchar_t szCurrentDirectory[MAX_PATH + 1];
if (GetCurrentDirectory(nBufferLength, szCurrentDirectory) == 0) {
szCurrentDirectory[MAX_PATH + 1] = '\0';
} else {
abspath = true;
}
if (abspath) {
// Create the absolute path to the file
myfile = L'\\' + myfile;
myfile = szCurrentDirectory + myfile ;
MessageBox(hWnd, ConvertToUNC(myfile).c_str(), L"Absolute Path", MB_ICONINFORMATION);
} else {
// Get the UNC path
myfile = ConvertToUNC(myfile);
MessageBox(hWnd, myfile.c_str(), L"UNC Path", MB_ICONINFORMATION);
}
// Try to find file
int retval = PathFileExists(myfile.c_str());
if (retval == 1) {
// Do something
} else {
// File not found
}
The ConvertToUNC function is copied from here.
What I see is that, although the executable lies somewhere else, the absolute path is considered to be C:\Windows. I really don't know what is causing this. The server is Windows 2012 R2 and, like I said, applications are run through RemoteApp Web Access. The returned UNC path is just the name of the file (no volume or folder)

In NTFS Compressed Directory, How to read Files compressed and uncompressed size?

In our application, we are generating some large ASCII log files to an Windows NTFS compressed directory. My users want to know both the compressed and uncompressed size of the files on a status screen for the application. We are using Rad Studio 2010 C++ for this application.
I found this nice recursive routine online to read the size of the files on the disk -
__int64 TransverseDirectory(string path)
{
WIN32_FIND_DATA data;
__int64 size = 0;
string fname = path + "\\*.*";
HANDLE h = FindFirstFile(fname.c_str(), &data);
if (h != INVALID_HANDLE_VALUE)
{
do
{
if ((data.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY))
{
if (strcmp(data.cFileName, ".") != 0 && strcmp(data.cFileName, "..") != 0)
{
// We found a sub-directory, so get the files in it too
fname = path + "\\" + data.cFileName;
// recurrsion here!
size += TransverseDirectory(fname);
}
}
else
{
LARGE_INTEGER sz;
sz.LowPart = data.nFileSizeLow;
sz.HighPart = data.nFileSizeHigh;
size += sz.QuadPart;
// ---------- EDIT ------------
if (data.dwFileAttributes & FILE_ATTRIBUTE_COMPRESSED)
{
unsigned long doNotCare;
fname = path + "\\" + data.cFileName;
DWORD lowWordCompressed = GetCompressedFileSize(fname.c_str(),
&doNotCare);
compressedSize += lowWordCompressed;
}
// ---------- End EDIT ------------
}
}
while (FindNextFile(h, &data) != 0);
FindClose(h);
}
return size;
}
But what I cannot find is any information on how to read compressed/uncompressed file size information. Suggestions on where to look?
The Win32 API GetFileSize will return the uncompressed file size. The API GetCompressedFileSize will return the compressed file size.