In NTFS Compressed Directory, How to read Files compressed and uncompressed size? - c++

In our application, we are generating some large ASCII log files to an Windows NTFS compressed directory. My users want to know both the compressed and uncompressed size of the files on a status screen for the application. We are using Rad Studio 2010 C++ for this application.
I found this nice recursive routine online to read the size of the files on the disk -
__int64 TransverseDirectory(string path)
{
WIN32_FIND_DATA data;
__int64 size = 0;
string fname = path + "\\*.*";
HANDLE h = FindFirstFile(fname.c_str(), &data);
if (h != INVALID_HANDLE_VALUE)
{
do
{
if ((data.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY))
{
if (strcmp(data.cFileName, ".") != 0 && strcmp(data.cFileName, "..") != 0)
{
// We found a sub-directory, so get the files in it too
fname = path + "\\" + data.cFileName;
// recurrsion here!
size += TransverseDirectory(fname);
}
}
else
{
LARGE_INTEGER sz;
sz.LowPart = data.nFileSizeLow;
sz.HighPart = data.nFileSizeHigh;
size += sz.QuadPart;
// ---------- EDIT ------------
if (data.dwFileAttributes & FILE_ATTRIBUTE_COMPRESSED)
{
unsigned long doNotCare;
fname = path + "\\" + data.cFileName;
DWORD lowWordCompressed = GetCompressedFileSize(fname.c_str(),
&doNotCare);
compressedSize += lowWordCompressed;
}
// ---------- End EDIT ------------
}
}
while (FindNextFile(h, &data) != 0);
FindClose(h);
}
return size;
}
But what I cannot find is any information on how to read compressed/uncompressed file size information. Suggestions on where to look?

The Win32 API GetFileSize will return the uncompressed file size. The API GetCompressedFileSize will return the compressed file size.

Related

How to get all file path from C:/ drive?

I'm trying to retrieve all files from the root (C:/) in C++
First of all, I retrieve all logical drives in the computer, then I use the std::filesystem library (specifically the recursive_directory_iterator function in order to loop in directories)
DWORD dwSize = MAX_PATH;
char szLogicalDrives[MAX_PATH] = { 0 };
DWORD dwResult = GetLogicalDriveStrings(dwSize, szLogicalDrives);
if (dwResult > 0 && dwResult <= MAX_PATH)
{
char* szSingleDrive = szLogicalDrives;
while (*szSingleDrive)
{
szSingleDrive[strlen(szSingleDrive) - 1] = 0;
printf(szSingleDrive);
for (fs::directory_entry p : std::filesystem::recursive_directory_iterator(szSingleDrive))
{
string filePath = p.path().string();
// Vérification du type de l'objet
if (fs::is_regular_file(p.path()))
{
cout << filePath << endl;
}
// get the next drive
szSingleDrive += strlen(szSingleDrive) + 1;
}
}
}
However, the output I get is the path of my project.
Eg : C:x64\Debug\myProject.exe
Desired output : C:\Users, C:\Windows, C:\Program Files...
In order to resolve the issue I had to launch VS 2019 in admin (or launch the .exe in admin) + disable Windows Defender.
To avoid UAC exception, I also added skip_permission_denied in filesystem option.
However, my program still encounter "Sharing Violation error"

readdir on AWS EFS doesn't return all files in directory

After having written many files to a series of folders on EFS (10k or so). Readdir stops returning all of the files in each directory.
I have a C++ application that in one part of its process it generates a lot of files and each file is given a symlink. After that I need to get a list of the file in a folder to then select a subset to rename. When I run the function that gets the list of files, it does not return all the files that are actually there. This code runs fine on my local machine, but on an AWS server with a mounted EFS drive, it stops working after a while.
In order to troubleshoot this issue, I have made my code only write one file at a time. I have also setup my code to use getFiles() to give me a count of how many files there are in a folder after writing each batch of files (around 17 files). When the number of files reaches ~950 files, getFiles() starts listing ~910 files and no longer increments. When its writing files, the files are varied but fairly small (2 bytes - 300K) and its writing about 200 files a second. Each file also has a symlink created to it.
When reading and writing files I am using posix open(), write(), read() and close(). I have verified that I do in fact close all files after reading or writing.
I am trying to figure out:
1. Why is readdir not working? Or why is it not listing all the files?
2. What is different about EFS that could be causing issues?
These are the functions I am using to get the list of files in a folder:
DIR * FileUtil::getDirStream(std::string path) {
bool success = false;
if (!folderExists(path)){
return NULL;
}
DIR * dir = opendir(path.c_str());
success = dir != NULL;
int count = 0;
while(!success){
int fileRetryDelay = BlazingConfig::getInstance()->getFileRetryDelay();
const int sleep_milliseconds = (count+1)*fileRetryDelay;
std::this_thread::sleep_for(std::chrono::milliseconds(sleep_milliseconds));
std::cout<<"Was unable to get Dir stream for "<<path<<std::endl;
dir = opendir(path.c_str());
success = dir != NULL;
count++;
if(count > 6){
break;
}
}
if(success == -1){
std::cout<<"Can't get Dir stream for "<<path<<". Error was: "<<errno<<std::endl;
}
return dir;
}
int FileUtil::getDirEntry(DIR * dirp, struct dirent * & prevDirEntry, struct dirent * & dirEntry){
bool success = false;
if (dirp == NULL){
return -1;
}
int returnCode = readdir_r(dirp, prevDirEntry, &dirEntry);
success = (dirEntry == NULL && returnCode == 0) || dirEntry != NULL;
int count = 0;
while(!success){
int fileRetryDelay = BlazingConfig::getInstance()->getFileRetryDelay();
const int sleep_milliseconds = (count+1)*fileRetryDelay;
std::this_thread::sleep_for(std::chrono::milliseconds(sleep_milliseconds));
std::cout<<"Was unable to get dirent with readdir"<<std::endl;
returnCode = readdir_r(dirp, prevDirEntry, &dirEntry);
success = (dirEntry == NULL && returnCode == 0) || dirEntry != NULL;
count++;
if(count > 6){
break;
}
}
if(success == -1){
std::cout<<"Can't get dirent with readdir. Error was: "<<errno<<std::endl;
}
return returnCode;
}
std::vector<std::string> FileUtil::getFiles(std::string baseFolder){
DIR *dir = getDirStream(baseFolder);
std::vector <std::string> subFolders;
if (dir != NULL) {
struct dirent *prevDirEntry = NULL;
struct dirent *dirEntry = NULL;
int len_entry = offsetof(struct dirent, d_name) + fpathconf(dirfd(dir), _PC_NAME_MAX) + 1;
prevDirEntry = (struct dirent *)malloc(len_entry);
int returnCode = getDirEntry(dir, prevDirEntry, dirEntry);
while (dirEntry != NULL) {
if( dirEntry->d_type == DT_REG || dirEntry->d_type == DT_LNK){
std::string name(dirEntry->d_name);
subFolders.push_back(name);
}
returnCode = getDirEntry(dir, prevDirEntry, dirEntry);
}
free(prevDirEntry);
closedir (dir);
} else {
std::cout<<"Could not open directory err num is"<<errno<<std::endl;
/* could not open directory */
perror ("");
}
return subFolders;
}
The functions were written this way to try to be as robust as possible, since there can be many threads performing file operations, I wanted to be able to have the code retry in case of any failures. Unfortunately when getFiles() returns the wrong result, it does not give me any indication of failure.
Note: when I use readdir as opposed to readdir_r I still have the same issue.

DEBUG : MP3Player.h error loading a new music path : abort()

I'm using a library which is called MP3Player.h and can be found here. I use it for helping me writing my own MP3 player in windows with c++ .
Everything is fine until today, i have progress and when i'm debugging an error abort() pop up .
screenies :
error
The error reffer to a function called in MP3Player.h :
// Sequence of call to get the MediaType
// WAVEFORMATEX for mp3 can then be extract from MediaType
mp3Assert(wmSyncReader->QueryInterface(&wmProfile));
mp3Assert(wmProfile->GetStream(0, &wmStreamConfig));
mp3Assert(wmStreamConfig->QueryInterface(&wmMediaProperties));
// Retrieve sizeof MediaType
mp3Assert(wmMediaProperties->GetMediaType(NULL, &sizeMediaType));
// Retrieve MediaType
WM_MEDIA_TYPE* mediaType = (WM_MEDIA_TYPE*)LocalAlloc(LPTR, sizeMediaType);
mp3Assert(wmMediaProperties->GetMediaType(mediaType, &sizeMediaType));
// Check that MediaType is audio
assert(mediaType->majortype == WMMEDIATYPE_Audio);
// assert(mediaType->pbFormat == WMFORMAT_WaveFormatEx);
// Check that input is mp3
WAVEFORMATEX* inputFormat = (WAVEFORMATEX*)mediaType->pbFormat;
assert(inputFormat->wFormatTag == WAVE_FORMAT_MPEGLAYER3);
assert(inputFormat->nSamplesPerSec == 44100);
assert(inputFormat->nChannels == 2); // CRASHING HERE
// Release COM interface
// wmSyncReader->Close();
wmMediaProperties->Release();
wmStreamConfig->Release();
wmProfile->Release();
wmHeaderInfo->Release();
wmSyncReader->Release();
// Free allocated mem
LocalFree(mediaType);
And there is the code i'm using in my main :
int main(void){
MP3Player player;
DIR* rep = NULL;
int timeRead = 0;
const size_t concatenated_size = 512;
char concatenated[concatenated_size];
double time;
struct dirent* fichierLu = NULL;
rep = opendir("C:/Users/paul/Music/Playlisty");
if (rep == NULL)
exit(1);
while ((fichierLu = readdir(rep)) != NULL) {
printf("Le fichier lu s'appelle '%s'\n", fichierLu->d_name);
// Open the mp3 from a file...
timeRead++;
if (timeRead > 79) {
printf("DEBUG : ");
printf(fichierLu->d_name);
printf("\n");
concatenated[0] = '\0'; // set char array content to null
time = 0;
snprintf(concatenated, concatenated_size, "C:/Users/paul/Music/Playlisty/%s", fichierLu->d_name); // load music path name into char array
player.OpenFromFile(concatenated); // load music from path
player.Play();
time = player.GetDuration(); // get time music
Sleep(time * 1000); // wait music time : Time(s) * 1000 = milli
player.Close(); // Close the player with current music
}
}
if (closedir(rep) == -1)
exit(-1);
system("Pause");
return 0;}
Thanks for reading my code , do you think the error come from my bad coding skill in the main or from the header MP3Player.h and how can i fix it ?
Thanks a lot .

Using zlib1.2.7 uncompress gzip data,how to get the files' name in the compression package

Using zlib version 1.2.7 uncompress gzip data, but I couldn't know how to get the files' name in the compression package, or some one you are extracting.The method I find,it looks like read all data to buffer, and then return it.
like this:
int gzdecompress(Byte *zdata, uLong nzdata, Byte *data, uLong *ndata)
{
int err = 0;
z_stream d_stream = {0}; /* decompression stream */
static char dummy_head[2] = {
0x8 + 0x7 * 0x10,
(((0x8 + 0x7 * 0x10) * 0x100 + 30) / 31 * 31) & 0xFF,
};
d_stream.zalloc = NULL;
d_stream.zfree = NULL;
d_stream.opaque = NULL;
d_stream.next_in = zdata;
d_stream.avail_in = 0;
d_stream.next_out = data;
//only set value "MAX_WBITS + 16" could be Uncompress file that have header or trailer text
if(inflateInit2(&d_stream, MAX_WBITS + 16) != Z_OK) return -1;
while(d_stream.total_out < *ndata && d_stream.total_in < nzdata) {
d_stream.avail_in = d_stream.avail_out = 1; /* force small buffers */
if((err = inflate(&d_stream, Z_NO_FLUSH)) == Z_STREAM_END) break;
if(err != Z_OK) {
if(err == Z_DATA_ERROR) {
d_stream.next_in = (Bytef*) dummy_head;
d_stream.avail_in = sizeof(dummy_head);
if((err = inflate(&d_stream, Z_NO_FLUSH)) != Z_OK) {
return -1;
}
} else return -1;
}
}
if(inflateEnd(&d_stream) != Z_OK) return -1;
*ndata = d_stream.total_out;
return 0;
}
Using Example:
// file you want to extract
filename = "D:\\gzfile";
// read file to buffer
ifstream infile(filename, ios::binary);
if(!infile)
{
cerr<<"open error!"<<endl;
}
int begin = infile.tellg();
int end = begin;
int FileSize = 0;
infile.seekg(0,ios_base::end);
end = infile.tellg();
FileSize = end - begin;
char* buffer_bin = new char[FileSize];
char buffer_bin2 = new char[FileSize * 2];
infile.seekg(0,ios_base::beg);
for(int i=0;i<FileSize;i++)
infile.read(&buffer_bin[i],sizeof(buffer_bin[i]));
infile.close( );
// uncompress
uLong ts = (FileSize * 2);
gzdecompress((Byte*)buffer_bin, FileSize, (Byte*)buffer_bin2, &ts);
Array "buffer_bin2" get the extracted data.Attribute "ts" is the data length.
The question is, I don't know what is it name, is there only one file.How can I get the infomation?
Your question is not at all clear, but if you are trying to get the file name that is stored in the gzip header, then it would behoove you to read the zlib documentation in zlib.h. In fact that would be good idea if you plan to use zlib in any capacity.
In the documentation, you will find that the inflate...() functions will decompress gzip data, and that there is an inflateGetHeader() function that will return the gzip header contents.
Note that when gzip decompresses a .gz file, it doesn't even look at the name in the header, unless explicitly asked to. gzip will decompress to the name of the .gz file, e.g. foo.gz becomes foo when decompressed, even if the gzip header says the name is bar. If you use gzip -dN foo.gz, then it will call it bar. It is not clear why you even care what the name in the gzip header is.

Optimize Disk Enumeration and File Listing C++

I am writing C++ code to enumerate whole HDD and drive listing, however it takes more than 15 minutes to complete the disk enumeration of all drives (HDD capacity 500GB). and compile the response in Binary file.
However, I have a 3rd party Executable which gives me the listing of whole Disk in just less than two minutes ... Can you please look into my code and suggest me some performance improvement techniques.
EnumFiles(CString FolderPath, CString SearchParameter,WIN32_FIND_DATAW *FileInfoData)
{
CString SearchFile = FolderPath + SearchParameter;
CString FileName;
hFile = FindFirstFileW(SearchFile, FileInfoData); // \\?\C:\*
if (hFile == INVALID_HANDLE_VALUE)
{
// Error
}
else
{
do
{
FileName = FileInfoData->cFileName;
if (FileInfoData->dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
{
if (! (FileName == L"." || FileName == L".."))
{
// Save the Folder Information
EnumFiles(FolderPath + FileName +(L"\\"), SearchParameter,FileInfoData);
}
}
else
{
// Save the File Parameters
}
} while (FindNextFileW(hFile, FileInfoData));
}
FindClose(hFile);
}