Half of read buffer is corrupt when using ReadFile - c++

Half of the buffer used with ReadFile is corrupt. Regardless of the size of the buffer, half of it has the same corrupted character. I have look for anything that could be causing the read to stop early, etc. If I increase the size of the buffer, I see more of the file so it is not failing on a particular part of the file.
Visual Studio 2019. Windows 10.
#define MAXBUFFERSIZE 1024
DWORD bufferSize = MAXBUFFERSIZE;
_int64 fileRemaining;
HANDLE hFile;
DWORD dwBytesRead = 0;
//OVERLAPPED ol = { 0 };
LARGE_INTEGER dwPosition;
TCHAR* buffer;
hFile = CreateFile(
inputFilePath, // file to open
GENERIC_READ, // open for reading
FILE_SHARE_READ, // share for reading
NULL, // default security
OPEN_EXISTING, // existing file only
FILE_ATTRIBUTE_NORMAL, // normal file | FILE_FLAG_OVERLAPPED
NULL); // no attr. template
if (hFile == INVALID_HANDLE_VALUE)
{
DisplayErrorBox((LPWSTR)L"CreateFile");
return 0;
}
LARGE_INTEGER size;
GetFileSizeEx(hFile, &size);
_int64 fileSize = (__int64)size.QuadPart;
double gigabytes = fileSize * 9.3132e-10;
sendToReportWindow(L"file size: %lld bytes \(%.1f gigabytes\)\n", fileSize, gigabytes);
if(fileSize > MAXBUFFERSIZE)
{
buffer = new TCHAR[MAXBUFFERSIZE];
}
else
{
buffer = new TCHAR[fileSize];
}
fileRemaining = fileSize;
sendToReportWindow(L"file remaining: %lld bytes\n", fileRemaining);
while (fileRemaining) // outer loop. while file remaining, read file chunk to buffer
{
sendToReportWindow(L"fileRemaining:%d\n", fileRemaining);
if (bufferSize > fileRemaining) // as fileremaining gets smaller as file is processed, it eventually is smaller than the buffer
bufferSize = fileRemaining;
if (FALSE == ReadFile(hFile, buffer, bufferSize, &dwBytesRead, NULL))
{
sendToReportWindow(L"file read failed\n");
CloseHandle(hFile);
return 0;
}
fileRemaining -= bufferSize;
// bunch of commented out code (verified that it does not cause the corruption)
}
delete [] buffer;
Debugger html view (512 byte buffer)
Debugger html view (1024 byte buffer). This shows that file is probably not the source of the corruption.
Misc notes: I have been told that memory mapping the file does not provide an advantage since I am sequentially processing the file. Another advantage to this method is that when I detect particular and reoccurring tags in the WARC file I can skip ahead ~500 bytes and resume processing. This improves speed.

The reason is that you use a buffer array of type TCHAR, and the size of TCHAR type is 2 bytes. So the bufferSize set when you call the ReadFile function is actually filled into the buffer array every 2 bytes.
But the actual size of the buffer is sizeof(TCHAR) * fileSize, so half of the buffer array you see is "corrupted"

Related

ReadFile buffer output is weird (prints content + some more)

I am trying to open a file and read its content using the Win32 API:
HANDLE hFileRead = CreateFileA(FilePath,
GENERIC_READ,
0,
NULL,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL,
NULL);
LARGE_INTEGER fileSize = { 0 };
DWORD cbFileSize = GetFileSizeEx(hFileRead, &fileSize);
PBYTE buffer = (PBYTE)HeapAlloc(GetProcessHeap(), 0, fileSize.QuadPart);
DWORD dwBytesRead = 0;
NTSTATUS s = ReadFile(hFileRead,
buffer,
fileSize.QuadPart,
&dwBytesRead,
NULL);
std::cout << buffer << "\n"; // <<< expect to print "asdasd" but prints "asdasd"+random chars (1 or more each run)
What I want to get is the file content (.txt in this case).
What I get is the content of a .txt file + some more random chars (its different for each run).
I tried to write the buffer indexed, it seems that the buffer prints more than its size (?)
What am I doing wrong?
std::cout << buffer expects buffer to be null-terminated, but it is not. You need to allocate space for the terminator, eg:
PBYTE buffer = (PBYTE)HeapAlloc(GetProcessHeap(), 0, fileSize.QuadPart + 1);
...
buffer[dwBytesRead] = 0;
Alternatively, you can use cout.write() instead, then you don't need a terminator, eg:
std::cout.write(buffer,dwBytesRead);

Writing zeroes to file block does not work

So I'm trying to write a sequence of zeroes from a file offset until the end of the file, here is my code:
HANDLE hFile = CreateFileA((LPCSTR)"hello.txt", GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if(hFile < 0) return -1;
DWORD fileSize = GetFileSize(hFile, NULL);
DWORD offset = 0x13d4;
DWORD check = NULL;
DWORD pos = SetFilePointer(hFile, offset, 0, FILE_BEGIN);
BYTE* zeroes = new BYTE[fileSize-offset];
ZeroMemory((PVOID)zeroes, fileSize-offset);
WriteFile(hFile, (PVOID)&zeroes, fileSize-offset, &check, NULL);
printf("Wrote %d bytes at %x\n", check, pos);
if(check < fileSize-offset)
{
printf("[+] An error occured while trying to patch the file.");
return EXIT_FAILURE;
}
CloseHandle(hFile);
Now I checked my fileSize is correct, the file offset (pos) is the same as offset, my file Handle is valid, the number of bytes written stored in check is equal to the the zeroes buffer length and the last error is 0. However, when I check my file in hex mode it did not add any zeroes at the end.
Any ideas?
Thanks in advance
The line
WriteFile(hFile, (PVOID)&zeroes, fileSize-offset, &check, NULL);
is wrong. You are writing data in the pointer variable zeroes itself, not what is pointed at by the variable. Typically the pointer has only 4 or 8 bytes, so it may cause out-of-range access if the file is large enough.
Remove & before zeros to have it write contents of the buffer pointed at by zeroes.
WriteFile(hFile, (PVOID)zeroes, fileSize-offset, &check, NULL);

After call to ReadFile program hits breakpoint in debug_heap.cpp

This function should read string from file and return it, but immediately after call to ReadFile program hits breakpoint in debug_heap.cpp file at line 985.
char* readFile()
{
char curDirectory[MAX_PATH];
GetCurrentDirectory(MAX_PATH, curDirectory);
char filePath[MAX_PATH];
char *name = "\\data.txt";
sprintf_s(filePath, "%s%s", curDirectory, name);
HANDLE hFile = CreateFile(filePath, GENERIC_ALL, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if (hFile == INVALID_HANDLE_VALUE)
{
DisplayError("Can't Create File");
return NULL;
}
DWORD fileSize = GetFileSize(hFile, NULL);
char *buffer = new char[fileSize / 2 + 1];
DWORD bytesReaded;
if (ReadFile(hFile, buffer, fileSize, &bytesReaded, NULL) == 0)
{
DisplayError("Can't read File");
return NULL;
}
buffer[bytesReaded] = '\0';
CloseHandle(hFile);
return buffer;
}
This is because your code writes beyond the end of buffer. You allocate buffer like this:
char *buffer = new char[fileSize / 2 + 1];
But then you attempt to read fileSize bytes from the file. Your allocation should instead be:
char *buffer = new char[fileSize + 1];
Some other comments:
Your call to sprintf_s risks buffer overrun.
Since you code in C++, use std::string and have that class manage buffers. You should do that for both filePath and buffer. That will allow you to avoid the leaks that your current code has. For instance, the failure return after ReadFile leaks memory. And it avoids placing a burden on the calling code to deallocate the memory.
You also leak the file handle if your code takes the failure return after ReadFile.
bytesReaded should be named bytesRead, to use the correct English word.
There is no real reason to believe that the executable file is located in the current working directory.

FILE_FLAG_NO_BUFFERING with overlapped I/O - bytes read zero

I observe a weird behavior while using the flag FILE_FLAG_NO_BUFFERING with overlapped I/O.
I invoke a series of ReadFile() function calls and query their statuses later using GetOverlappedResult().
The weird behavior that I am speaking of is that even though file handles were good and ReadFile() calls returned without any bad error(except ERROR_IO_PENDING which is expected), the 'bytes read' value returned from GetOverlappedResult() call is zero for some of the files, and each time I run the code - it is a different set of files.
If I remove the FILE_FLAG_NO_BUFFERING, things start working properly and no bytes read value is zero.
Here is how I have implemented overlapped I/O code with FILE_FLAG_NO_BUFFERING.
long overlappedIO(std::vector<std::string> &filePathNameVectorRef)
{
long totalBytesRead = 0;
DWORD bytesRead = 0;
DWORD bytesToRead = 0;
std::map<HANDLE, OVERLAPPED> handleMap;
HANDLE handle = INVALID_HANDLE_VALUE;
DWORD accessMode = GENERIC_READ;
DWORD shareMode = 0;
DWORD createDisposition = OPEN_EXISTING;
DWORD flags = FILE_FLAG_OVERLAPPED | FILE_FLAG_NO_BUFFERING;
DWORD fileSize;
LARGE_INTEGER li;
char * buffer;
BOOL success = false;
for(unsigned int i=0; i<filePathNameVectorRef.size(); i++)
{
const char* filePathName = filePathNameVectorRef[i].c_str();
handle = CreateFile(filePathName, accessMode, shareMode, NULL, createDisposition, flags, NULL);
if(handle == INVALID_HANDLE_VALUE){
fprintf(stdout, "\n Error occured: %d", GetLastError());
fprintf(stdout," getting handle: %s",filePathName);
continue;
}
GetFileSizeEx(handle, &li);
fileSize = (DWORD)li.QuadPart;
bytesToRead = (fileSize/g_bytesPerPhysicalSector)*g_bytesPerPhysicalSector;
buffer = static_cast<char *>(VirtualAlloc(0, bytesToRead, MEM_COMMIT, PAGE_READWRITE));
OVERLAPPED overlapped;
ZeroMemory(&overlapped, sizeof(overlapped));
OVERLAPPED * lpOverlapped = &overlapped;
success = ReadFile(handle, buffer, bytesToRead, &bytesRead, lpOverlapped);
if(!success && GetLastError() != ERROR_IO_PENDING){
fprintf(stdout, "\n Error occured: %d", GetLastError());
fprintf(stdout, "\n reading file %s",filePathName);
CloseHandle(handle);
continue;
}
else
handleMap[handle] = overlapped;
}
// Status check and bytes Read value
for(std::map<HANDLE, OVERLAPPED>::iterator iter = handleMap.begin(); iter != handleMap.end(); iter++)
{
HANDLE handle = iter->first;
OVERLAPPED * overlappedPtr = &(iter->second);
success = GetOverlappedResult(handle, overlappedPtr, &bytesRead, TRUE);
if(success)
{
/* bytesRead value in some cases is unexpectedly zero */
/* no file is of size zero or lesser than 512 bytes(physical volume sector size) */
totalBytesRead += bytesRead;
CloseHandle(handle);
}
}
return totalBytesRead;
}
With FILE_FLAG_NO_BUFFERING absent, totalBytesRead value is 57 MB. With the flag present, totalBytesRead value is much lower than 57 MB and keeps changing each time I run the code ranging from 2 MB to 15 MB.
Your calculation of bytesToRead will produce 0 as a result when the file size is less than g_bytesPerPhysicalSector. So for small files you are requesting 0 bytes.

Reading a Text File w/ WIN32

I'm trying to parse a text file with a win32 program in c++. Is there a simple method of reading a text file line by line? My text file consists of strings that I would like to store in a char array(const char* cArray[67]). Here is what I have so far. I am using CreateFile and ReadFile. I get an access violation error(0x000003e6) from readfile:
CDECK::CDECK():filename(".\\Deck/list.txt")
{
LPVOID data = NULL;
hFile = CreateFileA(filename, GENERIC_READ,FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if(hFile == INVALID_HANDLE_VALUE)
MessageBox(NULL, L"Failed to CreateFile - 'hFile'", L"CDECK::CDECK()", MB_OK);
DWORD fileSize = GetFileSize(hFile, &fileSize);
DWORD read = -1;
if(!ReadFile(hFile, data, fileSize, &read, NULL))
{
DWORD err = GetLastError();
MessageBox(NULL, L"Failed to ReadFile - 'hFile'", L"CDECK::CDECK()", MB_OK);
}
return;
}
Is there a simple method of reading a text file line by line?
Yes:
{
std::ifstream hFile(filename);
std::vector<std::string> lines;
std::string line;
while(std::getline(hFile, line))
lines.push_back(line);
return lines;
}
Consider this code:
LPVOID data = NULL;
if(!ReadFile(hFile, data, fileSize, &read, NULL))
Here data is null, and the following argument is the size of the entire file. You are supposed to allocate a buffer, and then pass a pointer to such buffer and its size to it. There is where the ReadFile function will write the readed bytes.
Here is a simple way of getting it to work with a statically sized buffer:
char data[4096] = {};
if(!ReadFile(hFile, static_cast< LPVOID >( &data ), 4096, &read, NULL))
Your problem is that you are reading the bytes of the file, to read the string you need to alloc a string location using SysAllocStringByteLen and then use the ReadFile
You forgot to allocate a buffer space before reading your data :
LPVOID data = NULL;
Before reading you must allocate a fileSize buffer space :
data = malloc(fileSize);
And probably must also declare your data variable as char* instead of void*