How to fix garbled text with using ReadFile? - c++

I have a Win32 application that I'm making.
Use "ReadFile" to retrieve a text file that is written in Unicode.
To be printed in the EditBox.
const TCHAR FILE_DIRECTORY[] = TEXT("data/");
const TCHAR FILE_LIST[][MAX_LOADSTRING] = {
TEXT("fputs_fgets.h"), TEXT("fprintf_fscanf.h"),
TEXT("fprintfs_fscanfs.h"), TEXT("fread_fwrite.h"), TEXT("freads_fwrite.h") };
const int FILE_NAME_LENGTH = _tcslen(FILE_LIST[idx]);
const int FILE_DIRECTORY_LENGTH = _tcslen(FILE_DIRECTORY);
TCHAR* filePath = (TCHAR*)calloc(FILE_NAME_LENGTH + FILE_DIRECTORY_LENGTH + 1, sizeof(TCHAR));
_tcscpy_s(filePath, FILE_DIRECTORY_LENGTH + 1, FILE_DIRECTORY);
_tcscat_s(filePath, FILE_NAME_LENGTH + FILE_DIRECTORY_LENGTH + 1, FILE_LIST[idx]);
HANDLE file = CreateFile(filePath, GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
DWORD fileSize = GetFileSize(file, NULL);
DWORD dwRead;
if (editText != NULL)
free(editText);
editText = (TCHAR*)calloc(1, fileSize + 1);
ReadFile(file, editText, fileSize, &dwRead, NULL);
CloseHandle(file);
free(filePath);
However, there are some strange characters on the back of the output.
printf("y좌표(정수): %d\n", point.y);
}
fclose(file);
}ﴀ﷽ý
How can i fix it?
Thank you.

Assuming your file is UTF-16 and you are compiling with _UNICODE defined (assumptions justified by the fact that the rest of your text is read correctly), in this line:
editText = (TCHAR*)calloc(1, fileSize + 1);
you should actually do fileSize + sizeof(TCHAR) if you want to exploit the zeroing that calloc does to get a NUL-terminated string. As it is now, you have a wide string whose last character has only the low byte to zero, so the rest of your code goes on reading garbage until it happens to find two solid bytes of zero (adequately aligned).
Mind you, I'm extremely dubious about this code in general - if you use TCHAR it means you want to compile both in ANSI (TCHAR == char) and in Unicode (TCHAR ==wchar_t), having this change how you interpret the bytes of external files is a disputable idea.

Related

Windows API ReadFile() skips one out of every two characters

My aim is to read all the text located in a file. For some reason whenever I read from the file and print the result (drawText), the buffer seems to be skipping one character every two positions. HELLO will become HLO and SCAVENGER becomes SAEGR.
This is for Windows API. I wonder if CreateFile() and ReadFile() are just fine and whether it's something else causing the issue.
void init(HDC hdc)
{
HANDLE hFile;
LPCSTR fileName = "c:\\Users\\kanaa\\Desktop\\code\\HW2_StarterCode\\words.txt";
hFile = CreateFileA(fileName, GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
DWORD dwFileSize = GetFileSize(hFile, NULL);
DWORD dwBytesRead;
WCHAR* buffer = new WCHAR[dwFileSize / 2 + 1];
buffer[dwFileSize / 2] = 0;
bool read = ReadFile(hFile, buffer, dwFileSize, &dwBytesRead, NULL);
std::wstring wstr(buffer);
std::string str(wstr.begin(), wstr.end());
delete[] buffer;
CloseHandle(hFile);
if (read) parse(str, hdc);
}
void parse(std::string word, HDC hdc)
{
std::string to = word;
std::wstring wword = std::wstring(to.begin(), to.end());
const WCHAR* wcword = wword.c_str();
Graphics graphics(hdc);
drawText(&graphics, wcword);
}
The problem was the WCHAR buffer. Below are the corrections
CHAR* buffer = new CHAR[dwFileSize/sizeof(char) + 1];
bool read = ReadFile(hFile, buffer, dwFileSize, &dwBytesRead, NULL);
buffer[dwBytesRead] = 0;
You are processing the file data using a wchar_t[] buffer. wchar_t is 2 bytes in size on Windows. So, in the statement:
std::string str(wstr.begin(), wstr.end());
You are iterating through the file data 2 bytes at a time, interpreting each byte pair as a single wchar_t that gets truncated to a 1-byte char, discarding the other byte. That is why your str ends up skipping every other character.
Process the file data using a char[] buffer instead. However, there are easier ways to read 7/8-bit file data into a std::string.
Lastly, in this statement:
std::wstring wword = std::wstring(to.begin(), to.end());
This is not the correct way to convert a std::string to a std::wstring. All you are doing is iterating through the chars converting each one as-is into a 2-byte wchar_t. Windows APIs expect wchar_t strings to be encoded in UTF-16, which your code is not converting to. You need to use MultiByteToWideChar(), std::wstring_convert, or other equivalent Unicode library call to perform that conversion. In which case, you first need to know the encoding of the source file in order to convert it to Unicode correctly.

MessageBox printing extra unicode characters from TCHAR buffer

So i'm in the middle of trying to get a NamedPipe IPC project working so that my C# GUI can communicate with my C++ code, i should mention i'm a fair bit out of my depth on the C++ side at the moment, although, for the most part i have it working.
I cannot for the life of me figure out how to 'translate?' the chReply buffer received by the C++ listener to a simple string and MessageBox it, i'm always getting extra unicode characters. I have added what i think are the most important parts.
C# Pipe Write
byte[] bReply = Encoding.Unicode.GetBytes("#TEST 123 456");
uint cbBytesWritten;
uint cbReplyBytes = (uint)bReply.Length;
bool bResult = PipeNative.WriteFile(hPipe, bReply, cbReplyBytes, out cbBytesWritten, IntPtr.Zero);
C++ Pipe Read
// Project's Character Set: Unicode
// BUFFER_SIZE = 1024
TCHAR chRequest[BUFFER_SIZE];
DWORD cbBytesWritten, cbRequestBytes;
TCHAR chReply[BUFFER_SIZE];
DWORD cbBytesRead, cbReplyBytes;
cbReplyBytes = sizeof(TCHAR) * BUFFER_SIZE;
do
{
bResult = ReadFile(hPipe, chReply, cbReplyBytes, &cbBytesRead, NULL);
}
while(!bResult);
MessageBox(NULL, chReply, _T("GUI Request"), MB_OK);
If somebody could save me from drowning i would be extremely grateful.
You have a few problems. The first of which is that you read and discard data. The second of which is you don't pay attention to the end of the buffer location.
// Project's Character Set: Unicode
// BUFFER_SIZE = 1024
TCHAR chRequest[BUFFER_SIZE];
DWORD cbBytesWritten, cbRequestBytes;
std::basic_string<TCHAR> result;
do {
TCHAR chReply[BUFFER_SIZE];
DWORD cbBytesRead;
bResult = ReadFile(hPipe, chReply, sizeof(chReply), &cbBytesRead, NULL);
if (bResult)
result.insert( result.end(), chReply, chReply+cbBytesRead/2 );
}
while(!bResult);
MessageBox(NULL, result.data(), _T("GUI Request"), MB_OK);
here we copy the bytes over into a basic_string<TCHAR>. It automatically handles null termination and the like, and permits long messages to be passed.
We'll read them 1024 characters at a time.
VTT is right. You need to initialize chReply with zeroes after each call to ReadFile.

After call to ReadFile program hits breakpoint in debug_heap.cpp

This function should read string from file and return it, but immediately after call to ReadFile program hits breakpoint in debug_heap.cpp file at line 985.
char* readFile()
{
char curDirectory[MAX_PATH];
GetCurrentDirectory(MAX_PATH, curDirectory);
char filePath[MAX_PATH];
char *name = "\\data.txt";
sprintf_s(filePath, "%s%s", curDirectory, name);
HANDLE hFile = CreateFile(filePath, GENERIC_ALL, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if (hFile == INVALID_HANDLE_VALUE)
{
DisplayError("Can't Create File");
return NULL;
}
DWORD fileSize = GetFileSize(hFile, NULL);
char *buffer = new char[fileSize / 2 + 1];
DWORD bytesReaded;
if (ReadFile(hFile, buffer, fileSize, &bytesReaded, NULL) == 0)
{
DisplayError("Can't read File");
return NULL;
}
buffer[bytesReaded] = '\0';
CloseHandle(hFile);
return buffer;
}
This is because your code writes beyond the end of buffer. You allocate buffer like this:
char *buffer = new char[fileSize / 2 + 1];
But then you attempt to read fileSize bytes from the file. Your allocation should instead be:
char *buffer = new char[fileSize + 1];
Some other comments:
Your call to sprintf_s risks buffer overrun.
Since you code in C++, use std::string and have that class manage buffers. You should do that for both filePath and buffer. That will allow you to avoid the leaks that your current code has. For instance, the failure return after ReadFile leaks memory. And it avoids placing a burden on the calling code to deallocate the memory.
You also leak the file handle if your code takes the failure return after ReadFile.
bytesReaded should be named bytesRead, to use the correct English word.
There is no real reason to believe that the executable file is located in the current working directory.

GetWindowTextLength returns 0 when text is in the edit control

I'm writing a text editor and am having some issues with saving files as utf-8.
I have a function that reads text from a rich edit control and writes it to a file
using a flag passed to the function which depends on a user setting. It can be utf-16, ascii, or utf-8. Both utf-16 and ascii file writing segments work fine and produce valid files.
The problem lies in that in the following block of code the call to GetWindowTextLength always returns 0. So, the result is that nothing ever gets retrieved from the window or written to the file.
HANDLE hFile;
if ((hFile = CreateFile (pstrFileName, GENERIC_WRITE, 0,
NULL, CREATE_ALWAYS, 0, NULL)) == INVALID_HANDLE_VALUE) {
return FALSE;
}
int iLength = 0;
DWORD dwBytesWritten = 0;
switch (encoding) {
/*other text encoding cases*/
case ID_SETTINGS_UTF_8: {
try {
iLength = GetWindowTextLength(hwndEdit); //returns 0
unique_ptr<wchar_t> wide_buf(new wchar_t[iLength + 1]);
GetWindowTextW(hwndEdit, wide_buf.get(), iLength + 1);
int bytes_needed = WideCharToMultiByte(CP_UTF8, WC_COMPOSITECHECK |
WC_DEFAULTCHAR | WC_NO_BEST_FIT_CHARS, wide_buf.get(), -1,
NULL, 0, NULL, NULL);
unique_ptr<char> utf8_buf(new char[bytes_needed]);
WideCharToMultiByte(CP_UTF8, WC_COMPOSITECHECK |
WC_DEFAULTCHAR | WC_NO_BEST_FIT_CHARS, wide_buf.get(), -1,
utf8_buf.get(), bytes_needed, NULL, NULL);
WriteFile(hFile, utf8_buf.get(), bytes_needed,
&dwBytesWritten, NULL);
if (bytes_needed != dwBytesWritten) {
CloseHandle (hFile);
return FALSE;
}
CloseHandle (hFile) ;
return TRUE;
} catch (bad_alloc& ba) {
UNREFERENCED_PARAMETER(ba);
CloseHandle (hFile);
return FALSE;
}
}
break;
You corrupt the heap. new[] must be matched by delete[], not delete.
It's simpler to just use std::vector:
std::vector<wchar_t> wide_buf(iLength + 1);
//...
std::vectorchar> utf8_buf(bytes_needed);
Is Your application compiled as UNICODE or ANSI ? (you use GetWindowTextLength and GetWindowTextW declarations)
Can you show code for ANSI and for UTF-16 (where you get proper result).

WinAPI File In-/Output with std::strings instead of char arrays?

due to performance reasons I didn't feel like using fstream for just one time. Seems like a very bad idea to use WinAPI functions with a std::string instead of a plain char array. All in all I would like you to tell me why the following snippet just won't work (empty stBuffer stays empty) and what I'd need to do to get it fixed.
Thanks in advance!
std::size_t Get(const std::string &stFileName, std::string &stBuffer)
{
HANDLE hFile = ::CreateFileA(stFileName.c_str(), GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
DWORD dwBytesRead = 0;
if(hFile != INVALID_HANDLE_VALUE)
{
DWORD dwFileSize = ::GetFileSize(hFile, NULL);
stBuffer.reserve(dwFileSize + 1);
::ReadFile(hFile, &stBuffer[0], dwFileSize, &dwBytesRead, NULL);
stBuffer[dwFileSize] = '\0';
::CloseHandle(hFile);
}
return dwBytesRead;
}
Because a std::string can contain embedded '\0' characters, it has to keep track of its own length in a separate way.
Your problem is that std::string::reserve() does not change the length of the string. It just pre-allocates some memory for the string to grow into. The solution is to use std::string::resize() and let the WinAPI function overwrite the string contents.
As a side-note: Currently, it is not guaranteed that std::string uses a contiguous buffer, but to my knowledge, all current implementations do use a contiguous buffer and it will be a requirement in the next standard.
Consider difference between reserve() and resize() members. So the solution would be:
stBuffer.resize(dwFileSize + 1);