Windows API ReadFile() skips one out of every two characters - c++

My aim is to read all the text located in a file. For some reason whenever I read from the file and print the result (drawText), the buffer seems to be skipping one character every two positions. HELLO will become HLO and SCAVENGER becomes SAEGR.
This is for Windows API. I wonder if CreateFile() and ReadFile() are just fine and whether it's something else causing the issue.
void init(HDC hdc)
{
HANDLE hFile;
LPCSTR fileName = "c:\\Users\\kanaa\\Desktop\\code\\HW2_StarterCode\\words.txt";
hFile = CreateFileA(fileName, GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
DWORD dwFileSize = GetFileSize(hFile, NULL);
DWORD dwBytesRead;
WCHAR* buffer = new WCHAR[dwFileSize / 2 + 1];
buffer[dwFileSize / 2] = 0;
bool read = ReadFile(hFile, buffer, dwFileSize, &dwBytesRead, NULL);
std::wstring wstr(buffer);
std::string str(wstr.begin(), wstr.end());
delete[] buffer;
CloseHandle(hFile);
if (read) parse(str, hdc);
}
void parse(std::string word, HDC hdc)
{
std::string to = word;
std::wstring wword = std::wstring(to.begin(), to.end());
const WCHAR* wcword = wword.c_str();
Graphics graphics(hdc);
drawText(&graphics, wcword);
}

The problem was the WCHAR buffer. Below are the corrections
CHAR* buffer = new CHAR[dwFileSize/sizeof(char) + 1];
bool read = ReadFile(hFile, buffer, dwFileSize, &dwBytesRead, NULL);
buffer[dwBytesRead] = 0;

You are processing the file data using a wchar_t[] buffer. wchar_t is 2 bytes in size on Windows. So, in the statement:
std::string str(wstr.begin(), wstr.end());
You are iterating through the file data 2 bytes at a time, interpreting each byte pair as a single wchar_t that gets truncated to a 1-byte char, discarding the other byte. That is why your str ends up skipping every other character.
Process the file data using a char[] buffer instead. However, there are easier ways to read 7/8-bit file data into a std::string.
Lastly, in this statement:
std::wstring wword = std::wstring(to.begin(), to.end());
This is not the correct way to convert a std::string to a std::wstring. All you are doing is iterating through the chars converting each one as-is into a 2-byte wchar_t. Windows APIs expect wchar_t strings to be encoded in UTF-16, which your code is not converting to. You need to use MultiByteToWideChar(), std::wstring_convert, or other equivalent Unicode library call to perform that conversion. In which case, you first need to know the encoding of the source file in order to convert it to Unicode correctly.

Related

How to fix garbled text with using ReadFile?

I have a Win32 application that I'm making.
Use "ReadFile" to retrieve a text file that is written in Unicode.
To be printed in the EditBox.
const TCHAR FILE_DIRECTORY[] = TEXT("data/");
const TCHAR FILE_LIST[][MAX_LOADSTRING] = {
TEXT("fputs_fgets.h"), TEXT("fprintf_fscanf.h"),
TEXT("fprintfs_fscanfs.h"), TEXT("fread_fwrite.h"), TEXT("freads_fwrite.h") };
const int FILE_NAME_LENGTH = _tcslen(FILE_LIST[idx]);
const int FILE_DIRECTORY_LENGTH = _tcslen(FILE_DIRECTORY);
TCHAR* filePath = (TCHAR*)calloc(FILE_NAME_LENGTH + FILE_DIRECTORY_LENGTH + 1, sizeof(TCHAR));
_tcscpy_s(filePath, FILE_DIRECTORY_LENGTH + 1, FILE_DIRECTORY);
_tcscat_s(filePath, FILE_NAME_LENGTH + FILE_DIRECTORY_LENGTH + 1, FILE_LIST[idx]);
HANDLE file = CreateFile(filePath, GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
DWORD fileSize = GetFileSize(file, NULL);
DWORD dwRead;
if (editText != NULL)
free(editText);
editText = (TCHAR*)calloc(1, fileSize + 1);
ReadFile(file, editText, fileSize, &dwRead, NULL);
CloseHandle(file);
free(filePath);
However, there are some strange characters on the back of the output.
printf("y좌표(정수): %d\n", point.y);
}
fclose(file);
}ﴀ﷽ý
How can i fix it?
Thank you.
Assuming your file is UTF-16 and you are compiling with _UNICODE defined (assumptions justified by the fact that the rest of your text is read correctly), in this line:
editText = (TCHAR*)calloc(1, fileSize + 1);
you should actually do fileSize + sizeof(TCHAR) if you want to exploit the zeroing that calloc does to get a NUL-terminated string. As it is now, you have a wide string whose last character has only the low byte to zero, so the rest of your code goes on reading garbage until it happens to find two solid bytes of zero (adequately aligned).
Mind you, I'm extremely dubious about this code in general - if you use TCHAR it means you want to compile both in ANSI (TCHAR == char) and in Unicode (TCHAR ==wchar_t), having this change how you interpret the bytes of external files is a disputable idea.

MessageBox printing extra unicode characters from TCHAR buffer

So i'm in the middle of trying to get a NamedPipe IPC project working so that my C# GUI can communicate with my C++ code, i should mention i'm a fair bit out of my depth on the C++ side at the moment, although, for the most part i have it working.
I cannot for the life of me figure out how to 'translate?' the chReply buffer received by the C++ listener to a simple string and MessageBox it, i'm always getting extra unicode characters. I have added what i think are the most important parts.
C# Pipe Write
byte[] bReply = Encoding.Unicode.GetBytes("#TEST 123 456");
uint cbBytesWritten;
uint cbReplyBytes = (uint)bReply.Length;
bool bResult = PipeNative.WriteFile(hPipe, bReply, cbReplyBytes, out cbBytesWritten, IntPtr.Zero);
C++ Pipe Read
// Project's Character Set: Unicode
// BUFFER_SIZE = 1024
TCHAR chRequest[BUFFER_SIZE];
DWORD cbBytesWritten, cbRequestBytes;
TCHAR chReply[BUFFER_SIZE];
DWORD cbBytesRead, cbReplyBytes;
cbReplyBytes = sizeof(TCHAR) * BUFFER_SIZE;
do
{
bResult = ReadFile(hPipe, chReply, cbReplyBytes, &cbBytesRead, NULL);
}
while(!bResult);
MessageBox(NULL, chReply, _T("GUI Request"), MB_OK);
If somebody could save me from drowning i would be extremely grateful.
You have a few problems. The first of which is that you read and discard data. The second of which is you don't pay attention to the end of the buffer location.
// Project's Character Set: Unicode
// BUFFER_SIZE = 1024
TCHAR chRequest[BUFFER_SIZE];
DWORD cbBytesWritten, cbRequestBytes;
std::basic_string<TCHAR> result;
do {
TCHAR chReply[BUFFER_SIZE];
DWORD cbBytesRead;
bResult = ReadFile(hPipe, chReply, sizeof(chReply), &cbBytesRead, NULL);
if (bResult)
result.insert( result.end(), chReply, chReply+cbBytesRead/2 );
}
while(!bResult);
MessageBox(NULL, result.data(), _T("GUI Request"), MB_OK);
here we copy the bytes over into a basic_string<TCHAR>. It automatically handles null termination and the like, and permits long messages to be passed.
We'll read them 1024 characters at a time.
VTT is right. You need to initialize chReply with zeroes after each call to ReadFile.

HttpSendRequest - POST data not supporting Unicode

I'm working on making a C++ agent that will post information (such as the system hostname) back to a central server using HttpSendRequest(). One of the pieces of information that I want it to post back is the OS. I created the following function that will obtain the system hostname.
wstring getOS()
{
HKEY key;
RegOpenKeyEx(HKEY_LOCAL_MACHINE, L"SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion", 0, KEY_QUERY_VALUE, &key); // Obtains Registry handle
DWORD type;
wchar_t buffer[MAX_PATH]; // MAX_PATH = 260 - The system hostname should never exceed this value
DWORD size = sizeof(buffer);
RegQueryValueEx(key, L"ProductName", NULL, &type, (LPBYTE)&buffer, &size); // Queries Registry key - stores value in "buffer"
wstring os(buffer); // Converts from C-style character array to wstring
return os; // Returns wstring to caller
}
This function will obtain the OS using the Registry and store it as a wstring. I then want to pass the returned "os" wstring to the following post() function, but I noticed that you must use a string instead of a wstring for the HTTP POST data. Below is the code for my post() function:
void post()
{
HINTERNET hInternetOpen = InternetOpen(userAgent.c_str(), INTERNET_OPEN_TYPE_PROXY, L"http://127.0.0.1:9999", NULL, 0);
HINTERNET hInternetConnect = InternetConnect(hInternetOpen, host.c_str(), INTERNET_DEFAULT_HTTP_PORT, NULL, NULL, INTERNET_SERVICE_HTTP, 0, 0);
HINTERNET hHttpOpenRequest = HttpOpenRequest(hInternetConnect, L"POST", file.c_str(), NULL, NULL, NULL, 0, 0);
wstring headers = L"Content-Type: application/x-www-form-urlencoded"; // Content-Type is necessary to POST
string postData = "os="; // Why does this have to be a string and not a wstring?
HttpSendRequest(hHttpOpenRequest, headers.c_str(), headers.length(), (LPVOID)postData.c_str(), postData.size());
InternetCloseHandle(hInternetOpen);
InternetCloseHandle(hInternetConnect);
InternetCloseHandle(hHttpOpenRequest);
}
If I try to make "postData" a wstring, I get something that looks like the image below:
Can someone shed some light onto the easiest way to include a wstring as the POST data?
HttpSendRequest() only knows about raw bytes, not strings. You can send UTF-16 data using a std::wstring, but you have to tell the server that you are sending UTF-16, via a charset attribute in the Content-Type header.
wstring headers = L"Content-Type: application/x-www-form-urlencoded; charset=utf-16";
// TODO: don't forget to URL-encode the value from getOS() to
// escape reserved characters, including '=' and '&'...
wstring postData = L"os=" + getOS();
HttpSendRequest(hHttpOpenRequest, headers.c_str(), headers.length(),
postData.c_str(), postData.length() * sizeof(wchar_t));
Note the use of sizeof(wchar_t) above. In your screenshot, your sniffer is showing the raw data, and the data it shows is what UTF-16 would look like, but you see only half of your wstring data because you are setting the dwOptionalLength parameter of HttpSendRequest() to a character count (7) instead of a byte count (14):
dwOptionalLength [in]
The size of the optional data, in bytes. This parameter can be zero if there is no optional data to send.
When you use std::string, the character count and the byte count are the same value.
What you really should be sending is UTF-8 instead of UTF-16, eg:
string Utf8Encode(const wstring &wstr)
{
// NOTE: C++11 has built-in support for converting between
// UTF-8 and UTF-16. See the std::wstring_convert class...
/*
wstring_convert<codecvt_utf8_utf16<wchar_t>> conv;
return conv.to_bytes(wstr);
*/
string out;
int len = WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), wstr.length(), NULL, 0, NULL, NULL);
if (len > 0)
{
out.resize(len);
WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), wstr.length(), &out[0], len, NULL, NULL);
}
return out;
}
wstring headers = L"Content-Type: application/x-www-form-urlencoded; charset=utf-8";
// TODO: don't forget to URL-encode the value from getOS() to
// escape reserved characters, including '=' and '&'...
string postData = "os=" + Utf8Encode(getOS());
HttpSendRequest(hHttpOpenRequest, headers.c_str(), headers.length(),
postData.c_str(), postData.size());

Writing new line character in file

I just want write blank line into the file. i use following code but is not working.
char* RegID;
RegID = "10";
char* mndtime;
mndtime = "10";
char* resourcetype;
resourcetype = "Backup";
char* ressubtype;
ressubtype = "shadowprotect";
char* DataBuffer = new char[100];
StrCpy(DataBuffer,"<wpshadowprotectstatus>");
strcat(DataBuffer,"\n");
strcat(DataBuffer,"<mndtime>");
strcat(DataBuffer,RegID);
strcat(DataBuffer,"<\\mndtime>\n");
strcat(DataBuffer,"<resourcetype>");
strcat(DataBuffer,RegID);
strcat(DataBuffer,"<\\resourcetype>\n");
strcat(DataBuffer,"<ressubtype>");
strcat(DataBuffer,RegID);
strcat(DataBuffer,"<\\ressubtype>\n");
strcat(DataBuffer,"<jobname>");
strcat(DataBuffer,RegID);
strcat(DataBuffer,"<\\jobname>\n");
strcat(DataBuffer,"<jobstarttime>");
strcat(DataBuffer,RegID);
strcat(DataBuffer,"<\\jobstarttime>\n");
HANDLE hFile;
hFile = CreateFile("text.txt", // name of the write
GENERIC_WRITE, // open for writing
0, // do not share
NULL, // default security
CREATE_NEW, // create new file only
FILE_ATTRIBUTE_NORMAL, // normal file
NULL); // no attr. template
if (hFile == INVALID_HANDLE_VALUE)
{
return 0;
}
DWORD dwBytesToWrite = (DWORD)strlen(DataBuffer);
DWORD dwBytesWritten = 0;
BOOL bErrorFlag = FALSE;
bErrorFlag = WriteFile(hFile, // open file handle
DataBuffer, // start of data to write
dwBytesToWrite, // number of bytes to write
&dwBytesWritten, // number of bytes that were written
NULL); // no overlapped structure
but i dont known why new line is not dump in text file.
Note :-
1)I dont want to use std:: library c++.
2)Dont want to use xml parser.
Use \r\n for line breaks on Windows.
And your XML is malformed. XML closing tags use the / character, not the \ character. And you are writing the same RegID variable for all of the XML values instead of using your other variables (mndtime, resourcetype, etc).
Windows?
If so, replace \n with \r\n. For FILE* / iostream it is done automatically by runtime, but not for WriteFile.
And, of course, you need two line endings to get blank line.
BTW, generating long string with strcat has O(N^2) complexity, which is very bad.

WinAPI File In-/Output with std::strings instead of char arrays?

due to performance reasons I didn't feel like using fstream for just one time. Seems like a very bad idea to use WinAPI functions with a std::string instead of a plain char array. All in all I would like you to tell me why the following snippet just won't work (empty stBuffer stays empty) and what I'd need to do to get it fixed.
Thanks in advance!
std::size_t Get(const std::string &stFileName, std::string &stBuffer)
{
HANDLE hFile = ::CreateFileA(stFileName.c_str(), GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
DWORD dwBytesRead = 0;
if(hFile != INVALID_HANDLE_VALUE)
{
DWORD dwFileSize = ::GetFileSize(hFile, NULL);
stBuffer.reserve(dwFileSize + 1);
::ReadFile(hFile, &stBuffer[0], dwFileSize, &dwBytesRead, NULL);
stBuffer[dwFileSize] = '\0';
::CloseHandle(hFile);
}
return dwBytesRead;
}
Because a std::string can contain embedded '\0' characters, it has to keep track of its own length in a separate way.
Your problem is that std::string::reserve() does not change the length of the string. It just pre-allocates some memory for the string to grow into. The solution is to use std::string::resize() and let the WinAPI function overwrite the string contents.
As a side-note: Currently, it is not guaranteed that std::string uses a contiguous buffer, but to my knowledge, all current implementations do use a contiguous buffer and it will be a requirement in the next standard.
Consider difference between reserve() and resize() members. So the solution would be:
stBuffer.resize(dwFileSize + 1);