HttpSendRequest - POST data not supporting Unicode - c++

I'm working on making a C++ agent that will post information (such as the system hostname) back to a central server using HttpSendRequest(). One of the pieces of information that I want it to post back is the OS. I created the following function that will obtain the system hostname.
wstring getOS()
{
HKEY key;
RegOpenKeyEx(HKEY_LOCAL_MACHINE, L"SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion", 0, KEY_QUERY_VALUE, &key); // Obtains Registry handle
DWORD type;
wchar_t buffer[MAX_PATH]; // MAX_PATH = 260 - The system hostname should never exceed this value
DWORD size = sizeof(buffer);
RegQueryValueEx(key, L"ProductName", NULL, &type, (LPBYTE)&buffer, &size); // Queries Registry key - stores value in "buffer"
wstring os(buffer); // Converts from C-style character array to wstring
return os; // Returns wstring to caller
}
This function will obtain the OS using the Registry and store it as a wstring. I then want to pass the returned "os" wstring to the following post() function, but I noticed that you must use a string instead of a wstring for the HTTP POST data. Below is the code for my post() function:
void post()
{
HINTERNET hInternetOpen = InternetOpen(userAgent.c_str(), INTERNET_OPEN_TYPE_PROXY, L"http://127.0.0.1:9999", NULL, 0);
HINTERNET hInternetConnect = InternetConnect(hInternetOpen, host.c_str(), INTERNET_DEFAULT_HTTP_PORT, NULL, NULL, INTERNET_SERVICE_HTTP, 0, 0);
HINTERNET hHttpOpenRequest = HttpOpenRequest(hInternetConnect, L"POST", file.c_str(), NULL, NULL, NULL, 0, 0);
wstring headers = L"Content-Type: application/x-www-form-urlencoded"; // Content-Type is necessary to POST
string postData = "os="; // Why does this have to be a string and not a wstring?
HttpSendRequest(hHttpOpenRequest, headers.c_str(), headers.length(), (LPVOID)postData.c_str(), postData.size());
InternetCloseHandle(hInternetOpen);
InternetCloseHandle(hInternetConnect);
InternetCloseHandle(hHttpOpenRequest);
}
If I try to make "postData" a wstring, I get something that looks like the image below:
Can someone shed some light onto the easiest way to include a wstring as the POST data?

HttpSendRequest() only knows about raw bytes, not strings. You can send UTF-16 data using a std::wstring, but you have to tell the server that you are sending UTF-16, via a charset attribute in the Content-Type header.
wstring headers = L"Content-Type: application/x-www-form-urlencoded; charset=utf-16";
// TODO: don't forget to URL-encode the value from getOS() to
// escape reserved characters, including '=' and '&'...
wstring postData = L"os=" + getOS();
HttpSendRequest(hHttpOpenRequest, headers.c_str(), headers.length(),
postData.c_str(), postData.length() * sizeof(wchar_t));
Note the use of sizeof(wchar_t) above. In your screenshot, your sniffer is showing the raw data, and the data it shows is what UTF-16 would look like, but you see only half of your wstring data because you are setting the dwOptionalLength parameter of HttpSendRequest() to a character count (7) instead of a byte count (14):
dwOptionalLength [in]
The size of the optional data, in bytes. This parameter can be zero if there is no optional data to send.
When you use std::string, the character count and the byte count are the same value.
What you really should be sending is UTF-8 instead of UTF-16, eg:
string Utf8Encode(const wstring &wstr)
{
// NOTE: C++11 has built-in support for converting between
// UTF-8 and UTF-16. See the std::wstring_convert class...
/*
wstring_convert<codecvt_utf8_utf16<wchar_t>> conv;
return conv.to_bytes(wstr);
*/
string out;
int len = WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), wstr.length(), NULL, 0, NULL, NULL);
if (len > 0)
{
out.resize(len);
WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), wstr.length(), &out[0], len, NULL, NULL);
}
return out;
}
wstring headers = L"Content-Type: application/x-www-form-urlencoded; charset=utf-8";
// TODO: don't forget to URL-encode the value from getOS() to
// escape reserved characters, including '=' and '&'...
string postData = "os=" + Utf8Encode(getOS());
HttpSendRequest(hHttpOpenRequest, headers.c_str(), headers.length(),
postData.c_str(), postData.size());

Related

Windows API ReadFile() skips one out of every two characters

My aim is to read all the text located in a file. For some reason whenever I read from the file and print the result (drawText), the buffer seems to be skipping one character every two positions. HELLO will become HLO and SCAVENGER becomes SAEGR.
This is for Windows API. I wonder if CreateFile() and ReadFile() are just fine and whether it's something else causing the issue.
void init(HDC hdc)
{
HANDLE hFile;
LPCSTR fileName = "c:\\Users\\kanaa\\Desktop\\code\\HW2_StarterCode\\words.txt";
hFile = CreateFileA(fileName, GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
DWORD dwFileSize = GetFileSize(hFile, NULL);
DWORD dwBytesRead;
WCHAR* buffer = new WCHAR[dwFileSize / 2 + 1];
buffer[dwFileSize / 2] = 0;
bool read = ReadFile(hFile, buffer, dwFileSize, &dwBytesRead, NULL);
std::wstring wstr(buffer);
std::string str(wstr.begin(), wstr.end());
delete[] buffer;
CloseHandle(hFile);
if (read) parse(str, hdc);
}
void parse(std::string word, HDC hdc)
{
std::string to = word;
std::wstring wword = std::wstring(to.begin(), to.end());
const WCHAR* wcword = wword.c_str();
Graphics graphics(hdc);
drawText(&graphics, wcword);
}
The problem was the WCHAR buffer. Below are the corrections
CHAR* buffer = new CHAR[dwFileSize/sizeof(char) + 1];
bool read = ReadFile(hFile, buffer, dwFileSize, &dwBytesRead, NULL);
buffer[dwBytesRead] = 0;
You are processing the file data using a wchar_t[] buffer. wchar_t is 2 bytes in size on Windows. So, in the statement:
std::string str(wstr.begin(), wstr.end());
You are iterating through the file data 2 bytes at a time, interpreting each byte pair as a single wchar_t that gets truncated to a 1-byte char, discarding the other byte. That is why your str ends up skipping every other character.
Process the file data using a char[] buffer instead. However, there are easier ways to read 7/8-bit file data into a std::string.
Lastly, in this statement:
std::wstring wword = std::wstring(to.begin(), to.end());
This is not the correct way to convert a std::string to a std::wstring. All you are doing is iterating through the chars converting each one as-is into a 2-byte wchar_t. Windows APIs expect wchar_t strings to be encoded in UTF-16, which your code is not converting to. You need to use MultiByteToWideChar(), std::wstring_convert, or other equivalent Unicode library call to perform that conversion. In which case, you first need to know the encoding of the source file in order to convert it to Unicode correctly.

POST request trucated with wstring type and well formed using string

I want to make a POST request using Wininet with UTF-8 special char like €.
void sendArticle()
{
LPCWSTR browser = L"MyClientApp/1.0";
LPCWSTR domain = L"127.0.0.1";
LPCWSTR methode = L"POST";
LPCWSTR page = L"/shopping/article.php";
std::wstring strContentType = L"Content-Type: application/x-www-form-urlencoded; charset=utf-8";
LPCWSTR contentType = strContentType.c_str();
std::wstring dataStr = L"article=thecontent";
LPVOID data = (LPVOID)dataStr.c_str();
HINTERNET hInternet = InternetOpenW(browser, INTERNET_OPEN_TYPE_PRECONFIG, NULL, NULL, 0);
HINTERNET hConnection = InternetConnectW(hInternet, domain, 80, NULL, NULL, INTERNET_SERVICE_HTTP, 0, 1);
HINTERNET hRequest = HttpOpenRequestW(hConnection, methode, page, NULL, NULL, NULL, 0, 1);
HttpSendRequestW(hRequest, contentType, strContentType.size(), data, dataStr.size());
}
If I use string type for variable dataStr, the content is sent correctly. If I use wstring type for the variable (to send UTF-8 content), the request is truncated.
I can observe TCP packet using Wireshark.
To resume, this line std::string dataStr = "article=thecontent"; works but this line std::wstring dataStr = L"article=thecontent"; not works.
I checked the int returned by size() method and it's correct. What is this strange phenomenon and how to solve it ?
HttpSendRequestW takes a size in bytes for the data size parameter but you are using wide strings so you really want:
HttpSendRequestW(hRequest, contentType, strContentType.size(), data, sizeof(WCHAR)*dataStr.size());
However, http really works properly using utf8 so sending utf16 is kinda weird (although as you can interpret the data-payload however you wish so sending utf16 is ok I guess).
What you really want to do is just send utf8 anyway - which you can do using narrow strings - just make dataStr a std::string and fill it with utf8 code points and you're good.
The server expects UTF8 data (dataStr), so data must be converted from UTF16 to UTF8. Keep everything else in wide char format, as those functions are properly handled by wide string APIs
//convert dataStr to UTF8
int sz = WideCharToMultiByte(CP_UTF8, 0, dataStr.c_str(), -1, 0, 0, 0, 0);
std::string utf8(sz, 0);
WideCharToMultiByte(CP_UTF8, 0, dataStr.c_str(), (int)dataStr.size(), &utf8[0], sz, 0, 0);
//send UTF8 data
HttpSendRequestW(hRequest, contentType, strContentType.size(), utf8.data(), utf8.size());
I had the same problem and this is the solution I found and it works perfectlly :
void sendHello()
{
LPCSTR header = "Content-Type: application/x-www-form-urlencoded; charset=utf-8";
std::string dataStr = u8"message=ééàà€€";
LPVOID myMessage = (LPVOID)dataStr.c_str();
HINTERNET hInternet = InternetOpenA("InetURL/1.0", INTERNET_OPEN_TYPE_PRECONFIG, NULL, NULL, 0);
HINTERNET hConnection = InternetConnectA(hInternet, "127.0.0.1", 80, " ", " ", INTERNET_SERVICE_HTTP, 0, 1);
HINTERNET hRequest = HttpOpenRequestA(hConnection, "POST", "/SendMessage.php", NULL, NULL, NULL, 0, 1);
HttpSendRequestA(hRequest, header, strlen(header), myMessage, dataStr.size());
}

BSTR conversion to UTF-8

I'm working with UIAutomation and I'm struggling with the localized BSTRs. I'm in Germany, so there are some special characters that are represented funny in the BSTRs. I'm logging the information and need to have them in UTF-8 to process later on.
I tried already every version of the answers that I could find regarding to WideCharToMultiByte, but that's just converting the funny character into an even funnier one. I'd really appreciate if anyone could tell me what I'm doing wrong, it's really bugging me.
So I tried both of the following versions and got both times this result (the upper one is the converted one, the lower the original one):
The first word should be "Schaltfläche" and the second "Fünf".
My tried code:
BSTR* origin;
_bstr_t originWrapper(*origin);
char* originChar = originWrapper;
size_t len = strlen(originChar) + 1;
int room = MultiByteToWideChar(CP_ACP, 0, originChar, -1, NULL, 0);
wchar_t* unicodeString = (wchar_t*)malloc((sizeof(wchar_t))*room);
MultiByteToWideChar(CP_ACP, 0, originChar, -1, unicodeString, room);
int size_needed = WideCharToMultiByte(CP_UTF8, 0, unicodeString, -1, NULL, 0, NULL, NULL);
char* utf8Char = (char*) malloc(size_needed);
WideCharToMultiByte(CP_UTF8, 0, unicodeString, -1, utf8Char, size_needed, NULL, NULL);
and
BSTR* origin;
_bstr_t originWrapper(*origin);
int size_needed = WideCharToMultiByte(CP_UTF8, 0, originWrapper, SysStringByteLen(*origin), NULL, 0, NULL, NULL);
std::string resultingString(size_needed, 0);
WideCharToMultiByte(CP_UTF8, 0, *origin, SysStringByteLen(*origin), &resultingString[0], size_needed, NULL, NULL);
BSTR is a pointer to UTF-16 (WCHAR) character data, preceded by the string length. So, your roundtrip through narrow strings is misguided, you should straight use WideCharToMultiByte:
std::string BSTRtoUTF8(BSTR bstr) {
int len = SysStringLen(bstr);
// special case because a NULL BSTR is a valid zero-length BSTR,
// but regular string functions would balk on it
if(len == 0) return "";
int size_needed = WideCharToMultiByte(CP_UTF8, 0, bstr, len, NULL, 0, NULL, NULL);
std::string ret(size_needed, '\0');
WideCharToMultiByte(CP_UTF8, 0, bstr, len, ret.data(), ret.size(), NULL, NULL);
return ret;
}
To check the validity of the conversion don't output the result to the console, as it doesn't support UTF-8 output by default (it interprets narrow strings not even as in CP_ACP, but in CP_OEM, go figure). Instead, write the output to a file and check it with a reliable editor supporting UTF-8.

Unable to obtain POST request response using wininet

I need to make a POST request to an API to get some XML data (http://freecite.library.brown.edu/welcome/api_instructions). This works fine with curl:
curl -H "Accept: application/xml" --data "citation=Udvarhelyi, I.S., Gatsonis, C.A., Epstein, A.M., Pashos, C.L., Newhouse, J.P. and McNeil, B.J. Acute Myocardial Infarction in the Medicare population: process of care and clinical outcomes. Journal of the American Medical Association, 1992; 18:2530-2536. " http://freecite.library.brown.edu:80/citations/create
So I am trying to do a similar thingy using Win32 SDK. This is my code:
void LoadData()
{
wil::unique_hinternet hInternet(InternetOpen(L"Dummy", INTERNET_OPEN_TYPE_PRECONFIG, NULL, NULL, 0));
wil::unique_hinternet hConnect(InternetConnect(hInternet.get(), L"http://freecite.library.brown.edu", 80, NULL, NULL, INTERNET_SERVICE_HTTP, 0, 0));
wil::unique_hinternet hRequest(HttpOpenRequest(hConnect.get(), L"POST", L"/citations/create", NULL, NULL, NULL, NULL, NULL));
wstring data = L"citation=Udvarhelyi, I.S., Gatsonis, C.A., Epstein, A.M., Pashos, C.L., Newhouse, J.P. and McNeil, B.J. Acute Myocardial Infarction in the Medicare population: process of care and clinical outcomes. Journal of the American Medical Association, 1992; 18:2530-2536.";
PCWSTR szHeaders = L"Accept: application/xml";
HttpSendRequest(hRequest.get(), szHeaders, 0, (LPVOID)data.c_str(), static_cast<int>(data.length()));
DWORD availableBytes = 0;
InternetQueryDataAvailable(hRequest.get(), &availableBytes, 0, 0);
PBYTE outputBuffer = (PBYTE)HeapAlloc(GetProcessHeap(), 0, availableBytes);
PBYTE nextBytes = outputBuffer;
DWORD bytesUsed = 0; // number of used bytes.
while (availableBytes)
{
DWORD downloadedBytes;
InternetReadFile(hRequest.get(), nextBytes, availableBytes, &downloadedBytes);
bytesUsed = bytesUsed + downloadedBytes;
InternetQueryDataAvailable(hRequest.get(), &availableBytes, 0, 0);
if (availableBytes > 0)
{
// lazy buffer growth here. Only alloc for what we need. could be optimized if we end up with huge payloads (>10MB).
// otherwise, this is good enough.
outputBuffer = (PBYTE)HeapReAlloc(GetProcessHeap(), 0, outputBuffer, bytesUsed + availableBytes);
nextBytes = outputBuffer + bytesUsed; // careful, pointer might have moved! Update cursor.
}
}
// Convert outputed XML to wide char
int size_needed = MultiByteToWideChar(CP_UTF8, 0, (PCCH)outputBuffer, bytesUsed, NULL, 0);
std::wstring wstrTo(size_needed, 0);
MultiByteToWideChar(CP_UTF8, 0, (PCCH)outputBuffer, bytesUsed, &wstrTo[0], size_needed);
wstring res = wstrTo;
}
The problem is, before entering the for loop, even after the call to InternetQueryDataAvailable, availableBytes comes out to be 0. As a result, I finally end up getting a blank string as response, whereas I was expecting a XML response.
Can anyone point me what am I doing wrongly, and how to fix it?
InternetConnect expects server name or IP address, so don't include "http://" in the address. Change to:
InternetConnect(handle, L"freecite.library.brown.edu"...);
Use UTF-8 for data. Other parameters for WinAPI functions are correctly using UTF-16, they automatically make the necessary conversions.
Change the header:
std::wstring szHeaders = L"Content-Type: application/x-www-form-urlencoded\r\n";
accept should be sent through HttpOpenRequest
const wchar_t *accept[] = { L"text/xml", NULL };
HINTERNET hrequest = HttpOpenRequest(hconnect, L"POST", L"/citations/create",
NULL, NULL, accept, 0, 0);
Note, if you don't specify accept (use NULL in its place) then the result can be in plain html.
The example below should return XML.
Note, for simplicity I put optional as ANSI string, but it should be UTF8, then you convert it to UTF16 woptional and send it. result will be UTF8 string, it should be converted to UTF16 for Windows's display.
#include <iostream>
#include <string>
#include <Windows.h>
#include <WinINet.h>
#pragma comment(lib, "wininet.lib")//include WinINet library
int main()
{
std::string result;
std::wstring server = L"freecite.library.brown.edu";
std::wstring objectname = L"/citations/create"; //file in this case!
std::wstring header = L"Content-Type: application/x-www-form-urlencoded\r\n";
std::string optional = "citation=Udvarhelyi, I.S., Gatsonis, C.A., Epstein";
HINTERNET hsession = InternetOpen(L"appname",
INTERNET_OPEN_TYPE_PRECONFIG, NULL, NULL, 0);
HINTERNET hconnect = InternetConnect(hsession, server.c_str(),
INTERNET_DEFAULT_HTTP_PORT, NULL, NULL, INTERNET_SERVICE_HTTP, 0, 0);
const wchar_t* accept[] = { L"text/xml", NULL };
HINTERNET hrequest = HttpOpenRequest(hconnect, L"POST", objectname.c_str(),
NULL, NULL, accept, 0, 0);
if(HttpSendRequest(hrequest, header.c_str(), header.size(),
&optional[0], optional.size()))
{
DWORD blocksize = 4096;
DWORD received = 0;
std::string block(blocksize, 0);
while (InternetReadFile(hrequest, &block[0], blocksize, &received)
&& received)
{
block.resize(received);
result += block;
}
std::cout << result << std::endl;
}
if (hrequest) InternetCloseHandle(hrequest);
if (hconnect) InternetCloseHandle(hconnect);
if (hsession) InternetCloseHandle(hsession);
return 0;
}

Why filename has different bytes after converting UTF16 -> UTF8 -> UTF16 in winapi?

I have next file:
I use ReadDirectoryChangesW for reading changes in current folder.
And I get path to this file: L"TEST Ӡ⬨☐.ipt":
Next, I want to convert this to utf8 and back:
std::string wstringToUtf8(const std::wstring& source) {
const int size = WideCharToMultiByte(CP_UTF8, 0, source.data(), static_cast<int>(source.size()), NULL, 0, NULL, NULL);
std::vector<char> buffer8(size);
WideCharToMultiByte(CP_UTF8, 0, source.data(), static_cast<int>(source.size()), buffer8.data(), size, NULL, NULL);
}
std::wstring utf8ToWstring(const std::string& source) {
const int size = MultiByteToWideChar(CP_UTF8, 0, source.data(), static_cast<int>(source.size()), NULL, 0);
std::vector<wchar_t> buffer16(size);
MultiByteToWideChar(CP_UTF8, 0, source.data(), static_cast<int>(source.size()), buffer16.data(), size);
}
int main() {
// Some code with ReadDirectoryChangesW and
// ...
// std::wstring fileName = "L"TEST Ӡ⬨☐.ipt""
// ...
std::string filenameUTF8 = wstringToUtf8(fileName);
std::wstring filename2 = utf8ToWstring(filenameUTF8);
assert(filenameUTF8 == filename2); // FAIL!
return 0;
}
But I catch assert.
filename2:
Different bits: [29]
Why?
57216 seems to fall in to surrogate pair range, used in UTF-16 to encode non-BMP code points. They need to be given in pairs, or decoding won't give you correct codepoint.
65533 is a special error character which decoder gives because other surrogate is missing.
To put it another way: Your original string is not valid UTF-16 string.
More info on Wikipedia.