How to translate a virtual-key code to char (depending on locale)? - c++

I am playing around with translating user's keystrokes between the different installed languages on their Windows machine.
I found this article about virtual-key codes, and how they map to characters, and also this function to perform the mapping. But it doesn't seem to work like I expected it to.
This is my attempt at sending the virtual-key code of "A" (which is 0x41), and translating it to the character "ש" in the Hebrew keyboard (which is what pressing that key outputs to the screen, while the user is on the Hebrew keyboard layout). It still prints only "A", regardless of my current active layout.
#include <windows.h>
#include <iostream>
#include <stdlib.h>
#include <tchar.h>
int main()
{
HKL lpList[2];
GetKeyboardLayoutList(2, lpList); // returns {0x04090409 , 0xf03d040d} on my machine, which is {en-US, he-IL}
HKL hkl = lpList[1]; // sets to he-IL
char ch = MapVirtualKeyEx(0x41, MAPVK_VK_TO_CHAR, hkl); //0x41 is the Virtual Key of the keyboard button 'A'
std::cout << "ch: " << ch << std::endl; //prints "ch: A", I want it to print "ch: ש"
}
What am I missing? Is there some other way to achieve what I am trying to do?
I just tried
UINT VKCode = LOBYTE(VkKeyScan('ש')); // returns 0xbf
UINT ScanCode = MapVirtualKeyEx(VKCode, MAPVK_VK_TO_VSC, hkl); // returns 0x35
UINT VKCode2 = MapVirtualKeyEx(ScanCode, MAPVK_VSC_TO_VK, hkl); // once again 0xbf - unsurprisingly
TCHAR ch = MapVirtualKeyEx(VKCode2, MAPVK_VK_TO_CHAR, hkl); // now it returns '.'
So I convert char -> vk -> sc -> vk -> char, and end up with a different character than the one I started with. Maybe there is a different way to convert a `virtual-key code* to char?

You can use ToUnicodeEx API.
And if you want to output characters correctly, you can refer to: How to print Latin characters to the C++ console properly on Windows?
I created a sample and used the following code:
int main()
{
SetConsoleOutputCP(1256);
_setmode(_fileno(stdout), _O_U16TEXT);
HKL lpList[2];
GetKeyboardLayoutList(2, lpList);
HKL hkl = lpList[1]; // sets to he-IL
UINT VKCode = (VkKeyScanExW(L'ש',hkl));
UINT ScanCode = MapVirtualKeyExW(VKCode, MAPVK_VK_TO_VSC, hkl);
UINT VKCode2 = MapVirtualKeyExW(ScanCode, MAPVK_VSC_TO_VK, hkl);
TCHAR ch1 = MapVirtualKeyExW(VKCode2, MAPVK_VK_TO_CHAR, hkl);
BYTE uKeyboardState[256];
WCHAR oBuffer[5] = {};
//Initialization of KeyBoardState
for (int i = 0; i < 256; ++i)
{
uKeyboardState[i] = 0;
}
TCHAR buffer[1024];
ToUnicodeEx(VKCode, ScanCode, uKeyboardState, buffer, 1024, 0, hkl);
wcout << buffer;
return 0;
}
And it works for me:

According to the documentation pages (MapVirtualKeyExA function and MapVirtualKeyExW function) the function returns an UINT and not a char:
UINT MapVirtualKeyW(
UINT uCode,
UINT uMapType
);
Depending on your projact settings you'll need to inerpret this result either as char or as wchar_t, that's the reason.
You can overcome this, if you use TCHAR ch = ..., and let the project settings expand the TCHAR macro to the correct type properly.
The harder part is to decide if you need to use std::cout or std::wcout (std::cout, std::wcout). You could use some type check (e.g. if(std::is_same(ch,wchar_t)) { ... } else { ... }) to do this properly.

Related

How to handle pasted text correctly via GetConsoleInput()?

In Windows console, we can use GetConsoleInput() to get raw keyboard (and more) input. I want to use it to implement a custom function that read keystrokes with possible CTRL, SHIFT, ALT status. A simplified version of the function is
// for demo only, no error checking ...
struct ret {
wchar_t ch; // 2-byte UTF-16 in Windows
DWORD control_keys;
};
ret getch() {
HANDLE in = GetStdHandle(STD_INPUT_HANDLE);
INPUT_RECORD buf;
DWORD cnt;
for (;;) {
ReadConsoleInput(in, &buf, 1, &cnt);
if (buf.EventType != KEY_EVENT)
continue;
const KEY_EVENT_RECORD& rec = buf.Event.KeyEvent;
if (!rec.bKeyDown)
continue;
if (!rec.uChar.UnicodeChar)
continue;
return { rec.uChar.UnicodeChar,rec.dwControlKeyState };
}
}
It works fine, except that when I try to paste character not representable in 2 bytes in UTF-16, the UnicodeChar field is 0 when bKeyDown==true, and the UnicodeChar field is the pasted content when bKeyDown==false. Can anyone tell why it is the case and suggest possible workarounds?
Here is some demo code and result.

I use the function "FillConsoleOutputCharacter" but the application stoped working

I want to make a console application, so i searched for Windows API, and get this code, but it stoped working when i run it, what should i do?
Source code:
#include "windows.h"
#include "stdio.h"
#include <conio.h> //console i/o
int main()
{
HANDLE hOut;
// 获取标准输出设备句柄
hOut = GetStdHandle(STD_OUTPUT_HANDLE);
// 窗口信息
CONSOLE_SCREEN_BUFFER_INFO bInfo;
// 获取窗口信息
GetConsoleScreenBufferInfo(hOut, &bInfo );
printf("\n\nThe soul selects her own society,\n");
printf("Then shuts the door;\n");
printf("On her devine majority\n");
printf("Obtrude no more.\n\n");
_getch();
COORD pos = {0, 0};
// 向窗口中填充字符以获得清屏的效果
FillConsoleOutputCharacter(hOut, ' ', bInfo.dwSize.X * bInfo.dwSize.Y, pos, NULL);
// 关闭标准输出设备句柄
CloseHandle(hOut);
return 0;
}
Could someone tell me how to solve? PLZPLZ!!
You are passing NULL to the lpNumberOfCharsWritten parameter of FillConsoleOutputCharacter(), but the documentation says:
lpNumberOfCharsWritten [out]
A pointer to a variable that receives the number of characters actually written to the console screen buffer.
It does not say that NULL is valid for that parameter. So give it what it wants - a pointer to a variable, eg:
DWORD dwNumWritten;
FillConsoleOutputCharacter(hOut, ' ', bInfo.dwSize.X * bInfo.dwSize.Y, pos, &dwNumWritten);

C++ - Serial (COM) port - asio | VS2015 error(s)

1. What are we trying to achieve (and why)
We're currently trying to communicate with an industrial robot over USB(COM)<->serial(RS232). We would like to control the robot from a C++ application.
2. What setup do we have
We're using Visual Studio C++ 2015 with the built-in C++ compiler. Creating a "Win32 Console Application".
3. What steps have we taken?
We've got the connection working in Processing (Java) using Serial but we would like to implement it in C++.
3.1 Boost ASIO
We're using Boost ASIO (installed with NuGet package manager).
At this point we get 2 compile errors indicating the same problem:
Error C2694 'const char *asio::detail::system_category::name(void) const': overriding virtual function has less restrictive exception specification than base class virtual member function 'const char *std::error_category::name(void) noexcept const'
I figured that this error is most likely not caused by my code (I haven't changed the library). So I believe the VS21015 C++ compiler is not fully compatible with boost::asio?
I've found two other links/posts with somewhat the same error:
https://github.com/chriskohlhoff/asio/issues/35
And I tried the following define:
#ifndef ASIO_ERROR_CATEGORY_NOEXCEPT
#define ASIO_ERROR_CATEGORY_NOEXCEPT noexcept(true)
#endif // !defined(ASIO_ERROR_CATEGORY_NOEXCEPT)
Error in websocketpp library and boost in windows Visual Studio 2015
With the following define:
#define ASIO_ERROR_CATEGORY_NOEXCEPT noexcept(true)
//or
#define ASIO_ERROR_CATEGORY_NOEXCEPT 1
But it did not resolve the errors. The even caused a lot of random syntax errors and undeclared identifiers (which would indicate missing the include of iterator.
3.2 Windows(base) and C
We've used some C code (and added in a little C++ Debugging) to detect COM ports. But it simply doesn't show them (however it does in device explorer). We even had to convert a LPCWSTR to char array (wtf?).
#include <stdio.h>
#include <cstdio>
#include <iostream>
#include <windows.h>
#include <winbase.h>
wchar_t *convertCharArrayToLPCWSTR(const char* charArray)
{
wchar_t* wString = new wchar_t[4096];
MultiByteToWideChar(CP_ACP, 0, charArray, -1, wString, 4096);
return wString;
}
BOOL COM_exists(int port)
{
char buffer[7];
COMMCONFIG CommConfig;
DWORD size;
if (!(1 <= port && port <= 255))
{
return FALSE;
}
snprintf(buffer, sizeof buffer, "COM%d", port);
size = sizeof CommConfig;
// COM port exists if GetDefaultCommConfig returns TRUE
// or changes <size> to indicate COMMCONFIG buffer too small.
std::cout << "COM" << port << " | " << (GetDefaultCommConfig(convertCharArrayToLPCWSTR(buffer), &CommConfig, &size)
|| size > sizeof CommConfig) << std::endl;
return (GetDefaultCommConfig(convertCharArrayToLPCWSTR(buffer), &CommConfig, &size)
|| size > sizeof CommConfig);
}
int main()
{
int i;
for (i = 1; i < 256; ++i)
{
if (COM_exists(i))
{
printf("COM%d exists\n", i);
}
}
std::cin.get();
return 0;
}
3.3 Another Serial.h from the internet
I believe it was from: http://www.codeguru.com/cpp/i-n/network/serialcommunications/article.php/c2503/CSerial--A-C-Class-for-Serial-Communications.htm
Same rules, I include the library, everything compiles fine. (Test written below)
#include <iostream>
#include <string>
#include "Serial.h"
int main(void)
{
CSerial serial;
if (serial.Open(8, 9600))
std::cout << "Port opened successfully" << std::endl;
else
std::cout << "Failed to open port!" << std::endl;
std::cin.get();
return 0;
}
But it still doesn't show my COM ports... (They do show up in device explorer though.)
4 So what is actually working?
This particular piece of code WILL display the right COM port...
TCHAR lpTargetPath[5000]; // buffer to store the path of the COMPORTS
DWORD test;
for (int i = 0; i<255; i++) // checking ports from COM0 to COM255
{
CString str;
str.Format(_T("%d"), i);
CString ComName = CString("COM") + CString(str); // converting to COM0, COM1, COM2
test = QueryDosDevice(ComName, lpTargetPath, 5000);
// Test the return value and error if any
if (test != 0) //QueryDosDevice returns zero if it didn't find an object
{
std::cout << "COM" << i << std::endl; // add to the ComboBox
}
}
Maybe you need to update to a more recent version of boost if you have not already.
The issue with the second part is you naming of the COM port. Only COM1 to 4 can be a 'bald' name. You need to format it like this:
\\.\COM9
Clearly take care of escape sequences here:
snprintf(buffer, sizeof(buffer), "\\\\.\\COM%d", port);
EDIT: Actually you don't need to do that with GetCommConfig, only with CreateFile for opening the port. It should work. I'd suspect your conversion to wide string.
You may also find a performance enhancement of you load the cfgmgr32.dll library first.
Using CreateFile for COM port detection can result in in BSODs on some Windows systems. Particular culprits are some software modems and some bluetooth devices which show up s COM ports. So using GetDefaultCommConfig is the way to go generally though it may not work for all ports.
So what else can you do? Use setupapi.dll. Sadly this is not completely trivial..
namespace {
typedef HKEY (__stdcall *OpenDevRegKey)(HDEVINFO, PSP_DEVINFO_DATA, DWORD, DWORD, DWORD, REGSAM);
typedef BOOL (__stdcall *ClassGuidsFromName)(LPCTSTR, LPGUID, DWORD, PDWORD);
typedef BOOL (__stdcall *DestroyDeviceInfoList)(HDEVINFO);
typedef BOOL (__stdcall *EnumDeviceInfo)(HDEVINFO, DWORD, PSP_DEVINFO_DATA);
typedef HDEVINFO (__stdcall *GetClassDevs)(LPGUID, LPCTSTR, HWND, DWORD);
typedef BOOL (__stdcall *GetDeviceRegistryProperty)(HDEVINFO, PSP_DEVINFO_DATA, DWORD, PDWORD, PBYTE, DWORD, PDWORD);
} // namespace
typedef std::basic_string<TCHAR> tstring;
struct PortEntry
{
tstring dev;
tstring name;
bool operator== (tstring const& device) const {
return dev == device; // TODO maybe use case-insentitive compare.
}
bool operator!= (tstring const& device) const {
return !(*this == device);
}
};
typedef std::vector<PortEntry> PortList;
// ...
DllHandler setupapi; // RAII class for LoadLibrary / FreeLibrary
if (!setupapi.load(_T("SETUPAPI.DLL"))) {
throw std::runtime_error("Can\'t open setupapi.dll");
}
OpenDevRegKey fnOpenDevRegKey =
setupapi.GetProc("SetupDiOpenDevRegKey");
ClassGuidsFromName fnClassGuidsFromName =
#ifdef UNICODE
setupapi.GetProc("SetupDiClassGuidsFromNameW");
#else
setupapi.GetProc("SetupDiClassGuidsFromNameA");
#endif
DestroyDeviceInfoList fnDestroyDeviceInfoList =
setupapi.GetProc("SetupDiDestroyDeviceInfoList");
EnumDeviceInfo fnEnumDeviceInfo =
setupapi.GetProc("SetupDiEnumDeviceInfo");
GetClassDevs fnGetClassDevs =
#ifdef UNICODE
setupapi.GetProc("SetupDiGetClassDevsW");
#else
setupapi.GetProc("SetupDiGetClassDevsA");
#endif
GetDeviceRegistryProperty fnGetDeviceRegistryProperty =
#ifdef UNICODE
setupapi.GetProc("SetupDiGetDeviceRegistryPropertyW");
#else
setupapi.GetProc("SetupDiGetDeviceRegistryPropertyA");
#endif
if ((fnOpenDevRegKey == 0) ||
(fnClassGuidsFromName == 0) ||
(fnDestroyDeviceInfoList == 0) ||
(fnEnumDeviceInfo == 0) ||
(fnGetClassDevs == 0) ||
(fnGetDeviceRegistryProperty == 0)
) {
throw std:runtime_error(
"Could not locate required functions in setupapi.dll"
);
}
// First need to get the GUID from the name "Ports"
//
DWORD dwGuids = 0;
(*fnClassGuidsFromName)(_T("Ports"), NULL, 0, &dwGuids);
if (dwGuids == 0)
{
throw std::runtime_error("Can\'t get GUIDs from \'Ports\' key in the registry");
}
// Allocate the needed memory
std::vector<GUID> guids(dwGuids);
// Get the GUIDs
if (!(*fnClassGuidsFromName)(_T("Ports"), &guids[0], dwGuids, &dwGuids))
{
throw std::runtime_error("Can\'t get GUIDs from \'Ports\' key in the registry");
}
// Now create a "device information set" which is required to enumerate all the ports
HDEVINFO hdevinfoset = (*fnGetClassDevs)(&guids[0], NULL, NULL, DIGCF_PRESENT);
if (hdevinfoset == INVALID_HANDLE_VALUE)
{
throw std::runtime_error("Can\'t get create device information set.");
}
// Finished with the guids.
guids.clear();
// Finally do the enumeration
bool more = true;
int index = 0;
SP_DEVINFO_DATA devinfo;
while (more)
{
//Enumerate the current device
devinfo.cbSize = sizeof(SP_DEVINFO_DATA);
more = (0 != (*fnEnumDeviceInfo)(hdevinfoset, index, &devinfo));
if (more)
{
PortEntry entry;
//Did we find a serial port for this device
bool added = false;
//Get the registry key which stores the ports settings
HKEY hdevkey = (*fnOpenDevRegKey)(
hdevinfoset,
&devinfo,
DICS_FLAG_GLOBAL,
0,
DIREG_DEV,
KEY_QUERY_VALUE
);
if (hdevkey)
{
//Read in the name of the port
TCHAR port_name[256];
DWORD size = sizeof(port_name);
DWORD type = 0;
if ((::RegQueryValueEx(
hdevkey,
_T("PortName"),
NULL,
&type,
(LPBYTE) port_name,
&size
) == ERROR_SUCCESS) &&
(type == REG_SZ)
) {
// If it looks like "COMX" then
// add it to the array which will be returned
tstring s = port_name;
size_t len = s.length();
String const cmp(s, 0, 3);
if (CaseInsensitiveCompareEqual(String("COM"), cmp)) {
entry.name = s;
entry.dev = "\\\\.\\" + s;
added = true;
}
}
// Close the key now that we are finished with it
::RegCloseKey(hdevkey);
}
// If the port was a serial port, then also try to get its friendly name
if (added)
{
TCHAR friendly_name[256];
DWORD size = sizeof(friendly_name);
DWORD type = 0;
if ((fnGetDeviceRegistryProperty)(
hdevinfoset,
&devinfo,
SPDRP_DEVICEDESC,
&type,
(PBYTE)friendly_name,
size,
&size
) &&
(type == REG_SZ)
) {
entry.name += _T(" (");
entry.name += friendly_name;
entry.name += _T(")");
}
//
// Add the port to our vector.
//
// If we already have an entry for the given device, then
// overwrite it (this will add the friendly name).
//
PortList::iterator i = std::find(
ports.begin(),
ports.end(),
entry.dev
);
if (i == ports.end()) {
ports.push_back(entry);
}
else {
(*i) = entry;
}
}
}
++index;
}
// Free up the "device information set" now that we are finished with it
(*fnDestroyDeviceInfoList)(hdevinfoset);
You'll need to do a bit of work to make that compilable but it should work.
See https://support.microsoft.com/en-us/kb/259695

can't get current keyboard layout

I have tried GetKeyboardLayoutName() and GetKeyboardLayout() for getting the current keyboard layout, but they both give me the default layout and changing the layout doesn't affect the output!
while(1)
{
Sleep(5);
for(int i = 8; i < 191; i++)
{
if(GetAsyncKeyState(i)&1 ==1)
{
TCHAR szKeyboard[KL_NAMELENGTH];
GetKeyboardLayoutName(szKeyboard);
if(GetAsyncKeyState(i)&1 ==1)
{
TCHAR szKeyboard[KL_NAMELENGTH];
GetKeyboardLayoutName(szKeyboard);
cout << szKeyboard << endl ;
}
}
}
}
It always gives me "00000409" when the default layout is set to English, while I expect it to be "00000429" when I change the layout to Farsi.
My first question here, I used to find all my answers by just searching. But right now I'm driving crazy after hours of searching around and getting nothing...
one thing that you need to notice is that ::GetKeyboardLayout (..) gets the lang for the passed thread identifer as a param.
each input thread can have different input locale lang.
for instance if you put lets IE in the foreground and press Alt+Shift the lang changes to UK. ( you can see it in the taskbar )
now if you will Alt+Tab to another window ( which will be in foregorund ) you will see that lang dont have to stay UK.
so what you need to check is what is the thread id you are passing.
look at this code it will get you the lang for the current active window:
GUITHREADINFO Gti;
::ZeroMemory ( &Gti,sizeof(GUITHREADINFO));
Gti.cbSize = sizeof( GUITHREADINFO );
::GetGUIThreadInfo(0,&Gti);
DWORD dwThread = ::GetWindowThreadProcessId(Gti.hwndActive,0);
HKL lang = ::GetKeyboardLayout(dwThread);
to use GUITHREADINFO you need to define WINVER 0x500.
put this in the stdafx.h before all the include.
#ifdef WINVER
#undef WINVER
#endif
#define WINVER 0x500
source: GetKeyboardLayout not returning correct language ID (WINXP)
The following code is simple and works fine. If you write a command line program, the GetKeyboardLayout API does't work in windows cmd or powershell, you can test it in babun(an open source windows shell).
#include <Windows.h>
int getInputMethod() {
HWND hwnd = GetForegroundWindow();
if (hwnd) {
DWORD threadID = GetWindowThreadProcessId(hwnd, NULL);
HKL currentLayout = GetKeyboardLayout(threadID);
unsigned int x = (unsigned int)currentLayout & 0x0000FFFF;
return ((int)x);
}
return 0;
}

Convert UTF-16 to UTF-8

I am current using VC++ 2008 MFC. Due to PostgreSQL doesn't support UTF-16 (Encoding used by Windows for Unicode), I need to convert string from UTF-16 to UTF-8, before store it.
Here is my code snippet.
// demo.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include "demo.h"
#include "Utils.h"
#include <iostream>
#ifdef _DEBUG
#define new DEBUG_NEW
#endif
// The one and only application object
CWinApp theApp;
using namespace std;
int _tmain(int argc, TCHAR* argv[], TCHAR* envp[])
{
int nRetCode = 0;
// initialize MFC and print and error on failure
if (!AfxWinInit(::GetModuleHandle(NULL), NULL, ::GetCommandLine(), 0))
{
// TODO: change error code to suit your needs
_tprintf(_T("Fatal Error: MFC initialization failed\n"));
nRetCode = 1;
}
else
{
// TODO: code your application's behavior here.
}
CString utf16 = _T("Hello");
std::cout << utf16.GetLength() << std::endl;
CStringA utf8 = UTF8Util::ConvertUTF16ToUTF8(utf16);
std::cout << utf8.GetLength() << std::endl;
getchar();
return nRetCode;
}
and the conversion functions.
namespace UTF8Util
{
//----------------------------------------------------------------------------
// FUNCTION: ConvertUTF8ToUTF16
// DESC: Converts Unicode UTF-8 text to Unicode UTF-16 (Windows default).
//----------------------------------------------------------------------------
CStringW ConvertUTF8ToUTF16( __in const CHAR * pszTextUTF8 )
{
//
// Special case of NULL or empty input string
//
if ( (pszTextUTF8 == NULL) || (*pszTextUTF8 == '\0') )
{
// Return empty string
return L"";
}
//
// Consider CHAR's count corresponding to total input string length,
// including end-of-string (\0) character
//
const size_t cchUTF8Max = INT_MAX - 1;
size_t cchUTF8;
HRESULT hr = ::StringCchLengthA( pszTextUTF8, cchUTF8Max, &cchUTF8 );
if ( FAILED( hr ) )
{
AtlThrow( hr );
}
// Consider also terminating \0
++cchUTF8;
// Convert to 'int' for use with MultiByteToWideChar API
int cbUTF8 = static_cast<int>( cchUTF8 );
//
// Get size of destination UTF-16 buffer, in WCHAR's
//
int cchUTF16 = ::MultiByteToWideChar(
CP_UTF8, // convert from UTF-8
MB_ERR_INVALID_CHARS, // error on invalid chars
pszTextUTF8, // source UTF-8 string
cbUTF8, // total length of source UTF-8 string,
// in CHAR's (= bytes), including end-of-string \0
NULL, // unused - no conversion done in this step
0 // request size of destination buffer, in WCHAR's
);
ATLASSERT( cchUTF16 != 0 );
if ( cchUTF16 == 0 )
{
AtlThrowLastWin32();
}
//
// Allocate destination buffer to store UTF-16 string
//
CStringW strUTF16;
WCHAR * pszUTF16 = strUTF16.GetBuffer( cchUTF16 );
//
// Do the conversion from UTF-8 to UTF-16
//
int result = ::MultiByteToWideChar(
CP_UTF8, // convert from UTF-8
MB_ERR_INVALID_CHARS, // error on invalid chars
pszTextUTF8, // source UTF-8 string
cbUTF8, // total length of source UTF-8 string,
// in CHAR's (= bytes), including end-of-string \0
pszUTF16, // destination buffer
cchUTF16 // size of destination buffer, in WCHAR's
);
ATLASSERT( result != 0 );
if ( result == 0 )
{
AtlThrowLastWin32();
}
// Release internal CString buffer
strUTF16.ReleaseBuffer();
// Return resulting UTF16 string
return strUTF16;
}
//----------------------------------------------------------------------------
// FUNCTION: ConvertUTF16ToUTF8
// DESC: Converts Unicode UTF-16 (Windows default) text to Unicode UTF-8.
//----------------------------------------------------------------------------
CStringA ConvertUTF16ToUTF8( __in const WCHAR * pszTextUTF16 )
{
//
// Special case of NULL or empty input string
//
if ( (pszTextUTF16 == NULL) || (*pszTextUTF16 == L'\0') )
{
// Return empty string
return "";
}
//
// Consider WCHAR's count corresponding to total input string length,
// including end-of-string (L'\0') character.
//
const size_t cchUTF16Max = INT_MAX - 1;
size_t cchUTF16;
HRESULT hr = ::StringCchLengthW( pszTextUTF16, cchUTF16Max, &cchUTF16 );
if ( FAILED( hr ) )
{
AtlThrow( hr );
}
// Consider also terminating \0
++cchUTF16;
//
// WC_ERR_INVALID_CHARS flag is set to fail if invalid input character
// is encountered.
// This flag is supported on Windows Vista and later.
// Don't use it on Windows XP and previous.
//
#if (WINVER >= 0x0600)
DWORD dwConversionFlags = WC_ERR_INVALID_CHARS;
#else
DWORD dwConversionFlags = 0;
#endif
//
// Get size of destination UTF-8 buffer, in CHAR's (= bytes)
//
int cbUTF8 = ::WideCharToMultiByte(
CP_UTF8, // convert to UTF-8
dwConversionFlags, // specify conversion behavior
pszTextUTF16, // source UTF-16 string
static_cast<int>( cchUTF16 ), // total source string length, in WCHAR's,
// including end-of-string \0
NULL, // unused - no conversion required in this step
0, // request buffer size
NULL, NULL // unused
);
ATLASSERT( cbUTF8 != 0 );
if ( cbUTF8 == 0 )
{
AtlThrowLastWin32();
}
//
// Allocate destination buffer for UTF-8 string
//
CStringA strUTF8;
int cchUTF8 = cbUTF8; // sizeof(CHAR) = 1 byte
CHAR * pszUTF8 = strUTF8.GetBuffer( cchUTF8 );
//
// Do the conversion from UTF-16 to UTF-8
//
int result = ::WideCharToMultiByte(
CP_UTF8, // convert to UTF-8
dwConversionFlags, // specify conversion behavior
pszTextUTF16, // source UTF-16 string
static_cast<int>( cchUTF16 ), // total source string length, in WCHAR's,
// including end-of-string \0
pszUTF8, // destination buffer
cbUTF8, // destination buffer size, in bytes
NULL, NULL // unused
);
ATLASSERT( result != 0 );
if ( result == 0 )
{
AtlThrowLastWin32();
}
// Release internal CString buffer
strUTF8.ReleaseBuffer();
// Return resulting UTF-8 string
return strUTF8;
}
} // namespace UTF8Util
However, during runtime, I get the exception at
ATLASSERT( cbUTF8 != 0 );
while trying to get size of destination UTF-8 buffer
What thing I had missed out?
If I am testing using a Chinese characters, How can I verify the resultant UTF-8 string is correct?
You can also use the ATL String Conversion Macros - to convert from UTF-16 to UTF-8 use CW2A and pass CP_UTF8 as the code page, e.g.:
CW2A utf8(buffer, CP_UTF8);
const char* data = utf8.m_psz;
The problem is you specified the WC_ERR_INVALID_CHARS flag:
Windows Vista and later: Fail if an invalid input character is encountered. If this flag is not set, the function silently drops illegal code points. A call to GetLastError returns ERROR_NO_UNICODE_TRANSLATION. Note that this flag only applies when CodePage is specified as CP_UTF8 or 54936 (for Windows Vista and later). It cannot be used with other code page values.
Your conversion function seems quite long. How does this one work for you?
//----------------------------------------------------------------------------
// FUNCTION: ConvertUTF16ToUTF8
// DESC: Converts Unicode UTF-16 (Windows default) text to Unicode UTF-8.
//----------------------------------------------------------------------------
CStringA ConvertUTF16ToUTF8( __in LPCWSTR pszTextUTF16 ) {
if (pszTextUTF16 == NULL) return "";
int utf16len = wcslen(pszTextUTF16);
int utf8len = WideCharToMultiByte(CP_UTF8, 0, pszTextUTF16, utf16len,
NULL, 0, NULL, NULL );
CArray<CHAR> buffer;
buffer.SetSize(utf8len+1);
buffer.SetAt(utf8len, '\0');
WideCharToMultiByte(CP_UTF8, 0, pszTextUTF16, utf16len,
buffer.GetData(), utf8len, 0, 0 );
return buffer.GetData();
}
I see you use a function called StringCchLengthW to get the required length of the output buffer. Most of the places I look recommend using the WideCharToMultiByte function itself to tell you how many CHARs it wants.
Edit:
As Rob pointed out, you can use CW2A with the CP_UTF8 code page:
CStringA str = CW2A(wStr, CP_UTF8);
While I'm editing, I can answer your second question:
How can I verify the resultant UTF-8 string is correct?
Write it to a text file, then open it in Mozilla Firefox or an equivillant program. In the View menu, you can go to Character Encoding and switch manually to UTF-8 (assuming Firefox didn't guess it correctly to begin with). Compare it with a UTF-16 document with the same text and see if there are any differences.