Store non-English string in std::string - c++

I have a simple string in std::wstring
std::wstring tempStr = _T("F:\\Projects\\Current_자동_\\Cam.xml");
I want to store this string in a std::string.
I have tried the below code but the result is not the same as input string
std::wstring tempStr = _T("F:\\Projects\\Current_자동_\\Cam.xml");
//setup converter
typedef std::codecvt_utf8_utf16 <wchar_t> convert_type;
std::wstring_convert<convert_type, wchar_t> converter;
//use converter (.to_bytes: wstr->str, .from_bytes: str->wstr)
std::string converted_str = converter.to_bytes( tempStr );
The Korean string present in the input string is converted to "ìžë™".
Is there any way I can get the same string in std::string?
Expected result:
converted_str should contain F:\Projects\Current_자동_\Cam.xml
Below is an screenshot of debugging showing 3 values in 3 scenarios (conversion in 3 ways). But none of them gives the desired value.

Your conversion code is fine.
In fact, in UTF-8 (the string you store in std::string), the characters 자동 corresponds to:
자 (UTF-16 0xC790) ---> UTF-8: EC 9E 90
동 (UTF-16 0xB3D9) ---> UTF-8: EB 8F 99
If you run the following program, which just prints the converted UTF-8 bytes, you get this output:
ec 9e 90 eb 8f 99
#include <iomanip> // For std::hex
#include <iostream> // For console output
#include <string> // For STL strings
#include <codecvt> // For Unicode conversions
void print_char_hex(const char ch)
{
auto * p = reinterpret_cast<const unsigned char*>(&ch);
int i = *p;
std::cout << std::hex << i << ' ';
}
int main()
{
std::wstring utf16_str = L"\xC790\xB3D9";
// setup converter
typedef std::codecvt_utf8_utf16<wchar_t> convert_type;
std::wstring_convert<convert_type, wchar_t> converter;
// use converter (.to_bytes: wstr->str, .from_bytes: str->wstr)
std::string converted_str = converter.to_bytes( utf16_str );
// Output the converted bytes (UTF-8)
for (size_t i = 0; i < converted_str.length(); ++i)
{
print_char_hex(converted_str[i]);
}
std::cout << std::endl;
}

I think the best solution would be to use the wide-char APIs to open the file, e.g. CreateFileW(...);, because then you can use the wide-char file name directly.
If this is not possible, maybe the string should not be converted to UTF8, but to the system default ANSI code page.
I think this might work:
char out[200];
wchar_t * in = L"F:\\Projects\\Current_자동_\\Cam.xml";
WideCharToMultiByte(CP_ACP, 0, in, 100, out, 100, 0, 0);
or maybe another Korean code page:
WideCharToMultiByte(949, 0, in, 100, out, 100, 0, 0);
WideCharToMultiByte(1361, 0, in, 100, out, 100, 0, 0);
WideCharToMultiByte(10003, 0, in, 100, out, 100, 0, 0);
WideCharToMultiByte(20833, 0, in, 100, out, 100, 0, 0);
WideCharToMultiByte(20949, 0, in, 100, out, 100, 0, 0);
WideCharToMultiByte(50225, 0, in, 100, out, 100, 0, 0);
WideCharToMultiByte(50933, 0, in, 100, out, 100, 0, 0);
WideCharToMultiByte(51949, 0, in, 100, out, 100, 0, 0);
The code page ids can be found here:
http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756(v=vs.85).aspx
good luck :-)

This works.. You can tell because the conversion back to UTF16 is valid.. If you write the UTF8 string to a file, it will also display properly. This way, you now have two ways of validating that it works.
// UTF16ToUTF8.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include <windows.h>
#include <iostream>
#include <codecvt>
std::wstring ToUTF16(const std::string &data)
{
return std::wstring_convert<std::codecvt_utf8<wchar_t>>().from_bytes(data);
}
std::string ToUTF8(const std::wstring &data)
{
return std::wstring_convert<std::codecvt_utf8<wchar_t>>().to_bytes(data);
}
int _tmain(int argc, _TCHAR* argv[])
{
std::wstring u16 = L"_자동_";
std::string u8 = ToUTF8(u16);
MessageBoxW(NULL, ToUTF16(u8).c_str(), L"", 0);
std::cin.get();
return 0;
}

You can store UTF-8 in std:string as regular char sequence. Here's library with some useful things, such as length() and everything about indexing, you may want to have http://utfcpp.sourceforge.net/.
For windows console you need to set codepage to 65001 and will become UTF-8.
Sadly or not, std::wstring and whole wchar_t thing doesn't specify any specific encoding.
By the way, you're using Managed C++, why wouldn't you use .NET Framework's System::String^? There's no problems with encodings at all. http://msdn.microsoft.com/ru-ru/library/system.string(v=vs.110).aspx?cs-save-lang=1&cs-lang=cpp

The problem is not in your string conversion code. This is a typical source file encoding problem. Visual studio does not use Unicode as default so you should convert your source file's encoding to UTF-8 yourself. To make this conversion you can open your file with notepad++ and click Encoding->Convert to UTF-8
Note1: In VS2010 and vs2012 if you write non-ascii characters to a source file visual studio now warns you and offers to make this conversion.
Note2: From your use of macro _T() I predict this is targeted only to Windows. If you try to build UTF-8 encoded source files that contains BOM with gcc you may get different errors. In any case the best approach would be to read your UTF-8 encoded text data from a file during run-time.

Related

Can't write chinese character into textfile with wofstream

I'm using std::wofstream to write characters in a text file.My characters can have chars from very different languages(english to chinese).
I want to print my vector<wstring> into that file.
If my vector contains only english characters I can print them without a problem.
But if I write chineses characters my file remains empty.
I browsed trough stackoverflow and all answers said bascially to use functions from the library:
#include <codecvt>
I can't include that library, because I am using Dev-C++ in version 5.11.
I did:#define UNICODE in all my header files.
I guess there is a really simple solution for that problem.
It would be great, if someone could help me out.
My code:
#define UNICODE
#include <string>
#include <fstream>
using namespace std;
int main()
{
string Path = "D:\\Users\\\t\\Desktop\\korrigiert_RotCommon_zh_check_error.log";
wofstream Out;
wstring eng = L"hello";
wstring chi = L"程序";
Out.open(Path, ios::out);
//works.
Out << eng;
//fails
Out << chi;
Out.close();
return 0;
}
Kind Regards
Even if the name of the wofstream implies it's a wide char stream, it's not. It's still a char stream that uses a convert facet from a locale to convert the wchars to char.
Here is what cppreference says:
All file I/O operations performed through std::basic_fstream<CharT> use the std::codecvt<CharT, char, std::mbstate_t> facet of the locale imbued in the stream.
So you could either set the global locale to one that supports Chinese or imbue the stream. In both cases you'll get a single byte stream.
#include <locale>
//...
const std::locale loc = std::locale(std::locale(), new std::codecvt_utf8<wchar_t>);
Out.open(Path, ios::out);
Out.imbue(loc);
Unfortunately std::codecvt_utf8 is already deprecated[2]. This MSDN
magazine
article explains how to do UTF-8 conversion using MultiByteToWideChar C++ - Unicode Encoding Conversions with STL Strings and Win32 APIs.
Here the Microsoft/vcpkg variant of an to_utf8 conversion:
std::string to_utf8(const CWStringView w)
{
const size_t size = WideCharToMultiByte(CP_UTF8, 0, w.c_str(), -1, nullptr, 0, nullptr, nullptr);
std::string output;
output.resize(size - 1);
WideCharToMultiByte(CP_UTF8, 0, w.c_str(), -1, output.data(), size - 1, nullptr, nullptr);
return output;
}
On the other side you can use normal binary stream and write the wstring data with write().
std::ofstream Out(Path, ios::out | ios::binary);
const uint16_t bom = 0xFEFF;
Out.write(reinterpret_cast<const char*>(&bom), sizeof(bom)); // optional Byte order mark
Out.write(reinterpret_cast<const char*>(chi.data()), chi.size() * sizeof(wchar_t));
You forgot to tell your stream what locale to use:
Out.imbue(std::locale("zh_CN.UTF-8"));
You'll obviously need to include <locale> for this.

On Windows, stat and GetFileAttributes fail for paths containing strange characters

The code below demonstrates how stat and GetFileAttributes fail when the path contains some strange (but valid) ASCII characters.
As a workaround, I would use the 8.3 DOS file name. But this does not work when the drive has 8.3 names disabled.
(8.3 names are disabled with the fsutil command: fsutil behavior set disable8dot3 1).
Is it possible to get stat and/or GetFileAttributes to work in this case?
If not, is there another way of determining whether or not a path is a directory or file?
#include "stdafx.h"
#include <sys/stat.h>
#include <string>
#include <Windows.h>
#include <atlpath.h>
std::wstring s2ws(const std::string& s)
{
int len;
int slength = (int)s.length() + 1;
len = MultiByteToWideChar(CP_ACP, 0, s.c_str(), slength, 0, 0);
wchar_t* buf = new wchar_t[len];
MultiByteToWideChar(CP_ACP, 0, s.c_str(), slength, buf, len);
std::wstring r(buf);
delete[] buf;
return r;
}
// The final characters in the path below are 0xc3 (Ã) and 0x3f (?).
// Create a test directory with the name à and set TEST_DIR below to your test directory.
const char* TEST_DIR = "D:\\tmp\\VisualStudio\\TestProject\\ConsoleApplication1\\test_data\\Ã";
int main()
{
std::string testDir = TEST_DIR;
// test stat and _wstat
struct stat st;
const auto statSucceeded = stat(testDir.c_str(), &st) == 0;
if (!statSucceeded)
{
printf("stat failed\n");
}
std::wstring testDirW = s2ws(testDir);
struct _stat64i32 stW;
const auto statSucceededW = _wstat(testDirW.data(), &stW) == 0;
if (!statSucceededW)
{
printf("_wstat failed\n");
}
// test PathIsDirectory
const auto isDir = PathIsDirectory(testDirW.c_str()) != 0;
if (!isDir)
{
printf("PathIsDirectory failed\n");
}
// test GetFileAttributes
const auto fileAttributes = ::GetFileAttributes(testDirW.c_str());
const auto getFileAttributesWSucceeded = fileAttributes != INVALID_FILE_ATTRIBUTES;
if (!getFileAttributesWSucceeded)
{
printf("GetFileAttributes failed\n");
}
return 0;
}
The problem you have encountered comes from using the MultiByteToWideChar function. Using CP_ACP can default to a code page that does not support some characters. If you change the default system code page to UTF8, your code will work. Since you cannot tell your clients what code page to use, you can use a third party library such as International Components for Unicode to convert from the host code page to UTF16.
I ran your code using console code page 65001 and VS2015 and your code worked as written. I also added positive printfs to verify that it did work.
Don't start with a narrow string literal and try to convert it, start with a wide string literal - one that represents the actual filename. You can use hexadecimal escape sequences to avoid any dependency on the encoding of the source code.
If the actual code doesn't use string literals, the best resolution depends on the situation; for example, if the file name is being read from a file, you need to make sure that you know what encoding the file is in and perform the conversion accordingly.
If the actual code reads the filename from the command line arguments, you can use wmain() instead of main() to get the arguments as wide strings.

Renaming a file with an en dash in the name in C++

In the project I'm working on, I work with files and I check if they exists before proceeding. Renaming or even working with files featuring that 'en dash' in the file path seems impossible.
std::string _old = "D:\\Folder\\This – by ABC.txt";
std::rename(_old.c_str(), "New.txt");
here the _old variable is interpreted as D:\Folder\This û by ABC.txt
I tried
setlocale(LC_ALL, "");
//and
setlocale(LC_ALL, "C");
//or
setlocale(LC_ALL, "en_US.UTF-8");
but none of them worked.. What should be done?
It depends on the operation system. In Linux file names are simple byte arrays: forget about encoding and just rename the file.
But seems you are using Windows and file name is actually a null-terminated string containing 16-bit characters. In this case the best way is to use wstring instead of messing with encodings.
Don't try to write platform-independent code to solve platform-specific problems. Windows uses Unicode for file names so you have to write platform-specific code instead of using standard function rename.
Just write L"D:\\Folder\\This \u2013 by ABC.txt" and call _wrename.
The Windows ANSI Western encoding has the Unicode n-dash, U+2013, “–”, as code point 150 (decimal). When you output that to a console with active code page 437, the original IBM PC character set, or compatible, then it's interpreted as an “û”. So you have the right codepage 1252 character in your string literal, either because
you're using Visual C++, which defaults to the Windows ANSI codepage for encoding narrow string literals, or
you're using an old version of g++ that doesn't do the standard-mandated conversions and checking but just passes narrow character bytes directly through its machinery, and your source code is encoded as Windows ANSI Western (or compatible), or
something I didn't think of.
For either of the first two possibilities
the rename call will work.
I tested that it does indeed work with Visual C++. I do not have an old version of g++ around, but I tested that it works with version 5.1. That is, I tested that the file is really renamed to New.txt.
// Source encoding: UTF-8
// Execution character set: Windows ANSI Western a.k.a. codepage 1252.
#include <stdio.h> // rename
#include <stdlib.h> // EXIT_SUCCESS, EXIT_FAILURE
#include <string> // std::string
using namespace std;
auto main()
-> int
{
string const a = ".\\This – by ABC.txt"; // Literal encoded as CP 1252.
return rename( a.c_str(), "New.txt" ) == 0? EXIT_SUCCESS : EXIT_FAILURE;
}
Example:
[C:\my\forums\so\265]
> dir /b *.txt
File Not Found
[C:\my\forums\so\265]
> g++ r.cpp -fexec-charset=cp1252
[C:\my\forums\so\265]
> type nul >"This – by ABC.txt"
[C:\my\forums\so\265]
> run a
Exit code 0
[C:\my\forums\so\265]
> dir /b *.txt
New.txt
[C:\my\forums\so\265]
> _
… where run is just a batch file that reports the exit code.
If your Windows ANSI codepage is not codepage 1252, then you need to use your particular Windows ANSI codepage.
You can check the Windows ANSI codepage via the GetACP API function, or e.g. via this command:
[C:\my\forums\so\265]
> wmic os get codeset /value | find "="
CodeSet=1252
[C:\my\forums\so\265]
> _
The code will work if that codepage supports the n-dash character.
This model of coding is based on having one version of the executable for each relevant main locale (including character encoding).
An alternative is to do everything in Unicode. This can be done portably via Boost file system, which will be adopted into the standard library in C++17. Or you can use the Windows API, or de facto standard extensions to the standard library in Windows, i.e. _rename.
Example of using the experimental file system module with Visual C++ 2015:
// Source encoding: UTF-8
// Execution character set: irrelevant (everything's done in Unicode).
#include <stdlib.h> // EXIT_SUCCESS, EXIT_FAILURE
#include <filesystem> // In C++17 and later, or Visual C++ 2015 and later.
using namespace std::tr2::sys;
auto main()
-> int
{
path const old_path = L".\\This – by ABC.txt"; // Literal encoded as wide string.
path const new_path = L"New.txt";
try
{
rename( old_path, new_path );
return EXIT_SUCCESS;
}
catch( ... )
{}
return EXIT_FAILURE;
}
To do this properly for portable code you can use Boost, or you can create a wrapper header that uses whatever implementation is available.
It really platform dependant, Unicode is headache. Depends on which compiler you use. For older ones from MS (VS2010 or older), you would need use API described in MSDN. This test example creates file with name you have problem with, then renames it
// #define _UNICODE // might be defined in project
#include <string>
#include <tchar.h>
#include <windows.h>
using namespace std;
// Convert a wide Unicode string to an UTF8 string
std::string utf8_encode(const std::wstring &wstr)
{
if( wstr.empty() ) return std::string();
int size_needed = WideCharToMultiByte(CP_UTF8, 0, &wstr[0], (int)wstr.size(), NULL, 0, NULL, NULL);
std::string strTo( size_needed, 0 );
WideCharToMultiByte (CP_UTF8, 0, &wstr[0], (int)wstr.size(), &strTo[0], size_needed, NULL, NULL);
return strTo;
}
// Convert an UTF8 string to a wide Unicode String
std::wstring utf8_decode(const std::string &str)
{
if( str.empty() ) return std::wstring();
int size_needed = MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), NULL, 0);
std::wstring wstrTo( size_needed, 0 );
MultiByteToWideChar (CP_UTF8, 0, &str[0], (int)str.size(), &wstrTo[0], size_needed);
return wstrTo;
}
int _tmain(int argc, _TCHAR* argv[] ) {
std::string pFileName = "C:\\This \xe2\x80\x93 by ABC.txt";
std::wstring pwsFileName = utf8_decode(pFileName);
// can use CreateFile id instead
HANDLE hf = CreateFileW( pwsFileName.c_str() ,
GENERIC_READ | GENERIC_WRITE,
0,
0,
CREATE_NEW,
FILE_ATTRIBUTE_NORMAL,
0);
CloseHandle(hf);
MoveFileW(utf8_decode("C:\\This \xe2\x80\x93 by ABC.txt").c_str(), utf8_decode("C:\\This \xe2\x80\x93 by ABC 2.txt").c_str());
}
There is still problem with those helpers so that you can have a null terminated string.
std::string utf8_encode(const std::wstring &wstr)
{
std::string strTo;
char *szTo = new char[wstr.length() + 1];
szTo[wstr.size()] = '\0';
WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), -1, szTo, (int)wstr.length(), NULL, NULL);
strTo = szTo;
delete[] szTo;
return strTo;
}
// Convert an UTF8 string to a wide Unicode String
std::wstring utf8_decode(const std::string &str)
{
std::wstring wstrTo;
wchar_t *wszTo = new wchar_t[str.length() + 1];
wszTo[str.size()] = L'\0';
MultiByteToWideChar(CP_UTF8, 0, str.c_str(), -1, wszTo, (int)str.length());
wstrTo = wszTo;
delete[] wszTo;
return wstrTo;
}
a problem with size of character for conversion.. call to WideCharToMultiByte with 0 as the size of target buffer allows to get size of character required for conversion. It will then return the number of bytes needed for the target buffer size. All this juggling with code explains why the frameworks like Qt got so convoluted code to support Unicode-based file system. Actually, best cost-effective way to get rid of all possible bugs for you is to use such framework.
for VS2015
std::string _old = u8"D:\\Folder\\This \xe2\x80\x93 by ABC.txt"s;
according to their docs. I can't check that one.
for mingw.
std::string _old = u8"D:\\Folder\\This \xe2\x80\x93 by ABC.txt";
std::cout << _old.data();
output contains proper file name... but for file API, you still need do proper conversion

Get the user's codepage name for functions in boost::locale::conv

The task at hand
I'm parsing a filename from an UTF-8 encoded XML on Windows. I need to pass that filename on to a function that I can't change. Internally it uses _fsopen() which does not support Unicode strings.
Current approach
My current approach is to convert the filename to the user's charset hoping that the filename is representable in that encoding. I'm then using boost::locale::conv::from_utf() to convert from UTF-8 and I'm using boost::locale::util::get_system_locale() to get the name of the current locale.
Life is good?
I'm on a German system using code page Windows-1252 thus get_system_locale() correctly yields de_DE.windows-1252. If I test the approach with a filename containing an umlaut everything works as expected.
The Problem
Just to make sure I switched my system locale to Ukrainian which uses code page Windows-1251. Using some Cyrillic letter in the filename my approach fails. The reason is that get_system_locale() still yields de_DE.windows-1252 which is now incorrect.
On the other side GetACP() correctly yields 1252 for the German locale and 1251 for the Ukrainian locale. I also know that Boost.Locale can convert to a given locale as this small test program works as I expect:
#include <boost/locale.hpp>
#include <iostream>
#include <string>
#include <windows.h>
int main()
{
std::cout << "Codepage: " << GetACP() << std::endl;
std::cout << "Boost.Locale: " << boost::locale::util::get_system_locale() << std::endl;
namespace blc = boost::locale::conv;
// Cyrillic small letter zhe -> \xe6 (ш on 1251, æ on 1252)
std::string const test1251 = blc::from_utf(std::string("\xd0\xb6"), "windows-1251");
std::cout << "1251: " << static_cast<int>(test1251.front()) << std::endl;
// Latin small letter sharp s -> \xdf (Я on 1251, ß on 1252)
auto const test1252 = blc::from_utf(std::string("\xc3\x9f"), "windows-1252");
std::cout << "1252: " << static_cast<int>(test1252.front()) << std::endl;
}
Questions
How can I query the name of the user locale in a format Boost.Locale supports? Using std::locale("").name() yields German_Germany.1252, using it results in a boost::locale::conv::invalid_charset_error exception.
Is it possible that the system locale remains de_DE.windows-1252 although I'm supposedly changing it as local admin? Similarly system language is German although my account's language is English. (Log in screen is German until I log in)
should I stick with using short filenames? Does not seem to work reliably though.
Fine-print
Compiler is MSVC18
Boost is version 1.56.0, backend supposedly winapi
System is Win7, system language is German, user language English
ANSI is deprecated so don't bother with it.
Windows uses UTF16, you must convert from UTF8 to UTF16 using MultiByteToWideChar. This conversion is safe.
std::wstring getU16(const std::string &str)
{
if (str.empty()) return std::wstring();
int sz = MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), 0, 0);
std::wstring res(sz, 0);
MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), &res[0], sz);
return res;
}
You then use _wfsopen (from the link you provided) to open file with UTF16 name.
int main()
{
//UTF8 source:
std::string filename_u8;
//This line works in VS2015 only
//For older version comment out the next line, obtain UTF8 from another source
filename_u8 = u8"c:\\test\\__ελληνικά.txt";
//convert to UTF16
std::wstring filename_utf16 = getU16(filename_u8);
FILE *file = NULL;
_wfopen_s(&file, filename_utf16.c_str(), L"w");
if (file)
{
//Add BOM, optional...
//Write the file name in to file, for testing...
fwrite(filename_u8.data(), 1, filename_u8.length(), file);
fclose(file);
}
else
{
cout << "access denined, or folder doesn't exits...
}
return 0;
}
Edit, getting ANSI from UTF8, using GetACP()
std::wstring string_to_wstring(const std::string &str, int codepage)
{
if (str.empty()) return std::wstring();
int sz = MultiByteToWideChar(codepage, 0, &str[0], (int)str.size(), 0, 0);
std::wstring res(sz, 0);
MultiByteToWideChar(codepage, 0, &str[0], (int)str.size(), &res[0], sz);
return res;
}
std::string wstring_to_string(const std::wstring &wstr, int codepage)
{
if (wstr.empty()) return std::string();
int sz = WideCharToMultiByte(codepage, 0, &wstr[0], (int)wstr.size(), 0, 0, 0, 0);
std::string res(sz, 0);
WideCharToMultiByte(codepage, 0, &wstr[0], (int)wstr.size(), &res[0], sz, 0, 0);
return res;
}
std::string get_ansi_from_utf8(const std::string &utf8, int codepage)
{
std::wstring utf16 = string_to_wstring(utf8, CP_UTF8);
std::string ansi = wstring_to_string(utf16, codepage);
return ansi;
}
Barmak's way is the best way to do it.
To clear up the locale stuff, the process always starts with the "C" locale. You can use the setlocale function to set the locale to the system default or any arbitrary locale.
#include <clocale>
// Get the current locale
setlocale(LC_ALL,NULL);
// Set locale to system default
setlocale(LC_ALL,"");
// Set locale to German
setlocale(LC_ALL,"de-DE");

C++: socket encoding (working with TeamSpeak)

As I'm currently working on a program for a TeamSpeak server, I need to retrieve the names of the currently online users which I'm doing with sockets - that's working fine so far.In my UI I'm displaying all clients in a ListBox which is basically working. Nevertheless I'm having problems with wrong displayed characters and symbols in the ListBox.
I'm using the following code:
//...
auto getClientList() -> void{
i = 0;
queryString.str("");
queryString.clear();
queryString << clientlist << " \n";
send(sock, queryString.str().c_str(), strlen(queryString.str().c_str()), NULL);
TeamSpeak::getAnswer(1);
while(p_1 != -1){
p_1 = lastLog.find(L"client_nickname=", sPos + 1);
if(p_1 != -1){
sPos = p_1;
p_2 = lastLog.find(L" ", p_1);
temporary = lastLog.substr(p_1 + 16, p_2 - (p_1 + 16));
users[i].assign(temporary.begin(), temporary.end());
SendMessage(hwnd_2, LB_ADDSTRING, (WPARAM)NULL, (LPARAM)(LPTSTR)(users[i].c_str()));
i++;
}
else{
sPos = 0;
p_1 = 0;
break;
}
}
TeamSpeak::getAnswer(0);
}
//...
I've already checked lastLog, temporary and users[i] (by writing them to a file), but all of them have no encoding problem with characters or symbols (for example Andrè). If I add a string directly:SendMessage(hwnd_2, LB_ADDSTRING, (WPARAM)NULL, (LPARAM)(LPTSTR)L"Andrè", it is displayed correctly in the ListBox.What might be the issue here, is it a problem with my code or something else?
Update 1:I recently continued working on this problem and considered the word Olè! receiving it from the socket. The result I got, is the following:O (79) | l (108) | � (-61) | � (-88) | ! (33).How can I convert this char array to a wstring containing the correct characters?
Solution: As #isanae mentioned in his post, the std::wstring_convert-template did the trick for me, thank you very much!
Many things can go wrong in this code, and you don't show much of it. What's particularly lacking is the definition of all those variables.
Assuming that users[i] contains meaningful data, you also don't say how it is encoded. Is it ASCII? UTF-8? UTF-16? The fact that you can output it to a file and read it with an editor doesn't mean anything, as most editors are able to guess at encoding.
If it really is UTF-16 (the native encoding on Windows), then I see no reason for this code not to work. One way to check would be to break into the debugger and look at the individual bytes in users[i]. If you see every character with a value less than 128 followed by a 0, then it's probably UTF-16.
If it is not UTF-16, then you'll need to convert it. There are a variety of ways to do this, but MultiByteToWideChar may be the easiest. Make sure you set the codepage to same encoding used by the sender. It may be CP_UTF8, or an actual codepage.
Note also that hardcoding a string with non-ASCII characters doesn't help you much either, as you'd first have to find out the encoding of the file itself. I know some versions of Visual C++ will convert your source file to UTF-16 if it encounters non-ASCII characters, which may be what happened to you.
O (79) | l (108) | � (-61) | � (-88) | ! (33).
How can I convert this char array to a wstring containing the correct characters?
This is a UTF-8 string. It has to be converted to UTF-16 so Windows can use it.
This is a portable, C++11 solution on implementations where sizeof(wchar_t) == 2. If this is not the case, then char16_t and std::u16string may be used, but the most recent version of Visual C++ as of this writing (2015 RC) doesn't implement std::codecvt for char16_t and char32_t.
#include <string>
#include <codecvt>
std::wstring utf8_to_utf16(const std::string& s)
{
static_assert(sizeof(wchar_t)==2, "wchar_t needs to be 2 bytes");
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> conv;
return conv.from_bytes(s);
}
std::string utf16_to_utf8(const std::wstring& s)
{
static_assert(sizeof(wchar_t)==2, "wchar_t needs to be 2 bytes");
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> conv;
return conv.to_bytes(s);
}
Windows-only:
#include <string>
#include <cassert>
#include <memory>
#include <codecvt>
#include <Windows.h>
std::wstring utf8_to_utf16(const std::string& s)
{
// getting the required size in characters (not bytes) of the
// output buffer
const int size = ::MultiByteToWideChar(
CP_UTF8, 0, s.c_str(), static_cast<int>(s.size()),
nullptr, 0);
// error handling
assert(size != 0);
// creating a buffer with enough characters in it
std::unique_ptr<wchar_t[]> buffer(new wchar_t[size]);
// converting from utf8 to utf16
const int written = ::MultiByteToWideChar(
CP_UTF8, 0, s.c_str(), static_cast<int>(s.size()),
buffer.get(), size);
// error handling
assert(written != 0);
return std::wstring(buffer.get(), buffer.get() + written);
}
std::string utf16_to_utf8(const std::wstring& ws)
{
// getting the required size in bytes of the output buffer
const int size = ::WideCharToMultiByte(
CP_UTF8, 0, ws.c_str(), static_cast<int>(ws.size()),
nullptr, 0, nullptr, nullptr);
// error handling
assert(size != 0);
// creating a buffer with enough characters in it
std::unique_ptr<char[]> buffer(new char[size]);
// converting from utf16 to utf8
const int written = ::WideCharToMultiByte(
CP_UTF8, 0, ws.c_str(), static_cast<int>(ws.size()),
buffer.get(), size, nullptr, nullptr);
// error handling
assert(written != 0);
return std::string(buffer.get(), buffer.get() + written);
}
Test:
// utf-8 string
const std::string s = {79, 108, -61, -88, 33};
::MessageBoxW(0, utf8_to_utf16(s).c_str(), L"", MB_OK);