How do I convert wchar_t* to string? - c++

I am new to C++.
And I am trying to convert wchar_t* to string.
I cannot use wstring in condition.
I have code below:
wchar_t *wide = L"中文";
wstring ret = wstring( wide );
string str2( ret.begin(), ret.end() );
But str2 returns some strange characters.
Where do I have to fix it?

You're trying to do it backwards. Instead of truncating wide characters to chars (which is very lossy), expand your chars to wide characters.
That is, transform your std::string into an std::wstring and concatenate the two std::wstrings.

I'm not sure what platform you're targeting. If you're on Windows platform you can call WideCharToMultiByte API function. Refer to MSDN for documentation.
If you're on Linux, I think you can use libiconv functions, try google.
Of course there is a port of libiconv for Windows.
In general this is a quite complex topic for a new beginners if you know nothing about character encodings - there are a lot of background knowledge to have to learn.

Related

Converting UTF16(Windows wchar_t) to UTF8 in C++ Non-English letters corrupted(Korean)

I'm trying to make a multiplatform app. On the Windows Store App(winrt) side, open a file and read its path in Platform::String format which is wchar_t, UTF16 in Windows.
Since my core logic is platform independent and only use standard C++ data types, I've converted the path into std::string in UTF8 via this code:
Platform::String^ copyPath = copy->Path;
std::wstring source(copyPath->Data());
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t >, wchar_t > convert;
std::string u8CopyPath = convert.to_bytes(source);
However, when I check u8CopyPath in debugger, it shows corrupted letters for non-English chars. Far as I know, UTF-8 is perfectly capable of encoding non-English languages since it can use multiple bytes for a single letter. Is there something in the conversion that corrupts the non-English letters?
It turns out it's just a debugger thing. Once I wrote it to a file and examine it, it printed out correctly.

Convert wide CString to char*

There are lots of times this question has been asked and as many answers - none of which work for me and, it seems, many others. The question is about wide CStrings and 8bit chars under MFC. We all want an answer that will work in ALL cases, not a specific instance.
void Dosomething(CString csFileName)
{
char cLocFileNamestr[1024];
char cIntFileNamestr[1024];
// Convert from whatever version of CString is supplied
// to an 8 bit char string
cIntFileNamestr = ConvertCStochar(csFileName);
sprintf_s(cLocFileNamestr, "%s_%s", cIntFileNamestr, "pling.txt" );
m_KFile = fopen(LocFileNamestr, "wt");
}
This is an addition to existing code (by somebody else) for debugging.
I don't want to change the function signature, it is used in many places.
I cannot change the signature of sprintf_s, it is a library function.
You are leaving out a lot of details, or ignoring them. If you are building with UNICODE defined (which it seems you are), then the easiest way to convert to MBCS is like this:
CStringA strAIntFileNameStr = csFileName.GetString(); // uses default code page
CStringA is the 8-bit/MBCS version of CString.
However, it will fill with some garbage characters if the unicode string you are translating from contains characters that are not in the default code page.
Instead of using fopen(), you could use _wfopen() which will open a file with a unicode filename. To create your file name, you would use swprintf_s().
an answer that will work in ALL cases, not a specific instance...
There is no such thing.
It's easy to convert "ABCD..." from wchar_t* to char*, but it doesn't work that way with non-Latin languages.
Stick to CString and wchar_t when your project is unicode.
If you need to upload data to webpage or something, then use CW2A and CA2W for utf-8 and utf-16 conversion.
CStringW unicode = L"Россия";
MessageBoxW(0,unicode,L"Russian",0);//should be okay
CStringA utf8 = CW2A(unicode, CP_UTF8);
::MessageBoxA(0,utf8,"format error",0);//WinApi doesn't get UTF-8
char buf[1024];
strcpy(buf, utf8);
::MessageBoxA(0,buf,"format error",0);//same problem
//send this buf to webpage or other utf-8 systems
//this should be compatible with notepad etc.
//text will appear correctly
ofstream f(L"c:\\stuff\\okay.txt");
f.write(buf, strlen(buf));
//convert utf8 back to utf16
unicode = CA2W(buf, CP_UTF8);
::MessageBoxW(0,unicode,L"okay",0);

urlDecode - php function in c++

I have urlDecode function.
But when i'm decoding some string like:
P%C4%99dz%C4%85cyJele%C5%84
I get output: PędzącyJeleń
Of course this is not correct output. I think its broken because there are Polish chars.
I try to set in compilator:
Use Unicode Character Set
or Use Multi-Byte Character Set
I try to do that using wstrings but i have a lot of errors :|
I suppose that i should use wstring not string but could you tell me how? There is not easier way to solve my problem? (i listen a lot about wstring and string and litte dont understand - wstring should not use on linux, but i have Windows)
//link to my functions at bottom
http://bogomip.net/blog/cpp-url-encoding-and-decoding/
//EDIT
When i change all string to wstring, fstream->wfstream
It still problem look:
z%C5%82omiorz - this wstring (from file ) != złomiorz , but this function print me L"z197130omiorz"
what is 197130 ? How to fix that ?:0

LPCWSTR Error - C++

I'm trying to draw text to a window. Some enough, two things I'm wondering. Why can the tutorial I'm using not put an L"String Here" and I have to?
I'm confused about that, anyway back to the main point, I;m trying to draw text and I'm getting an error.
If you have UNICODE defined in your project (which you should be default) then you can either use
wstring s = L"Hello, World!";
or the ANSI API for TextOut
TextOutA(hdc, 10, 10, s.c_str(), s.size());
See the following question:
What does LPCWSTR stand for and how should it be handled with?
Basically, you're trying to convert a regular character string to a wide character string implicitly and it won't allow you to do that. From the top answer:
To get a normal C literal string to assign to a LPCWSTR, you need to prefix it with L
LPCWSTR a = L"TestWindow";

OSX and C++ unicode conversion from NFD to NFC

I have a problem with NFD Unicode strings I get from the OSX Filesystem.
This is what I get for the "Ä"-Umlaut on OSX "A\xcc\x88" and this is what I expect "\xc3\x84". The same function does it right under windows (simple boost filesystem operation, listing an directory).
After searching a while, I found out that Apple the NFD coding for UTF-8 and the rest of the world NFC. I tried a bit with converting through NSStrings or with boost::locale::normalize, but without success.
Does anybody know a way to do this in C++ (I can use Cocoa through obj-c if necessary)?
I would like the raw unicode string as std::string (with unicode coding) after that.
This is the solution to get the precomposed form.
std::string precomposeFilename(const std::string& name)
{
CFStringRef cfStringRef = CFStringCreateWithCString(kCFAllocatorDefault, name.c_str(), kCFStringEncodingUTF8);
CFMutableStringRef cfMutable = CFStringCreateMutableCopy(NULL, 0, cfStringRef);
CFStringNormalize(cfMutable,kCFStringNormalizationFormC);
char c_str[255 + 1];
CFStringGetCString(cfMutable, c_str, sizeof(c_str)-1, kCFStringEncodingUTF8);
CFRelease(cfStringRef);
CFRelease(cfMutable);
return std::string(c_str);
}
NSString has - (NSString *)precomposedStringWithCanonicalMapping function, and some other ones, looks like they will help you.