C++ Unicode Issue - c++

I'm having a bit of trouble with handling unicode conversions.
The following code outputs this into my text file.
HELLO??O
std::string test = "HELLO";
std::string output;
int len = WideCharToMultiByte(CP_OEMCP, 0, (LPCWSTR)test.c_str(), -1, NULL, 0, NULL, NULL);
char *buf = new char[len];
int len2 = WideCharToMultiByte(CP_OEMCP, 0, (LPCWSTR)test.c_str(), -1, buf, len, NULL, NULL);
output = buf;
std::wofstream outfile5("C:\\temp\\log11.txt");
outfile5 << test.c_str();
outfile5 << output.c_str();
outfile5.close();
But as you can see, output is just a unicode conversion from the test variable. How is this possible?

Check if the LEN is correct after first measuring call. In general, you should not cast test.c_str() to LPCWSTR. The 'test' as is 'char'-string not 'wchar_t'-wstring. You may cast it to LPCSTR - note the 'W' missing. The WinAPI has distinction between that. You really should be using wstring if you want to keep widechars in it.. Yeah, after re-reading your code, the test should be a wstring, then you can cast it to LPCWSTR safely.

after reading this
Microsoft wstring reference
I changed
std::string test = "HELLO";
to
std::wstring test = L"HELLO";
And the string was outputted correctly and I got
HELLOHELLO

Related

Converting shift-jis encoded file to to utf-8 in c++

I am trying with below code to convert from shift-jis file to utf-8, but when we open the output file it has corrupted characters, looks like something is missed out here, any thoughts?
// From file
FILE* shiftJisFile = _tfopen(lpszShiftJs, _T("rb"));
int nLen = _filelength(fileno(shiftJisFile));
LPSTR lpszBuf = new char[nLen];
fread(lpszBuf, 1, nLen, shiftJisFile);
// convert multibyte to wide char
int utf16size = ::MultiByteToWideChar(CP_ACP, 0, lpszBuf, -1, 0, 0);
LPWSTR pUTF16 = new WCHAR[utf16size];
::MultiByteToWideChar(CP_ACP, 0, lpszBuf, -1, pUTF16, utf16size);
wstring str(pUTF16);
// convert wide char to multi byte utf-8 before writing to a file
fstream File("filepath", std::ios::out);
string result = string();
result.resize(WideCharToMultiByte(CP_UTF8, 0, str.c_str(), -1, NULL, 0, 0, 0));
char* ptr = &result[0];
WideCharToMultiByte(CP_UTF8, 0, str.c_str(), -1, ptr, result.size(), 0, 0);
File << result;
File.close();
There are multiple problems.
The first problem is that when you are writing the output file, you need to set it to binary for the same reason you need to do so when reading the input.
fstream File("filepath", std::ios::out | std::ios::binary);
The second problem is that when you are reading the input file, you are only reading the bytes of the input stream and treat them like a string. However, those bytes do not have a terminating null character. If you call MultiByteToWideChar with a -1 length, it infers the input string length from the terminating null character, which is missing in your case. That means both utf16size and the contents of pUTF16 are already wrong. Add it manually after reading the file:
int nLen = _filelength(fileno(shiftJisFile));
LPSTR lpszBuf = new char[nLen+1];
fread(lpszBuf, 1, nLen, shiftJisFile);
lpszBuf[nLen] = 0;
The last problem is that you are using CP_ACP. That means "the current code page". In your question, you were specifically asking how to convert Shift-JIS. The code page Windows uses for its closes equivalent to what is commonly called "Shift-JIS" is 932 (you can look that up on wikipedia for example). So use 932 instead of CP_ACP:
int utf16size = ::MultiByteToWideChar(932, 0, lpszBuf, -1, 0, 0);
LPWSTR pUTF16 = new WCHAR[utf16size];
::MultiByteToWideChar(932, 0, lpszBuf, -1, pUTF16, utf16size);
Additionally, there is no reason to create wstring str(pUTF16). Just use pUTF16 directly in the WideCharToMultiByte calls.
Also, I'm not sure how kosher char *ptr = &result[0] is. I personally would not create a string specifically as a buffer for this.
Here is the corrected code. I would personally not write it this way, but I don't want to impose my coding ideology on you, so I made only the changes necessary to fix it:
// From file
FILE* shiftJisFile = _tfopen(lpszShiftJs, _T("rb"));
int nLen = _filelength(fileno(shiftJisFile));
LPSTR lpszBuf = new char[nLen+1];
fread(lpszBuf, 1, nLen, shiftJisFile);
lpszBuf[nLen] = 0;
// convert multibyte to wide char
int utf16size = ::MultiByteToWideChar(932, 0, lpszBuf, -1, 0, 0);
LPWSTR pUTF16 = new WCHAR[utf16size];
::MultiByteToWideChar(932, 0, lpszBuf, -1, pUTF16, utf16size);
// convert wide char to multi byte utf-8 before writing to a file
fstream File("filepath", std::ios::out | std::ios::binary);
string result;
result.resize(WideCharToMultiByte(CP_UTF8, 0, pUTF16, -1, NULL, 0, 0, 0));
char *ptr = &result[0];
WideCharToMultiByte(CP_UTF8, 0, pUTF16, -1, ptr, result.size(), 0, 0);
File << ptr;
File.close();
Also, you have a memory leak -- lpszBuf and pUTF16 are not cleaned up.
You should try use std::locale to perform this conversion:
namespace fs = std::filesystem;
void convert(const fs::path inName, const fs::path outName)
{
std::wifstream in{inName};
in.imbue(std::locale{".932"}); // or "ja_JP.SJIS"
if (in) {
std::wofstream out{outName};
out.imbue(std::locale{".utf-8"});
std::wstring line;
while (getline(in, line)) {
out << line << L'\n';
}
}
}
Note locale names are platform specific - I think I used proper one for Windows.
Update: I've tested this on my Window 10 machine with MSVC 19.29.30145 and works perfectly. I used wiki page to get some valid Japanese text and used Notepad++ to save this text in proper encoding (Shift-JIS).
I also used Beyond Compare to verify results:
Note I used similar method here for Korean and it worked nicely.
wstring str(pUTF16); - pUTF16 there does not end with zero char. It should be wstring str(pUTF16, utf16size);

Can "const char[18]* be changed to an entity of type LPCWSTR(C++)? [duplicate]

After getting a struct from C# to C++ using C++/CLI:
public value struct SampleObject
{
LPWSTR a;
};
I want to print its instance:
printf(sampleObject->a);
but I got this error:
Error 1 error C2664: 'printf' : cannot convert parameter 1 from
'LPWSTR' to 'const char *'
How can I convert from LPWSTR to char*?
Thanks in advance.
Use the wcstombs() function, which is located in <stdlib.h>. Here's how to use it:
LPWSTR wideStr = L"Some message";
char buffer[500];
// First arg is the pointer to destination char, second arg is
// the pointer to source wchar_t, last arg is the size of char buffer
wcstombs(buffer, wideStr, 500);
printf("%s", buffer);
Hope this helped someone! This function saved me from a lot of frustration.
Just use printf("%ls", sampleObject->a). The use of l in %ls means that you can pass a wchar_t[] such as L"Wide String".
(No, I don't know why the L and w prefixes are mixed all the time)
int length = WideCharToMultiByte(cp, 0, sampleObject->a, -1, 0, 0, NULL, NULL);
char* output = new char[length];
WideCharToMultiByte(cp, 0, sampleObject->a, -1, output , length, NULL, NULL);
printf(output);
delete[] output;
use WideCharToMultiByte() method to convert multi-byte character.
Here is example of converting from LPWSTR to char*
or wide character to character.
/*LPWSTR to char* example.c */
#include <stdio.h>
#include <windows.h>
void LPWSTR_2_CHAR(LPWSTR,LPSTR,size_t);
int main(void)
{
wchar_t w_char_str[] = {L"This is wide character string test!"};
size_t w_len = wcslen(w_char_str);
char char_str[w_len + 1];
memset(char_str,'\0',w_len * sizeof(char));
LPWSTR_2_CHAR(w_char_str,char_str,w_len);
puts(char_str);
return 0;
}
void LPWSTR_2_CHAR(LPWSTR in_char,LPSTR out_char,size_t str_len)
{
WideCharToMultiByte(CP_ACP,WC_COMPOSITECHECK,in_char,-1,out_char,str_len,NULL,NULL);
}
Here is a Simple Solution. Check wsprintf
LPWSTR wideStr = "some text";
char* resultStr = new char [wcslen(wideStr) + 1];
wsprintfA ( resultStr, "%S", wideStr);
The "%S" will implicitly convert UNICODE to ANSI.
Don't convert.
Use wprintf instead of printf:
wprintf
See the examples which explains how to use it.
Alternatively, you can use std::wcout as:
wchar_t *wstr1= L"string";
LPWSTR wstr2= L"string"; //same as above
std::wcout << wstr1 << L", " << wstr2;
Similarly, use functions which are designed for wide-char, and forget the idea of converting wchar_t to char, as it may loss data.
Have a look at the functions which deal with wide-char here:
Unicode in Visual C++

C++ Arabic UTF8 string to CString

in a Visual Studio 2008 MFC project I've to manage strings in UTF8 containing arabic cities and searching onlines I write this little piece of code:
CString MyClass::convertString(string input) {
int l = MultiByteToWideChar(CP_UTF8, 0, input.c_str(), -1, NULL, 0);
wchar_t *str = new wchar_t[l];
int r = MultiByteToWideChar(CP_UTF8, 0, input.c_str(), -1, str, l);
CString output = str;
delete str ;
return output;
}
When I try to convert a string it remains the same and if I try to print these two string the result is the same.
What am I doing wrong?
Thanks in advance.
You don't want to convert strings to UTF-8 for display purposes. There is no UTF-8 charset than will allow you to display them correctly. If your already have them in Unicode, just keep them in Unicode. I would build your application in Unicode and avoid MBCS if you can. It makes life easier. Otherwise, for displaying those Arabic strings, you would have to convert them to the Arabic codepage and then use an Arabic font/charset to display them.
Thanks for all replies. I've found a solution; the string in input was not encoded in UTF8 (I should have check it before posting on Stackoverflow), then I edited the code changing the output from CString to wstring.
wstring MyClass::convertString(string input) {
int l = MultiByteToWideChar(CP_UTF8, 0, input.c_str(), -1, NULL, 0);
wchar_t *str = new wchar_t[l];
int r = MultiByteToWideChar(CP_UTF8, 0, input.c_str(), -1, str, l1);
wstring output = wstring(str);
delete str ;
return output
}
Now everything works fine. Thanks.

Converting string to LPCTSTR

I encountered a problem during writting my code. I use a function which take as an argument object which type is LPCSTR. The object declaration looks like shown below:
LPCTSTR lpFileName;
Firstly, I used defined variable, which was futher assign to lpFileName like this:
#define COM_NR L"COM3"
lpFileName = COM_NR
Using this manner, I could easily pass lpFileName argument to the function. Anyway, i had to changed the way of defining my port number. Currently i read text from *.txt file and save it as string variable e.g "COM3" or "COM10". The main problem is to convert string to LPCSTR properly. I found good solution but finally it doesn't seem working properly. My code looks like this:
string temp;
\\code that fill temp\\
wstring ws;
ws.assign(temp.begin(),temp.end());
I thought that conversion went correctly, maybe it did and I don't get it because when i print few things it makes me to wonder why it doesn't work as i want:
cout temp_cstr(): COM3
cout LCOM3: 0x40e586
cout ws.c_str(): 0x8b49b2c
Why LCOM3 and ws.c_str() doesn't contain the same? When i pass lpFileName = ws.c_str() to my function, it works uncorretly. On the other hand, passing lpFileName = L"COM3" gives success.
I code using cpp, and IDE is QtCreator
Eventually, I managed with the pitfall using conversion-function s2ws() and doing few operations. I place my soultion here for people who will have similar troubles with converting string. In my first post i wrote that i needed to convert string to LPCTSTR and finally it turned out that argument in my function is not, LPCTSTR but LPCWSTR that is const wchar_t*.
So, soulution:
string = "COM3";
wstring stemp;
LPCWSTR result_port;
stemp = s2ws(port_nr);
result_port = stemp.c_str(); // now passing result_port to my function i am getting success
declaration of s2ws:
wstring s2ws(const std::string& s)
{
int len;
int slength = (int)s.length() + 1;
len = MultiByteToWideChar(CP_ACP, 0, s.c_str(), slength, 0, 0);
wchar_t* buf = new wchar_t[len];
MultiByteToWideChar(CP_ACP, 0, s.c_str(), slength, buf, len);
std::wstring r(buf);
delete[] buf;
return r;
}
Try to use wostringstream:
string temp;
\\code that fill temp\\
wostringstream ost;
ost << temp.c_str();
wstring ws = ost.str();
I have struggled with this for quite a while. After quite a bit of digging I found this works the best; you could try this.
std::string t = "xyz";
CA2T wt (t.c_str());

WNetUseConnection SystemErrorCode 1113 No Mapping Exist

I am trying to convert a string into a wchar_t string to use it in a WNetUseConnection function.
Basicly its an unc name looking like this "\\remoteserver".
I get a return code 1113, which is described as:
No mapping for the Unicode character
exists in the target multi-byte code
page.
My code looks like this:
std::string serverName = "\\uncDrive";
wchar_t *remoteName = new wchar_t[ serverName.size() ];
MultiByteToWideChar(CP_ACP, 0, serverName.c_str(), serverName.size(), remoteName, serverName.size()); //also doesn't work if CP_UTF8
NETRESOURCE nr;
memset( &nr, 0, sizeof( nr ));
nr.dwType = RESOURCETYPE_DISK;
nr.lpRemoteName = remoteName;
wchar_t pswd[] = L"user"; //would have the same problem if converted and not set
wchar_t usrnm[] = L"pwd"; //would have the same problem if converted and not set
int ret = WNetUseConnection(NULL, &nr, pswd, usrnm, 0, NULL, NULL, NULL);
std::cerr << ret << std::endl;
The intersting thing is, that if remoteName is hard codede like this:
char_t remoteName[] = L"\\\\uncName";
Everything works fine. But since later on the server, user and pwd will be parameters which i get as strings, i need a way to convert them (also tried mbstowcs function with the same result).
MultiByteToWideChar will not 0-terminate the converted string with your current code, and therefore you get garbage characters following the converted "\uncDrive"
Use this:
std::string serverName = "\\uncDrive";
int CharsNeeded = MultiByteToWideChar(CP_ACP, 0, serverName.c_str(), serverName.size() + 1, 0, 0);
wchar_t *remoteName = new wchar_t[ CharsNeeded ];
MultiByteToWideChar(CP_ACP, 0, serverName.c_str(), serverName.size() + 1, remoteName, CharsNeeded);
This first checks with MultiByteToWideChar how many chars are needed to store the specified string and the 0-termination, then allocates the string and converts it. Note that I didn't compile/test this code, beware of typos.