Conversion from char * to wchar* does not work properly - c++

I'm getting a string like: "aña!a¡a¿a?a" from the server so I decode it and then I pass it to a function.
What I need to do with the message is something like loading paths depending the letters.
The header of my function is: void SetInfo(int num, char *descr[4]) so it receives one number and an array of 4 chars (sentences). To make it easier, let's say I just need to work only with descr[0].
When I debug and arrive there to SetInfo(), I get the exact message in the debugg view: "aña!a¡a¿a?a" so until here is all ok.
Initially, the info I was receiving on that function, was a std::wstring so all my code working with that message was with wstrings and strings but now what I receive is a char as shown in the header. The message arrived until here ok, but if I want to work with it, then I can't because if I debug and see each position of Descr[0] then I get
descr[0][0] = 'a'; //ok
descr[0][1] = 'Ã '; // BAD
so I tried converting char* to wchar* with a code found here:
size_t size = strlen(descr[0]) + 1;
wchar_t* wa = new wchar_t[size];
mbstowcs(wa,descr[0],size);
But then the debugger shows me that wa has:
wa wchar_t * 0x185d4be8 L"a-\uffffffff刯2e2e牵6365⽳6f73歯6f4c楲6553䈯736f獵6e6f档6946琯7361灭6569湰2e6f琀0067\021ᡰ9740슃b8\020\210=r"
which I suppose that is incorrect (I'm supossing that I have to see the same initial message of "aña!a¡a¿a?a". If this message is fine then I don't know how to get what I need...)
So my question is: how can I get that descr[0][0] = 'a' and descr[0][1] = 'ñ' ?? I can't pass char to wchar (you've already see what I got). Am I doing it wrong? Or is there any other way? I am really stuck on that so any idea will be very apreciated.
Before, when I was working with wstrings (and it worked so fine) I was doing something like this:
if (word[i]==L'\x00D1' or word[i]==L'\x00F1') // ñ or Ñ
path ="PathOfÑ";
where word[i] is the same as descr[0][1] in that case but with wstrings. So with that i knew that this word[i] was the letter 'ñ'. Maybe this helps to understand what I'm doing
(btw...I'm working on eclipse, on linux. )

The mbstowcs function work on C-style strings, and one of the things about C-style strings is that they have a special terminating character, '\0'. You don't seem to be adding this terminator to the string, leading mbstowcs to go out of bounds of the actual string and giving you undefined behavior.

Related

Can I decode € (euro sign) as a char and not as a wstring/wchar?

Let's try explain my problem. I have to receive a message from a server (programmed in delphi) and do some things with that message in the client side (which is the side I programm, in c++).
Let's say that the message is: "Hello €" that means that I have to work with std::wstring as €(euro sign) needs 2 bytes instead of 1 byte, so knowing that I have made all my work with wstrings and if I set the message it works fine. Now, I have to receive the real one from the server, and here comes the problem.
The person on the server side is sending that message as a string. He uses a EncodeString() function in delphi and he says that he is not gonna change it. So my question is: If I Decode that string into a string in c++, and then I convert it into a wstring, will it work? Or will I have problems and have other message on my string var instead of "Hello €".
If yes, if I can receive that string with no problem, then I have another problem. The function that I have to use to decode the string is void DecodeString(char *buffer, int length);
so normally if you receive a text, you do something like:
char Text[255];
DescodeString(Text, length); // length is a number decoded before
So... can I decode it with no problem and have in Text the "Hello €" message? with that I'll just need to convert it and get the wstring.
Thank you
EDIT:
I'll add another example. If i know that the server is going to send me always a text of length 30 max, in the server they do something like:
EncodeByte(lengthText);
EncodeString(text)
and in the client you do:
int length;
char myText[30];
DecodeByte(length);
DecodeString(myText,length);
and then, you can work with myText as a string lately.
Hope that helps a little more. I'm sorry for not having more information but I'm new in that work and I don't know much more about the server.
EDIT 2
Trying to summarize... The thing is that I have to receive a message and do something with it, with the tool I said I have to decode it. So as de DecodeString() needs a char and I need a wstring, I just need a way to get the data received by the server, decode it with decodeString() and get it into a wstring, but I don't really know if its possible, and if it is, I'm not sure about how to do it and what type of vars use to get it
EDIT 3
Finally! I know what code pages are using. Seems that the client uses the ANSI ones and that the server doesn't, so.. I'll have to tell to the person who does that part to change it to the ANSI ones. Thanks everybody for helping me with my big big ignorance about the existence of code pages.
Since you're using wstring, I guess that you are on Windows (wstring isn't popular on *nix).
If so, you need the Delphi app to send you UTF-16, which you can use in the wstring constructor. Example:
char* input = "\x0ac\x020"; // UTF-16 encoding for euro sign
wchar_t* input2 = reinterpret_cast<wchar_t*>(input);
wstring ws(input2);
If you're Linux/Mac, etc, you need to receive UTF-32.
This method is far from perfect though. There can be pitfalls and edge cases for unicodes beyond 0xffff (chinese, etc). Supporting that probably requires a PhD.

urlDecode - php function in c++

I have urlDecode function.
But when i'm decoding some string like:
P%C4%99dz%C4%85cyJele%C5%84
I get output: PędzącyJeleń
Of course this is not correct output. I think its broken because there are Polish chars.
I try to set in compilator:
Use Unicode Character Set
or Use Multi-Byte Character Set
I try to do that using wstrings but i have a lot of errors :|
I suppose that i should use wstring not string but could you tell me how? There is not easier way to solve my problem? (i listen a lot about wstring and string and litte dont understand - wstring should not use on linux, but i have Windows)
//link to my functions at bottom
http://bogomip.net/blog/cpp-url-encoding-and-decoding/
//EDIT
When i change all string to wstring, fstream->wfstream
It still problem look:
z%C5%82omiorz - this wstring (from file ) != złomiorz , but this function print me L"z197130omiorz"
what is 197130 ? How to fix that ?:0

Why does my colon character disappear when I go from char[] to string?

In an old Windows application I'm working on I need to get a path from an environment variable and then append onto it to build a path to a file. So the code looks something like this:
static std::string PathRoot; // Private variable stored in class' header file
char EnvVarValue[1024];
if (! GetEnvironmentVariable(L"ENV_ROOT", (LPWSTR) EnvVarValue, 1024))
{
cout << "Could not retrieve the ROOT env variable" << endl;
return;
}
else
{
PathRoot = EnvVarValue;
}
// Added just for testing purposes - Returning -1
int foundAt = PathRoot.find_first_of(':');
std::string FullFilePath = PathRoot;
FullFilePath.append("\\data\\Config.xml");
The environment value for ENV_ROOT is set to "c:\RootDir" in the Windows System Control Panel. But when I run the program I keep ending up with a string in FullFilePath that is missing the colon char and anything that followed in the root folder. It looks like this: "c\data\Config.xml".
Using the Visual Studio debugger I looked at EnvVarValue after passing the GetEnvironmentVariable line and it shows me an array that seems to have all the characters I'd expect, including the colon. But after it gets assigned to PathRoot, mousing over PathRoot only shows the C and drilling down it says something about a bad ptr. As I noted the find_first_of() call doesn't find the colon char. And when the append is done it only keeps the initial C and drops the rest of the RootDir value.
So there seems to be something about the colon character that is confusing the string constructor. Yes, there are a number of ways I could work around this by leaving the colon out of the env variable and adding it later in the code. But I'd prefer to find a way to have it read and used properly from the environment variable as it is.
You cannot simply cast a char* to a wchar_t* (by casting to LPWSTR) and expect things to work. The two are fundamentally distinct types, and in Windows, they signify different encoding.
You obviously have WinAPI defines set such that GetEnvironmentVariable resolves to GetEnvironmentVariableW, which uses UTF-16 to encode the string. In practice, this means a 0 byte follows every ASCII character in memory.
You then construct a std::string out of this, so it takes the first 0 byte (at char index 1) as the string terminator, so you get just "c".
You have several options:
Use std::wstring and wchar_t EnvVarValue[1024];
Call GetEnvironmentVariableA() (which uses char and ASCII)
Use wchar_t EnvVarValue[1024]; and convert the returned value to a std::string using something like wcstombs.
It seems you are building with wide-character functions (as indicated by your cast to LPWSTR). This means that the string in EnvVarValue is a wide-character string, and you you should be using wchar_t and std::wstring instead.
I would guess that the contents in the array array after the GetEnvironmentVariable call is actually the ASCII values 0x43 0x00 0x3a 0x00 0x5c 0x00 etc. (that is the wide-char representation of "C:\"). The first zero acts as the string terminator for a narrow-character string, which is why the narrow-character string PathRoot only contains the 'C'.
The problem might be that EnvVarValue is not a wchar. Try using wchar_t and std::wstrîng.

Adding a string to a listbox results in weird characters

I have made a function that send strings to listbox using WIN32
char data[] = "abcd";
addToList(hWnd,data);
void addToList(HWND hWnd,char data[] ){
SendMessage(GetDlgItem(hWnd,IDC_LISTBOX),LB_ADDSTRING,0,(LPARAM)data);
}
when I execute this it's send data to list box but the problem they appeared in weird characters, I have tried wchar_t also but the problem still issued
First of all, you should be checking your API calls for errors. You need to check the return values of all your calls to API functions.
That said, given the code in the question,
SendMessage(GetDlgItem(hWnd,IDC_LISTBOX),LB_ADDSTRING,0,(LPARAM)data);
If that results in an item being added to the list box, then it means that GetDlgItem did indeed return a valid window handle, and data did indeed point to valid memory. In which case the only explanation for what you report is that the text encoded did not match.
So, we can assume that the SendMessage macro evaluates to SendMessageW. And since you are passing ANSI encoded text, that mismatch explains the symptoms. The function treats the text as UTF-16 encoded.
One obvious solution is to use SendMessageA instead. However, a better solution, in my view, would be to pass UTF-16 encoded data.
wchar_t data[] = L"abcd";
....
void addToList(HWND hWnd, const wchar_t *data)
{
SendMessage(GetDlgItem(hWnd,IDC_LISTBOX), LB_ADDSTRING, 0, (LPARAM)data);
}
Obviously your code would add in the error checking that I mentioned at the start.

Conversion from UTF-8 to ANSI wcstombs failes at one spezial character

I want to change a wchar_t* like it is displayed to a char*.
No conversions like in the WideCharToMultibyte should be done.
I found the wcstombs function and it looked like it works perfectly, but there is one char which does not get changed correcly.
It is the 'œ', it has the ANSI Number 156, but in UTF-8 it is the number 339.
Of caurse ASCII has not so much numbers, but why does it get the wrong one?
Here a part of my sourcecode, I added a loop and a if so that it works:
wchar_t *wc; // source string
char *cc; // destination string
int len = 0; // length of the strings
...
for(int i = 0; i < len; i++) {
if(wc[i] != 339) {
cc[i] = wc[i];
}else{
cc[i] = 156;
}
}
This Code is working, but seriously, is this the best way to solve that problem?
Many thanks in advance!
I want to change a wchar_t* like it is displayed to a char*.
Okay, you want to convert from wchar_t strings to char strings.
No conversions like in the WideCharToMultibyte should be done.
What? I presume you don't mean 'no conversion should be done,' but with only one example I can't tell what you want to avoid. Just WideCharToMultibyte or are there other functions?
I found the wcstombs function and it looked like it works perfectly,
wcstombs seems like WideCharToMultibyte to me, but I guess it's different in some way that's important to you? It'd be good if you could describe what exactly makes wcstombs acceptable and WideCharToMultibyte unacceptable.
but there is one char which does not get changed correcly.
Sounds like it's not working perfectly...
It is the 'œ', it has the ANSI Number 156, but in UTF-8 it is the number 339. Of caurse ASCII has not so much numbers, but why does it get the wrong one?
You probably mean that in CP1252 'œ' is encoded as 156 in decimal or 0x9C in hex, and that this character has the Unicode codepoint 339 in decimal, or more conventionally U+0153. I don't see where UTF-8 comes into this at all.
Here a part of my sourcecode, I added a loop and a if so that it works:
As for why you're not getting the results you expect, it's probably because you're not using wcstombs() correctly. It's hard to tell because you're not showing how you're doing the conversion with wcstombs().
wcstombs() converts between wchar_t and char using the encodings specified by the program's current C locale. If you've set the locale to one that uses a Unicode encoding for wchar_t and uses CP1252 for char then it should do what you expect.
This Code is working, but seriously, is this the best way to solve that problem?
No.
Please bear with my complete ignorance of c/c++, but you can either use a custom lookup table
or some premade function.
Here is an array of 256 integers, where the index i contains the unicode codepoint for the Windows-1252
codepoint i.
So for instance, the index 156, contains 0x0153 which is 339 in decimal.
int[] windows1252ToUnicodeCodePoints = {
0x0000,0x0001,0x0002,0x0003,0x0004,0x0005,0x0006,0x0007,0x0008,0x0009,0x000A,0x000B,0x000C,0x000D,0x000E,0x000F
,0x0010,0x0011,0x0012,0x0013,0x0014,0x0015,0x0016,0x0017,0x0018,0x0019,0x001A,0x001B,0x001C,0x001D,0x001E,0x001F
,0x0020,0x0021,0x0022,0x0023,0x0024,0x0025,0x0026,0x0027,0x0028,0x0029,0x002A,0x002B,0x002C,0x002D,0x002E,0x002F
,0x0030,0x0031,0x0032,0x0033,0x0034,0x0035,0x0036,0x0037,0x0038,0x0039,0x003A,0x003B,0x003C,0x003D,0x003E,0x003F
,0x0040,0x0041,0x0042,0x0043,0x0044,0x0045,0x0046,0x0047,0x0048,0x0049,0x004A,0x004B,0x004C,0x004D,0x004E,0x004F
,0x0050,0x0051,0x0052,0x0053,0x0054,0x0055,0x0056,0x0057,0x0058,0x0059,0x005A,0x005B,0x005C,0x005D,0x005E,0x005F
,0x0060,0x0061,0x0062,0x0063,0x0064,0x0065,0x0066,0x0067,0x0068,0x0069,0x006A,0x006B,0x006C,0x006D,0x006E,0x006F
,0x0070,0x0071,0x0072,0x0073,0x0074,0x0075,0x0076,0x0077,0x0078,0x0079,0x007A,0x007B,0x007C,0x007D,0x007E,0x007F
,0x20AC,0xFFFD,0x201A,0x0192,0x201E,0x2026,0x2020,0x2021,0x02C6,0x2030,0x0160,0x2039,0x0152,0xFFFD,0x017D,0xFFFD
,0xFFFD,0x2018,0x2019,0x201C,0x201D,0x2022,0x2013,0x2014,0x02DC,0x2122,0x0161,0x203A,0x0153,0xFFFD,0x017E,0x0178
,0x00A0,0x00A1,0x00A2,0x00A3,0x00A4,0x00A5,0x00A6,0x00A7,0x00A8,0x00A9,0x00AA,0x00AB,0x00AC,0x00AD,0x00AE,0x00AF
,0x00B0,0x00B1,0x00B2,0x00B3,0x00B4,0x00B5,0x00B6,0x00B7,0x00B8,0x00B9,0x00BA,0x00BB,0x00BC,0x00BD,0x00BE,0x00BF
,0x00C0,0x00C1,0x00C2,0x00C3,0x00C4,0x00C5,0x00C6,0x00C7,0x00C8,0x00C9,0x00CA,0x00CB,0x00CC,0x00CD,0x00CE,0x00CF
,0x00D0,0x00D1,0x00D2,0x00D3,0x00D4,0x00D5,0x00D6,0x00D7,0x00D8,0x00D9,0x00DA,0x00DB,0x00DC,0x00DD,0x00DE,0x00DF
,0x00E0,0x00E1,0x00E2,0x00E3,0x00E4,0x00E5,0x00E6,0x00E7,0x00E8,0x00E9,0x00EA,0x00EB,0x00EC,0x00ED,0x00EE,0x00EF
,0x00F0,0x00F1,0x00F2,0x00F3,0x00F4,0x00F5,0x00F6,0x00F7,0x00F8,0x00F9,0x00FA,0x00FB,0x00FC,0x00FD,0x00FE,0x00FF
};
What you need is this table inversed (or do linear scans everytime), in any other language I would use a construct like Map<int, int>.