Unicode in C++ support [duplicate] - c++

This question already has answers here:
How well is Unicode supported in C++11?
(5 answers)
Closed 9 years ago.
There are some posts on this matter but I wanted to double check.
In Joel Spoolsky's article (link) one reads:
In C++ code we just declare strings as wchar_t ("wide char") instead
of char and use the wcs functions instead of the str functions (for
example wcscat and wcslen instead of strcat and strlen). To create a
literal UCS-2 string in C code you just put an L before it as so:
L"Hello".
My question is: Is what is written above, not enough to support Unicodes in a C++ app?
My confusions started when I couldn't output simple text like (in Russian):
wcout<<L"логин";
in console.
Also, recently I saw some code written for an embedded device where one person handles I think Unicode related strings using wchar_t.
Any help greatly appreciated.

This works in C++11 on a linux, utf8 machine:
#include <iostream>
int main(int, char**) {
std::cout << u8"Humberto Massa Guimarães\nлогин\n";
}

First, you can not print non-english characters in command-line
Second, briefly; UNICODE uses two bytes for every character and char uses single byte. For example string "ABC" will be stored in char as ABC\0 (3 bytes + end_of_string_character)
but in UNICODE will be stored as A\0B\0C\0\0\0 (6 + end_of_string_character which is two bytes like other characters)
For view some text, I suggest you to MessageBoxW:
First include windows header file: #include <windows.h>
Second use MessageBoxW API function:
MessageBoxW(0, L"UNICODE text body", L"title", MB_ICONINFORMATION);

Related

How to traverse a wstring properly? [duplicate]

This question already has an answer here:
std::wcout printing unicode characters but they are hidden
(1 answer)
Closed 4 months ago.
I was trying to traverse a std::wstring, here's my code:
#include <iostream>
int main() {
std::wstring ws;
std::getline(std::wcin, ws);
for (auto wc : ws) {
std::wcout << wc << std::endl;
}
}
When I tried to run this program, typed “你好” into the console, the program just printed 4 blank lines.
What I expect the program to output:
你
好
I have searched this site and came back with no solution.
What should I do to produce the result I expect?
First:
This is an encoding problem, so it has not much connection to wstring, a string would probably have the same problem. And the size of wchar and encoding are system dependent, so your code would probably work under linux.
The explanation for your result is that under windows a wstring has 2 bytes per character and it uses UTF-16 encoding, but UTF-16 is a variable-length encoding and I am pretty sure that your (Chinese?) symbols can not be represented in 2 bytes but they need more space.
So for your exact example you could use some function or wrapper class that gives you full code points instead of code units, but I personally do not know any library that do so, because I follow my own advice:
But:
I recommend to read http://utf8everywhere.org/ , especially the part about code point, code unit, abstract character and so on, and then stick to UTF-8 and the opaque data argument.

Is a C++ string class which I can use to process UTF8 strings? [duplicate]

This question already has answers here:
std::string and UTF-8 encoded unicode
(3 answers)
What encoding does std::string.c_str() use?
(2 answers)
Closed 8 years ago.
I think string can handle ANSI and wstring can handle Unicode16, is that right?
does C++ STL has class which can support UTF8 strings?
Just use std::string. It handles UTF-8 strings just fine.
Obviously you need to be aware that a codepoint can be 1 to 4 chars, and that a character can actually be any number of codepoints, but that rarely matters to you, and when it matters, std::wstring would have the same problems.
Big advantage is that std::string works the same everywhere. With std::wstring, different implementations use 16 bit or 32 bit numbers with very different meanings, there are problems with byte ordering and so on.

Command Line arguments handling on Visual C++ 2010 [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What is the difference between _tmain() and main() in C++?
I have a console application, which takes one command line parameter.
int _tmain(int argc, char* argv[])
{
...
printf("Path: %s\n", argv[1]);
...
}
When I run the program with an argument (myprogram.exe D:\myfolder\myfile), it prints Path: D instead of Path: D:\myfolder\myfile.
How should I change the procedure for extracting first command-line parameter (argv[1]) so that it returns the full path, not just the first letter?
I tried to put the path in quotes (myprogram.exe "D:\myfolder\myfile"), but it didn't help.
Use _tprintf instead of printf. Your program is almost certainly set to compile as UNICODE and, therefore, each "character" takes up two bytes.
You call printf which operates on single byte characters. The second byte happens to be 0, and so printf thinks it reached the end of the string since a zero byte is considered the termination of a string.
And make sure your argv is declared as TCHAR and not char. Be consistent: either use the TCHAR routines, or don't. But don't mix unless you have a very good reason (i.e. you know what you're doing and you are gearing up to do some magic). Chances are you don't and you shouldn't.
Visual Studio C++ has a special way of dealing with Unicode support. Macros in tchar.h expand to normal or wide char versions of the symbols depending on the project setting. So _tprintf expands to printf or wprintf. It seems like you're trying to use printf, if your project settings enable the Unicode support, what you're running into is the exact problem to expect. Have a look at tchar.h and try using _tprintf instead.
In visual studio the main function usually takes up the arguments of type wchar. If that is the case try outputting via the wprintf function as:
wprintf("Path : %S", argv[1]);
Notice the capital 'S'. If you substitute this with a lowercase 's' it will print the first character only. That is how the wide characters work. you should read more about them on the msdn forum.

WNDCLASSEX vs WNDCLASSEXW? [duplicate]

What is the difference in calling the Win32 API function that have an A character appended to the end as opposed to the W character.
I know it means ASCII and WIDE CHARACTER or Unicode, but what is the difference in the output or the input?
For example, If I call GetDefaultCommConfigA, will it fill my COMMCONFIG structure with ASCII strings instead of WCHAR strings? (Or vice-versa for GetDefaultCommConfigW)
In other words, how do I know what Encoding the string is in, ASCII or UNICODE, it must be by the version of the function I call A or W? Correct?
I have found this question, but I don't think it answers my question.
The A functions use Ansi (not ASCII) strings as input and output, and the W functions use Unicode string instead (UCS-2 on NT4 and earlier, UTF-16 on W2K and later). Refer to MSDN for more details.

C win32 api unicode to ascii [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How do you convert LPCWSTR to const char *?
I have a function that gets a LPCWSTR string for win32 api usage.
Now I have a function that sends data in ASCII to the IRC server.
So now I need to convert it to ASCII, I haven't found any solutions, I only found ways to convert ASCII to UNICODE.
WideCharToMultiByte converts from UNICODE to MBCS. If your original wide char string consisted only of characters which could be represented in ASCII, then this will result in a ASCII string.
You can also use wcstombs which internally calls WideCharToMultiByte.
Either way, you will get a LPSTR or char * (which are the same).