Windows UTF8 to UTF16

Windows UTF8 to UTF16 - c++

I've been trying to convert between UTF8 and UTF16 LE with BOM using C++ to make the characters output correctly on Windows, without having to change the font of the terminal.
I tried changing the code pages, but they didn't work.
I have 2 questions,
How can I convert a normal string to a wide string?
Is it a bad idea to create a C++ map that maps each unicode character to the character in the Windows code page?
For example,
wcout << L"\u00A0" << endl;
This code outputs the letter á on Windows when using Code page 850. How can I put a variable in place of the "\u00A0" to convert a normal string to a wide character on Windows?
What I'd like is this:
wcout << Lsome_variable << endl;
I realise it's not valid c++ syntax but does anyone know how I can do this? Or if there's a better way?

As noted in the comments, the standard library now provides things like std::wstring_convert (and other functions/classes in the See Also section of that page).
Since you're on Windows, the WinAPI also has conversion functions. In this case you would be looking for MultiByteToWideChar which can be used to convert from UTF-8 to UTF-16.
Between those options, something should fit your use case. Generally speaking, you should never need to write your own conversion map.

Related

C++ Infinity Sign

Hello I was just wondering how I can display the infinity (∞) in C++? I am using CodeBlocks. I read couple of Q&A's on this topic but I'm a newbie at this stuff, especially with Hex coding and stuff. What do I have to include and what do I type out exactly. If someone can write the code and explain it, that'd be great! Thanks!

The symbol is not part of the ASCII code. However, in the code page 437 (most of the time the default in Windows Command Prompt with English locales/US regional settings) it is represented as the character #236. So in principle
std::cout << static_cast<unsigned char>(236);
should display it, but the result depends on the current locale/encoding. On my Mac (OS X) it is not displayed properly.
The best way to go about it is to use the UNICODE set of characters (which standardized a large amount of characters/symbols). In this case,
std::cout << "\u221E";
should do the job, as the UNICODE character #221 represents inf.
However, to be able to display UNICODE, your output device should support UTF encoding. On my Mac, the Terminal uses UTF, however Windows Command Prompt still uses the old ASCII encoding CodePage 437 (thanks to #chris for pointing this out). According to this answer, you can change to UNICODE by typing
chcp 65001
in a Command Prompt.

You can show it through its UNICODE
∞ has the value: \u221E
You can show any character from the Character Map by its unicode.

How to insert check mark "✓" in std::string in c++ programming using VS2010

In one of my application I'm developing in c++, I have to display "✓" mark. For that I need to first insert the same in a std::string or in a char. But when I do that, I'm getting a "?" mark as output. I'm using VS2010 to code. Please suggest how to solve the same. Thanks in advance.

There seems to be some basic misunderstanding.
The checkmark character is Unicode 0x2713. You cannot store it as a single character in std::string. The maximum value for a char is 0xff (255). It won't fit.
If you are developing a GUI using C++ for Windows then I would guess MFC. However if you are using std::string then perhaps that's not so. Some choices:
For MFC, you can rebuild your application in UNICODE mode. Then chars are short (16 bits) and your checkmark will fit fine.
You could use std::wstring instead of string. That means changes to existing code.
You could use UTF-8, which replaces the character by a multi-byte sequence. Not recommended in Windows, even if you think you know what you're doing. Very unfriendly.
In any case, if you are using GUI and dialog boxes you will have to make sure they are Unicode dialog boxes or nothing will work.
With a few more details, we could give more specific advice.

you can insert check mark in your console using C++ using the following code
cout << " (\xfb) "<<endl;
Output:
(√)

Printing arabic string in c++

I'm beginner in C++
I'd like to print Arabic statement in C++ using Borland C, but I failed , I also tried to save the file as UTF-8 but it didn't help.
So please if any one knows anything about this or about what is the problem or how to configure the compiler to print Arabic please help me.
#include<iostram.h>
#include<conio.h>
void main()
{
clrscr();
char x [5] = {'ا','ح','م','د'};
for(int i = 0; i< 5; i++)
cout << x[i];
getche();
}

First of all, you are assuming that your source code can contain Arabic characters. This is a very bad idea, and depends on the assumption that the compiler is interpreting your source file in the same code page as your editor is writing it in.
The safest way to handle Arabic or other arbitrary Unicode in Windows C++ is to compile with _UNICODE, declare variables of wchar_t (and friends), and use Unicode constants like '\u6041' for your Arabic characters. If you must do things with 'char', you will have to come up with the multi-byte \x sequences in the right code page for your Arabic characters, and deal with the fact that a single char can't hold an Arabic character in UTF-8.
Finally, since you are using cout, this will only show you Arabic if the current code page of your DOS box is an Arabic code page.

If your BorlandC++ is under DOS
By default you have not any character set to show it as Arabic. But those days, there were applications which change extended ASCII characters to other languages such as Arabic, Persian, ... .
Steps you should do:
If you are using Windows Vista/7+, first you should use DosBox (you need Fullscreen-mode)
You must change the default ASCII font table in memory
Something like vegaf.com which defines Persian/Arabic alpha-beta
Note: UTF-8 is undefined for this system

C++11 is the only C++ standard that can offer native support for UTF-8 ( and other UTF charsets ) encoding.
in pre C++11 releases you can simply use a third part library if you need UTF-8 support like this one .

what locale does wstring support?

In my program I used wstring to print out text I needed but it gave me random ciphers (those due to different encoding scheme). For example, I have this block of code.
wstring text;
text.append(L"Some text");
Then I use directX to render it on screen. I used to use wchar_t but I heard it has portability problem so I switched to swtring. wchar_t worked fine but it seemed only took English character from what I can tell (the print out just totally ignore the non-English character entered), which was fine, until I switch to wstring: I only got random ciphers that looked like Chinese and Korean mixed together. And interestingly, my computer locale for non-unicode text is Chinese. Based on what I saw I suspected that it would render Chinese character correctly, so then I tried and it does display the charactor correctly but with a square in front (which is still kind of incorrect display). I then guessed the encoding might depend on the language locale so I switched the locale to English(US) (I use win8), then I restart and saw my Chinese test character in the source file became some random stuff (my file is not saved in unicode format since all texts are English) then I tried with English character, but no luck, the display seemed exactly the same and have nothing to do with the locale. But I don't understand why it doesn't display correctly and looked like asian charactor (even I use English locale).
Is there some conversion should be done or should I save my file in different encoding format? The problem is I wanted to display English charactore correctly which is the default.

In the absence of code that demonstrates your problem, I will give you a correspondingly general answer.
You are trying to display English characters, but see Chinese characters. That is what happens when you pass 8 bit ANSI text to an API that receives UTF-16 text. Look for somewhere in your program where you cast from char* to wchar_t*.

First of all what is type of file you are trying to store text in?Normal txt files stores in ANSI by default (so does excel). So when you are trying to print a Unicode character to a ANSI file it will print junk. Two ways of over coming this problem is:
try to open the file in UTF-8 or 16 mode and then write
convert Unicode to ANSI before writing in file. If you are using windows then MSDN provides particular API to do Unicode to ANSI conversion and vice-verse. If you are using Linux then Google for conversion of Unicode to ANSI. There are lot of solution out there.
Hope this helps!!!

std::wstring does not have any locale/internationalisation support at all. It is just a container for storing sequences of wchar_t.
The problem with wchar_t is that its encoding is unspecified. It might be Unicode UTF-16, or Unicode UTF-32, or Shift-JIS, or something completely different. There is no way to tell from within a program.
You will have the best chances of getting things to work if you ensure that the encoding of your source code is the same as the encoding used by the locale under which the program will run.
But, the use of third-party libraries (like DirectX) can place additional constraints due to possible limitations in what encodings those libraries expect and support.

Bug solved, it turns out to be the CASTING problem (not rendering problem as previously said).
The bugged text is a intermediate product during some internal conversion process using swtringstream (which I forgot to mention), the code is as follows
wstringstream wss;
wstring text;
textToGenerate.append(L"some text");
wss << timer->getTime()
text.append(wss.str());
Right after this process the debugger shows the text as a bunch of random stuff but later somehow it converts back so it's readable. But the problem appears at rendering stage using DirectX. I somehow left the casting for wchar_t*, which results in the incorrect rendering.
old:
LPCWSTR lpcwstrText = (LPCWSTR)textToDraw->getText();
new:
LPCWSTR lpcwstrText = (*textToDraw->getText()).c_str();
By changing that solves the problem.
So, this is resulted by a bad cast. As some kind people provided correction to my statement.

utf-8 encoding a std::string?

I use a drawing api which takes in a const char* to a utf-8 encoded string. Doing myStdString.cstr() does not work, the api fails to draw the string.
for example:
sd::stringsomeText = "■□▢▣▤▥▦▧▨▩▪▫▬▭▮▯▰▱";
will render ???????????????
So how do I get the std::string to act properly?
Thanks

Try using std::codecvt_utf8 to write the string to a stringstream and then pass the result (stringstream::str) to the API.

There are so many variables here it's hard to know where to begin.
First, verify that the API really and truly supports UTF-8 input, and that it doesn't need some special setup or O/S support to do so. In particular make sure it's using a font with full Unicode support.
Your compiler will be responsible for converting the source file into a string. It probably does not do UTF-8 encoding by default, and may not have an option to do so no matter what. In that case you may have to declare the string as a std::wstring and convert it to UTF-8 from there. Alternatively you can look up each character beyond the first 128 and encode them as hex values in the string, but that's a hassle and makes for an unreadable source.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js