Unicode Support in MFC Controls - c++

I'm exploring converting an existing MFC app from MBCS to Unicode, and I'm compiling a simple starter app in Unicode mode to check out how edit controls, for example, behave differently in Unicode/W or MBCS/A mode.
But I'm getting some strange results.
If I enter Alt+1702 into Word, for example, I get the Arabic character (ڦ) which is expected from the Unicode table.
But if I enter Alt+1702 into an edit control in the Unicode MFC app, I get a superscript "a" (ª) instead. This is the same behaviour that I get from the existing MBCS app.
This second behaviour also happens in Word (2007) if I use File-Open and enter Alt+1702 in the Filename field. But it comes through properly if I enter it in the Font combo in the Ribbon.
What am I missing here?

Windows disables hex-numpad by default. You must enable it and enter the value using Alt++Hex value
How to enable it:
Insert Unicode characters via the keyboard?
How to enter Unicode characters in Microsoft Windows
About the reason why Alt+1702 produces ª
Alt codes are generally limited to ANSI or OEM code pages only and won't work for code points larger than 255. A few apps (like MS Word as you experienced) do support larger values, which means Alt+1702 will produce U+06A6 (Arabic letter peheh = ڦ) as expected (1702 = 0x06A6). Some other apps just throw away any digits after the third one. But by default in almost all applications if you input any larger values then only the low byte of the real value is taken as the code point, i.e. modulo 256
So pressing Alt+1702 will be equivalent to Alt+166 because 1702 ≡ 166 (mod 256). When you run US Windows which uses code page 437 for the OEM code page then the character at code point 166 is ª

Related

How to store the character ú in char array with std::cin?

In console the user type several characters including: ú .I would like store these characters in a char array using std::cin, but the character ú is stored as: 163'£', I really want to store it as: 163'ú', How could I do it?.
The character set of the console defines how a char value will be displayed. For example:
if console uses ISO 8859-1 or windows-1252 character set, the value 163 is a £;
if console uses an old DOS code page 850, the same value of 163 is an ù.
In principle, if you input a char from the console and output this char on the same console, you should graphically get the same result.
However, if there's some mixing, this is not the case. For example if you input ù on a CMD window using 850 code page, but then output the result in a unicode window, you would get £ as output. Same phenomenon if you write a file to the disk and open it in an editor using another character encoding.
Unfortunately, console settings and default encodings are things which are very much system dependent, and more information is needed to provide you accurate advise on the best way to solve the issue.

C++ Infinity Sign

Hello I was just wondering how I can display the infinity (∞) in C++? I am using CodeBlocks. I read couple of Q&A's on this topic but I'm a newbie at this stuff, especially with Hex coding and stuff. What do I have to include and what do I type out exactly. If someone can write the code and explain it, that'd be great! Thanks!
The symbol is not part of the ASCII code. However, in the code page 437 (most of the time the default in Windows Command Prompt with English locales/US regional settings) it is represented as the character #236. So in principle
std::cout << static_cast<unsigned char>(236);
should display it, but the result depends on the current locale/encoding. On my Mac (OS X) it is not displayed properly.
The best way to go about it is to use the UNICODE set of characters (which standardized a large amount of characters/symbols). In this case,
std::cout << "\u221E";
should do the job, as the UNICODE character #221 represents inf.
However, to be able to display UNICODE, your output device should support UTF encoding. On my Mac, the Terminal uses UTF, however Windows Command Prompt still uses the old ASCII encoding CodePage 437 (thanks to #chris for pointing this out). According to this answer, you can change to UNICODE by typing
chcp 65001
in a Command Prompt.
You can show it through its UNICODE
∞ has the value: \u221E
You can show any character from the Character Map by its unicode.

How to access box drawing characters in ascii in c++ on Mac

The character I'm first looking for is usually 201 in normal ascii code, but its different for mac. How do i work around this?
It's possible to input the Unicode characters on a Mac by switching to the Unicode Hex Input keyboard layout.
Open system preferences
Choose keyboard
Add Unicode Hex Input to the list
Select "Show Input menu in menu bar"
Close the preferences
Click on the flag that's appeared in the menu bar
Select Unicode Hex Input
Then you need the codes and you can find a nice summary of the box codes here at Wikipedia.
To enter a code:
Hold down Option (alt)
Type the code, without the preceding U, i.e for U+2560, type 2560
Release Option
I drew this example using that method: ╠╩╬╩╣
After you're finished, you can change your keyboard input back to your normal one using the flag in the menu bar.
This character in not available in any single byte character set on OS X.
Unlike the Windows environment (which require special coding to use Unicode), Unicode is readily available in OS X.
Use Unicode U+2554 or UTF-8 E2 95 94
You can just use the following in a character or string ╔
There is no such thing as ASCII character 201. ASCII is a 7-bit single byte character encoding, where code points go from 0 to 127, inclusive. Maybe you are referring to “╔” in the original IBM PC character set?
Then you can do this:
Use a Windows PC with a keyboard that has a numeric keypad.
In a console window with input (e.g. the command interpreter), hold down Alt and type 201 on the numeric keypad, in number mode (NumLock on).
Start Word or Windows’ WordPad.
Copy and paste the character into Word or WordPad.
Type Alt+X.
On my laptop WordPad reports 2554, which means it's Unicode character U+2554 (hexadecimal).
In C++ you can express that character as L'\u2554', which is of type wchar_t.
On the other hand, if you prefer names to numbers, ncurses has supported double- and thick-line drawing characters in Unicode terminals since late 2009. That is after the (rather old) ncurses 5.7 bundled with OSX, but newer releases are available with MacPorts, etc.
Here are a couple of screenshots to illustrate:

C++ Console character display

I'm using Visual Studio as my C++ IDE.
When I try to std::cout OEM type characters like :" █ ░",
I get an error saying:
" some unicode characters could not be saved in the current codepage.
do you want to resave this file as Unicode in order to maintain your
data?"
So I press "save with other encoding" and switch it to Western European(DOS)-Codepage 850,
and it displays the characters perfectly fine in console.
My question is, even though the characters are displaying for me just fine,
if I were to give the completed program.exe to someone, would it display the same characters I see(█), or would they see an entirely different set of characters like (Ä)?
In general, no. If their terminal uses the same encoding, then you can hope that the characters will be displayed the same way. You should not rely on this, though.

Unicode character for superscript shows a square box: ࠚ

Using the following code to create a Unicode string:
wchar_t HELLO[20];
wsprintf(HELLO, TEXT("%c"), 0x2074);
When I display this onto a Win32 Control like a Text box or a button it gets mapped to a [] Square.
How do I fix this ?
I tried compiling with both Eclipse(MinGW) and Microsoft Visual C++ (2010).
Also, UNICODE is defined at the top
Edit:
I think it might be something to do with my system, because when I visit: http://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts
some of the unicode characters don't appear.
The font you are using does not contain a glyph for that character. You will likely need to install some new fonts to overcome this deficiency.
The character you have picked out is 'SAMARITAN MODIFIER LETTER EPENTHETIC YUT' (U+081A). Perhaps you were after U+2074, i.e. 'SUPERSCRIPT FOUR' (U+2074). You need hex for that: 0x2074.
Note you changed the question to read 0x2074 but the original version read 2074. Either way, if you see a box that indicates your font is missing that glyph.
The characters you are getting from Wikipedia are expressed in hexadecimal, so your code should be:
wchar_t HELLO[20];
wsprintf(HELLO, TEXT("%c"), (wchar_t)0x2074); // or TEXT('\x2074')
If it still doesn't work, it's a font problem; if you need a pan-Unicode font, it seems that Code2000 is one of the most complete out there.
Funny fact: the character that has the decimal code 2074 (i.e. hex 81a) seems to actually be a box (or it's such a strange beast that even the image outline at FileFormat.Info is wrong). :)
For the curious ones: it turns out that 0x081a is this thing: