If I have a string with non-printable characters, are they supposed to appear or not when I use CDC::DrawText?
CString str = L"ItemOne\x1EItemTwo\x1EItemThree\x1E";
In WinCE5, the non-printable character did not appear, but in WinCE7, it appeared as a square. Which one has the correct behavior?
Or does it depend on the font used, or perhaps it is something that is configurable in the OS?
This depends on the font and on the charset you're using in the OS.
Don't forget that Windows CE is natively unicode, so something like \x1E is translated into Unicode too.
Related
All the ASCII codes greater than 127 are replaced by Diamond? symbol. How can I display those characters. I have an unsigned char buffer[1024] which contains values from 0 to 256.
Use the QString class's fromAscii() method. By default this will treat Ascii chars above 128 as Latin-1 chars. To change this behavior use QTextCodec::setCodecForCStrings method to set the correct codec for your usage.
I believe QT5 may have taken out the setCodecForCStrings method.
EDIT: Adnan supplied the QT5 alternative to setCodecForCStrings method, adding to answer for completeness.
Qt5 alternative for setCodecForCStrings is QTextCodec::setCodecForLocale(QTextCodec::codecForName("UTF-8"));
This is a rabbit hole with no end. Qt does not fully support printing ascii > 127 as it is not well defined. The current method is to use "fromLocal8bit()" which will take a char array and transform it into the "right" Unicode string (the only thing Qt supports printing).
QTextCodec::setCodecForLocale can be used to identify the character set you wish to transform from. Many codecs are supported, but for some reason IBM437 (the character set used by IBM PCs in the US for decades) is not supported, where several other codecs used by Europe, etc. are. Probably some characters in IBM437 were never assigned proper code points in Unicode, so transforming it isn't possible?
What's frustrating is that there are fonts with all 256 ascii code points, but it is simply not possible to display these in Qt as they only work with Unicode strings. There are a handful of glyphs they don't support, and it seems to grow with newer versions of Qt. Currently I know of 9, 10, 12, 13, and 173. Some of these are for obvious reasons (usually you don't want to print a carriage return glyph, though it did exist in DOS), but others used to work in Qt and now do not.
In my application, I resorted to creating a new font that has copies of the unprintable glyphs in higher unicode codepoints, and translate them before printing them on the screen. It's quite silly but Qt gave up on ascii many years ago, so it's the best option I could find.
We are using a korean font and freetype library and trying to display a korean character. But it displays some other characters indtead of hieroglyph
Code:
std::wstring text3 = L"놈";
Is there any tricks to type the korean characters?
For maximum portability, I'd suggest avoiding encoding Unicode characters directly in your source code and using \u escape sequences instead. The character 놈 is Unicode code point U+B188, so you could write this as:
std::wstring text3 = L"\uB188";
The question is what is the encoding of the source code.
It is likely UTF-8, which is one of the reasons not to use wstring. Use regular string. For more information on my way of handling characters, see http://utf8everywhere.org.
Using the following code to create a Unicode string:
wchar_t HELLO[20];
wsprintf(HELLO, TEXT("%c"), 0x2074);
When I display this onto a Win32 Control like a Text box or a button it gets mapped to a [] Square.
How do I fix this ?
I tried compiling with both Eclipse(MinGW) and Microsoft Visual C++ (2010).
Also, UNICODE is defined at the top
Edit:
I think it might be something to do with my system, because when I visit: http://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts
some of the unicode characters don't appear.
The font you are using does not contain a glyph for that character. You will likely need to install some new fonts to overcome this deficiency.
The character you have picked out is 'SAMARITAN MODIFIER LETTER EPENTHETIC YUT' (U+081A). Perhaps you were after U+2074, i.e. 'SUPERSCRIPT FOUR' (U+2074). You need hex for that: 0x2074.
Note you changed the question to read 0x2074 but the original version read 2074. Either way, if you see a box that indicates your font is missing that glyph.
The characters you are getting from Wikipedia are expressed in hexadecimal, so your code should be:
wchar_t HELLO[20];
wsprintf(HELLO, TEXT("%c"), (wchar_t)0x2074); // or TEXT('\x2074')
If it still doesn't work, it's a font problem; if you need a pan-Unicode font, it seems that Code2000 is one of the most complete out there.
Funny fact: the character that has the decimal code 2074 (i.e. hex 81a) seems to actually be a box (or it's such a strange beast that even the image outline at FileFormat.Info is wrong). :)
For the curious ones: it turns out that 0x081a is this thing:
I want to directly embed non-ASCII Unicode characters in string literals and use them in printf. This implies my source codes must be saved in utf-8 or utf-16. Visual Studio 2010 does support editing and saving C++ source files in either format. But when compiled & executed, it does not produce the correct unicode characters. Does the compiler support string literals with unicode characters embedded?
e.g.
wprintf(L" chinese characters:中文字\n"); the trailing chinese characters cannot be displayed
I don't have a Chinese version of Windows to test with, so this is complete speculation.
The console and file output functions are aware that files are not coded in UTF-16, so they attempt to convert the characters to a code page before output. Just as the default locale is "C" rather than anything based on your system settings, so too the default code page is probably an inappropriate one that does not include Chinese characters.
There is a function SetConsoleOutputCP to change the code page for the console. It is not clear if this function changes the code page used by the actual console window, or if it only affects conversions from Unicode within the program.
The easy way to test wide literals is to skip the formatting part of printf, and give your string straight to the OS: WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), L" chinese characters:中文字", ....
It's possible that #pragma setlocale may be what you need.
When I'm trying to do this code in C++
cout << char(219);
the output on my mac is question mark ?
However, on PC it gives me a black square.
Does anyone have any idea why on mac there is only 128 characters, when it should be 256?
Thanks for your help.
There's no such thing as ASCII character 219. ASCII only goes up to 127. chars 128-255 are defined in different ways in different character encodings for different languages and different OSs.
MacRoman defines it as €.
IBM code page 437 (used at the Windows command prompt) defines it as █.
Windows code page 1252 (used in Windows GUI programs) defines it as Û.
UTF-8 defines it as a part of a 2-byte character. (Specifically, the lead byte of the characters U+06C0 to U+06FF.)
ASCII is really a 7-bit encoding. If you are printing char(219) that is using some other encoding: on Windows most probably CP 1252. On Mac, I have no idea...
When a character is missing from an encoding set, it shows a box on Windows (it's not character 219, which doesn't exist) Macs show the question mark in a diamond symbol because a designer wanted it that way. But they both mean the same thing, missing/invalid character.