Unicode chars are printing, but not as expected

Unicode chars are printing, but not as expected - c++

When i want a string containing a unicode char, i would normally do it with u8 before the string.
Since im making addons and gamemodes for a game, i usually need to use multibyte char set.
however, this project does not print unicode correctly, for instance, as a loading animation, i use a unicode char that is moved over periods. After applying some optimizations this no longer works.
case 1: game::functions.loadanim(XorStr(u8"•...... "), "Loading"); break;
but now, in place of printing the •, it prints â€Ç
all my unicode symbols have similar issues, project wide, some will actually print the unicode, but with unexpected results, when i try to append a degrees symbol, like this
XorStr("°");
it actually prints this
Â°
I have tried resaving all files as UTF-8 encoding, i have also tried changing encoding using the command line, if i dont use multibyte char set, it simply crashes, this was working fine before, and only changed after the optimization stuff in the property page, however, even after undoing all the changes in the property page, i still have this issue.
What did i break to cause this?

Related

C++ char spacing in console output, UTF-16 characters

I'm making a game in C++ console using UTF-16 characters to make it little bit more interesting, but some characters are different size then others. So, when I print the level, things after character are moved further than others. Is there any way how to add spacing between characters with some console function, I try to google something helpful, but I have not found nothing.
I tried to change font size by CONSOLE_FONT_INFOEX, but it changed nothing, maybe i implement it in the wrong way, or it not work with UTF-16 characters.
// i tried this
CONSOLE_FONT_INFOEX cfi;
cfi.cbSize = sizeof(cfi);
cfi.dwFontSize.X= 24;
cfi.dwFontSize.Y= 24;

Unfortunately I expect that this will heavily depend on the particular console you're using. Some less Unicode-friendly consoles will treat all characters as the same size (possibly cutting off the right half of larger characters), and some consoles will cause larger characters to push the rest of the line to the right (which is what I see in the linked image). The most reasonable consoles I've observed have a set of characters considered "double-wide" and reserve two monospace columns for those characters instead of one, so the rest of the line still fits into the grid.
That said, you may want to experiment with different console programs. Can I assume you are on Windows? In that case, you might want to give Windows Terminal a try. If that doesn't work, there are other console programs available, such as msys's Mintty, or ConEmu.

So, after some intense googling i found the solution. And solution is fight fire with fire. Unicode include character Thin Space, that is 1/5 of the normal space, so if i include two of them with one normal space after my problematic character, the output is diplaying how i want. If anybody runs into some simliar issue, unicode have lot of different sized spaces. I found Website that shows them all of them, with their propperties.
fixed output picture

Unicode in wxWidgets

I'm creating a calculator application in C++ wxWidgets using Visual Studio 2019. I have created a custom button class that I want to use for all mathematical operations and symbols.
How can I set the button's label to √ instead of sqrt? If I do that, a ? symbol is displayed instead. I also need to display these symbols on a wxTextCtrl, if I do it I get the following error when I try to compile: (ignore App.razor, the picture is not mine)
Do I need to change che current character set from ASCII to Unicode? How do you do that?

For a single character, you can just use wxUniChar. You create a wxUniChar with a value in hexadecimal of the Unicode code point for the desired character. Since the Unicode code point of the square root character is U+221A, you can create a wxUniChar for this character like so:
wxUniChar c(0x221A);
wxUnichar is implicitly convertible to wxString, so (assuming wxWidgets was built in Unicode mode), you can use wxUniChar variables exactly as you would use a wxString. For example you could do something like:
control->SetLabel(c);
or
dc.DrawText(c,0,0);

The answer by #New-Pagodi (sorry, don't know how to tag people with spaces in their names) works, but just saving your file in UTF-8 encoding, as MSVS proposes you to do is a much nicer solution. Even in this case notice that you still need to either use wxString::FromUTF8("√") or explicitly set your locale encoding to UTF-8 by using setlocale(), which is (finally) supported by the recent Windows versions, in which case you can use just "√", or use wide strings, i.e. L"√"`.
I.e. you must both have the correct bytes (e2 88 9a for UTF-8-encoded representation of U+221A) in the file and use the correct encoding when creating wxString from it if you're using char* strings. By default this encoding is not UTF-8 under Windows, so using just wxString("√") doesn't work.

how to display extended ascii character in QTextEdit

All the ASCII codes greater than 127 are replaced by Diamond? symbol. How can I display those characters. I have an unsigned char buffer[1024] which contains values from 0 to 256.

Use the QString class's fromAscii() method. By default this will treat Ascii chars above 128 as Latin-1 chars. To change this behavior use QTextCodec::setCodecForCStrings method to set the correct codec for your usage.
I believe QT5 may have taken out the setCodecForCStrings method.
EDIT: Adnan supplied the QT5 alternative to setCodecForCStrings method, adding to answer for completeness.
Qt5 alternative for setCodecForCStrings is QTextCodec::setCodecForLocale(QTextCodec::codecForName("UTF-8"));

This is a rabbit hole with no end. Qt does not fully support printing ascii > 127 as it is not well defined. The current method is to use "fromLocal8bit()" which will take a char array and transform it into the "right" Unicode string (the only thing Qt supports printing).
QTextCodec::setCodecForLocale can be used to identify the character set you wish to transform from. Many codecs are supported, but for some reason IBM437 (the character set used by IBM PCs in the US for decades) is not supported, where several other codecs used by Europe, etc. are. Probably some characters in IBM437 were never assigned proper code points in Unicode, so transforming it isn't possible?
What's frustrating is that there are fonts with all 256 ascii code points, but it is simply not possible to display these in Qt as they only work with Unicode strings. There are a handful of glyphs they don't support, and it seems to grow with newer versions of Qt. Currently I know of 9, 10, 12, 13, and 173. Some of these are for obvious reasons (usually you don't want to print a carriage return glyph, though it did exist in DOS), but others used to work in Qt and now do not.
In my application, I resorted to creating a new font that has copies of the unprintable glyphs in higher unicode codepoints, and translate them before printing them on the screen. It's quite silly but Qt gave up on ascii many years ago, so it's the best option I could find.

what locale does wstring support?

In my program I used wstring to print out text I needed but it gave me random ciphers (those due to different encoding scheme). For example, I have this block of code.
wstring text;
text.append(L"Some text");
Then I use directX to render it on screen. I used to use wchar_t but I heard it has portability problem so I switched to swtring. wchar_t worked fine but it seemed only took English character from what I can tell (the print out just totally ignore the non-English character entered), which was fine, until I switch to wstring: I only got random ciphers that looked like Chinese and Korean mixed together. And interestingly, my computer locale for non-unicode text is Chinese. Based on what I saw I suspected that it would render Chinese character correctly, so then I tried and it does display the charactor correctly but with a square in front (which is still kind of incorrect display). I then guessed the encoding might depend on the language locale so I switched the locale to English(US) (I use win8), then I restart and saw my Chinese test character in the source file became some random stuff (my file is not saved in unicode format since all texts are English) then I tried with English character, but no luck, the display seemed exactly the same and have nothing to do with the locale. But I don't understand why it doesn't display correctly and looked like asian charactor (even I use English locale).
Is there some conversion should be done or should I save my file in different encoding format? The problem is I wanted to display English charactore correctly which is the default.

In the absence of code that demonstrates your problem, I will give you a correspondingly general answer.
You are trying to display English characters, but see Chinese characters. That is what happens when you pass 8 bit ANSI text to an API that receives UTF-16 text. Look for somewhere in your program where you cast from char* to wchar_t*.

First of all what is type of file you are trying to store text in?Normal txt files stores in ANSI by default (so does excel). So when you are trying to print a Unicode character to a ANSI file it will print junk. Two ways of over coming this problem is:
try to open the file in UTF-8 or 16 mode and then write
convert Unicode to ANSI before writing in file. If you are using windows then MSDN provides particular API to do Unicode to ANSI conversion and vice-verse. If you are using Linux then Google for conversion of Unicode to ANSI. There are lot of solution out there.
Hope this helps!!!

std::wstring does not have any locale/internationalisation support at all. It is just a container for storing sequences of wchar_t.
The problem with wchar_t is that its encoding is unspecified. It might be Unicode UTF-16, or Unicode UTF-32, or Shift-JIS, or something completely different. There is no way to tell from within a program.
You will have the best chances of getting things to work if you ensure that the encoding of your source code is the same as the encoding used by the locale under which the program will run.
But, the use of third-party libraries (like DirectX) can place additional constraints due to possible limitations in what encodings those libraries expect and support.

Bug solved, it turns out to be the CASTING problem (not rendering problem as previously said).
The bugged text is a intermediate product during some internal conversion process using swtringstream (which I forgot to mention), the code is as follows
wstringstream wss;
wstring text;
textToGenerate.append(L"some text");
wss << timer->getTime()
text.append(wss.str());
Right after this process the debugger shows the text as a bunch of random stuff but later somehow it converts back so it's readable. But the problem appears at rendering stage using DirectX. I somehow left the casting for wchar_t*, which results in the incorrect rendering.
old:
LPCWSTR lpcwstrText = (LPCWSTR)textToDraw->getText();
new:
LPCWSTR lpcwstrText = (*textToDraw->getText()).c_str();
By changing that solves the problem.
So, this is resulted by a bad cast. As some kind people provided correction to my statement.

Possible to pass UTF-8/UTF-16 options to JVM invoked from C++?

I've got a Windows C++ program where I want to invoke a JVM and be able to pass it an option that might be given from the command line invocation of the C++ program (the command line option might not be plain text, for example "-Dblah=japan日本"). The JavaVMOption struct in jni.h appears to define the option string as chars only, so it looks like I can't just pass it a wide string.
I tried converting it to UTF-8 and storing it as a narrow string on the C++ side and then on the Java side to convert it back, but it seems the "日本" gets replaced with the actual "??" characters, and thus are lost in the conversion-unconversion process.
Am I thinking about this incorrectly? Would this not be expected to work?

The invocation api documentation makes it clear:
typedef struct JavaVMOption {
char *optionString; /* the option as a string in the default platform encoding */
void *extraInfo;
} JavaVMOption;
The term "default platform encoding" is unambiguous, that does not mean utf-8 on Windows. It means the encoding used by the default system code page. If your machine is not configured to use a Japanese code page (like 932) then the conversion from the utf-16 string is going to produce question marks for Japanese characters that cannot be converted. This is not normally a problem since a Japanese user will have the correct code page selected. No workaround for having the wrong one.
Ensure you've got the correct system code page selected, Control Panel + Region and Language to change. And use WideCharToMultiByte() with CP_ACP to make the conversion.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js