Printing Chinese Characters in C++ - c++

I've been trying to print Chinese Characters in C++. I've already searched around in the Internet, some said that you have to use wcout, others have suggested other methods. I've also stumbled on this post, where someone uses a piece of code:
#include <iostream>
int main()
{
char x[] = "中";
char y[] = u8"中";
wchar_t z = L'中';
char16_t b = u'\u4e2d';
char32_t a = U'\U00004e2d';
std::cout << x << '\n';
std::cout << y << '\n';
std::wcout << z << '\n';
std::cout << a << '\n';
std::cout << b << '\n';
}
which, on an internet site that shows the output of C++ code, prints:
中
中
-
20013
20013
However, for me it just prints
õ©¡
õ©¡
20013
20013
I'm using JetBrains CLion, with encoding set to UTF-8. However, I've also tried Visual Studio and QT Creator, I get the same result. I hope someone can help me out.

If you're using OSX Terminal maybe you can check the Encoding.
Terminal -> Preferences -> Encodings Tab
Then check if Traditional Chinese is checked or Unicode (UTF-8).
For Windows, you can try this, to change to UTF-8 encoding.
Go to Start then Run "regedit" -> Navigate to [HKEY_LOCAL_MACHINE\Software\Microsoft\Command Processor\Autorun] -> modify value to "chcp 65001"
Hope this helps.

Related

Cout unsigned char

I'm using Visual Studio 2019: why does this command do nothing?
std::cout << unsigned char(133);
It literally gets skipped by my compiler (I verified it using step-by-step debug):
I expected a print of à.
Every output before the next command is ignored, but not the previous ones. (std::cout << "12" << unsigned char(133) << "34"; prints "12")
I've also tried to change it to these:
std::cout << unsigned char(133) << std::flush;
std::cout << (unsigned char)(133);
std::cout << char(-123);
but the result is the same.
I remember that it worked before, and some of my programs that use this command have misteriously stopped working... In a blank new project same result!
I thought that it my new custom keyboard layout could be the cause, but disabling it does not change so much.
On other online compilers it works properly, so may it be a bug of Visual Studio 2019?
The "sane" answer is: don't rely on extended-ASCII characters. Unicode is widespread enough to make this the preferred approach:
#include <iostream>
int main() {
std::cout << u8"\u00e0\n";
}
This will explicitly print the character à you requested; in fact, that's also how your browser understands it, which you can easily verify by putting into e.g. some unicode character search, which will result in LATIN SMALL LETTER A WITH GRAVE, with the code U+00E0 which you can spot in the code above.
In your example, there's no difference between using a signed or unsigned char; the byte value 133 gets written to the terminal, but the way it interprets it might differ from machine to machine, basing on how it's actually set up to interpret it. In fact, in a UTF-8 console, this is simply a wrong unicode sequence (u"\0x85" isn't a valid character) - if your OS was switched to UTF-8, that might be why you're seeing no output.
You can try to use static_cast
std::cout << static_cast<unsigned char>(133) << std::endl;
Or
std::cout << static_cast<char>(133) << std::endl;
Since in mine all of this is working, it's hard to pinpoint the problem, the common sense would point to some configuration issue.

架 (U+67B6) is not graphical with en_US.UTF-8. Whats going on?

This is a follow up question to:
std::isgraph asserts, how to fix?
After setting locale to "en_US.UTF-8", std::isgraph no longer asserts.
However, the unicode character 架 (U+67B6) is reported as false in the same function. What is going on ?
It's a unicode built on Windows platform.
If you want to test characters that are too large to fit in an unsigned char, you can try using the wide-character versions, or a Unicode library as already suggested (Which is really the better option for portable code, as it removes any system or locale based differences in behavior).
This program:
#include <clocale>
#include <cwctype>
#include <iostream>
int main() {
wchar_t x = L'\u67B6';
char *loc = std::setlocale(LC_CTYPE, "");
std::wcout << "Using locale " << loc << ".\n";
std::wcout << "Character " << x << " is graphical: " << std::boolalpha
<< static_cast<bool>(std::iswgraph(x)) << '\n';
return 0;
}
when compiled and ran on my Ubuntu test system, outputs
Using locale en_US.utf8.
Character 架 is graphical: true
You said you're using Windows, but I don't have a Windows computer available for testing, so I can't confirm if this'll work there or not.
std::isgraph is not a Unicode-aware function.
It's an antiquity from C.
From the documentation:
The behavior is undefined if the value of ch is not representable as unsigned char and is not equal to EOF.
It only takes int because .. it's an antiquity from C. Just like std::tolower.
You should be using something like ICU instead.

In C/C++, how do you edit a certain 'coordinate' in stdout?

I've been using Vim a lot lately, and I was wondering how the program manages to change the characters at certain positions in the terminal. For example, when using :rc, it replaces the character under the cursor with c.
I have also seen similar things done with Homebrew, which prints a progress bar to the screen and updates it when necessary.
How is this done in C/C++?
There is no standard way of doing this in C++.
It is done with OS dependent lbiraries, such as curses and similar libraries (ncurses) in the Unix/Linux world. Some of these libraries have been ported on across platforms (example: PDCurses)
For very simple things such as a progress bar or a counter, and as long as you remain on a single line there is the trick of using "\r" (carriage return) in the output, to place the cursor back at the begin of the current line. Example:
for (int i = 0; i < 100; i++) {
cout << "\rProgress: " << setw(3) << i;
this_thread::sleep_for(chrono::milliseconds(100));
}
Certainly, using ncurses or similar library is a good answer. An alternative may be to use ANSI Escape Codes to control the cursor in some terminal emulators (but not Windows command shell). For example, this code prints a line in multiple colors and then moves the cursor to 2,2 (coordinates are 1-based with 1,1 being the upper left corner) and prints the word "red" in the color red.
#include <iostream>
#include <string>
const std::string CSI{"\x1b["};
const std::string BLUE{CSI + "34m"};
const std::string RED{CSI + "31m"};
const std::string RESET{CSI + "0m"};
std::ostream &curpos(int row, int col)
{
return std::cout << CSI << row << ';' << col << 'H';
}
int main()
{
std::cout << "This is " << BLUE << "blue" << RESET << " and white.\n";
curpos(2,2);
std::cout << RED << "red" << RESET << '\n';
}
As mentioned that's not a matter of any C/C++ standard operations provided with stdout or cout (besides writing the necessary control characters to the screen).
Controlling the screen cursor of an ASCII terminal totally depends on implementation of the particular terminal program used, and besides a very narrow set of control characters, there's no standard established.
There are libraries like ncurses for a broader variety of linux terminal implementations, or PDcurses for a windows CMD shell.
I'm not sure to understand you completely but with creating an array of 100 elements of type char you can modify any position of the array and loop it with a std:cout to mostrate it on the console.
Perhaps could be better define the array of 50 chars to resuce the size of the printed result.
For example, if you have to print a progessbar in the 1% process, you should print:
Char progressbar[100] = {'X','','','','','','','','',........}

Windows console does not display square root sign

For a console application, I need to display the symbol: √
When I try to simply output it using:
std::cout << '√' << std::endl; or
std::wcout << '√' << std::endl;,
it outputs the number 14846106 instead.
I've tried searching for an answer and have found several recommendations of the following:
std::cout << "\xFB" << std::endl; and
std::cout << (unsigned char)251 << std::endl;
which both display a superscript 1.
This is using the Windows console with Lucida font. I've tried this with various character pages and always get the same superscript 1. When I try to find its value through getchar() or cin, the symbol is converted into the capital letter V. I am, however, sure that it can display this character simply by pasting it in. Is there an easy way of displaying Unicode characters?
Actually "\xFB" or (unsigned char)251 are the same and do correspond to the root symbol √... but not in the Lucida font and other typefaces ASCII table , where it is an ¹ (superscript 1).
Switching to Unicode with the STL is a possibility, but I doubt it will run on Windows...
#include <iostream>
#include <locale.h>
int main() {
std::locale::global(std::locale("en_US.UTF8"));
std::wcout.imbue(std::locale());
wchar_t root = L'√';
std::wcout << root << std::endl;
return 0;
}
Since this will not satisfy you, here a portable Unicode library: http://site.icu-project.org/

Strange newline issue after DLL call C++ Windows

The Problem
I'm developing an 32 bit unmanaged application in C++ on Windows using Visual Studio 2010. Forgive my lack of Windows knowledge as I usually develop on *nix.
Initially, in my program my calls to std::cout's stream insertion operator work fine. For example, the following statement outputs as expected:
std::cout << "hello" << std::endl;
However, the following code does not work:
std::cout << "\thello" << std::endl;
...call to DLL from Japanese company who won't respond to support requests...
std::cout << "\thello" << std::endl;
The above code prints:
hello
(inverted diamond symbol)hello(eighth note music symbol)(inverted o symbol)
Once I have called this DLL for the first time my output to std::cout is forever messed up. The symbols that are printed are not found in an ASCII table. The inverted o symbol is a single unicode char that looks like the letter 'o' but the black part of the o is white, and the white part is black(inverted colors). The music symbol is the unicode 8th note character.
Any ideas on why this is happening and how to fix it? It seems that this DLL is messing up how control characters (chars starting with \) are outputted.
What I have tried so far
I thought this might be a locale issue since the DLL is from a Japanese company. However, after the DLL call the locale is still "C" just as it was before the call. I use the following to query the locale:
printf ("Locale is: %s\n", setlocale(LC_ALL,NULL) );
I also thought this might be some kind of bizarre memory corruption but it seems that the \r\n gets replaced by (music symbol)(inverted o) whereas \t gets replaced by an inverted diamond symbol. There seems to be a regular "replace A by B" pattern for all the control chars, which would not indicate memory corruption.
Lastly, I also tried this:
std::cout << "blah" << '\r' << '\n';
and I see the same garbage characters created by:
std::cout << "blah" << std::endl;
Thanks in advance for any help and insight.
See whether this fixes it:
#include <iostream>
#include <locale>
int main()
{
std::cout << "\thello" << std::endl;
// ...call to DLL from Japanese company who won't respond to support requests...
locale mylocale(""); // or "C" // Construct locale object with the user's default preferences
std::cout.imbue( mylocale ); // Imbue that locale
std::cout << "\thello" << std::endl;
return 0;
}
Consult the documentation for that library whether
the change of locale is by design
it can be configured otherwise
You could perhaps associate another stream with cout
std::ostream cout2;
cout2.rdbuf(std::cout.rdbuf());
And use it. I'm sure that won't be thread safe. Flushing might be 'awkward' - but it should work