Change Console Code Page in Windows C++ - c++

I'm trying to output UTF8 characters in the Windows command line. I can't seem to get the function, setConsoleOutputCP to work. I also heard that you had to change the font to "Lucida Grande" for it to work but I can't get that working either. Can someone please provide me with a short example of how to use these functions to correctly output UTF-8 characters to the console?
Also I heard that those functions don't work in Windows XP, is there a better alternative to those functions which will work in Windows XP?

[I know this question is old and was about Windows XP, but it still seemed like a good place to drop this information so I (and maybe others) can find it again in the future.]
Support for Unicode in CMD windows has improved in newer versions of Windows. This program will work on Windows 10.
#include <iostream>
#include <Windows.h>
class UTF8CodePage {
public:
UTF8CodePage() : m_old_code_page(::GetConsoleOutputCP()) {
::SetConsoleOutputCP(CP_UTF8);
}
~UTF8CodePage() { ::SetConsoleOutputCP(m_old_code_page); }
private:
UINT m_old_code_page;
};
int main() {
UTF8CodePage use_utf8;
const char *text = u8"This text is in UTF-8. ¡Olé! 佻\n";
std::cout << text;
return 0;
}
I made an RAII class to ensure the code page is restored because it would be rude to leave the code page changed if the user had purposely selected a specific one. All the Windows-specific code (SetConsoleOutputCP) is contained within that class. The definition of the use_utf8 variable in main changes the code page to UTF-8, and that code page will stay in effect until the variable is destructed at the end of the scope.
Note that I used the u8 prefix on the string literal, which is a newer feature of C++ to ensure that the string is encoded using UTF-8 regardless of the encoding used for the source file. You don't have to use that feature if you have another way to make a string of valid UTF-8 text.
You still have to be sure that the CMD window is using a font that supports the glyphs you need. I don't think there's a way to get font linking automatically.
But this will at least show a the replacement character if the font is missing the glyph. For example, on my window, the ¡Olé! looks right but the CJK glyph is shown approximately like �. If the user copies that replacement character, the clipboard will receive the original glyph, so they can paste it into other programs without any loss of fidelity.
Note that command line parameters you get from main's argv will be in the original code page. One way to work around this is to get the unconverted "wide" command line with GetCommandLineW, convert it to UTF-8 with WideToMultibyte, and then parse it yourself. Alternatively, you can pass the result of GetCommandLineW to CommandLineToArgvW, which will parse it, and then you'd convert each argument to UTF-8.
Finally, note that changing the code page affects only the output. If you input text from the user, it arrives encoded using the original code page (often called the OEM code page).
TODO: Figure out input. SetConsoleCP isn't doing what I think the documentation says it should do.

Windows console doesn't play nice with UNICODE and particularly with UTF-8.
Setting a console code page to utf-8 won't work.
One approach is to use WideCharToMultiByte() (or something else) to convert the text to UTF-16, then MultiByteToWideChar() (or something else) to convert to a localised ISO encoding. The set the console code page to the ISO code page.
Its ugly, but it sort of works.

In short: SetConsoleOutputCP CP_UTF8 and cout/wcout dont work together by default.
Though windows CRT supports utf-8 output, a robust way to output to console utf-8 chars is to convert them into a console current codepage, especially if you want to use count/wcout.
Standard high level functions of basic_ostream does not work properly with utf-8 by default.
I've seen usage of MultiByteToWideChar and WideCharToMultiByte with CP_OEMCP and CP_UTF8 parameters.
You may setup your application environment, including console font via SetCurrentConsoleFontEx but it works only from Vista and Server 2008.
Also, check this about cout and console.
_setmode and wprintf works together as well, but this may lead to crash for non-wide char functions.

The problem occurs because there is a difference of codepage that uses windows in your console with the encoding of your source code text file.
Qt uses utf-8 by default, but another editor can use another one. So you must to verify which one you're using.
To change to utf-8 use:
#include <windows.h>
SetConsoleOutputCP(CP_UTF8);

Related

How to make ACS variables display on terminal

Is there any way to force displaying ACS variables from ncurses in terminal?
On urxvt and text-mode everything displays well, but on other terminals (i tested on xfce4-terminal, xterm, gnome-terminal)there is always is problem. I tought I can do nothing with this, but I saw that in alsamixer everything displays properly. I loop up for this in alsamixer code and saw they are using exacly same method to display this characters, for examle
addch(ACS_RARROW);
is giving them this result while same command gives me this on same terminal.
On a terminal where your locale says to use UTF-8 (you can see this by the naming convention of values shown by the locale command), you must do this:
compile/link with ncursesw
initialize the locale before initscr, e.g.,
setlocale(LC_ALL, "");
See the Initialization section of the ncurses manual, as well as the Line Graphics section of the addch manual page.

making sure program is in a terminal

I was trying to add colors to some strings that have to be displayed in a terminal using ansi escape code. So far I haven't grasped the whole ascii escapes code thing, just trying out by copy pasting some escape codes. Then saw this answer which asked to verify that program should check that its being executed in a terminal or else continue without polluting strings with escape codes?
Answer explains to use a *nix based function isatty() which I found out resides in unistd.h which in turn wasn't promoted to cunistd by cpp standard based on my understanding that it wasn't in c's standard at first place.I tried to search SO again but wasn't able to understand well. Now I have two questions regarding this :
In what environment(right word?) can a program - using ascii escape codes, be executed that it requires an initial check? since I'm bulding for cli only.
What would be a proper solution according to ISO cpp standards for handling this issue? using unistd.h? would this use confine to modern cpp practices?
Also is there anything I should read/understand before dealing with ansi/colors related thing?
On a POSIX system (like Linux or OSX) the isatty function is indeed the correct function to determine if you're outputting to a terminal or not.
Use it as this
if (isatty(STDOUT_FILENO))
{
// Output using VT100 control codes
}
else
{
// Output is not a TTY, could be a pipe or redirected to a file
// Use normal output without control codes
}

Windows version of wcswidth_l

I have some text to write to the Windows console that I need to know the real width of in columns. wcswidth_l seems to be the best option on platforms that have it (though mbswidth_l() would be better since I have no desire to use wchar_t, but for some reason it doesn't exist). But in addition to other platforms, I need something that works on Windows. Although it's unlikely that there's a portable solution, I don't know of any solution at all on Windows. I think the console has an API for getting cursor position and such, so I could write the text out and check the change in position. That would be accurate I guess, but writing out extra output isn't acceptable at all.
How does one go about getting the column width of a string or character on Windows?
Edit:
wcswidth_l returns the number of console columns used to display a string. Some characters take up one column and others, e.g. japanese characters, take up two.
As an example the 'column width' of "a あ" is four. 'a' is one, ' ' is one, and 'あ' is two. (Assuming the console is set up to actually display non-ascii characters that is). Also it'd be nice if the API supports strings using codepage 65001 (UTF-8).
First of all, the Windows Console API is located here.
Secondly, is the function you're looking for GetConsoleFontSize?
I'll try to quickly type an example in a second.
EDIT: Here you go. Forgive me if it there's a small error. I actually found it was even easier. GetCurrentConsoleFont fills in a COORD structure on the way to you getting the index to pass to GetConsoleFontSize, so step saved :)
#define _WIN32_WINNT 0x0501 //XP, 0x0601=windows 7
#include <windows.h>
int main()
{
HANDLE hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE);
CONSOLE_FONT_INFO cfi;
GetCurrentConsoleFont (hStdOutput, FALSE, &cfi);
//cfi.dwFontSize.X == x size
//cfi.dwFontSize.Y == y size
}
EDIT:
If you don't mind invisible output, you can use CreateConsoleScreenBuffer to pretty much have an invisible console window at your command while leaving yours unaffected. GetConsoleScreenBufferInfoEx will tell you the cursor position, at which point you can use WriteConsole to write to your buffer (invisibly), and check the cursor location again versus the number of characters actually written. Note that checking the cursor location beforehand would not require clearing the screen to use this method.
If you cannot afford to do extra output, visible or invisible, I'm not sure there really is a possibility.
Portable approach
Since width of characters depends more on characters themselves rather than the system on which they are displayed (ok, there might be excepetions, but they should be rather rare), one can use separate function to do that (on Windows too). This requires Unicode characters as it makes it much easier to analyze width of strings, but one for sure can write a wrapper to convert between encodings.
Available implementation
Here is suitable and portable implementation, which one can plug in into his application and fallback to use it on Windows.

Unexpected output of std::wcout << L"élève"; in Windows Shell

While testing some functions to convert strings between wchar_t and utf8 I met the following weird result with Visual C++ express 2008
std::wcout << L"élève" << std::endl;
prints out "ÚlÞve:" which is obviously not what is expected.
This is obviously a bug. How can that be ? How am I suppose to deal with such "feature" ?
The C++ compiler does not support Unicode in code files. You have to replace those characters with their escaped versions instead.
Try this:
std::wcout << L"\x00E9l\x00E8ve" << std::endl;
Also, your console must support Unicode as well.
UPDATE:
It's not going to produce the desired output in your console, because the console does not support Unicode.
I found these related questions with useful answers
Is there a Windows command shell that will display Unicode characters?
How can I embed unicode string constants in a source file?
You might also want to take a look at this question. It shows how you can actually hard-code unicode characters into files using some compilers (I'm not sure what the options would be got MSVC).
This is obviously a bug. How can that be?
While other operating systems have dispensed with legacy character encodings and switched to UTF-8, Windows uses two legacy encodings: An "OEM" code page (used at the command prompt) and an "ANSI" code page (used by the GUI).
Your C++ source file is in ANSI code page 1252 (or possibly 1254, 1256, or 1258), but your console is interpreting it as OEM code page 850.
You IDE and the compiler use the ANSI code page.
The console uses the OEM code page.
It also matter what are you doing with those conversion functions.

Multi-byte character set in MFC application

I have a MFC application in which I want to add internationalization support. The project is configured to use the "multi-byte character set" (the "unicode character set" is not an option in my situation).
Now, I would expect the CWnd::OnChar() function to send me multi-byte characters if I set my keyboard to some foreign language, but it doesn't seem to work that way. The OnChar() function always sends me a 1-byte character in its nChar variable.
I thought that the _getmbcp() function would give me the current code page for the application, but this function always return 0.
Any advice would be appreciated.
And help here? Multibyte Functions in Microsoft C Run-time
As far as changing the default code page:
The default code page for a user (for WinXP - not sure how it is on Vista) is set in the "Regional and Languages options" Control Panel applet on the "Advanced" tab.
The "Language for non-Unicode programs" sets the default code page for the current user. Unfortunately it does not actually tell you the codepage number it's configuring - it just gives the language (which might be further specified with a region variant). This meakes sense from an end-user perspective, because I think codepage numbers have no meaning to 99.999% of end users. You need to reboot for a change to take effect. If you use regmon to determine what changes you could probably come up with something that specifies the default codepage somewhat easier.
Microsoft also has an unsupported utility, AppLocale, for testing localization that changes the codepage for particular applications: http://www.microsoft.com/globaldev/tools/apploc.mspx
Also you can change the code page for a thread by calling SetThreadLocale() - but you also have to call the C runtime's setlocale() function because some CRT functions don't talk to the Win API locale functions (and vice versa). See "Windows SetThreadLocale and CRT setlocale" by Chris Grimes for details.
As always in non Unicode scenarios, you'll get a reliable result only if the system locale (aka in Control Panel "language for non-unicode applications") is set accordingly. If not, don't expect anything good.
For example, if system locale is Chinese Traditional, you'll receive 2 successive WM_CHAR messages (One for each byte, assuming user composed a 2-char character).
isleadbyte() should help you determine if a 2nd byte is coming soon.
If your system locale is NOT set to Chinese, don't expect to receive correct messages even iusing a Chinese keyboard/IME. The misleading part is that some scenarios work. e.g. using a Greek keyboard, you'll receive WM_CHAR char values based of the Greek codepage even if your system locale is Latin-based. But you should really stay away from trying to cope with such scenarios: Success is not guaranteed and will likely vary according to Windows version and locale.
As MikeB wrote, MS AppLocale is your friend to make basic tests.
[ad] and appTranslator is your friend if you need to translate your UI [/ad]
For _getmbcp, MSDN says "A return value of 0 indicates that a single byte code page is in use." That sure makes it not very useful. Try one of these: GetUserDefaultLCID GetSystemDefaultLCID GetACP. (Now why is there no "user" equivalent for GetACP?)
Anyway if you want _getmbcp to return an actual value then set your system default language to Chinese, Japanese, or Korean.
There is actually a very simple (but weird) way of force the OnChar function to send unicode characters to the application even if it's configured in multi-byte character set:
SetWindowLongW( m_hWnd, GWL_WNDPROC, GetWindowLong( m_hWnd, GWL_WNDPROC ) );
Simply by calling the unicode version of "SetWindowLong", it forces the application to receive unicode characters.