Printing Copyright Symbol in Visual Studio 2010 - c++

I have been struggling to print Copyright symbol in Windows using Visual Studio. I understand that 0xA9 is the ASCII code for Copyright symbol and it works on non-windows platform. But on Windows I can't print the Copyright symbol using the same code.
#include "iostream.h"
using namespace std;
int main(int argc, char * argv[])
{
cout << (char)0xA9 << " Copyright symbol" << endl;
return 0;
}
Output on Linux/HP-UX and AIX: © Copyright symbol
Output on Windows: ⌐ Copyright symbol
I am new to Windows, can someone help me out.

As Basile points out, the copyright symbol (©) is not an ASCII character. In other words, it is not one of the characters included in the 7-bit ASCII character set.
You need to switch to a Unicode encoding in order to use "special" characters like this that extend beyond the range of 7-bit ASCII. That's not difficult in Windows, it just requires that you use wide characters (wchar_t) instead of narrow characters (char). Unlike most Unix-based systems that implement Unicode support using UTF-8 (which uses the regular char data type), Windows does not have built-in support for UTF-8. It uses UTF-16 instead, which requires that you use the larger wchar_t type.
Conveniently, the C++ standard library also supports wide character strings, you just need to use the appropriate versions of the classes. The ones you want will have a w appended to the front of their names. So, rewriting your code to use wide (Unicode) characters on Windows, it would look like this:
#include <iostream> // (standard C++ headers should be in angle brackets)
int main(int argc, char * argv[])
{
std::wcout << (wchar_t)0xA9 << " Copyright symbol" << std::endl;
return 0;
}
The reason you're getting that strange ⌐ character when you try the original code on Windows is that that character is what the value 0xA9 maps to in your default Windows character set. You see, the char type supports 8-bit values, but I said above that the ASCII character set only defines 7 bits worth of characters. That extra bit is used on Windows to define some additional useful characters.
There are two different sets of extended narrow (non-Unicode) character sets, one is called the OEM character set and the other is (for historical reasons) called the ANSI character set. Generally, the Command Prompt uses the OEM character set, which fills most of the upper range with characters for drawing lines, boxes, and other simulated graphics in a text-based environment. Legacy, non-Unicode Windows applications generally use the ANSI character set, which is specific to your localized version of Windows and fills the upper range with characters needed to display all of the letters/symbols in your language.
If it sounds complicated, that's because it is. That's why everyone has forgotten all of this stuff and uses exclusively Unicode on Windows. I strongly recommend that path to you as well. :-)
Edit: Nuts, I forgot it was more complicated than this. Changing your code to output wide characters may not be sufficient. The Windows Command Prompt is broken backwards-compatible in all sorts of ways, severely hobbling its support for Unicode characters.
By default, it uses raster fonts which probably don't even provide symbols for most of the Unicode characters (the copyright symbol is likely common enough to be an exception). You need to change the font used by the Command Prompt to something else like Lucida Console or Consolas to ensure that it works correctly. Fortunately, you can set the defaults for all Command Prompt windows. Unfortunately, this is a per-user setting.
Additionally, the Command Prompt still uses the active code page, so all of that stuff I was explaining above is still relevant and you can't forget about it. You can change the particular code page that it uses with the chcp xxxx command, where xxxx is the number of the code page you wish to use. Unfortunately, this applies only to the current console session and must be reset each time. Not a good solution for an application program that needs to output Unicode characters.
More information on these problems and how to output Unicode strings on the Command Prompt is available in the answers to these questions:
What encoding/code page is cmd.exe using?
Can command prompt display unicode characters?

Notice that 0xa9 is not ASCII (which had 7 bits characters, limited to 0 - 0x7f range). It could be ISO/IEC 8859-1. Many current systems (including most Linux terminals today) use UTF-8 these days, in which the copyright glyph is encoded by two bytes, so you would code "\302\251" or "\xc2\xa9" in your C or C++ source. So your program don't display a copyright sign in my Linux xfce4-terminal which uses UTF-8.
Some Windows machines had different encoding systems.
I would setup your system (be it Linux or Windows) to use UTF8 character encoding, if possible, on its terminal (or use UTF16 wide chars). Read about UTF-8 everywhere.
An ASCII conventional evocation of copyright is very commonly (C) precisely because the ASCII encoding does not have any copyright glyph.

Taken and adapted from here:
#if defined(WIN32)
#include <windows.h>
#endif
#include <stdio.h>
void print_copyright_hint() {
printf("Copyright ");
#if defined(WIN32)
auto copyright = const_cast<wchar_t *>(L"©");
auto handle = GetStdHandle(STD_OUTPUT_HANDLE);
WriteConsoleW(handle, copyright, static_cast<DWORD>(wcslen(copyright)), nullptr, nullptr);
#else
printf("©");
#endif
printf(" my Company");
}

You can use alt+0169.
Forgive me if i am wrong.

Related

C++ output Unicode in variable

I'm trying to output a string containing unicode characters, which is received with a curl call. Therefore, I'm looking for something similar to u8 and L options for literal strings, but than applicable for variables. E.g.:
const char *s = u8"\u0444";
However, since I have a string containing unicode characters, such as:
mit freundlichen Grüßen
When I want to print this string with:
cout << UnicodeString << endl;
it outputs:
mit freundlichen Gr??en
When I use wcout, it returns me:
mit freundlichen Gren
What am I doing wrong and how can I achieve the correct output. I return the output with RapidJSON, which returns the string as:
mit freundlichen Gr��en
Important to note, the application is a CGI running on Ubuntu, replying on browser requests
If you are on Windows, what I would suggest is using Unicode UTF-16 at the Windows boundary.
It seems to me that on Windows with Visual C++ (at least up to VS2015) std::cout cannot output UTF-8-encoded-text, but std::wcout correctly outputs UTF-16-encoded text.
This compilable code snippet correctly outputs your string containing German characters:
#include <fcntl.h>
#include <io.h>
#include <iostream>
int main()
{
_setmode(_fileno(stdout), _O_U16TEXT);
// ü : U+00FC
// ß : U+00DF
const wchar_t * text = L"mit freundlichen Gr\u00FC\u00DFen";
std::wcout << text << L'\n';
}
Note the use of a UTF-16-encoded wchar_t string.
On a more general note, I would suggest you using the UTF-8 encoding (and for example storing text in std::strings) in your cross-platform C++ portions of code, and convert to UTF-16-encoded text at the Windows boundary.
To convert between UTF-8 and UTF-16 you can use Windows APIs like MultiByteToWideChar and WideCharToMultiByte. These are C APIs, that can be safely and conveniently wrapped in C++ code (more details can be found in this MSDN article, and you can find compilable C++ code here on GitHub).
On my system the following produces the correct output. Try it on your system. I am confident that it will produce similar results.
#include <string>
#include <iostream>
using namespace std;
int main()
{
string s="mit freundlichen Grüßen";
cout << s << endl;
return 0;
}
If it is ok, then this points to the web transfer not being 8-bit clean.
Mike.
containing unicode characters
You forgot to specify which unicode encoding does the string contain. There is the "narrow" UTF-8, which can be stored in a std::string and printed using std::cout, as well as wider variants, which can't. It is crucial to know which encoding you're dealing with. For the remainder of my answer, I'm going to assume you want to use UTF-8.
When I want to print this string with:
cout << UnicodeString << endl;
EDIT:
Important to note, the application is a CGI running on Ubuntu, replying on browser requests
The concerns here are slightly different from printing onto a terminal.
You need to set the Content-Type response header appropriately or else the client cannot know how to interpret the response. For example Content-Type: application/json; charset=utf-8.
You still need to make sure that the source string is in fact the correct encoding corresponding to the header. See the old answer below for overview.
The browser has to support the encoding. Most modern browsers have had support for UTF-8 a long time now.
Answer regarding printing to terminal:
Assuming that
UnicodeString indeed contains an UTF-8 encoded string
and that the terminal uses UTF-8 encoding
and the font that the terminal uses has the graphemes that you use
the above should work.
it outputs:
mit freundlichen Gr??en
Then it appears that at least one of the above assumptions don't hold.
Whether 1. is true, you can verify by inspecting the numeric value of each code unit separately and comparing it to what you would expect of UTF-8. If 1. isn't true, then you need to figure out what encoding does the string actually use, and either convert the encoding, or configure the terminal to use that encoding.
The terminal typically, but not necessarily, uses the system native encoding. The first step of figuring out what encoding your terminal / system uses is to figure out what terminal / system you are using in the first place. The details are probably in a manual.
If the terminal doesn't use UTF-8, then you need to convert the UFT-8 string within your program into the character encoding that the terminal does use - unless that encoding doesn't have the graphemes that you want to print. Unfortunately, the standard library doesn't provide arbitrary character encoding conversion support (there is some support for converting between narrow and wide unicode, but even that support is deprecated). You can find the unicode standard here, although I would like to point out that using an existing conversion implementation can save a lot of work.
In the case the character encoding of the terminal doesn't have the needed grapehemes - or if you don't want to implement encoding conversion - is to re-configure the terminal to use UTF-8. If the terminal / system can be configured to use UTF-8, there should be details in the manual.
You should be able to test if the font itself has the required graphemes simply by typing the characters into the terminal and see if they show as they should - although, this test will also fail if the terminal encoding does not have the graphemes, so check that first. Manual of your terminal should explain how to change the font, should it be necessary. That said, I would expect üß to exist in most fonts.

c++ Lithuanian language, how to get more than ascii

I am trying to use Lithuanian in my c++ application, but every try is unsuccesfull.
Multi-byte character set is used. I have tryed everything i have tought of, i am new in c++. Never ever tryed to do something in Lithuanian.
Tryed every setlocale(LC_ALL, "en_US.utf8"); setlocale(LC_ALL, "Lithuanian");...
Researched for 2 hours and didnt found proper examples, solution.
I do have a average sized project which needs Lithuanian translation from database and it cant understand most of "ĄČĘĖĮŠŲŪąčęėįšųū".
Compiler - "Visual studio 2013"
Database - sqlite3.
I cant get simple strings to work(defined myself), and output as Lithuanian to win32 application, even.
In Windows use wide character strings (1UTF-16 encoding, wchar_t type) for internal text handling, and preferably UTF-8 for external text files and networking.
Note that Visual C++ will translate narrow text literals from the source encoding to Windows ANSI, which is a platform-dependent usually single-byte encoding (you can check which one via the GetACP API function), i.e., Visual C++ has the platform-specific Windows ANSI as its narrow C++ execution character set.
But also do note that for an app restricted to non-Windows platforms, i.e. Unix-land, it makes practical sense to do everything in UTF-8, based on char type.
For the database communication you may need to translate to and from the program's internal text representation.
This depends on what the database interface requires, which is not stated.
Example for console output in Windows:
#include <iostream>
#include <fcntl.h>
#include <io.h>
auto main() -> int
{
_setmode( _fileno( stdout ), _O_WTEXT );
using namespace std;
wcout << L"ĄČĘĖĮŠŲŪąčęėįšųū" << endl;
}
To make this compile by default with g++, the source code encoding needs to be UTF-8. Then, to make it produce correct results with Visual C++ the source code encoding needs to be UTF-8 with BOM, which happily is also accepted by modern versions of g++. For otherwise the Visual C++ compiler will assume the Windows ANSI encoding and produce an incorrect UTF-16 string.
Not coincidentally this is the default meaning of UTF-8 in Windows, e.g. in the Notepad editor, namely UTF-8 with BOM.
But note that while in Windows the problem is that the main system compiler requires a BOM for UTF-8, in Unix-land the problem is the opposite, that many old tools can't handle the BOM (for example, even MinGW g++ 4.9.1 isn't yet entirely up to speed: it sometimes includes the BOM bytes, then incorrectly interpreted, in error messages).
1) On other platforms wide character text can be encoded in other ways, e.g. with UTF-32. In fact the Windows convention is in direct conflict with the C and C++ standards which require that a single wchar_t should be able to encode any character in the extended character set. However, this requirement was, AFAIK, imposed after Windows adopted UTF-16, so the fault probably lies with the politics of the C and C++ standardization process, not yet another Microsoft'ism.
Complexity of internationalisation
There are several related but distinct topics that can cause mismatches between them, making try and error approach very tedious:
type used for storing strings and chars: windows iuses wchar_t by default, but for most APIs you have also char equivalents functions
character set encoding this defines how the chars stored in the type are to be understood. For exemple unicode (UTF8, UTF16, UTF32), 7 bits ascii, 8 bit ansii. In windows, by default it is UTF16 for wchar_t and ansi/windows for char
locale defines, among other things, the character set asumptions, when processing strings. This permit to use language independent functions like isalpha(i, loc), islower(i, loc), ispunct(i, loc) to find out if a given character is alphanumeric, a lower case alphabetic, or a punctuation, for example to bereak down a user text into words. C++ offers here portable functions.
output codepage or font used to show a character to the user. This assumes that the font used shows the characters using the same character set used in the code internals.
source code encoding. For example your editor could assume an ansi encoding, with windows 1252 character set.
Most typical errors
The problem n°1 is Win32 console output, as unicode is not well supported by the console. But this is not your problem here.
Another cause of mismatch is the encoding of your text editor. It might not be unicode, but use a windows code page. In this case, you type "Č", the deditor displays it as such, but editor might use windows 1257 encoding for lithuanian and store 0xC8 in the file. If you then display this literal with a windows unicode function, it will interpret 0xC8 as "latin E grave accent" and print something else, as the right unicode encoding for "Č" is 0x010C !
I can be even worse: the compiler may have its own assumption about character set encoding used and convert your litterals into unicode using false assumptions (it happened to me when I used some exotic code generation switch).
How to do ?
To figure out what goes wront, proceed by elimination:
First, for plain windows, use the native unicode setting. Ok it's UTF16 and wchar_t instead of UTF8 and as thus comes with some drawbacks, but it's native and well supported.
Then use explict unicode coding in litterals, for example TEXT("\u010C") instead of TEXT("Č"). This avoids editor and compiler mismatch.
If it's still not the right character, make sure that your font FULLY supports unicode. The default system font for instance doesn't while most other do. You can easily check with the windows font pannel (WindowKey+R fonts then click on "search char") to display the character table of your font.
Set fonts explicitely in your code
For example, a very tiny experiment :
...
case WM_PAINT:
{
hdc = BeginPaint(hWnd, &ps);
auto hf = CreateFont(24, 0, 0, 0, 0, TRUE, 0, 0, 0, 0, 0, 0, 0, L"Times New Roman");
auto hfOld = SelectObject(hdc, hf); // if you comment this out, € and Č won't display
TextOut(hdc, 50, 50, L"Test with éç € \u010C special chars", 30);
SelectObject(hdc, hfOld);
DeleteObject(hf);
EndPaint(hWnd, &ps);
break;
}

Swedish characters don't compare correctly

For some reason If/else statements isn't working correctly for me in C++
The problem is that when a variabel is equal to the right (höger), it won't output the If statement, instead it will go on to the else statement. If I replace the letter 'ö' with say 'o' so it becomes 'hoger' instead, then the if statement will work. So whenever I write the word 'höger' it won't go to the if statement, instead it will go to the else statement. However if I make the variabel equal to 'hoger', and then I write 'hoger', it will work. How can I make it possible writing 'höger' were the If statement recognizes it instead? It's as if Swedish letters don't work.
My code look like this:
#include <iostream>
#include <string>
using namespace std;
int main() {
setlocale(LC_ALL,"");
string test; // Define variabel
cout << " Höger elle vänster"<<endl; // Right or left
cin >> test;
if(test == "höger") { // If right, then output this.
cout <<"Du valde höger"<<endl;
}
else if(test == "vänster") { // If left, then output this
cout <<"Du valde vänster"<<endl;
} else {
// Do this
}
}
The problem is almost certainly to do with encodings.
The C/C++ language specs do not automatically handle anything other than 7 bit ASCII. The o-umlaut character is outside that range, and the exact behaviour depends on the encoding of your source code file.
The most likely possibilities are ISO 8859-1, Windows ANSI-1252, UTF-8 or Windows OEM 850. The first two encode this character the same, but in each of the others it is different.
With a bit more information about the encoding and tool set you are using it may be possible to provide more specific diagnosis and advice.
[And by the way, if/else statements in C/C++ work just fine, thank you.]
If we assume for the moment that this is Windows and Visual C++, then this is what you're dealing with.
Source code written inside Visual Studio: code page 1252. Code point for the o-umlaut character is 0xf6.
Keyboard input read from the console: code page 850. Code point for the o-umlaut character is 0x94.
Obviously not a good match. However, Visual Studio can also quite happily edit source code files in many encodings including UTF-8 (with byte mark), UTF-16 (wide characters) and code page 850. So:
Source code written inside Visual Studio: code page 850. Code point for the o-umlaut character is 0x94. Now it works.
You can also change the code page for your console using the CHCP command.
Change Console to CHCP 1252 and it works.
The behaviour of the compiler when reading source code is obliged by the standard to be consistent with the execution character set. See n3797 S2.2.5:
Each source character set member in a character literal or a string literal, as well as each escape
sequence and universal-character-name in a character literal or a non-raw string literal, is converted to the corresponding member of the execution character set
S2.3/3:
The basic execution character set and the basic execution wide-character set shall each contain all the members of the basic source character set, plus control characters representing alert, backspace, and carriage return, plus a null character (respectively, null wide character), whose representation has all zero bits. For each basic execution character set, the values of the members shall be non-negative and distinct from one another. In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous. The execution character set and the execution wide-character set are implementation-defined supersets of the basic execution character
set and the basic execution wide-character set, respectively. The values of the members of the execution character sets and the sets of additional members are locale-specific.
n3797 S2.14.3/1:
A character literal that does not begin with u, U, or L is an ordinary character literal, also referred to as a narrow-character literal. An ordinary character literal that contains a single c-char representable in the execution character set has type char, with value equal to the numerical value of the encoding of the c-char in the execution character set.
n3297 S2.14.5/6:
a string literal that does not begin with an encoding-prefix is an ordinary string
literal, and is initialized with the given characters.
The execution character set is implementation-defined. Microsoft's statement reqarding implementation-defined behaviour for the C compiler is here: http://msdn.microsoft.com/en-us/library/hx3yt8af.aspx. [I can't find a separate one for C++, so I assume this applies to both.]
The source character set is the set of legal characters that can appear in source files. For Microsoft C, the source character set is the standard ASCII character set.
Sorry about the language-lawyer stuff, but what this says is that the MSVC compiler is independent of locale/encoding and implements 8-bit ASCII, code page unspecified. Obviously the standard library functions may need to know the encoding for various purposes, but that is a whole other story.
As a final point, the Microsoft C compiler dates back around 30 years, since before Windows. It has always been possible to write source code in code page 850 and have it run correctly on the console, subject to careful handling of extended (8-bit) characters. Many people still do. The problem here source code written in Windows-Ansi or Unicode and keyboard input from a OEM (cp850) console. Change either one to get it to work correctly.
In practice this problem will only manifest itself in Windows, so I'll assume Windows.
Then the problem is that the C++ narrow extended execution character set(1) (encoding) does not match the encoding used by the console window. "Narrow" refers to the char type. "Excecution character set" is a formal term employed by the C++ standard, and refers to the encoding that is assumed for text stored in the executable. The compiler translates source code literals to this encoding. It's also assumed for translation to/from any external encoding, such as translation to/from a console's encoding.
With Visual C++ the narrow encoding is always Windows ANSI(2), regardless of source code encoding, unless you trick the compiler. And assuming you're using Visual C++, this is then one encoding that you know.
The encoding in the console window is by default the one used for original IBM PC, in your case probably codepage 850 (a Western European variant of the original IBM PC English codepage 437). Run the Windows command interpreter cmd (Windows-key+R, type cmd, OK). Type chcp to check the current codepage. Type chcp 1252 to switch to Windows ANSI Western, which presumably is the Windows ANSI codepage on your machine. Run your program [.exe] file, e.g. by typing its full path, or by going to its directory and typing just its name, e.g.
[H:\dev\test\0046]
> cl /nologo /EHsc /GR encoding.cpp /Fe:b.exe
encoding.cpp
[H:\dev\test\0046]
> chcp & b
Active code page: 850
Höger elle vänster
höger
← No output here, didn't compare as equal.
[H:\dev\test\0046]
> chcp 1252
Active code page: 1252
[H:\dev\test\0046]
> b
Höger elle vänster
höger
Du valde höger
[H:\dev\test\0046]
> _
… where cl (short for original “Lattice C”) is the Visual C++ compiler.
You can change the console codepage more permanently by running regedit, going to this registry key:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage
and in the list in the right pane double-click the value named OEMCP (short for Original Equipment Manufacturer Code Page, referring to the IBM PC), change it to 1252, or more generally to the same value as the ACP value, and reboot the machine.
Oh, it's also necessary to change the console window font to a TrueType font such as Lucida Console, because the default is (an emulation of) a bitmapped font that only works correctly with the original console codepage. You can right click the console window title to get a menu, choose [Defaults], and configure the default font, size, colors etc. The changes won't affect the current console window, but they will apply to future console windows, except for those that have been configured individually(3).
An alternative to such console window configuration is to use the Console2 program. If you do, then in Windows 7 and later be sure to use the 64-bit version. Otherwise some things, such as invoking links to 64-bit programs, won't work.
Summing up, you can either
run the program from the command interpreter (using chcp to change the codepage), or
change the console codepage more permanently, as discussed above.
In either case it's a Good Idea™ to change the console window font to a TrueType font – and yes, this affects the functionality, not just the looks.
Note on additional Microsoft absurdity: in Windows 7 and later the "System" font used by default in console windows is actually, behind the scenes, a TrueType font with umpteen thousand glyphs, but it's used to emulate the old 16-bit Windows bitmapped fonts, with the same silly restrictions, so that you still have to change to some other TrueType font…
(1) See the C++11 standard §2.3/3.
(2) “Windows ANSI” depends on the Windows configuration and is always the codepage specified by the GetACP API function. In practice this function gets its value from the registry key/value referenced above. However, that's largely undocumented.
(3) In Windows XP Windows would ask if you wanted to save an individual console window configuration. Starting with Windows Vista it's saved with no question asked and no information that it's been saved. There is no user interface for removing such saved configurations, but they can be removed by programmatically altering shortcut files, and/or by registry editing, which however is both an impractical and brittle solution.
The only change I made to your code was the following:
// setlocale(LC_ALL, "");
char *l = setlocale(LC_ALL, NULL);
cout << "Current Locale: " << l << endl;
Because I don't have a “ISO” keyboard layout, I used the Alt code to type the character I need. The following the key combination I used for the different code pages.
First run I had to type in Alt+246 for Code page 437
Second run, Alt+148 for Windows-1252
Below is the output when I change code page between execution
It seems the problem is the encoding of your source file when your IDE compiles it. If you are using Visual Studio you can change your encoding setting like this:

printing Unicode characters C++

I'm trying to write a simple command line app to teach myself Japanese, but can't seem to get Unicode characters to print. What am I missing?
#include <iostream>
using namespace std;
int main()
{
wcout << L"こんにちは世界\n";
wcout << L"Hello World\n"
system("pause");
}
In this example only "Press any key to continue" is displayed. Tested on Visual C++ 2013.
This is not easy on Windows. Even when you manage to get the text to the Windows console you still need to configure cmd.exe to be able to display Japanese characters.
#include <iostream>
int main() {
std::cout << "こんにちは世界\n";
}
This works fine on any system where:
The compiler's source and execution encodings include the characters.
The output device (e.g., the console) expects text in the same encoding as the compiler's execution encoding.
A font with the appropriate characters is available (usually not a problem).
Most platforms these days use UTF-8 by default for all these encodings and so can support the entire Unicode range with code similar to the above. Unfortunately Windows is not one of these platforms.
wcout << L"こんにちは世界\n";
In this line the string literal data is (at compile time) converted from the source encoding to the execution wide encoding and then (at run time) wcout uses the locale it is imbued with to convert the wchar_t data to char data for output. Where things go wrong is that the default locale is only required to support characters from the basic source character set, which doesn't even include all ASCII characters, let alone non-ASCII characters.
So the conversion results in an error, putting wcout into a bad state. The error has to be cleared before wcout will function again, which is why the second print statement does not output anything.
You can work around this for a limited range of characters by imbuing wcout with a locale that will successfully convert the characters. Unfortunately the encoding that is needed to support the entire Unicode range this way is UTF-8; Although Microsoft's implementation of streams supports other multibyte encodings it very specifically does not support UTF-8.
For example:
wcout.imbue(std::locale(std::locale::classic(), new std::codecvt_utf8_utf16<wchar_t>()));
SetConsoleOutputCP(CP_UTF8);
wcout << L"こんにちは世界\n";
Here wcout will correctly convert the string to UTF-8, and if the output were written to a file instead of the console then the file would contain the correct UTF-8 data. However the Windows console, even though configured here to accept UTF-8 data, simply will not accept UTF-8 data written in this way.
There are a few options:
Avoid the standard library entirely:
DWORD n;
WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), L"こんにちは世界\n", 8, &n, nullptr);
Use non-standard magical incantation that will break standard code:
#include <fcntl.h>
#include <io.h>
_setmode(_fileno(stdout), _O_U8TEXT);
std::wcout << L"こんにちは世界\n";
After setting this mode std::cout << "Hello, World"; will crash.
Use a low level IO API along with manual conversion:
#include <codecvt>
#include <locale>
SetConsoleOutputCP(CP_UTF8);
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> convert;
std::puts(convert.to_bytes(L"こんにちは世界\n"));
Using any of these methods, cmd.exe will display the correct text to the best of its ability, by which I mean it will display unreadable boxes. Seven little boxes, for the given string.
You can copy the text out of cmd.exe and into notepad.exe or whatever to see the correct glyphs.
There's a whole article about dealing with Unicode in Windows console
http://alfps.wordpress.com/2011/11/22/unicode-part-1-windows-console-io-approaches/
http://alfps.wordpress.com/2011/12/08/unicode-part-2-utf-8-stream-mode/
Basically, you may implement you own streambuf for std::cout (or std::wcout) in terms of WriteConsoleW and enjoy writing UTF-8 (or whatever Unicode you want) to Windows console without depending on locales, console code pages and even without using wide characters.
It may not look very straightforward, but it's convenient and reusable solution, which is also able to give you a portable utf8-everywhere style user code. Please, don't beat me for my English :)
Or you can change Windows locale to Japanese.

C++ spanish question mark

I am beginning developing in C++ and I am developing a simple calculator in console and when my program ask to the user if wants to exit,the character '¿' doesn't appear (The questions in spanish are between '¿' and '?')
Can someone help me?
PD: The problem only happens in Windows,not in Linux
EDIT: Here is the code that output the code:
cout << '¿' <<"Desea salir (S/N)? " ;
There are a few ways to deal with this problem.
The fundamental problem is not that the ¿ doesn't exist in the console, but that the console and your C++ text editor disagree on what that character is. The two are using different character codes for many characters beyond those needed for English. Character codes 32-126 (letters, numbers, punctuation and brackets), are universally the same. However, character codes 128 through 255, which from a Spanish point of view includes all the accented characters, "u with diaeresis" (e.g. "pingüino"), Ñ, and the starting ¿ and ¡, depend on the specific environment.
Why have such an inconvenient disagreement in character codes is a historical accident, interesting on its own but out of the scope of this question. To keep it simple: in the Windows OS, "consoles" (typically) use the list of characters described in OEM Code Page 437, while Windows applications like your C++ editor (typically) use the Windows-1252 Code Page.
There is no portable (universal) solution for this problem, because the issue of differing charsets is a platform-specific problem. Windows is unfortunately somewhat unique in that the editor and (console) outputs use different sets.
The first and simplest solution - which is fine for toy programs - is to just look up the character code that you want from the OEM 437 code-page, and use that. For ¿, that's #168 (0xa8 in hex, or \250 in octal). You can just embed the character code in the string to make clear what you're trying to do, either of these:
std::cout << ""\x0a8""Cu""\x0a0""l es el primer n""\x0a3""mero?\n"; // hex
std::cout << "\250Cu\240l es el primer n\243mero?\n"; // octal
Outputs:
¿Cuál es el primer número?
Note how I had to do the same thing with the ú and the á. Unfortunately, writing strings like this gets unwieldy quickly. using macros or const chars can help, but not much.
A second alternative is to use a Windows function such as CharToOemA. For example1:
#include <windows.h>
...
...
char pregunta[] = "¿Cuál es el primer número\n";
char *pregunta_oem = new char[sizeof(pregunta)/sizeof(char)];
CharToOemA(pregunta, pregunta_oem);
std::cout << pregunta_oem;
delete []pregunta_oem;
For a more complex program, I would wrap that pattern into a utility function or class.
A different approach is to change the Code Page of the console, so that it agrees with your C++ editor and the rest of Windows. You can do that via the CHCP console command, or via the SetConsoleOutputCP() function, but that doesn't work on the default "raster font" used by consoles, so you have to change the font as well. When the font is set to a unicode font like Lucida Console, this works:
std::cout << "¿Cuál es el primer número?\n"; // ┐Cußl es el...
UINT originalCP = GetConsoleOutputCP();
SetConsoleOutputCP(1252);
std::cout << "¿Cuál es el primer número?\n"; // ¿Cuál es el...
SetConsoleOutputCP(originalCP);
(I don't know if you can change the font from the program itself; I have to look that up. The standard way to do it from the console is to click on the tiny icon on the corner, click Properties, Font tab, and pick a font from the list).
1 I have to warn that this snippet contains a number of subtleties that can easily trip a beginner. You have to make sure the source of the text is a char array; if you're using a char pointer, sizeof won't work correctly and you have to use strlen(source)+1. For the source I used the natural option of a char array initialized to a literal, but you can't do that for the destination because the contents of such an array are read/only. If you are using a new'd char array or one that is not initialized to a literal, you can use the same char array for the source and destination. This example feels very C-like.
You can use _setmode function to do that :
#include <iostream>
#include <string>
#if defined(WIN32) && !defined(UNIX)
# include <io.h> // for _setmode()
# include <fcntl.h> // for _O_U16TEXT
#endif // WIN32 && !UNIX
int main()
{
#if defined(WIN32) && !defined(UNIX)
_setmode(_fileno(stdout), _O_U16TEXT);
//^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#endif // WIN32 && !UNIX
std::wstring wstr = L"'¿' and '?'";
std::wcout << L"WString : " << wstr << std::endl;
system("pause");
return 0;
}
To write UNICODE chars (assuming LE is the standard Windows variant of UTF-16...) out with the iostream library, call _setmode() with _O_U16TEXT and then use wcout.
But you can't use cout anymore. It throws an assert.
Check this answer.
Assuming you are using simple call to std::cout, you should be able to print Unicode strings, if you set your command line to Unicode mode:
1. Change code page to UTF-8
You can do this by simply calling the command below in your cmd:
chcp 65001
2. Make sure you are using a font which has the characters you want to display
Lucidia Console should do the trick, as it supports ¿ (and other characters included in WGL4).
this character is simply not included in basic ascii. Try using wstring http://www.cplusplus.com/reference/string/wstring/
As you can see in Ascii table, symbol ¿ have the code 168. You can use in output stream \ddd to print some special character.
This is because the command console does not support non-ASCII characters by default (ASCII has mainly English language characters and few accented characters). To get support for characters in other character classes play around with the chcp command. Refer to it's documentation here.
In your case I think you need to run chcp 850 in the console before running your program.