Printing Unicodes in C++ - c++

How to print Unicode in Loop in C++?
I Tried with this code.It does not work
#include <iostream>
int main()
{
wchar_t chr1=L'\u0985';
while(chr1<=L'\u098C')
std::wcout<<(wchar_t)chr1++<<std::endl;
}

Assuming you use Windows, try going onto command prompt, properties, font, and setting a font which include the Unicode characters you are trying to output.
If non-Windows, try setting a font similarly.

The issue here seems to be that your terminal (or font you set it to) does not support unicode, you should try using a different terminal, or setting your font to something that does support unicode

Related

How to initialize or assign 中文 to wstring?

I tried to use L"string", but it doesn't work.
#include <iostream>
using namespace std;
int main(){
wstring wstr = L"你好";//[Error] converting to execution character set: Illegal byte sequence
wcout<<wstr<<endl;
}
Use wcin and input 中文 works fine.
#include <iostream>
using namespace std;
int main(){
wstring wstr;
wcin>>wstr;//Input Chinese is OK
wcout<<wstr<<endl;
}
How to initialize or assign 中文 to wstring?
Edit: I tried some online compilers. They all can compile but all output "??".
e.g. cpp.sh jdoodle onlinegdb repl.it
Edit 2: I installed g++ i868 MinGW-W64 8.1.0. Use Visual Studio to save the cpp file as utf8 format. Then use command line to compile it. It still output nothing.
Your compiler clearly doesn't like Unicode characters in its source files. Try initializing your string with Unicode escapes, instead:
wstring wstr = L"\u4E2D\u6587"; // These MAY be the correct codes.
Where 4E2D and 6587 are replaced with the actual hexadecimal values for the characters you want. (Sorry, but I don't have access to a full Unicode table for Chinese characters: I tried pasting them into my compiler, and these are the values it gave me on translating.)
The Unicode values given are for the character string in your question (中文); for the (different - 你好) one in your posted code, use L"\u4F60\u597D".
Also see the answer by #MarekR.
This must be configuration issue!
Apparently your compiler uses different encoding then your file is written in!
Since you are using Windows most probably encoding of file on your machine is not UTF-8 (end you have copied this file to Linux), but something else.
Since gcc is more Linux friendly it may expect UTF-8 and you have an conflict.
This is common problem, since Windows for a long time did maintain some backward compatibility with DOS (where only single byte characters where allowed and system used code pages for respective languages).
As you can see here, most compilers with default settings do not have a problem with code which uses Chinese characters.
I do not see TCM-GCC 4.9.2 compiler on godbolt, but it is not very old gcc after all.
I recommend ensure that code is written in UTF-8 and compiler will treat sources as UTF-8 encoded.
Edit: Adding std::locale::global(std::locale("")); made your code properly displaying this string on godbolt.
I tried in Visual Studio. It works(outputs "你好") if I saved my C++ file to either Unicode or utf8 format. Can you try saving your c++ file to either Unicode or utf8?

GetFileAttributeW fails for non-ASCII characters

So I am trying to check if a given file exists or not. Following this answer I tried GetFileAttributesW. It works just fine for any ascii input, but it fails for ß, ü and á (and any other non-ascii character I suspect). I get ERROR_FILE_NOT_FOUND for filenames with them and ERROR_PATH_NOT_FOUND for pathnames with them, as one would expect if they didn't exists.
I made 100% sure that they did. I spend 15 minutes on copying filenames to not make typos and using literals to avoid any bad input. I couldn't find any mistake.
Since all of these characters are non-ascii characters I stopped trying, because I suspected I might have screwed up with encodings. I just can't spot it. Is there something I am missing? I link against Kernel32.lib
Thanks!
#include <stdio.h>
#include <iostream>
#include <string>
#include "Windows.h"
void main(){
while(true){
std::wstring file_path;
std::getline(std::wcin, file_path);
DWORD dwAttrib = GetFileAttributesW(file_path.data());
if(dwAttrib == INVALID_FILE_ATTRIBUTES){
printf("error: %d\n", GetLastError());
continue;
}
if(!(dwAttrib & FILE_ATTRIBUTE_DIRECTORY))
printf("valid!\n");
else
printf("invalid!\n");
}
}
It's extremely hard to make Unicode work well in a console program on Windows, so let's start by removing that aspect of it (for now).
Modify your program so that it looks like this:
#include <cstdio>
#include <iostream>
#include <string>
#include "Windows.h"
int main() {
std::wstring file_path = L"fooß.txt";
DWORD dwAttrib = GetFileAttributesW(file_path.data());
if (dwAttrib == INVALID_FILE_ATTRIBUTES)
printf("error: %d\n", GetLastError());
if (!(dwAttrib & FILE_ATTRIBUTE_DIRECTORY))
printf("valid!\n");
else
printf("invalid!\n");
return 0;
}
Make sure this file is saved with a byte-order mark (BOM), even if you're using UTF-8. Windows applications, including Visual Studio and the compilers, can be very picky about that. If your editor won't do that, use Visual Studio to edit the file and then use Save As, click the down arrow next to the Save button, choose With Encoding. In the Advanced Save Options dialog, choose "Unicode (UTF-8 with signature) - Codepage 65001".
Make sure you have a file named fooß.txt in the current folder. I strongly recommend using a GUI program to create this file, like Notepad or Explorer.
This program works. If you still get a file-not-found message, check to make sure the temporary file is in the working directory or change the program to use an absolute path. If you use an absolute path, use backslashes and make sure they are all properly escaped. Check for typos, the extension, etc. This code does work.
Now, if you take the file name from standard input:
std::wstring file_path;
std::getline(std::wcin, file_path);
And you enter fooß.txt in the console window, you'll probably find that it doesn't work. And if you look in the debugger, you'll see that the character that should be ß is something else. For me, it's á, but it might be different for you if your console codepage is something else.
ß is U+00DF in Unicode. In Windows 1252 (the most common codepage for Windows users in the U.S.), it's 0xDF, so it might seem like there's no chance of a conversion problem. But the console windows (by default) use OEM code pages. In the U.S., the common OEM codepage is 437. So when I try to type ß in the console, that's actually encoded as 0xE1. Surprise! That's the same as the Unicode value for á. And if you manage to enter a character with the value 0xDF, you'll see that corresponds to the block character you reported in the original question.
You would think (well, I would think) that asking for the input from std::wcin would do whatever conversion is necessary. But it doesn't, and there's probably some legacy backward compatibility reason for that. You could try to imbue the stream with the "proper" codepage, but that gets complicated, and I've never bothered trying to make it work. I've simply stopped trying to use anything other than ASCII on the console.

C++ - using Unicode in winconsole

I was trying to properly handle unicode input/output in winconsole (Polish Windows 10) using std::wcin, std::wcout, std::wstring, etc. I am using Visual Studio 2015 and Character Set in Property Pages is set to Unicode. After some search and couple of tries I managed to get it work with:
_setmode(_fileno(stdout), _O_U16TEXT);
_setmode(_fileno(stdin), _O_U16TEXT);
std::locale::global(std::locale(""));
The problem is I do not quite understand why it works and why I have to use both _setmode and std::locale. When I remove _setmodes output is fine, but non-ascii chars on input aren't handled properly. Commenting std::locale line makes the program stop working on the very first appearance of non-ascii char on output.
Does someone use the same thing? Could you explain briefly what those functions do? I want to undestand what exactly I am doing.

C++ spanish question mark

I am beginning developing in C++ and I am developing a simple calculator in console and when my program ask to the user if wants to exit,the character '¿' doesn't appear (The questions in spanish are between '¿' and '?')
Can someone help me?
PD: The problem only happens in Windows,not in Linux
EDIT: Here is the code that output the code:
cout << '¿' <<"Desea salir (S/N)? " ;
There are a few ways to deal with this problem.
The fundamental problem is not that the ¿ doesn't exist in the console, but that the console and your C++ text editor disagree on what that character is. The two are using different character codes for many characters beyond those needed for English. Character codes 32-126 (letters, numbers, punctuation and brackets), are universally the same. However, character codes 128 through 255, which from a Spanish point of view includes all the accented characters, "u with diaeresis" (e.g. "pingüino"), Ñ, and the starting ¿ and ¡, depend on the specific environment.
Why have such an inconvenient disagreement in character codes is a historical accident, interesting on its own but out of the scope of this question. To keep it simple: in the Windows OS, "consoles" (typically) use the list of characters described in OEM Code Page 437, while Windows applications like your C++ editor (typically) use the Windows-1252 Code Page.
There is no portable (universal) solution for this problem, because the issue of differing charsets is a platform-specific problem. Windows is unfortunately somewhat unique in that the editor and (console) outputs use different sets.
The first and simplest solution - which is fine for toy programs - is to just look up the character code that you want from the OEM 437 code-page, and use that. For ¿, that's #168 (0xa8 in hex, or \250 in octal). You can just embed the character code in the string to make clear what you're trying to do, either of these:
std::cout << ""\x0a8""Cu""\x0a0""l es el primer n""\x0a3""mero?\n"; // hex
std::cout << "\250Cu\240l es el primer n\243mero?\n"; // octal
Outputs:
¿Cuál es el primer número?
Note how I had to do the same thing with the ú and the á. Unfortunately, writing strings like this gets unwieldy quickly. using macros or const chars can help, but not much.
A second alternative is to use a Windows function such as CharToOemA. For example1:
#include <windows.h>
...
...
char pregunta[] = "¿Cuál es el primer número\n";
char *pregunta_oem = new char[sizeof(pregunta)/sizeof(char)];
CharToOemA(pregunta, pregunta_oem);
std::cout << pregunta_oem;
delete []pregunta_oem;
For a more complex program, I would wrap that pattern into a utility function or class.
A different approach is to change the Code Page of the console, so that it agrees with your C++ editor and the rest of Windows. You can do that via the CHCP console command, or via the SetConsoleOutputCP() function, but that doesn't work on the default "raster font" used by consoles, so you have to change the font as well. When the font is set to a unicode font like Lucida Console, this works:
std::cout << "¿Cuál es el primer número?\n"; // ┐Cußl es el...
UINT originalCP = GetConsoleOutputCP();
SetConsoleOutputCP(1252);
std::cout << "¿Cuál es el primer número?\n"; // ¿Cuál es el...
SetConsoleOutputCP(originalCP);
(I don't know if you can change the font from the program itself; I have to look that up. The standard way to do it from the console is to click on the tiny icon on the corner, click Properties, Font tab, and pick a font from the list).
1 I have to warn that this snippet contains a number of subtleties that can easily trip a beginner. You have to make sure the source of the text is a char array; if you're using a char pointer, sizeof won't work correctly and you have to use strlen(source)+1. For the source I used the natural option of a char array initialized to a literal, but you can't do that for the destination because the contents of such an array are read/only. If you are using a new'd char array or one that is not initialized to a literal, you can use the same char array for the source and destination. This example feels very C-like.
You can use _setmode function to do that :
#include <iostream>
#include <string>
#if defined(WIN32) && !defined(UNIX)
# include <io.h> // for _setmode()
# include <fcntl.h> // for _O_U16TEXT
#endif // WIN32 && !UNIX
int main()
{
#if defined(WIN32) && !defined(UNIX)
_setmode(_fileno(stdout), _O_U16TEXT);
//^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#endif // WIN32 && !UNIX
std::wstring wstr = L"'¿' and '?'";
std::wcout << L"WString : " << wstr << std::endl;
system("pause");
return 0;
}
To write UNICODE chars (assuming LE is the standard Windows variant of UTF-16...) out with the iostream library, call _setmode() with _O_U16TEXT and then use wcout.
But you can't use cout anymore. It throws an assert.
Check this answer.
Assuming you are using simple call to std::cout, you should be able to print Unicode strings, if you set your command line to Unicode mode:
1. Change code page to UTF-8
You can do this by simply calling the command below in your cmd:
chcp 65001
2. Make sure you are using a font which has the characters you want to display
Lucidia Console should do the trick, as it supports ¿ (and other characters included in WGL4).
this character is simply not included in basic ascii. Try using wstring http://www.cplusplus.com/reference/string/wstring/
As you can see in Ascii table, symbol ¿ have the code 168. You can use in output stream \ddd to print some special character.
This is because the command console does not support non-ASCII characters by default (ASCII has mainly English language characters and few accented characters). To get support for characters in other character classes play around with the chcp command. Refer to it's documentation here.
In your case I think you need to run chcp 850 in the console before running your program.

C++: output contents of a Unicode file to console in Windows

I've read a bunch of articles and forums posts discussing this problem all of the solutions seem way too complicated for such a simple task.
Here's a sample code straight from cplusplus.com:
// reading a text file
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main () {
string line;
ifstream myfile ("example.txt");
if (myfile.is_open())
{
while ( myfile.good() )
{
getline (myfile,line);
cout << line << endl;
}
myfile.close();
}
else cout << "Unable to open file";
return 0;
}
It works fine as long as example.txt has only ASCII characters. Things get messy if I try to add, say, something in Russian.
In GNU/Linux it's as simple as saving the file as UTF-8.
In Windows, that doesn't work. Converting the file into UCS-2 Little Endian (what Windows seems to use by default) and changing all the functions into their wchar_t counterparts doesn't do the trick either.
Isn't there some kind of a "correct" way to get this done without doing all kinds of magic encoding conversions?
The Windows console supports unicode, sort of. It does not support left-to-right and "complex scripts". To print a UTF-16 file with Visual C++, use the following:
_setmode(_fileno(stdout), _O_U16TEXT);
And use wcout instead of cout.
There is no support for a "UTF8" code page so for UTF-8 you will have to use MultiBytetoWideChar
More on console support for unicode can be found in this blog
The right way to output to a console on Windows using cout is to first call GetConsoleOutputCP, and then convert the input you have into the console code page. Alternatively, use WriteConsoleW, passing a wchar_t*.
For reading UTF-8 or UTF-16 strings from a file, you can use the extended mode string of _wfopen_s and fgetws. I don't think there is a C++ interface for these extensions yet. The easiest way to print to the console is described in Michael Kaplan's blog:
#include <fcntl.h>
#include <io.h>
#include <stdio.h>
int main(void) {
_setmode(_fileno(stdout), _O_U16TEXT);
wprintf(L"\x043a\x043e\x0448\x043a\x0430 \x65e5\x672c\x56fd\n");
return 0;
}
Avoid GetConsoleOutputCP, it is only retained for compatibility with the 8-bit API.
While Windows console windows are UCS-2 based, they don't support UTF-8 properly.
You might make things work by setting the console window's active output code page to UTF-8 temporarily, using the appropriate API functions. Note that those functions distinguish between input code page and output code page. However, [cmd.exe] really doesn't like UTF-8 as active code page, so don't set that as a permanent code page.
Otherwise, you can use the Unicode console window functions.
Cheers & hth.,
#include <stdio.h>
int main (int argc, char *argv[])
{
// do chcp 65001 in the console before running this
printf ("γασσο γεο!\n");
}
Works perfectly if you do chcp 65001 in the console before running your program.
Caveats:
I'm using 64 bit Windows 7 with VC++ Express 2010
The code is in a file encoded as UTF-8 without BOM - I wrote it in a text editor, not using the VC++ IDE, then used VC++ to compile it.
The console has a TrueType font - this is important
Don't know if these things make too much difference...
Can't speak for chars off the BMP, give it a whirl and leave a comment.
Just to be clear, some here have mentioned UTF8. UTF8 is a multibyte format, which in some documentation is mistakenly referred to as Unicode. Unicode is always just two bytes.
I've used this previously posted solution with Visual Studio 2008. I don't know if if works with later versions of Visual Studio.
#include <iostream>
#include <fnctl.h>
#include <io.h>
#include <tchar.h>
<code ommitted>
_setmode(_fileno(stdout), _O_U16TEXT);
std::wcout << _T("This is some text to print\n");
I used macros to switch between std::wcout and std::cout, and also to remove the _setmode call for ASCII builds, thus allowing compiling either for ASCII and UNICODE. This works. I have not yet tested using std::endl, but I that might work wcout and Unicode (not sure), i.e.
std::wcout << _T("This is some text to print") << std::endl;