Unicode to integer conversion visual studio bug - c++

Im trying to convert a unicode character to an integer and encountered a bug in visual studio not sure if its a bug or something im doing wrong
The project has unicode character set and not multibyte.
#include <windows.h>
#include <iostream>
int main()
{
constexpr int a = L'🦀';
printf("%i\n", a);
std::cout << a << std::endl;
return 0;
}
Problem:
Mouse hovering variable 'a' shows that its 129408 or 0x1F980 which is correct but when it prints it out to the console i get 55358
I have created a new project and wrote the same code and it printed out the correct value but after switching the same project from unicode to multibyte and back to unicode it produces this issue, not sure how to fix this.

Wide characters in Visual Studio are only 16 bits, meaning they won't hold a value greater than 65535. You're getting the first half of the character encoded in UTF-16, which is d83e dd80.

Related

Unicode output not showing

I'm trying to learn Unicode programming in Windows.
I have this simple program:
#include <iostream>
#include <string>
int main()
{
std::wstring greekWord = L"Ελληνικά";
std::wcout << greekWord << std::endl;
return 0;
}
However, it outputs nothing. Any ideas how to make it output Greek?
I tried adding non-Greek letters, and that didn't work quite right either.
The first thing to try is to make the program not dependent on the encoding of the source file. So use Unicode escapes not literal Unicode letters
std::wstring greekWord = L"\u0395\u03BB\u03BB\u03B7\u03BD\u03B9\u03BA\u03AC";
Having the incorrect encoding in the source file is only one thing of many things that could be preventing you from printing Greek. The other obvious issue is the ability of your terminal to print Greek letters. If it can't do that, or needs to be set up correctly so that it can then nothing you do in your program is going to work.
And probably you want to fix the source code encoding issue, so that you can use unescaped literals in your code. But that's dependent on the compiler/IDE you are using.
If you are outputting your cout to a normal console then the console doesn't usually support unicode text like greek, try setting it up for unicode text or find another way to output your data, like txt files or some gui,
There are two way to do this.
The old, non-standard Microsoft way is as follows:
#include <fcntl.h>
#include <io.h>
int main()
{
_setmode(_fileno(stdout), _O_U16TEXT);
_setmode(_fileno(stdin), _O_WTEXT);
// your code here
}
You will fild this everywhere, but this is not necessarily a good way to solve this problem.
The more standards-compliant way is as follows:
#include <locale>
int main()
{
std::locale l(""); // or std::locale l("en_US.utf-8");
std::locale::global(l); // or std::wcout.imbue(l); std::wcin.imbue(l);
// your code here
}
This should work with other modern compilers and operating systems too.
TRY this it works with me :
#include
#include <io.h>
#include <fcntl.h>
using namespace std;
int main() {
_setmode(_fileno(stdout),_O_U16TEXT);
wcout<<L"Ελληνικά";
setlocale(LC_ALL,"");
return 0;
}

How to output low ASCII using C++ in Windows 10?

I'm trying to output directional arrows for a simple snake game in C++ on Windows 10. However, using this table as reference:
ASCII reference
All I got is this tiny question mark in the console:
Tiny question mark
I would like to output the symbols 16, 17, 30 and 31. I'm not much of programmer so it could be some basic mistake, but some symbols do work while others result in that symbol above.
A small example:
void showSnake() {
char snakeHead;
snakeHead = 31;
cout << snakeHead; //THIS SHOWS THE TINY QUESTION MARK
snakeHead = 62;
cout << snakeHead; //THIS SHOWS THE ">" SYMBOL
}
You should use Unicode, you'll have much more choices for characters.
On https://en.wikipedia.org/wiki/List_of_Unicode_characters I found this symbol 'â–¶' which looks similar to what you wanted to use.
Its unicode value is U+25BA which means you can create a character with a value of '\u25BA' in C++.
In practice however that value would go outside the range of the char type so you have to use wide characters (wchar) to get the job done.
As per this answer you should also toggle support for Unicode character in stdout using the _setmode function (see here) from the C run-time library.
#include <iostream>
#include <io.h>
#include <fcntl.h>
int main() {
_setmode(_fileno(stdout), _O_U16TEXT);
std::wcout << L'\u25BA';
}

how can C++ wcout utf-16 encoded char array?

I am reading the well-known answer about string and wstring and come up some confusion.
source charset and execution charset are all set as utf-8, Windows x64, VC++ compiler, git bash console (can print unicode characters), system default codepage 936(GB2312).
My expertiment code:
#include <cstring>
#include <iostream>
using namespace std;
int main(int argc, char* argv[])
{
wchar_t c[] = L"olé";
wchar_t d[] = L"abc";
wcout << c << endl;
wcout << d << endl;
return 0;
}
Can print "abc" but can't print "é".
I understand that wchar_t is used along with L prefix string literal. And under Windows wchar_t is encoded with UTF-16(It's hard coded right? No matter what source charset or execution charset I choose, L"abc" would always have the same UTF-16 code units).
The question is:How can it wcout a UTF-16 encoded string("abc"), while my source file is utf-8 and execution charset is utf-8. The program should not be able to recognize UTF-16 encoded stuff unless I set everything to utf-16.
And if it can print UTF-16 in some way, then why can't it print é?
You need a non-standard Windows system call to enable UTF-16 output.
#include <iostream>
#include <io.h>
#include <fcntl.h>
#include <stdio.h>
int main()
{
_setmode(_fileno(stdout), _O_U16TEXT); // <=== Windows madness
std::wcout << L"olé\n";
}
Note you cannot use cout after doing this, only wcout.
Also note your source code file must have BOM, otherwise the compiler will not recognise it as Unicode.
The Windows Console does not support UTF-16 output. It only supports 8-bit output, and it has partial support for 8-bit MBCS, such as Big5 or UTF-8.
To display Unicode characters on the console you will need to do conversion to UTF-8 or another MBCS in your code, and also put the console into UTF-8 mode (which requires an undocumented system call).
See also this answer

Trying to print chess pieces in Visual Studio 2013 [duplicate]

This question already has answers here:
Unicode character Visual C++
(2 answers)
Closed 8 years ago.
I am trying to print a "pawn" while using visual studio and it doesn't recognize the unicode.
can anyone tell me how to fix it?
this is a basic example:
#include <iostream>
using namespace std;
int main()
{
cout << "\33[37;42m\u2659\33[0m";
}
and the output i get is:
"<-[37;42m?<-[0m".
The ordinary Windows console windows do not support ANSI escape sequences.
To display general Unicode characters you can
use the Windows console functions instead, or
set up standard data streams to Unicode (Microsoft extension, see _setmode) and use wide streams, or
e.g. display a text file with the desired text, e.g. encoded in UCS-2.
Example 1: Using the Windows console functions.
Code:
#undef UNICODE
#define UNICODE
#include <windows.h>
#include <string>
namespace console {
using std::wstring;
const HANDLE output = GetStdHandle( STD_OUTPUT_HANDLE );
void write( const wstring& text )
{
if( text.length() == 0 ) { return; }
DWORD n_chars_written = 0;
WriteConsole( output, &text[0], text.length(), &n_chars_written, 0 );
}
}
auto main() -> int
{
console::write( L"Oh look, a \u2659!\n" );
}
Running this in the ordinary Windows console will most likely produce a square instead of a pawn symbol, even in Windows 8.x with e.g. Lucida Console font. That’s because the console window implementation simply does not support presentation of such characters. The character is correctly stored, and you can copy it out and e.g. present it in Windows Write (standard accessory program), but that’s a tad impractical, shall we say, for the ordinary end user.
A simple solution is to require use of e.g. the Console console window replacement.
Then you get nifty tabs, too. ;-)
H:\dev\test\so\0208>g++ using_console_api.cpp
H:\dev\test\so\0208>a
Oh look, a â™™!
H:\dev\test\so\0208>_
Example 2: using wide streams.
Code:
#include <assert.h>
#include <iostream>
#include <fcntl.h> // _O_WTEXT
#include <io.h> // _setmode, _isatty
namespace console {
struct Usage
{
Usage()
{
const int out_fileno = _fileno( stdout );
assert( _isatty( out_fileno ) );
_setmode( out_fileno, _O_WTEXT );
}
};
} // console
const console::Usage console_usage;
auto main() -> int
{
using namespace std;
wcout << L"Oh look, a \u2659!" << endl;
}
I recall that at one time, probably with the original MinGW g++, one had to define some special preprocessor symbol to get the Microsoft library extensions defined. However, this compiled nicely as-is with MinGW g++ 4.9.1 from the Nuwen distribution. And of course, it also compiles nicely with Visual C++.

Getline Function Messing with Code

Picking up C++ and having a go at it on OS X 10.9 using XCode 5.0.2 and using Alex Allain as reference.
The following code compiles just fine and outputs correctly
#include <iostream>
#include <string>
using namespace std;
int main()
{
std::string user_first_name = "test";
std::cout << user_first_name << "\n";
return 0;
}
When I add a getline function, code appears to compile but with no output.
#include <iostream>
#include <string>
using namespace std;
int main()
{
std::string user_first_name = "test";
std::getline( std::cin, user_first_name, '\n' );
std::cout << user_first_name << "\n";
return 0;
}
In fact debug navigator has memory filling up with bars (although actual memory use is fixed at 276 KB). Why am I getting stumped on such a simple thing/concept?
I did a bit of digging around and its quite likely this is related to a text encoding issue. I'm using defaults which is Unicode (UTF-8). Encoding is not something I'm familiar with, never something I had to deal with when learning on Windows. How do I get past this?
I can't comment regarding the use of XCode or OS X, but it was my understanding that std::cin always gives you a narrow (single-byte) character stream. In Windows (at least with Visual Studio), I think it works whether you compile for UTF8 (single-byte for all ASCII characters) or UTF16 (2-bytes for all ASCII characters). The runtime library presumably does the conversion for you as necessary.
I'm not sure what "filling up with bars" means, but maybe it's just that you're looking at uninitialized memory. If you think that it is an encoding issue, perhaps try using wstring/wcin instead of string/cin and see if that helps.