Strange Output Encoding in WSL - c++

I have created a program in C++ which uses wstring. The program is something like this:-
#include<iostream>
using namespace std;
const wstring NAME = L"STACKOVERFLOW";
int main()
{
wcout<<NAME;
}
Now, I have tried this program in windows and linux. The program is working on both platform correctly with output as STACKOVERFLOW but while using in wsl(Ubuntu), the output is very strange. It is something like this:-
?????r????????????????
I have two questions now:-
Is this error related to Unicode? because I know that Windows have UTF-16 and Linux has UTF-32 for wchar whereas Wsl uses UTF-8.
If it is related to Unicode, then how is it possible that it can work on Windows and linux but not in wsl since Windows and Linux have diffrent encoding.
P.S.:- If anybody has any idea, please do write your views.
Update:-
Recently, added a new line in the code and the output is like this now(in wsl):-
#include<iostream>
using namespace std;
const wstring NAME = L"STACKOVERFLOW";
int main()
{
cout<<"Hello"<<endl; //outputting hello along with the widestring
wcout<<NAME;
}
and Now, The output is changed to this:-
Hello
SAKVRLW��pH�� �0�`� (�$p

Related

Cannot wcout wide string other than English [duplicate]

I've read a bunch of articles and forums posts discussing this problem all of the solutions seem way too complicated for such a simple task.
Here's a sample code straight from cplusplus.com:
// reading a text file
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main () {
string line;
ifstream myfile ("example.txt");
if (myfile.is_open())
{
while ( myfile.good() )
{
getline (myfile,line);
cout << line << endl;
}
myfile.close();
}
else cout << "Unable to open file";
return 0;
}
It works fine as long as example.txt has only ASCII characters. Things get messy if I try to add, say, something in Russian.
In GNU/Linux it's as simple as saving the file as UTF-8.
In Windows, that doesn't work. Converting the file into UCS-2 Little Endian (what Windows seems to use by default) and changing all the functions into their wchar_t counterparts doesn't do the trick either.
Isn't there some kind of a "correct" way to get this done without doing all kinds of magic encoding conversions?
The Windows console supports unicode, sort of. It does not support left-to-right and "complex scripts". To print a UTF-16 file with Visual C++, use the following:
_setmode(_fileno(stdout), _O_U16TEXT);
And use wcout instead of cout.
There is no support for a "UTF8" code page so for UTF-8 you will have to use MultiBytetoWideChar
More on console support for unicode can be found in this blog
The right way to output to a console on Windows using cout is to first call GetConsoleOutputCP, and then convert the input you have into the console code page. Alternatively, use WriteConsoleW, passing a wchar_t*.
For reading UTF-8 or UTF-16 strings from a file, you can use the extended mode string of _wfopen_s and fgetws. I don't think there is a C++ interface for these extensions yet. The easiest way to print to the console is described in Michael Kaplan's blog:
#include <fcntl.h>
#include <io.h>
#include <stdio.h>
int main(void) {
_setmode(_fileno(stdout), _O_U16TEXT);
wprintf(L"\x043a\x043e\x0448\x043a\x0430 \x65e5\x672c\x56fd\n");
return 0;
}
Avoid GetConsoleOutputCP, it is only retained for compatibility with the 8-bit API.
While Windows console windows are UCS-2 based, they don't support UTF-8 properly.
You might make things work by setting the console window's active output code page to UTF-8 temporarily, using the appropriate API functions. Note that those functions distinguish between input code page and output code page. However, [cmd.exe] really doesn't like UTF-8 as active code page, so don't set that as a permanent code page.
Otherwise, you can use the Unicode console window functions.
Cheers & hth.,
#include <stdio.h>
int main (int argc, char *argv[])
{
// do chcp 65001 in the console before running this
printf ("γασσο γεο!\n");
}
Works perfectly if you do chcp 65001 in the console before running your program.
Caveats:
I'm using 64 bit Windows 7 with VC++ Express 2010
The code is in a file encoded as UTF-8 without BOM - I wrote it in a text editor, not using the VC++ IDE, then used VC++ to compile it.
The console has a TrueType font - this is important
Don't know if these things make too much difference...
Can't speak for chars off the BMP, give it a whirl and leave a comment.
Just to be clear, some here have mentioned UTF8. UTF8 is a multibyte format, which in some documentation is mistakenly referred to as Unicode. Unicode is always just two bytes.
I've used this previously posted solution with Visual Studio 2008. I don't know if if works with later versions of Visual Studio.
#include <iostream>
#include <fnctl.h>
#include <io.h>
#include <tchar.h>
<code ommitted>
_setmode(_fileno(stdout), _O_U16TEXT);
std::wcout << _T("This is some text to print\n");
I used macros to switch between std::wcout and std::cout, and also to remove the _setmode call for ASCII builds, thus allowing compiling either for ASCII and UNICODE. This works. I have not yet tested using std::endl, but I that might work wcout and Unicode (not sure), i.e.
std::wcout << _T("This is some text to print") << std::endl;

C++ Getting basic properties of a program

I am writing an algorithm that wants to check if google-chrome or has the same version as the input given by the user. for that I need a way to check what version google-chrome has. I am using a linux machine to program but I want to make it work at home where I use win 8.1
Is there a way to check in C/C++ what the version of a program is?
I thing it is best to get the awnser in a string because then i can just compare with
if(strcmp(version, input)=1)
Thanks for reading.
PS. I started in C++ but I can change, even to java if neccesary
here is the basic version of what i have now:
#include <iostream>
#include <string>
using namespace std;
#define x 256;
int main(){
std::string version;
std::string input;
//get version
if(strcmp(version, input)=1){
//versions are equal
}
//chrome needs to be updaded
return 0;
}
You can launch a terminal process from C with the popen() command. You'll need to include the stdio.h header. Here's a code snippet that might help you:
FILE *pd = popen("google-chrome --version", "r");
char output[50];
fgets(output,50,pd);
pclose(pd);
In the output array you'll get something like "Google Chrome 25.0.1364.97"

Windows Unicode C++ Stream Output Failure

I am currently writing an application which requires me to call GetWindowText on arbitrary windows and store that data to a file for later processing. Long story short, I noticed that my tool was failing on Battlefield 3, and I narrowed the problem down to the following character in its window title:
http://www.fileformat.info/info/unicode/char/2122/index.htm
So I created a little test app which just does the following:
std::wcout << L"\u2122";
Low and behold that breaks output to the console window for the remainder of the program.
Why is the MSVC STL choking on this character (and I assume others) when APIs like MessageBoxW etc display it just fine?
How can I get those characters printed to my file?
Tested on both VC10 and VC11 under Windows 7 x64.
Sorry for the poorly constructed post, I'm tearing my hair out here.
Thanks.
EDIT:
Minimal test case
#include <fstream>
#include <iostream>
int main()
{
{
std::wofstream test_file("test.txt");
test_file << L"\u2122";
}
std::wcout << L"\u2122";
}
Expected result: '™' character printed to console and file.
Observed result: File is created but is empty. No output to console.
I have confirmed that the font I"m using for my console is capable of displaying the character in question, and the file is definitely empty (0 bytes in size).
EDIT:
Further debugging shows that the 'failbit' and 'badbit' are set in the stream(s).
EDIT:
I have also tried using Boost.Locale and I am having the same issue even with the new locale imbued globally and explicitly to all standard streams.
To write into a file, you have to set the locale correctly, for example if you want to write them as UTF-8 characters, you have to add
const std::locale utf8_locale
= std::locale(std::locale(), new std::codecvt_utf8<wchar_t>());
test_file.imbue(utf8_locale);
You have to add these 2 include files
#include <codecvt>
#include <locale>
To write to the console you have to set the console in the correct mode (this is windows specific) by adding
_setmode(_fileno(stdout), _O_U8TEXT);
(in case you want to use UTF-8).
For this you have to add these 2 include files:
#include <fcntl.h>
#include <io.h>
Furthermore you have to make sure that your are using a font that supports Unicode (such as for example Lucida Console). You can change the font in the properties of your console window.
The complete program now looks like this:
#include <fstream>
#include <iostream>
#include <codecvt>
#include <locale>
#include <fcntl.h>
#include <io.h>
int main()
{
const std::locale utf8_locale = std::locale(std::locale(),
new std::codecvt_utf8<wchar_t>());
{
std::wofstream test_file("c:\\temp\\test.txt");
test_file.imbue(utf8_locale);
test_file << L"\u2122";
}
_setmode(_fileno(stdout), _O_U8TEXT);
std::wcout << L"\u2122";
}
Are you always using std::wcout or are you sometimes using std::cout? Mixing these won't work. Of course, the error description "choking" doesn't say at all what problem you are observing. I'd suspect that this is a different problem to the one using files, however.
As there is no real description of the problem it takes somewhat of a crystal ball followed by a shot in the dark to hit the problem... Since you want to get Unicode characters from you file make sure that the file stream you are using uses a std::locale whose std::codecvt<...> facet actually converts to a suitable Unicode encoding.
I just tested GCC (versions 4.4 thru 4.7) and MSVC 10, which all exhibit this problem.
Equally broken is wprintf, which does as little as the C++ stream API.
I also tested the raw Win32 API to see if nothing else was causing the failure, and this works:
#include <windows.h>
int main()
{
HANDLE stdout = GetStdHandle(STD_OUTPUT_HANDLE);
DWORD n;
WriteConsoleW( stdout, L"\u03B2", 1, &n, NULL );
}
Which writes β to the console (if you set cmd's font to something like Lucida Console).
Conclusion: wchar_t output is horribly broken in both large C++ Standard library implementations.
Although the wide character streams take Unicode as input, that's not what they produce as output - the characters go through a conversion. If a character can't be represented in the encoding that it's converting to, the output fails.

International UTF-32 strings output to console in Linux

#include <stdio.h>
#include <iostream>
#include <locale>
int main()
{
const wchar_t *str = L"\u041F\u043E\u0440\u044F\u0434\u043E\u043A";
std::locale::global(std::locale(""));
std::wcout << str << std::endl;
}
Here's a piece of code that outputs a russian phrase in UTF-32 wchar_t string as:
The correct one: Порядок when run from UTF-8 gnome terminal in Ubuntu 11.10
РџРѕСЂСЏРґРѕРє in Eclipse in the test run as above
45=B8D8:0B># in Eclipse in a real program (where I don't even know who does what and where, but I suppose someone does mess with locales)
??????? if I don't call locale
str is shown as Details:0x400960 L"\320\237\320\276\321\200\321\217\320\264\320\276\320\272" in Eclipse Watch window
is shown as ASCII only byte chars in Eclipse memory window (and there's no way to specify that this is UTF-32 string)
I believe this is a misconfiguration in either eclipse console or the program, because, for example, other people that just run my code in Eclipse they do see the correct output.
Could someone shed a light on this confusion? What is the correct way to setup all the pieces (OS, gcc, terminal, Eclipse, sources...) to output international symbols that are stored in UTF-32 wchar_t strings?
And as a side note, why should I still care about all this when we have UTF-32 and that should be enough to know what is inside...
It turned out to be that other code changed locale.

C++: output contents of a Unicode file to console in Windows

I've read a bunch of articles and forums posts discussing this problem all of the solutions seem way too complicated for such a simple task.
Here's a sample code straight from cplusplus.com:
// reading a text file
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main () {
string line;
ifstream myfile ("example.txt");
if (myfile.is_open())
{
while ( myfile.good() )
{
getline (myfile,line);
cout << line << endl;
}
myfile.close();
}
else cout << "Unable to open file";
return 0;
}
It works fine as long as example.txt has only ASCII characters. Things get messy if I try to add, say, something in Russian.
In GNU/Linux it's as simple as saving the file as UTF-8.
In Windows, that doesn't work. Converting the file into UCS-2 Little Endian (what Windows seems to use by default) and changing all the functions into their wchar_t counterparts doesn't do the trick either.
Isn't there some kind of a "correct" way to get this done without doing all kinds of magic encoding conversions?
The Windows console supports unicode, sort of. It does not support left-to-right and "complex scripts". To print a UTF-16 file with Visual C++, use the following:
_setmode(_fileno(stdout), _O_U16TEXT);
And use wcout instead of cout.
There is no support for a "UTF8" code page so for UTF-8 you will have to use MultiBytetoWideChar
More on console support for unicode can be found in this blog
The right way to output to a console on Windows using cout is to first call GetConsoleOutputCP, and then convert the input you have into the console code page. Alternatively, use WriteConsoleW, passing a wchar_t*.
For reading UTF-8 or UTF-16 strings from a file, you can use the extended mode string of _wfopen_s and fgetws. I don't think there is a C++ interface for these extensions yet. The easiest way to print to the console is described in Michael Kaplan's blog:
#include <fcntl.h>
#include <io.h>
#include <stdio.h>
int main(void) {
_setmode(_fileno(stdout), _O_U16TEXT);
wprintf(L"\x043a\x043e\x0448\x043a\x0430 \x65e5\x672c\x56fd\n");
return 0;
}
Avoid GetConsoleOutputCP, it is only retained for compatibility with the 8-bit API.
While Windows console windows are UCS-2 based, they don't support UTF-8 properly.
You might make things work by setting the console window's active output code page to UTF-8 temporarily, using the appropriate API functions. Note that those functions distinguish between input code page and output code page. However, [cmd.exe] really doesn't like UTF-8 as active code page, so don't set that as a permanent code page.
Otherwise, you can use the Unicode console window functions.
Cheers & hth.,
#include <stdio.h>
int main (int argc, char *argv[])
{
// do chcp 65001 in the console before running this
printf ("γασσο γεο!\n");
}
Works perfectly if you do chcp 65001 in the console before running your program.
Caveats:
I'm using 64 bit Windows 7 with VC++ Express 2010
The code is in a file encoded as UTF-8 without BOM - I wrote it in a text editor, not using the VC++ IDE, then used VC++ to compile it.
The console has a TrueType font - this is important
Don't know if these things make too much difference...
Can't speak for chars off the BMP, give it a whirl and leave a comment.
Just to be clear, some here have mentioned UTF8. UTF8 is a multibyte format, which in some documentation is mistakenly referred to as Unicode. Unicode is always just two bytes.
I've used this previously posted solution with Visual Studio 2008. I don't know if if works with later versions of Visual Studio.
#include <iostream>
#include <fnctl.h>
#include <io.h>
#include <tchar.h>
<code ommitted>
_setmode(_fileno(stdout), _O_U16TEXT);
std::wcout << _T("This is some text to print\n");
I used macros to switch between std::wcout and std::cout, and also to remove the _setmode call for ASCII builds, thus allowing compiling either for ASCII and UNICODE. This works. I have not yet tested using std::endl, but I that might work wcout and Unicode (not sure), i.e.
std::wcout << _T("This is some text to print") << std::endl;