How to get output in other spoken languages other than English? - c++

I want to make an calculator in a spoken language another than English e.g. Hindi. I tried to make one using '''cout''' function but it gave me unexpected errors. Please Tell what should I do
Code Sample
cout<<name<<", "<<"कृपया ऑपरेटर दर्ज करें--> ";
cin>>op;
switch (op)
{
case '+':
cout<<name<<", "<<"कृपया पहला नंबर दर्ज करें--> ";
cin>>x;
cout<<name<<", "<<"कृपया दूसरा नंबर दर्ज करें--> ";
cin>>y;
Result:
कृपया ऑपरेटर दर्ज करें--> +
कृपया पहला नंबर दर्ज करें-->
कृपया दूसरा नंबर दर्ज करें-->

I am not familiar with Hindi font used in console, but you need to do the following things:-
Set the correct code page using std::locale OR use unicode (for examplestd::wstring and std::wcout).
Set your console to a font that can display those characters.
I would suggest you to find the character set Hindi, like English has ASCII (classification ISO646 series).
You can find more info about wstring and wcout , here posted by a fellow user.Remember to save the code file to UTF8 (or in the correct locale) otherwise the characters will simply 'disappear' from the code.

For Display output in Hindi or any none language, we Firstly need to set UTF encoding to output encoding
#include <string>
#include <iostream>
#include <Windows.h>
int main() {
std::string test = u8"Greek: αβγδ; German: Übergrößenträger";
SetConsoleOutputCP(65001);
std::cout << test;
}

You can use setlocale function in C++. You can include locale.h header (if you need more C specific functionality, this functionality can give possibility to play with currency, language and so on).
Example code:
include <locale.h>
int main(){
setlocale(LC_ALL,"en_US.utf8");
// Your code
return 0;
}
Explanation: setlocale function sets locale information to be used by the current program instead of changing whole locale. setlocale function accepts two parameters: 1. category (LC_ALL); 2. locale ("en_US.utf8"). Locales can be different.
Reference: setlocale in c++ (reference)
This function helped me a lot in my projects.

Related

how to get spcecial charecter input and store in char or string variable? special characters means © or ž

// Microsoft visual studio 2010
#include <iostream>
#include <string>
using namespace std;
int main()
{
char password;
string strPassword;
cin >> password; // only for single character
cin >> strPassword // for string storage
/* 1. but when I give input © it will store as single character c, not as ©.
I want to store it as © in my variable 'password' or 'strPassword' so that I can compare it with my pre-stored password
2. I also want to know about stdint.h
what is it ? why we need it and how it helps us (especially for streaming).
*/
return 0;
}
Regarding the first question about text encodings, you need the program to assume the same text encoding as it receives via input. One way to do that in Windows is to check what your ANSI codepage is, e.g. via commmand wmic os get codeset. For example, a US or Norwegian will get 1252, the Windows ANSI Western codepage.
Then configure your console window to that, e.g. include
system( "chcp 1252" );
at the start of your program.
More generally you will need to use Unicode to deal with a larger character set. See my Q+A “How can I make Unicode iostream i/o work in both Windows and Unix-land?”. Hopefully it's not been invalidated by compiler evolution.
Regarding the second question about stdint.h, that I suggest that you just google. It's surely a duplicate on SO. And SO is not a site for tutorials.

How can I use std::imbue to set the locale for std::wcout?

I am trying to use the std::locale mechanism in C++11 to count words in different languages. Specifically, I have std::wstringstream which contains the title of a famous Russian novel ("Crime and Punishment" in English). What I want to do is to use the appropriate locale (ru_RU.utf8 on my Linux machine) to read the stringstream, count the words and print the results. I should also probably note that my system is set to use the en_US.utf8 locale.
The desired result is this:
0: "Преступление"
1: "и"
2: "наказание"
I counted 3 words.
and the last word was "наказание"
That all works when I set the global locale, but not when I attempt to imbue the wcout stream. When I try that, I get this result instead:
0: "????????????"
1: "?"
2: "?????????"
I counted 3 words.
and the last word was "?????????"
Also, when I attempt to use a solution suggested in the comments, (which can be activate by changing #define USE_CODECVT 0 to #define USE_CODECVT 1) I get the error mentioned in this other question.
Those interested in experimenting with the code, or with compiler settings or both may wish to use this live code.
My questions
Why does that not work? Is it because wcout is already open?
Is there way to use imbue rather than setting the global locale to do what I want?
If it makes a difference, I'm using g++ 4.8.3. The full code is shown below.
getwords.cpp
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <locale>
#define USE_CODECVT 0
#define USE_IMBUE 1
#if USE_CODECVT
#include <codecvt>
#endif
using namespace std;
int main()
{
#if USE_CODECVT
locale ru("ru_RU.utf8",
new codecvt_utf8<wchar_t, 0x10ffff, consume_header>{});
#else
locale ru("ru_RU.utf8");
#endif
#if USE_IMBUE
wcout.imbue(ru);
#else
locale::global(ru);
#endif
wstringstream in{L"Преступление и наказание"};
in.imbue(ru);
wstring word;
unsigned wordcount = 0;
while (in >> word) {
wcout << wordcount << ": \"" << word << "\"\n";
++wordcount;
}
wcout << "\nI counted " << wordcount << " words.\n"
<< "and the last word was \"" << word << "\"\n";
}
First I did some more test using your code and I can confirm that L"Преступление и наказание" is a correct UTF16 string. I controlled the code of the individual characters, and they are correctly 0x41f, 0x440, 0x435, 0x441, 0x442, 0x443, 0x43f, 0x43b, 0x435, 0x43d, 0x438, 0x435, 0x20, 0x438, 0x20, 0x43d, 0x430, 0x43a, 0x430, 0x437, 0x430, 0x43d, 0x438, 0x435
I could not find any reference about it, but it looks like simply calling imbue is not enough. imbue it a method from basic_ios which is an ancestor of cout and wcout. It does act on numeric conversions, but on all my tests, it has no effect on the charset used for output.
By default, the locale used in a C++ (or C) program is ... the C locale which knows nothing about unicode. All printable ASCII characters (below 128) are outputted as is, and others are replaced with a ?. It is exactly what your program does.
To make it work correctly, you have to select a locale that knows about unicode characters with setlocale. Once this is done, you can change the numeric conversion by calling imbue, and as you selected a unicode charset all will be fine.
So provided your current locale uses an UTF-8 charset, you only have to add
setlocale(LC_ALL, "");
as first line in your program, and the output will be as expected :
0: "Преступление"
1: "и"
2: "наказание"
I counted 3 words.
and the last word was "наказание"
If your current locale does not use UTF-8, choose one that is installed on you system and that supports it. I used setlocale(LC_ALL, "fr_FR.UTF-8");, or even setlocale(LC_ALL, "en_US.UTF-8"); and both worked.
Edit :
In fact, the best way to correctly output unicode to screen is to use setlocale(LC_ALL, "");. It automatically adapts to the current charset. I tested with a stripped down variant using Latin1 charset (my system speaks natively french and not russian ...)
#include <iostream>
#include <locale>
using namespace std;
int main() {
setlocale(LC_ALL, "");
wchar_t ws[] = { 0xe8, 0xe9, 0 };
wcout << ws << endl;
}
I tried it under Linux using UTF-8 charset and ISO-8859-1 (latin1) (resp export LANG=fr_FR.UTF-8 and export LANG=fr_FR.ISO-8859-1) and I got correctly èé in the proper charset. I tried it also under Windows XP, with codepage 851 (oem) and 1252 (ansi) (resp. chcp 850 and chcp 1252 with Lucida console charset), and got èé on the console too.
Edit 2 :
Of course, you can also set a global C++ locale with locale::global(locale(""); with default locale or locale::global(locale("ru_RU.UTF-8"); with russian locale, but it is more than simply calling setlocale. According to the documentation of Gnu implementation of C++ Standard Library about locale : there is only one relation (of the C++ locale mechanism) to the C locale mechanism: the global C locale is modified if a named C++ locale object is set as the global locale", that is: std::locale::global(std::locale("")); affects the C functions as if the following call was made: std::setlocale(LC_ALL, "");. On the other hand, there is no vice versa, that is, calling setlocale has no whatsoever on the C++ locale mechanism, in particular on the working of locale("").
So it really looks like there was an underlying C library mechanizme that should be first enabled with setlocale to allow imbue conversion to work correctly.
In this answer, I'm taking the questions in reverse order, and adding another (with answer) that came up along the way.
Is there way to use imbue rather than setting the global locale to do what I want?
Yes. By default, std::wcout is synchronized to the underlying stdout C stream. So std::wcout can use imbue if that synchronization is turned off, allowing the C++ stream to operate independently. So to modify the original code to use imbue and work as intended only a single line need be added, calling std::ios_base::sync_with_stdio:
std::ios_base::sync_with_stdio(false);
std::wcout.imbue(ru);
Why didn't the original version work?
The standard (I'm referring to INCITS/ISO/IEC 14882-2011[2012]) says very little about the tie to the underlying stdio stream, but in 27.4.3 it says
The object wcout controls output to a stream buffer associated with the object stdout, declared in <cstdio>
Further, without explicitly setting a global locale, the locale is the "C" locale which is US English ASCII, so this appears to imply that stdout will, by default, have an ASCII mapping. Since no Cyrillic characters are represented in ASCII, the underlying stdout is what converts the proper Russian into a series of ? characters.
Why must the sync_with_stdio call precede imbue?
According to 27.5.3.4 of the standard:
If any input or output operation has occurred using the standard streams prior to the call,
the effect is implementation-defined. Otherwise, called with a false argument, it allows the standard streams to operate independently of the standard C streams.
I don't know what languages you're planning on supporting, but there are languages where your algorithm doesn't apply, eg. Japanese. I suggest checking out the word iterators in International Components for Unicode. http://userguide.icu-project.org/boundaryanalysis

printing Unicode characters C++

I'm trying to write a simple command line app to teach myself Japanese, but can't seem to get Unicode characters to print. What am I missing?
#include <iostream>
using namespace std;
int main()
{
wcout << L"こんにちは世界\n";
wcout << L"Hello World\n"
system("pause");
}
In this example only "Press any key to continue" is displayed. Tested on Visual C++ 2013.
This is not easy on Windows. Even when you manage to get the text to the Windows console you still need to configure cmd.exe to be able to display Japanese characters.
#include <iostream>
int main() {
std::cout << "こんにちは世界\n";
}
This works fine on any system where:
The compiler's source and execution encodings include the characters.
The output device (e.g., the console) expects text in the same encoding as the compiler's execution encoding.
A font with the appropriate characters is available (usually not a problem).
Most platforms these days use UTF-8 by default for all these encodings and so can support the entire Unicode range with code similar to the above. Unfortunately Windows is not one of these platforms.
wcout << L"こんにちは世界\n";
In this line the string literal data is (at compile time) converted from the source encoding to the execution wide encoding and then (at run time) wcout uses the locale it is imbued with to convert the wchar_t data to char data for output. Where things go wrong is that the default locale is only required to support characters from the basic source character set, which doesn't even include all ASCII characters, let alone non-ASCII characters.
So the conversion results in an error, putting wcout into a bad state. The error has to be cleared before wcout will function again, which is why the second print statement does not output anything.
You can work around this for a limited range of characters by imbuing wcout with a locale that will successfully convert the characters. Unfortunately the encoding that is needed to support the entire Unicode range this way is UTF-8; Although Microsoft's implementation of streams supports other multibyte encodings it very specifically does not support UTF-8.
For example:
wcout.imbue(std::locale(std::locale::classic(), new std::codecvt_utf8_utf16<wchar_t>()));
SetConsoleOutputCP(CP_UTF8);
wcout << L"こんにちは世界\n";
Here wcout will correctly convert the string to UTF-8, and if the output were written to a file instead of the console then the file would contain the correct UTF-8 data. However the Windows console, even though configured here to accept UTF-8 data, simply will not accept UTF-8 data written in this way.
There are a few options:
Avoid the standard library entirely:
DWORD n;
WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), L"こんにちは世界\n", 8, &n, nullptr);
Use non-standard magical incantation that will break standard code:
#include <fcntl.h>
#include <io.h>
_setmode(_fileno(stdout), _O_U8TEXT);
std::wcout << L"こんにちは世界\n";
After setting this mode std::cout << "Hello, World"; will crash.
Use a low level IO API along with manual conversion:
#include <codecvt>
#include <locale>
SetConsoleOutputCP(CP_UTF8);
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> convert;
std::puts(convert.to_bytes(L"こんにちは世界\n"));
Using any of these methods, cmd.exe will display the correct text to the best of its ability, by which I mean it will display unreadable boxes. Seven little boxes, for the given string.
You can copy the text out of cmd.exe and into notepad.exe or whatever to see the correct glyphs.
There's a whole article about dealing with Unicode in Windows console
http://alfps.wordpress.com/2011/11/22/unicode-part-1-windows-console-io-approaches/
http://alfps.wordpress.com/2011/12/08/unicode-part-2-utf-8-stream-mode/
Basically, you may implement you own streambuf for std::cout (or std::wcout) in terms of WriteConsoleW and enjoy writing UTF-8 (or whatever Unicode you want) to Windows console without depending on locales, console code pages and even without using wide characters.
It may not look very straightforward, but it's convenient and reusable solution, which is also able to give you a portable utf8-everywhere style user code. Please, don't beat me for my English :)
Or you can change Windows locale to Japanese.

C++ spanish question mark

I am beginning developing in C++ and I am developing a simple calculator in console and when my program ask to the user if wants to exit,the character '¿' doesn't appear (The questions in spanish are between '¿' and '?')
Can someone help me?
PD: The problem only happens in Windows,not in Linux
EDIT: Here is the code that output the code:
cout << '¿' <<"Desea salir (S/N)? " ;
There are a few ways to deal with this problem.
The fundamental problem is not that the ¿ doesn't exist in the console, but that the console and your C++ text editor disagree on what that character is. The two are using different character codes for many characters beyond those needed for English. Character codes 32-126 (letters, numbers, punctuation and brackets), are universally the same. However, character codes 128 through 255, which from a Spanish point of view includes all the accented characters, "u with diaeresis" (e.g. "pingüino"), Ñ, and the starting ¿ and ¡, depend on the specific environment.
Why have such an inconvenient disagreement in character codes is a historical accident, interesting on its own but out of the scope of this question. To keep it simple: in the Windows OS, "consoles" (typically) use the list of characters described in OEM Code Page 437, while Windows applications like your C++ editor (typically) use the Windows-1252 Code Page.
There is no portable (universal) solution for this problem, because the issue of differing charsets is a platform-specific problem. Windows is unfortunately somewhat unique in that the editor and (console) outputs use different sets.
The first and simplest solution - which is fine for toy programs - is to just look up the character code that you want from the OEM 437 code-page, and use that. For ¿, that's #168 (0xa8 in hex, or \250 in octal). You can just embed the character code in the string to make clear what you're trying to do, either of these:
std::cout << ""\x0a8""Cu""\x0a0""l es el primer n""\x0a3""mero?\n"; // hex
std::cout << "\250Cu\240l es el primer n\243mero?\n"; // octal
Outputs:
¿Cuál es el primer número?
Note how I had to do the same thing with the ú and the á. Unfortunately, writing strings like this gets unwieldy quickly. using macros or const chars can help, but not much.
A second alternative is to use a Windows function such as CharToOemA. For example1:
#include <windows.h>
...
...
char pregunta[] = "¿Cuál es el primer número\n";
char *pregunta_oem = new char[sizeof(pregunta)/sizeof(char)];
CharToOemA(pregunta, pregunta_oem);
std::cout << pregunta_oem;
delete []pregunta_oem;
For a more complex program, I would wrap that pattern into a utility function or class.
A different approach is to change the Code Page of the console, so that it agrees with your C++ editor and the rest of Windows. You can do that via the CHCP console command, or via the SetConsoleOutputCP() function, but that doesn't work on the default "raster font" used by consoles, so you have to change the font as well. When the font is set to a unicode font like Lucida Console, this works:
std::cout << "¿Cuál es el primer número?\n"; // ┐Cußl es el...
UINT originalCP = GetConsoleOutputCP();
SetConsoleOutputCP(1252);
std::cout << "¿Cuál es el primer número?\n"; // ¿Cuál es el...
SetConsoleOutputCP(originalCP);
(I don't know if you can change the font from the program itself; I have to look that up. The standard way to do it from the console is to click on the tiny icon on the corner, click Properties, Font tab, and pick a font from the list).
1 I have to warn that this snippet contains a number of subtleties that can easily trip a beginner. You have to make sure the source of the text is a char array; if you're using a char pointer, sizeof won't work correctly and you have to use strlen(source)+1. For the source I used the natural option of a char array initialized to a literal, but you can't do that for the destination because the contents of such an array are read/only. If you are using a new'd char array or one that is not initialized to a literal, you can use the same char array for the source and destination. This example feels very C-like.
You can use _setmode function to do that :
#include <iostream>
#include <string>
#if defined(WIN32) && !defined(UNIX)
# include <io.h> // for _setmode()
# include <fcntl.h> // for _O_U16TEXT
#endif // WIN32 && !UNIX
int main()
{
#if defined(WIN32) && !defined(UNIX)
_setmode(_fileno(stdout), _O_U16TEXT);
//^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#endif // WIN32 && !UNIX
std::wstring wstr = L"'¿' and '?'";
std::wcout << L"WString : " << wstr << std::endl;
system("pause");
return 0;
}
To write UNICODE chars (assuming LE is the standard Windows variant of UTF-16...) out with the iostream library, call _setmode() with _O_U16TEXT and then use wcout.
But you can't use cout anymore. It throws an assert.
Check this answer.
Assuming you are using simple call to std::cout, you should be able to print Unicode strings, if you set your command line to Unicode mode:
1. Change code page to UTF-8
You can do this by simply calling the command below in your cmd:
chcp 65001
2. Make sure you are using a font which has the characters you want to display
Lucidia Console should do the trick, as it supports ¿ (and other characters included in WGL4).
this character is simply not included in basic ascii. Try using wstring http://www.cplusplus.com/reference/string/wstring/
As you can see in Ascii table, symbol ¿ have the code 168. You can use in output stream \ddd to print some special character.
This is because the command console does not support non-ASCII characters by default (ASCII has mainly English language characters and few accented characters). To get support for characters in other character classes play around with the chcp command. Refer to it's documentation here.
In your case I think you need to run chcp 850 in the console before running your program.

How to write Cyrillic text in C++ console

For example, if I write:
cout << "Привет!" << endl; //it's hello in Russian
In the console it would be something like ╧ЁштхЄ!.
OK, I know that we can use:
setlocale(LC_ALL, "Russian");
But after that, command line arguments in Russian do not work (if I start my program through a BAT file):
StartProgram.bat
chcp 1251
MyProgram.exe -user=Олег -password=Пароль
So, after setlocale the program can't read Russian arguments properly.
This happens because the BAT file in CP1251, but the console is in CP866.
So, there is a question:
How can I write Russian text in the C++ console and at the same time have Russian command line arguments read properly.
See this entry from Michael Kaplan's blog:
http://www.siao2.com/2008/03/18/8306597.aspx
Have you tried using wcout? It is similar to cout, but it accepts "wide" characters, which should permit the proper unicode encodings.
This article about localization, and another, both from MSDN may be of use.
Console set to be in 1251 instead of in 866:
//Save As Windows 1251
#include<stdio.h>
#include<windows.h>
int main(int argc, char **argv){
SetConsoleOutputCP(1251);
SetConsoleCP(1251);
if(argc<2)return 0;
else printf("Hello %s %s\n",argv[1],argv[2]);
}
Program is argument.exe and result:
D:\Debug>argument Олег Пароль
Hello Олег Пароль
You can try using the following functions setlocale() and SetConsoleOutputCP()
setlocale(LC_ALL, "Russian");
SetConsoleOutputCP(866);
For me this seems to resolve the problem:
#include <fcntl.h>
#include <io.h>
#include <iostream>
using namespace std;
int main(void) {
_setmode(_fileno(stdout), _O_U16TEXT);
wcout << L"Огњен" << endl;
return 0;
}
Have you set the language for non-unicode programs to be Russian, in the Regional and Language Options section of the Control Panel?
(I have no idea what the usual setup for Russian-speaking programmers might be; I just wonder whether it is common to set this to some kind of English to avoid confusing overly-parochial tools.)
Unless my memory is playing tricks, when I was working with some code from Japanese developers it was this step that got the console displaying non-Unicode Japanese text (Shift-JIS encoding) properly.
WriteConsoleW can handle UNICODE e.g. Cyrillic letters without problems. If you won’t miss the formatting features of wcout, you can redirect the standard wcout stream buffer and print it with WriteConsoleW.
A full example is shown here
// save and redirect cout buffer
wostringstream newCoutBuffer;
wstreambuf* oldCoutBuffer = wcout.rdbuf(newCoutBuffer.rdbuf());
// do your wcout stuff here
// do your wcout stuff here
DWORD dwWritten;
WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), newCoutBuffer.str().c_str(),newCoutBuffer.tellp(),&dwWritten,NULL);
// restore cout buffer
wcout.rdbuf(oldCoutBuffer);
The most correct way is using wcout + std::imbue.
But one should know that there was some changes in setlocale API which happened in Windows Vista/7. "Russian" locale string isn't recognized as "cp866" anymore, at least in Visual C++ CRT.
To get cp866 output, try use this instead:
::setlocale( LC_ALL , "russian_russia.866" );