C++, If...Else with strings and special characters

C++, If...Else with strings and special characters - c++

I was trying to create a simple program to help my students to train German irregular verbs, but I have had problems with special characters and If statement. Basically it does not recognise the ä ö ü and ß, and output is therefore the Else statement (¨Nicht Gut"). How could I fix it?
#include <iostream>
#include <conio.h>
#include <locale.h>
using namespace std;
int main () {
setlocale(LC_CTYPE, "German");
string Antwort1;
string Antwort2;
string Antwort3;
getline(cin, str);
cout << str;
cout << "Präsens von BEHALTEN (du)" << endl;
cin >> Antwort1;
if (Antwort1 == "behältst") {
cout << "Gut!" << endl;
}
else {
cout << "Nicht Gut" << endl;
}
cout << "Präsens von BEHALTEN (er/sie/es/man) " << endl;
cin >> Antwort1;
if (Antwort1 == "behält") {
cout << "Gut!" << endl;
}
else {
cout << "Nicht Gut" << endl;
}
return 0;
}
I tried with
if (Antwort1 == (LC_CTYPE, "German"),"behält")
but then it causes the contrary effect. Then every single string I write is valid ("Gut").

My answer applies to the Windows 10 console using the classic default Command Prompt (I haven't tried it with other systems like PowerShell, nor I have tried these experiments on Linux yet).
It seems to me that, as of today (23 February 2022), Windows 10's Command Prompt and the Microsoft C/C++ Runtime of VS2019 don't support Unicode UTF-8 well: See, for example, this blog post showing a CRT crash you get when trying to call:
_setmode(_fileno(stdout), _O_U8TEXT);
and printing UTF-8 text using std::cout.
In my experience, you can make Unicode work in Windows Command Prompt using Unicode UTF-16. You can still use UTF-8 in your C++ application, but you have to convert between UTF-8 and UTF-16 at the Windows boundaries.
I modified your code to use Unicode UTF-16, and the code seems to work correctly when compiled with Visual Studio 2019, and executed inside the Windows Command Prompt:
// Used for _setmode calls
#include <fcntl.h>
#include <io.h>
#include <stdio.h>
// Console I/O with Unicode UTF-16 wcin, wcout and wstring
#include <iostream>
#include <string>
using std::wcin;
using std::wcout;
using std::wstring;
int main() {
// Enable Unicode UTF-16 console input/output
_setmode(_fileno(stdout), _O_U16TEXT);
_setmode(_fileno(stdin), _O_U16TEXT);
wcout << L"Präsens von BEHALTEN (du) \n";
wstring Antwort1;
wcin >> Antwort1;
if (Antwort1 == L"behältst") {
wcout << L"Gut! \n";
} else {
wcout << L"Nicht Gut \n";
}
}
Note the use of L"..." to represent UTF-16 string literals, and the use of wchar_t-based std::wcout, std::wcin, and std::wstring instead of the char-based std::cout, std::cin and std::string.

Related

fstream not working properly with russian text?

I work with russian a lot and I've been trying to get data from a file with an input stream. Here's the code, it's supposed to output only the words that contain no more than 5 characters.
#include <iostream>
#include <fstream>
#include <string>
#include <Windows.h>
using namespace std;
int main()
{
setlocale(LC_ALL, "ru_ru.utf8");
ifstream input{ "in_text.txt" };
if (!input) {
cerr << "Ошибка при открытии файла" << endl;
return 1;
}
cout << "Вывод содержимого файла: " << "\n\n";
string line{};
while (input >> line) {
if (line.size() <= 5)
cout << line << endl;
}
cout << endl;
input.close();
return 0;
}
Here's the problem:
I noticed the output didn't pick up all of the words that were actually containing less than 5 characters. So I did a simple test with the word "Test" in english and the translation "тест" in russian, the same number of characters. So my text file would look like this:
Test тест
I used to debugger to see how the program would run and it printed out the english word and left the russian. I can't understand why this is happening.
P.S. When I changed the code to if (line.size() <= 8) it printed out both of them. Very odd
I think I messed up my system locale somehow I don't know. I did one time try to use std::locale
without really understanding it, maybe that did something to my PC I'm not really sure. Please help

I'm very unsure about this but using codecvt_utf8 and wstring_convert seems to work:
#include <codecvt> // codecvt_utf8
#include <string>
#include <iostream>
#include <locale> // std::wstring_convert
int main() {
// ...
while (input >> line) {
// convert the utf8 encoded `line` to utf32 encoding:
std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> u8_to_u32;
std::u32string u32s = u8_to_u32.from_bytes(line);
if (u32s.size() <= 5) // check the utf32 length
std::cout << line << '\n'; // but print the utf8 encoded string
}
// ...
}
Demo

Cin and getline won't save non ascii characters correctly

I'm trying to open a file which path has non-ascii characters. The user drags the file into the cmd and I get the path with getline.
When I try to open the file using the path provided by the user it doesn't work.
#include <iostream>
#include <string>
#include <fstream>
#include <cstdlib>
using namespace std;
int main() {
string userInput; //Saves the user input
string autoInput = "C:\\Espolón\\file.txt"; //Specifying the path like this works
ifstream file1; //For opening the file with the userInput
ifstream file2; //For opening the file with autoInput
getline(cin, userInput);
system("CLS"); //Clears the CMD
file1.open(userInput); //This throws an error. Note that I didn't use is_open for cleaner code but im actually using it in my tests
file2.open(autoInput); //This works perfectly
cout << "User input: " + userInput << endl<<"Auto input: " + autoInput << endl; //Both show correctly in the CMD
system("pause");
}
Although the cout shows everything correctly, while debugging I found out the userInput non-ascii character 'ó' is being changed with a '¢' ("C:\Espol¢n\file.txt") but the autoInput is correctly stored ("C:\Espolón\file.txt"):
screen capture. Therefore, I can't open a file using the path provided by the user if it has special characters.
I tried using wide characters as I've read in other similar questions:
#include <iostream>
#include <string>
#include <fstream>
#include <cstdlib>
using namespace std;
int main() {
wstring userInput; //Saves the user input
string autoInput = "C:\\Espolón\\file.txt"; //Specifying the path like this works
ifstream file1; //For opening the file with the userInput
ifstream file2; //For opening the file with a fix string
getline(wcin, userInput);
system("CLS"); //Clears the CMD
file1.open(userInput); //This throws an error. Note that I didn't use is_open for cleaner code but im actually using it in my tests
file2.open(autoInput); //This works perfectly
wcout << L"User input: " + userInput << endl;
cout<<"Auto input: " + autoInput << endl; //Both show correctly in the CMD
system("pause");
}
But the issue persists.
I also tried changing the encoding to Unicode but nothing changed.
Any help would be welcomed!
(Sry for bad eglish)

Windows APIs expect Unicode in UTF16LE format, not UTF8. Wide string characters and functions are treated as UTF16 in Windows.
The letter ó is ANSI, not ASCII. Windows also supports ANSI. You can use ANSI generally if your program runs on your own computer, but even then, you can run in to problems. If you wish to use ANSI, then try not to mix it with UTF16 and wide string format (std::wstring)
Windows 8 and 10 do have limited console support for UTF8. But it is recommended to use UTF16 if you want to avoid the pitfalls with UTF8.
Here is UTF16 version:
#include <iostream>
#include <string>
#include <fstream>
#include <cstdlib>
#include <io.h>
#include <fcntl.h>
using namespace std;
int main()
{
_setmode(_fileno(stdout), _O_U16TEXT);
_setmode(_fileno(stdin), _O_U16TEXT);
wstring userInput;
wstring autoInput = L"C:\\Espolón\\file.txt";
ifstream file1, file2;
getline(wcin, userInput);
file1.open(userInput);
file2.open(autoInput);
wcout << L"file1: " << (file1.good() ? L"good\n" : L"bad\n");
wcout << L"file2: " << (file1.good() ? L"good\n" : L"bad\n");
wcout << L"User input: " + userInput << endl;
wcout << L"Auto input: " + autoInput << endl;
system("pause");
return 0;
}
Unix based systems use UTF8. Note that these systems don't easily get along with UTF16. They use std::wstring is used for UTF32. For the most part you can simply use std::string for UTF8 in Linux, Mac, etc.
The text file may contain text in ASCII format, ANSI, UTF8, or UTF16. It is recommended to use UTF8 for compatibility.
In Windows, store the text file in UTF8 format. Use these functions to convert back and forth between UTF8 and UTF16:
//need at least c++11 for writable std::string::data()
std::string get_u8(const std::wstring u16)
{
if(u16.empty()) return std::string();
int size = WideCharToMultiByte(CP_UTF8, 0, u16.c_str(), -1, 0, 0, 0, 0);
std::string u8(size, 0);
WideCharToMultiByte(CP_UTF8, 0, u16.c_str(), -1, u8.data(), size, 0, 0);
return u8;
}
std::wstring get_u16(const std::string u8)
{
if(u8.empty()) return std::wstring();
int size = MultiByteToWideChar(CP_UTF8, 0, u8.c_str(), -1, 0, 0);
std::wstring u16(size, 0);
MultiByteToWideChar(CP_UTF8, 0, u8.c_str(), -1, u16.data(), size);
return u16;
}

wcin.imbue and UTF-8

On linux with g++, if I set a utf8 global locale, then wcin correctly transcodes UTF-8 to the internal wchar_t encoding.
However, if I use the classic locale and imbue an UTF8 locale into wcin, this doesn't happen. Input either fails altogether, or each individual byte gets converted to wchar_t independently.
With clang++ and libc++, neither setting the global locale nor imbuing the locale in wcin work.
#include <iostream>
#include <locale>
#include <string>
using namespace std;
int main() {
if(true)
// this works with g++, but not with clang++/libc++
locale::global(locale("C.UTF-8"));
else
// this doesn't work with either implementation
wcin.imbue(locale("C.UTF-8"));
wstring s;
wcin >> s;
cout << s.length() << " " << (s == L"áéú");
return 0;
}
The input stream contains only áéú characters. (They are in UTF-8, not any single-byte encoding).
Live demo: one two (I can't reproduce the other behaviour with online compilers).
Is this standard-conforming? Shouldn't I be able to leave the global locale alone and use imbue instead?
Should either of the described behaviours be classified as an implementation bug?

First of all you should use wcout with wcin.
Now you have two possible solutions to that:
1) Deactivate synchronization of iostream and cstdio streams by using
ios_base::sync_with_stdio(false);
Note, that this should be the first call, otherwise the behavior depends on implementation.
int main() {
ios_base::sync_with_stdio(false);
wcin.imbue(locale("C.UTF-8"));
wstring s;
wcin >> s;
wcout << s.length() << " " << (s == L"áéú");
return 0;
}
2) Localize both locale and wcout:
int main() {
std::setlocale(LC_ALL, "C.UTF-8");
wcout.imbue(locale("C.UTF-8"));
wstring s;
wcin >> s;
wcout << s.length() << " " << (s == L"áéú");
return 0;
}
Tested both of them using ideone, works fine. I don't have clang++/libc++ with me, so wasn't able to test this behavior, sorry.

wcin gets wrong character input?

The following code:
#include <iostream>
using std::wcin;
using std::wcout;
using std::locale;
int main()
{
locale::global(locale("Portuguese_Brazil"));
wcout << "wcin Test using \"ção\": "; // shows that wcout works properly
wchar_t wcinTest[] = L"";
wcin >> wcinTest;
wcout << wcinTest << " should be \"ção\".";
return 0;
}
Results in:
wcin Test using "ção": ção
╬Æo should be "ção".
The ╬ character is U+2021 or 8225, and the ç is U+00E7 or 231.
I changed mult-bytes option, set and not set UNICODE in project properties. Nothing worked.
I already set the console font into Consolas, a true type font capable of displaying the ç character correctly.
I'd like this as simple and reproducible possible to use as a standard practice for future UNICODE console applications.
Any ideas?

wcinTest is a wchar_t buffer of length 1;
You overflow it when you read into it. Use a std::wstring insead.

This finally worked:
#include <iostream>
#include <string>
#include <Windows.h>
using std::cin;
using std::cout;
using std::string;
int main()
{
SetConsoleOutputCP(1252);
SetConsoleCP(1252);
cout << "wcin Test using \"ção\": "; // shows that wcout works properly
string wcinTest;
cin >> wcinTest;
cout << wcinTest << " should be \"ção\".";
return 0;
}
I'm too newbie to understand why I need both SetConsoleOutputCP and SetConsoleCP. I though maybe just SetConsoleCP would fix everything, but no, I need both: SetConsoleOutputCP fixed cout; and SetConsoleCP fixed cin.
Thanks anyway #StoryTeller

Unable to write a std::wstring into wofstream

I'm using Qt/C++ on a Linux system. I need to convert a QLineEdit's text to std::wstring and write it into a std::wofstream. It works correctly for ascii strings, but when I enter any other character (Arabic or Uzbek) there is nothing written in the file. (size of file is 0 bytes).
this is my code:
wofstream customersFile;
customersFile.open("./customers.txt");
std::wstring ws = lne_address_customer->text().toStdWString();
customersFile << ws << ws.length() << std::endl;
Output for John Smith entered in the line edit is John Smith10. but for unicode strings, nothing.
First I thought that is a problem with QString::toStdWString(), but customersFile << ws.length(); writes correct length of all strings. So I guess I'm doing something wrong wrong with writing wstring in file. [?]
EDIT:
I write it again in eclipse. and compiled it with g++4.5. result is same:
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
int main()
{
cout << "" << endl; // prints
wstring ws = L"سلام"; // this is an Arabic "Hello"
wofstream wf("new.txt");
if (!wf.bad())
wf << ws;
else
cerr << "some problem";
return 0;
}

Add
#include <locale>
and at the start of main,
std::locale::global(std::locale(""));

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++, If...Else with strings and special characters - c++

Related

fstream not working properly with russian text?

Cin and getline won't save non ascii characters correctly

wcin.imbue and UTF-8

wcin gets wrong character input?

Unable to write a std::wstring into wofstream

Categories

Resources