Unable to write a std::wstring into wofstream - c++

I'm using Qt/C++ on a Linux system. I need to convert a QLineEdit's text to std::wstring and write it into a std::wofstream. It works correctly for ascii strings, but when I enter any other character (Arabic or Uzbek) there is nothing written in the file. (size of file is 0 bytes).
this is my code:
wofstream customersFile;
customersFile.open("./customers.txt");
std::wstring ws = lne_address_customer->text().toStdWString();
customersFile << ws << ws.length() << std::endl;
Output for John Smith entered in the line edit is John Smith10. but for unicode strings, nothing.
First I thought that is a problem with QString::toStdWString(), but customersFile << ws.length(); writes correct length of all strings. So I guess I'm doing something wrong wrong with writing wstring in file. [?]
EDIT:
I write it again in eclipse. and compiled it with g++4.5. result is same:
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
int main()
{
cout << "" << endl; // prints
wstring ws = L"سلام"; // this is an Arabic "Hello"
wofstream wf("new.txt");
if (!wf.bad())
wf << ws;
else
cerr << "some problem";
return 0;
}

Add
#include <locale>
and at the start of main,
std::locale::global(std::locale(""));

Related

C++, If...Else with strings and special characters

I was trying to create a simple program to help my students to train German irregular verbs, but I have had problems with special characters and If statement. Basically it does not recognise the ä ö ü and ß, and output is therefore the Else statement (¨Nicht Gut"). How could I fix it?
#include <iostream>
#include <conio.h>
#include <locale.h>
using namespace std;
int main () {
setlocale(LC_CTYPE, "German");
string Antwort1;
string Antwort2;
string Antwort3;
getline(cin, str);
cout << str;
cout << "Präsens von BEHALTEN (du)" << endl;
cin >> Antwort1;
if (Antwort1 == "behältst") {
cout << "Gut!" << endl;
}
else {
cout << "Nicht Gut" << endl;
}
cout << "Präsens von BEHALTEN (er/sie/es/man) " << endl;
cin >> Antwort1;
if (Antwort1 == "behält") {
cout << "Gut!" << endl;
}
else {
cout << "Nicht Gut" << endl;
}
return 0;
}
I tried with
if (Antwort1 == (LC_CTYPE, "German"),"behält")
but then it causes the contrary effect. Then every single string I write is valid ("Gut").
My answer applies to the Windows 10 console using the classic default Command Prompt (I haven't tried it with other systems like PowerShell, nor I have tried these experiments on Linux yet).
It seems to me that, as of today (23 February 2022), Windows 10's Command Prompt and the Microsoft C/C++ Runtime of VS2019 don't support Unicode UTF-8 well: See, for example, this blog post showing a CRT crash you get when trying to call:
_setmode(_fileno(stdout), _O_U8TEXT);
and printing UTF-8 text using std::cout.
In my experience, you can make Unicode work in Windows Command Prompt using Unicode UTF-16. You can still use UTF-8 in your C++ application, but you have to convert between UTF-8 and UTF-16 at the Windows boundaries.
I modified your code to use Unicode UTF-16, and the code seems to work correctly when compiled with Visual Studio 2019, and executed inside the Windows Command Prompt:
// Used for _setmode calls
#include <fcntl.h>
#include <io.h>
#include <stdio.h>
// Console I/O with Unicode UTF-16 wcin, wcout and wstring
#include <iostream>
#include <string>
using std::wcin;
using std::wcout;
using std::wstring;
int main() {
// Enable Unicode UTF-16 console input/output
_setmode(_fileno(stdout), _O_U16TEXT);
_setmode(_fileno(stdin), _O_U16TEXT);
wcout << L"Präsens von BEHALTEN (du) \n";
wstring Antwort1;
wcin >> Antwort1;
if (Antwort1 == L"behältst") {
wcout << L"Gut! \n";
} else {
wcout << L"Nicht Gut \n";
}
}
Note the use of L"..." to represent UTF-16 string literals, and the use of wchar_t-based std::wcout, std::wcin, and std::wstring instead of the char-based std::cout, std::cin and std::string.

fstream not working properly with russian text?

I work with russian a lot and I've been trying to get data from a file with an input stream. Here's the code, it's supposed to output only the words that contain no more than 5 characters.
#include <iostream>
#include <fstream>
#include <string>
#include <Windows.h>
using namespace std;
int main()
{
setlocale(LC_ALL, "ru_ru.utf8");
ifstream input{ "in_text.txt" };
if (!input) {
cerr << "Ошибка при открытии файла" << endl;
return 1;
}
cout << "Вывод содержимого файла: " << "\n\n";
string line{};
while (input >> line) {
if (line.size() <= 5)
cout << line << endl;
}
cout << endl;
input.close();
return 0;
}
Here's the problem:
I noticed the output didn't pick up all of the words that were actually containing less than 5 characters. So I did a simple test with the word "Test" in english and the translation "тест" in russian, the same number of characters. So my text file would look like this:
Test тест
I used to debugger to see how the program would run and it printed out the english word and left the russian. I can't understand why this is happening.
P.S. When I changed the code to if (line.size() <= 8) it printed out both of them. Very odd
I think I messed up my system locale somehow I don't know. I did one time try to use std::locale
without really understanding it, maybe that did something to my PC I'm not really sure. Please help
I'm very unsure about this but using codecvt_utf8 and wstring_convert seems to work:
#include <codecvt> // codecvt_utf8
#include <string>
#include <iostream>
#include <locale> // std::wstring_convert
int main() {
// ...
while (input >> line) {
// convert the utf8 encoded `line` to utf32 encoding:
std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> u8_to_u32;
std::u32string u32s = u8_to_u32.from_bytes(line);
if (u32s.size() <= 5) // check the utf32 length
std::cout << line << '\n'; // but print the utf8 encoded string
}
// ...
}
Demo

Cin and getline won't save non ascii characters correctly

I'm trying to open a file which path has non-ascii characters. The user drags the file into the cmd and I get the path with getline.
When I try to open the file using the path provided by the user it doesn't work.
#include <iostream>
#include <string>
#include <fstream>
#include <cstdlib>
using namespace std;
int main() {
string userInput; //Saves the user input
string autoInput = "C:\\Espolón\\file.txt"; //Specifying the path like this works
ifstream file1; //For opening the file with the userInput
ifstream file2; //For opening the file with autoInput
getline(cin, userInput);
system("CLS"); //Clears the CMD
file1.open(userInput); //This throws an error. Note that I didn't use is_open for cleaner code but im actually using it in my tests
file2.open(autoInput); //This works perfectly
cout << "User input: " + userInput << endl<<"Auto input: " + autoInput << endl; //Both show correctly in the CMD
system("pause");
}
Although the cout shows everything correctly, while debugging I found out the userInput non-ascii character 'ó' is being changed with a '¢' ("C:\Espol¢n\file.txt") but the autoInput is correctly stored ("C:\Espolón\file.txt"):
screen capture. Therefore, I can't open a file using the path provided by the user if it has special characters.
I tried using wide characters as I've read in other similar questions:
#include <iostream>
#include <string>
#include <fstream>
#include <cstdlib>
using namespace std;
int main() {
wstring userInput; //Saves the user input
string autoInput = "C:\\Espolón\\file.txt"; //Specifying the path like this works
ifstream file1; //For opening the file with the userInput
ifstream file2; //For opening the file with a fix string
getline(wcin, userInput);
system("CLS"); //Clears the CMD
file1.open(userInput); //This throws an error. Note that I didn't use is_open for cleaner code but im actually using it in my tests
file2.open(autoInput); //This works perfectly
wcout << L"User input: " + userInput << endl;
cout<<"Auto input: " + autoInput << endl; //Both show correctly in the CMD
system("pause");
}
But the issue persists.
I also tried changing the encoding to Unicode but nothing changed.
Any help would be welcomed!
(Sry for bad eglish)
Windows APIs expect Unicode in UTF16LE format, not UTF8. Wide string characters and functions are treated as UTF16 in Windows.
The letter ó is ANSI, not ASCII. Windows also supports ANSI. You can use ANSI generally if your program runs on your own computer, but even then, you can run in to problems. If you wish to use ANSI, then try not to mix it with UTF16 and wide string format (std::wstring)
Windows 8 and 10 do have limited console support for UTF8. But it is recommended to use UTF16 if you want to avoid the pitfalls with UTF8.
Here is UTF16 version:
#include <iostream>
#include <string>
#include <fstream>
#include <cstdlib>
#include <io.h>
#include <fcntl.h>
using namespace std;
int main()
{
_setmode(_fileno(stdout), _O_U16TEXT);
_setmode(_fileno(stdin), _O_U16TEXT);
wstring userInput;
wstring autoInput = L"C:\\Espolón\\file.txt";
ifstream file1, file2;
getline(wcin, userInput);
file1.open(userInput);
file2.open(autoInput);
wcout << L"file1: " << (file1.good() ? L"good\n" : L"bad\n");
wcout << L"file2: " << (file1.good() ? L"good\n" : L"bad\n");
wcout << L"User input: " + userInput << endl;
wcout << L"Auto input: " + autoInput << endl;
system("pause");
return 0;
}
Unix based systems use UTF8. Note that these systems don't easily get along with UTF16. They use std::wstring is used for UTF32. For the most part you can simply use std::string for UTF8 in Linux, Mac, etc.
The text file may contain text in ASCII format, ANSI, UTF8, or UTF16. It is recommended to use UTF8 for compatibility.
In Windows, store the text file in UTF8 format. Use these functions to convert back and forth between UTF8 and UTF16:
//need at least c++11 for writable std::string::data()
std::string get_u8(const std::wstring u16)
{
if(u16.empty()) return std::string();
int size = WideCharToMultiByte(CP_UTF8, 0, u16.c_str(), -1, 0, 0, 0, 0);
std::string u8(size, 0);
WideCharToMultiByte(CP_UTF8, 0, u16.c_str(), -1, u8.data(), size, 0, 0);
return u8;
}
std::wstring get_u16(const std::string u8)
{
if(u8.empty()) return std::wstring();
int size = MultiByteToWideChar(CP_UTF8, 0, u8.c_str(), -1, 0, 0);
std::wstring u16(size, 0);
MultiByteToWideChar(CP_UTF8, 0, u8.c_str(), -1, u16.data(), size);
return u16;
}

type in special characters and save in file [duplicate]

This question already has answers here:
Output unicode strings in Windows console app
(16 answers)
Closed 4 years ago.
I want to use a plain text file to save input from a user to read it out later In the application it is possible that there occur German and Spanish special signs. Unfortunately the code below doesn't save the signs properly in the file. How is it best to solve this issue?
I solved this issue in a similar application in C by saving in a .bin file instead of a .txt but isnt there a better solution in c++?
#include <string>
#include <fstream>
#include <iostream>
int main() {
std::string s = "abcöäüÑ"; //(alt + 165 for Ñ)
std::ofstream ofs{ "test.txt" };
ofs << s <<'\n'; //this works fine "abcöäüÑ"
std::string s2;
std::cin >> s2; //typeing in the app abcöäüÑ (alt+165 for Ñ)
// i use a windows system with a german keyboard setting
ofs << s2 <<'\n'; //this doesn't it gets "abc”„¥"
}
I use a windows 7 64 system with visual studio 2017 with a german keyboard setting.
The easiest solution (which is deprecated) is to use ANSI code page for German:
setlocale(LC_ALL, "gr");
cout << "abcöäüÑ\n";
ofstream fout("ansi.txt");
fout << "abcöäüÑ\n";
This will work for some limited character sets, it's relatively safe if you stick to Western Latin characters. Maybe this is what you have done with your C code. It doesn't have much to do with saving the file in binary or text.
In Windows it is recommended to use Unicode with wide string functions. Example:
#include <iostream>
#include <string>
#include <fstream>
#include <codecvt>
#include <io.h>
#include <fcntl.h>
int main()
{
_setmode(_fileno(stdout), _O_U16TEXT);//wcout instead of cout
_setmode(_fileno(stdin), _O_U16TEXT); //wcin instead of cin
std::locale loc_utf16(std::locale(), new std::codecvt_utf16<wchar_t>);
std::wofstream fout(L"utf16.txt", std::ios::binary);
if(fout)
{
fout.imbue(loc_utf16);
fout << L'\xFEFF'; //insert optional BOM for UTF16
fout << L"abcöäüÑ ελληνική\r\n";
fout.close();
}
std::wifstream fin(L"utf16.txt", std::ios::binary);
if(fin)
{
fin.imbue(loc_utf16);
fin.seekg(2, std::ios::beg); //skip optional BOM if it were added
std::wstring ws;
while(std::getline(fin, ws)) std::wcout << ws << std::endl;
fin.close();
}
return 0;
}
The disadvantage with UTF16 is that programs on systems like Linux may have a hard time with this format. Some people will save the file in UTF8 format, because UTF8 is more familiar with other systems.
Windows itself is UTF16 based. So you have to read and display the input in UTF16. But you can read/write the file in UTF8 format. Example:
std::wcout << "enter:";
std::wstring sw;
std::wcin >> sw;
std::locale loc_utf8(std::locale(), new std::codecvt_utf8<wchar_t>);
std::wofstream fout(L"utf8.txt", std::ios::binary);
if(fout)
{
fout.imbue(loc_utf8);
fout << sw << L"\r\n";
fout << L"abcöäüÑ ελληνική\r\n";
}
Here the binary flag does matter.

Characters not recognized while reading from file

I have the following c++ code in visual studio to read characters from a file.
ifstream infile;
infile.open(argv[1]);
if (infile.fail()) {
cout << "Error reading from file: " << strerror(errno) << endl;
cout << argv[0] << endl;
}
else {
char currentChar;
while (infile.get(currentChar)) {
cout << currentChar << " " << int(currentChar) << endl;
//... do something with currentChar
}
ofstream outfile("output.txt");
outfile << /* output some text based on currentChar */;
}
infile.close();
The file in this case is expected to contain mostly normal ASCII characters, with the exception of two: “ and ”.
The problem is that the code in it's current form is not able to recognise those characters. couting the character outputs garbage, and its int conversion yields a negative number that's different depending on where in the file it occurs.
I have a hunch that the problem is encoding, so I've tried to imbue infile based on some examples on the internet, but I haven't seemed to get it right. infile.get either fails when reaching the quote character, or the problem remains. What details am I missing?
The file you are trying to read is likely UTF-8 encoded. The reason most characters read fine is because UTF-8 is backwards compatible with ASCII.
In order to read a UTF-8 file I'll refer you to this: http://en.cppreference.com/w/cpp/locale/codecvt_utf8
#include <fstream>
#include <iostream>
#include <string>
#include <locale>
#include <codecvt>
...
// Write file in UTF-8
std::wofstream wof;
wof.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t,0x10ffff,std::generate_header>));
wof.open(L"file.txt");
wof << L"This is a test.";
wof << L"This is another test.";
wof << L"\nThis is the final test.\n";
wof.close();
// Read file in UTF-8
std::wifstream wif(L"file.txt");
wif.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t,0x10ffff, std::consume_header>));
std::wstringstream wss;
wss << wif.rdbuf();
(from here)
try:
while (infile.get(&currentChar, 1))
Also, be sure that you pass argv[1]. Print its value:
cout<<argv[1]<<endl;