C++ write utf-8 to text file

C++ write utf-8 to text file - c++

How to write characters in UTF-8 coding page into text file in С++?
#include <fstream>
using namespace std;
int main() {
ofstream fout("tree.txt");
fout << "┌───────────────22───────┐" << endl;
fout.close();
return 0;
}
Wrong characters will be written into file.

If you want to write UTF-8 literals, then you should encode the source file with UTF-8, and make sure that the compiler reads the input as UTF-8. If you've done that, then the output should be correct. You could use u8"┌───────────────22───────┐" literal just to be sure. Unfortunately, there's no std::basic_ifstream<char8_t> though.

Related

C++ Semicolon separated file reading

I basically have a semicolon separated text file, in that file there are some commands like "A", "P", "R", "S", and the inputs to process according to those commands like names "Ali Aksu, Mithat Köse", like transactions "Process, withdraw". I have a program which process those inputs without any problems in console (User gives the inputs). But i need to make it getting the inputs from the semicolon separated file. Here is a test for the reading:
This is an example input file:
A;Ali;Aksu;N;2;deposit;withdraw
P
A;Mithat;Köse;P;3;deposit;credit;withdraw
This is the output on the console:
ï»¿A/Ali/Aksu/N/2/deposit/withdraw
P
A/Mithat/KÃ¶se/P/3/deposit/credit/withdraw
/
1.Problem: It cannot read the special characters like "ö"
2.Problem: Why is that starting with this weird "ï»¿" character?
#include <iostream>
#include <fstream>
using namespace std;
int main(){
setlocale(LC_ALL, "Turkish");
fstream myfile;
char *string;
string = new char[50];
myfile.open("input_file.txt",ios::in);
while(!myfile.eof()){
myfile.getline(string, 49, ';');
cout << string << "/";
}
myfile.close();
cout << endl;
system("pause");
return 0;
}

I will assume that the file is in UTF8 format. If so then you question is really, how to i read UTF8 files using c++
here is somebody reading chinese How to read an UTF-8 encoded file containing Chinese characters and output them correctly on console?. You should be able to adapt this to your locale

How to read non-ASCII lines from file with std::ifstream on Linux?

I was trying to read a plain text file. In my case, I need to read line per line, and process that information. I know the C++ has wstuffs for reading wchars. I tried the following:
#include <fstream>
#include <iostream>
int main() {
std::wfstream file("file"); // aaaàaaa
std::wstring str;
std::getline(file, str);
std::wcout << str << std::endl; // aaa
}
But as you can see, it did not read a full line. It stops when reads "à", which is non-ASCII. How can I fix it?

You will need to understand some basic concepts of encodings. I recommend reading this article: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets. Basically you can't assume every byte is a letter and that every letter fits in a char. Also, the system must know how to extract letters from the sequence of bytes you have on the file.
Let's assume your file is encoded in UTF-8, this is likely given that you are on Linux. I'll assume your terminal also supports it. If you directly read using a std::string, with chars, you will have everything working. Look:
// olá
#include <iostream>
#include <fstream>
int main() {
std::fstream file("test.cpp");
std::string str;
std::getline(file, str);
std::cout << str << std::endl;
}
The output is what you expect, but this is not really correct. Look at what is going on: The file is encoded in utf-8. This means the first line is this byte sequence:
/ / o l á
47 47 32 111 108 195 161
Note that á is encoded with two bytes. If you ask the size of the string (str.size()), you will indeed get the wrong value: 7. This happens because the string thinks every byte is a char. When you send it to std::cout, the string will be given to the terminal to print. And the magical part: The terminal works with utf-8 by default. So it just assumes the string is utf-8 and correctly prints 6 chars.
You see that it works, but it is not really right. Try to make any string operation on the data and you may break the utf-8 encoding and will never be able to print it again!
Let's go for wstrings. They store each letter with a wchar_t that, on Linux, has 4 bytes. This is enough to hold any possible unicode character. But it will not work directly because C++ by default uses the "C" locale. A locale is a specification of how to deal with various aspects of the system, like "how to print a date" or "how to format a currency value" or even "how to decode text". The last factor is important and the default "C" encoding says: "Assume everything is ASCII". When it is reading the file and tries to decode a non-ASCII byte, it just fails silently.
The correction is simple: Use a UTF-8 locale. Look:
// olá
#include <iostream>
#include <fstream>
#include <locale>
int main() {
std::ios::sync_with_stdio(false);
std::locale loc("en_US.UTF-8"); // You can also use "" for the default system locale
std::wcout.imbue(loc); // Use it for output
std::wfstream file("test.cpp");
file.imbue(loc); // Use it for file input
std::wstring str;
std::getline(file, str); // str.size() will be 6
std::wcout << str << std::endl;
}
You may be asking what std::ios::sync_with_stdio(false); means. It is required because by default C++ streams are kept in sync with C streams. This is good because enables you to use both cout and printf on the same program. We have to disable it because C streams will break the utf-8 encoding and will produce garbage on the output.

Creating the same text file over and over

I need to create a program that writes a text file in the current folder, the text file always contains the same information, for example:
Hello,
This is an example of how the text file may look
some information over here
and here
and so on
So I was thinking in doing something like this:
#include <iostream>
#include <fstream>
using namespace std;
int main(){
ofstream myfile("myfile.txt");
myfile << "Hello," << endl;
myfile << "This is an example of how the text file may look" << endl;
myfile << "some information over here" << endl;
myfile << "and here" << endl;
myfile << "and so on";
myfile.close();
return 0;
}
Which works if the number of lines in my text file is small, the problem is that my text file has over 2000 lines, and I'm not willing to give the myfile << TEXT << endl; format to every line.
Is there a more effective way to create this text file?
Thanks.

If you have the problem of writing in same file, you need to use an append mode.
i.e., your file must be opened like this
ofstream myfile("ABC.txt",ios::app)

You may use Raw string in C++11:
const char* my_text =
R"(Hello,
This is an example of how the text file may look
some information over here
and here
and so on)";
int main()
{
std::ofstream myfile("myfile.txt");
myfile << my_text;
myfile.close();
return 0;
}
Live example
Alternatively, you may use some tools to create the array for you as xxd -i

If you don't care about the subtile differences between '\n' and std::endl, then you can create a static string with your text outside of your function, and then it's just :
myfile << str // Maybe << std::endl; too
If your text is really big, you can write a small script to format it, like changing every newlines with "\n", etc.

It sounds like you should really be using resource files. I won't copy and paste all of the information here, but there's a very good Q&A already on this website, over here: Embed Text File in a Resource in a native Windows Application
Alternatively, you could even stick the string in a header file then include that header file where it's needed:
(assuming no C++11 since if you do you could simply use Raw to make things a little easier but an answer for that has already been posted - no need to repeat).
#pragma once
#include <iostream>
std::string fileData =
"data line 1\r\n"
"data line 2\r\n"
"etc.\r\n"
;
Use std::wstring and prepend the strings with L if you need more complex characters.
All you need to do is to write a little script (or even just use Notepad++ if it's a one off) to replace backslashes with double backslash, replace double quotation marks with backslash double quotation marks, and replace line breaks with \r\n"{line break}{tab}". Tidy up the beginning and end, and you're done. Then just write the string to a file.

c++ read Arabic text from file

In C++, I have a text file that contains Arabic text like:
شكلك بتعرف تقرأ عربي يا ابن الذين
and I want to parse each line of this file into a string and use string functions on it (like substr, length, at...etc.) then print some parts of it to an output file.
I tried doing it but it prints some garbage characters like "\'c7\'e1\'de\'d1\"
Is there any library to support Arabic characters?
edit: just adding the code:
#include <iostream>
#include <fstream>
using namespace std;
int main(){
ifstream ip;
ip.open("d.rtf");
if(ip.is_open() != true){
cout<<"open failed"<<endl;
return 0;
}
string l;
while(!ip.eof()){
getline(ip, l);
cout<<l<<endl;
}
return 0;
}
Note: I still need to add some processing code like
if(l == "كلام بالعربي"){
string s = l.substr(0, 4);
cout<<s<<" is what you are looking for"<<endl;
}

You need to find out which text encoding the file is using. For example, to read an UTF-8 file as a wchar_t you can (C++11):
std::wifstream fin("text.txt");
fin.imbue(std::locale("en_US.UTF-8"));
std::wstring line;
std::getline(fin, line);
std::wcout << line << std::endl;

The best way to deal with this, in my opinion, is to use some UNICODE helper. The strings in C or even in C++ are just an array of bytes. When you do, for example, a strlen() [C] or somestring.length() [C++] you will only have the number os bytes of that string instead of number os characters.
Some auxiliar functions can be used help you on it, like mbstowcs(). But my opinion is that they are kinda old and hard to use.
Another way is to use C++11, that, in theory, has support for many things related to UTF-8. But I never saw it working perfectly, at least if you need to be multi-platform.
The best solution I found is to use ICU library. With this I can work on UTF-8 strings easily and with the same "charm" as working with a regular std::string. You have a string class with methods, for length, substrings and so on... and it's very portable. I use it on Window, Mac and Linux.

You can use Qt too .
Simple example :
#include <QDebug>
#include <QTextStream>
#include <QFile>
int main()
{
QFile file("test.txt");
file.open(QIODevice::ReadOnly | QIODevice::Text);
QTextStream stream(&file);
QString text=stream.readAll();
if(text == "شكلك بتعرف تقرأ عربي يا ابن الذين")
qDebug()<<",,,, ";
}

It is better to process an Arabic text line by line. To get all lines of Arabic text from file, try this
std::wifstream fin("arabictext.txt");
fin.imbue(std::locale("en_US.UTF-8"));
std::wstring line;
std::wstring text;
while ( std::getline(fin, line) )
{
text= text+ line + L"\n";
}

no output with wide streams

I have a problem with wide stream output. My primary concern is wofstream but wcout doesn't work properly either.
So it doesn't produce output besides Latin characters.
That is
#include <string>
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
wstring wstr = L"Андрей";
wofstream fout(L"C:\\Work\\report.htm");
wcout << wstr << L"Привет мир";
fout << wstr << L"Привет мир";
fout.close();
}
Produces no output, the file stays 0 byte long.
Mixing like wcout<<L"zuhщзг" prints just "zuh", ignores the rest.
I use MVS 2013 with Intel C++ Composer 14.0
EDIT:
Windows Unicode C++ Stream Output Failure describes similar problem. But I don't quite understand how the solution works.
MVS/Windows use UTF-16 for wide strings. and I would like they to be written in the file, as is, that is utf-16, without any unnecessary conversion

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ write utf-8 to text file - c++

Related

C++ Semicolon separated file reading

How to read non-ASCII lines from file with std::ifstream on Linux?

Creating the same text file over and over

c++ read Arabic text from file

no output with wide streams

Categories

Resources