UTF-16LE Encoding woes with Qt text editor written in C++ - c++

So I have a QT text editor that I have started creating. I started with this http://doc.qt.io/archives/qt-5.7/gettingstartedqt.html and i have added on to it. So far I have added a proper save/save as function (the version in the link only really has a save as function), a "find" function, and an "open new window" function. Very soon, I will add a find and replace function.
I am mainly doing this for the learning experience, but I am also going to eventually add a few more functions that will specifically help me create PLC configuration files at work. These configuration files could be in many different encodings, but most of them seem to be in UTF-16LE (according to Emacs anyway.) My text editor originally had no problem reading the UTF-16LE, but wrote in plain text, I needed to change that.
Here is the snippet from the Emacs description of the encoding system of one of these UTF16-LE files.
U -- utf-16le-with-signature-dos (alias: utf-16-le-dos)
UTF-16 (little endian, with signature (BOM)).
Type: utf-16
EOL type: CRLF
This coding system encodes the following charsets:
unicode
And here is an example of the code that I am using to encode the text in my QT text editor.
First... This is similar to the link that I gave earlier. The only difference here is that "saveFile" is a global variable that I created to perform a simple "Save" function instead of a "Save As" function. This saves the text as plain text and works like a charm.
void findreplace::on_actionSave_triggered()
{
if (!saveFile.isEmpty())
{
QFile file(saveFile);
if (!file.open(QIODevice::WriteOnly))
{
// error message
}
else
{
QTextStream stream(&file);
stream << ui->textEdit->toPlainText();
stream.flush();
file.close();
}
}
}
Below is my newer version which attempts to save the code in "UTF-16LE." My text editor can read the text just fine after saving it with this, but Emacs will not read it at all. This to me says that the configuration file will probably not be readable by the programs that read it. Something changed, not sure what.
void findreplace::on_actionSave_triggered()
{
if (!saveFile.isEmpty())
{
QFile file(saveFile);
if (!file.open(QIODevice::WriteOnly))
{
// error message
}
else
{
QTextStream stream(&file);
stream << ui->textEdit->toPlainText();
stream.setCodec("UTF-16LE");
QString stream3 = stream.readAll();
//QString stream2 = stream3.setUnicode();
//QTextCodec *codec = QTextCodec::codecForName("UTF-16LE");
//QByteArray stream2 = codec->fromUnicode(stream3);
//file.write(stream3);
stream.flush();
file.close();
}
}
}
The parts that are commented out I also tried, but they ended up writing the file as Asian (Chinese or Japanese) characters. Like I said my text editor, (and Notepad in Wine) can read the file just fine, but Emacs now describes the encoding as the following after saving.
= -- no-conversion (alias: binary)
Do no conversion.
When you visit a file with this coding, the file is read into a
unibyte buffer as is, thus each byte of a file is treated as a
character.
Type: raw-text (text with random binary characters)
EOL type: LF
This indicates to me that something is not right in the file. Eventually this text editor will be used to create multiple text files at once and modify their contents via user input. It would be great if I could get this encoding right.

Thanks to the kind fellows that commented on my post here, I was able to answer my own question. This code here solved my problem.
void findreplace::on_actionSave_triggered()
{
if (!saveFile.isEmpty())
{
QFile file(saveFile);
if (!file.open(QIODevice::WriteOnly))
{
// error message
}
else
{
QTextStream stream(&file);
stream.setCodec("UTF-16LE");
stream.setGenerateByteOrderMark(true);
stream << ui->textEdit->toPlainText();
stream.flush();
file.close();
}
}
}
I set the codec of the stream, and then set the the generate BOM to "True." I guess that I have more to learn about encodings. I thought that the byte order mark had to be set to a specific value or something.I wasn't aware that I just had to set this value to "True" and that it would take care of itself. Emacs can now read the files that are generated by saving a document with this code, and the encoding documentation from Emacs is the same. I will eventually add options for the user to pick which encoding they need while saving. Glad that I was able to learn something here.

Related

How do I save/load information onto/from my device?

I'm developing an application, but I need it to save its information onto the computer, and load it from there next time it's opened.
To give the simplest example: I have an array of strings and I want to save them as a *.txt file in the application's directory. And every member of the array should be on a new row of the file.
And I want to load the entries of the file into the array when I open the app, or create an empty *.txt file, if one doesn't exist.
Note: if there is an easier way to do this, instead of saving them into a *.txt, please tell me. Saving them strictly as a *.txt format isn't mandatory.
Also, I am using wxWidgets for my application, if it's gonna make it any easier.
MainFrame::MainFrame() {
wxFileName f(wxStandardPaths::Get().GetExecutablePath());
wxString appPath(f.GetPath());
std::ifstream inputFileStream;
inputFileStream.open(std::string(appPath.mb_str(wxConvUTF8)) + "data.txt");
std::string data;
inputFileStream >> data;
}
MainFrame::~MainFrame()
{
wxFileName f(wxStandardPaths::Get().GetExecutablePath());
wxString appPath(f.GetPath());
std::ofstream outputFileStream;
outputFileStream.open(std::string(appPath.mb_str(wxConvUTF8)) + "data.txt");
std::string data = "something";
outputFileStream << data;
outputFileStream.close();
}
When frame is created, I get the data. When frame is destroyed, I save the data. I don't use C++ standard library classes, but wxWidgets classes and methods for UTF-8 support. (I haven't checked if this piece of code works – it's taken from my old project.)

CStdioFile problems with encoding on read file

I can't read a file correctly using CStdioFile.
I open notepad.exe, I type àèìòùáéíóú and I save twice, once I set codification as ANSI (really is CP-1252) and other as UTF-8.
Then I try to read it from MFC with the following block of code
BOOL ReadAllFileContent(const CString &FilePath, CString *fileContent)
{
CString sLine;
BOOL isSuccess = false;
CStdioFile input;
isSuccess = input.Open(FilePath, CFile::modeRead);
if (isSuccess) {
while (input.ReadString(sLine)) {
fileContent->Append(sLine);
}
input.Close();
}
return isSuccess;
}
When I call it, with ANSI file I've got the expected result àèìòùáéíóú
but when I try to read the UTF8 encoded file I've got à èìòùáéíóú
I would like my function works with all files regardless of the encoding.
Why I need to implement?
.EDIT.
Unfortunately, in the real app, files come from external app so change the file encoding isn't an option.I must be able to read both UTF-8 and CP-1252 files.
Any file is valid ANSI, what notepad told ANSI is really Windows-1252 encode.
I've figured out a way to read UTF-8 and CP-1252 right based on the example provided here. Although it works, I need to pass the file encode which I don't know in advance.
Thnks!
I personally use the class as advertised here:
https://www.codeproject.com/Articles/7958/CTextFileDocument
It has excellent support for reading and writing text files of various encodings including unicode in its various flavours.
I have not had a problem with it.

How to create an ISO 8859-15 (instead of default UTF-8) encoded text file on Linux using QTextStream?

The function below is something I have created in a unit test for a Qt project I'm working on.
It creates a file (empty or filled) that is then opened in various use cases, processed and the outcome evaluated. One special use case I have identified is that the encoding actually does affect my application so I decided to cover non-UTF-8 files too (as far as this is possible).
void TestCsvParserOperators::createCsvFile(QString& path, CsvType type, bool utf8)
{
path = "test_data.txt";
QFile csv(path);
// Make sure both reading and writing access is possible. Also turn on truncation to replace any existing files
QVERIFY(csv.open(QIODevice::ReadWrite | QIODevice::Truncate | QIODevice::Text) == true);
QTextStream csvStream(&csv);
// Set encoding
if (utf8)
{
csvStream.setCodec("UTF-8");
}
else
{
csvStream.setCodec("ISO 8859-15");
csvStream.setGenerateByteOrderMark(false);
}
switch(type)
{
case EMPTY: // File doesn't contain any data
break;
case INVALID: // File contains data that is not supported
csvStream << "abc" << '\n';
break;
case VALID:
{
// ...
break;
}
}
csv.close();
}
While the project runs on Linux the data is exported as a plain text file on Windows (and possibly edited with Notepad) and used by my application as it is. I discovered that it is encoded not as UTF-8 but as ISO 8859-15. This led to a bunch of problems including incorrectly processed characters etc.
The actual part in my application that is tested is
// ...
QTextStream in(&csvFile);
if (in.codec() != QTextCodec::codecForName("UTF-8"))
{
LOG(WARNING) << this->sTag << "Expecting CSV file with UTF-8 encoding. Found " << QString(in.codec()->name()) << ". Will attempt to convert to supported encoding";
// Handle encoding
// ...
}
// ...
Regardless of the combination of values for type and utf8 I always get my test text file. However the encoding remains UTF-8 regardless of the utf8 flag.
Calling file on the CSV file with the actual data (shipped by the client) returns
../trunk/resources/data.txt: ISO-8859 text, with CRLF line terminators
while doing the same on test_data.txt gives me
../../build/test-bin/test_data.txt: UTF-8 Unicode text
I've read somewhere that if I want to use some encoding other than UTF-8 I have to work with QByteArray. However I am unable to verify this in the Qt documentation. I've also read that setting the BOM should do the trick but I tried with both enabling and disabling its generation without any luck.
I've already written a small bash script which converts the encoding to UTF-8 (given that the input file is ISO 8859) but I'd like to
have this integrated in my actual application
not being forced to take care of this every single time
have at least some basic test coverage for the encoding that the client uses
Any ideas how to achieve this?
UPDATE: I replaced the content I'm writing to the text file as
csvStream << QString("...").toLatin1() << ...;
and now I get
../../build/test-bin/test_data.txt: ASCII text
which is still not what I'm looking for.
Usually this is what I do:
QTextCodec *codec1 = QTextCodec::codecForName("ISO 8859-15");
QByteArray csvStreambyteArray = " .... "; // from your file
QString csvStreamString = codec1->toUnicode(csvStreambyteArray);
csvStream << csvStreamString ;

Extra character when reading a file. C++

I'm writing two programs that communicate by reading files which the other one writes.
My problem is that when the other program is reading a file created by the first program it outputs a weird character at the end of the last data. This only happens seemingly at random, as adding data to the textfile can result in a normal output.
I'm utilizing C++ and Qt4. This is the part of program 1:
std::ofstream idxfile_new;
QString idxtext;
std::string fname2="some_textfile.txt"; //Imported from a file browser in the real code.
idxfile_new.open (fname2.c_str(), std::ios::out);
idxtext = ui->indexBrowser->toPlainText(); //Grabs data from a dialog of the GUI.
//See 'some_textfile.txt' below
idxfile_new<<idxtext.toStdString();
idxfile_new.clear();
idxfile_new.close();
some_textfile.txt:
3714.1 3715.1 3716.1 3717.1 3719.1 3739.1 3734.1 3738.1 3562.1 3563.1 3623.1
part of program 2:
std::string indexfile = "some_textfile.txt"; //Imported from file browser in the real code
std::ifstream file;
std::string sub;
file.open(indexfile.c_str(), std::ios::in);
while(file>>sub)
{
cerr<<sub<<"\n"; //Stores values in an array in the real code
}
This outputs:
3714.1
3715.1
3716.1
3717.1
3719.1
3739.1
3734.1
3738.1
3562.1
3563.1
3623.1�
If I add more data it works at times. Sometimes it can output data such as
3592.�
or
359�
at the end. So it is not consistent in reading the whole data either. At first I figured it wasn't reading the eof properly, and I have read and tried many solutions to similar problems but can't get it to work correctly.
Thank you guys for the help!
I managed to solve the problem by myself this morning.
For anyone with the same problem I will post my solution.
The problem was the UTF-8 encoding when creating the file. Here's my solution:
Part of program 1:
std::ofstream idxfile_new;
QString idxtext;
std::string fname2="some_textfile.txt";
idxfile_new.open (fname2.c_str(), std::ios::out);
idxtext = ui->indexBrowser->toPlainText();
QByteArray qstr = idxtext.toUtf8(); //Enables Utf8 encoding
idxfile_new<<qstr.data();
idxfile_new.clear();
idxfile_new.close();
The other program is left unchanged.
A hex converter displayed the extra character as 'ef bf bd', which is due to the replacement character U+FFFD that replace invalid bytes when encoding to Utf8.

std::ifstream not reading in after switching text editors from notepad++ to sublime text 2 for using the file it's reading in?

I read some data for my application from file, and it recently stopped working. I feel like the time when it stopped working corresponds to when I switched from Notepad++ to Sublime Text 2... Anyway, here is my code to read in the data:
std::ifstream stream;
stream.open("parsing_model.txt");
char ignore_char;
std::string model_class;
int parsing_model;
while (stream >> model_class >> ignore_char >> parsing_model)
{
// snip
// doesn't even make it into a single run of this while loop.
}
My data is organized as
Item1, 12
Item2, 4
foo, 42
bar, 1
Is it something in the text encoding? How can I make my code robust against this and solve my problem? This code absolutely worked for months up until recently. Thanks
Check to see if the stream is in a good state before using it.
stream.open("parsing_model.txt");
if (stream.good()) {
//... read the stream
} else {
std::cerr << "failed to open input file\n";
}
If there is a failure, make sure the current working directory is the same location as where you have saved the input file. It seems you are running on windows, so you should be able to use this command to view your current directory.
system("dir & pause");