QXmlStreamWriter and cyrillic - c++

I have a problem with encoding when writing XML files via QXmlStreamWriter in windows, how can I resolve it? Using stream.setCodec("UTF-8") or "windows-1251" is not helped.
QFile *file = new QFile(filename);
if (file->open(QIODevice::WriteOnly | QIODevice::Text))
{
QXmlStreamWriter stream(file);
stream.setAutoFormatting(true);
stream.writeStartDocument();
stream.writeStartElement("СЕКЦИЯ"); // start root section
stream.writeStartElement("FIELD");
stream.writeAttribute("name", "Имя");
stream.writeAttribute("value", "Иван");
stream.writeEndElement();
stream.writeEndElement(); // END СЕКЦИЯ
file->close();
}

Most likely the interpretation of the string literals in your source file is the problem, not the configuration of the stream writer.
Make sure your source file is encoded in UTF-8 and use QString::fromUtf8("Imja") etc. (Imja in cyrillic of course) instead of the implicit literal to QString conversion.

Related

Read Arabic file contents using string in c++

I have a text file (ansi encoding) that contains an arabic contents and I have read it using c++ as:
ifstream ifs(file.GetFileName());
std::string content((std::istreambuf_iterator<char>(ifs)), std::istreambuf_iterator<char>());
unfortunately the content variable holds encrypted strings (which must be in arabic lang) i.e:
121101 ÇáÒÈæä ßãÇá
121102 ÇáÒÈæä ÓÚíÏ
121103 ÇáÒÈæä ÚãÇÑ
any solution???
Thanks :)

Why Unicode fonts are not showing properly in the QTextBrowser when Unicode contents are read from an html file?

I am reading an html file. The file basically contains Unicode texts as follows:
<b>akko- sati (ā + kruś), akkhāti (ā + khyā), abbahati (ā + bṛh)</b>
But the QTextBrowser is not interpreting the Unicode fonts. So the QTextBrowser shows them as follows:
akko- sati (Ä + kruÅ›), akkhÄti (Ä + khyÄ), abbahati (Ä + bá¹›h)
The QTextBrowser is correctly interpreting the html tags. But what’s wrong with the Unicode fonts?
Following are my codes for reading and populating the Unicode contents:
void MainWindow::populateTextBrowser(const QModelIndex &index)
{
QFile file("Data\\" + index.data().toString() + ".html");
if (!file.open(QFile::ReadOnly | QFile::Text)) {
statusBar()->showMessage("Cannot open file: " + file.fileName());
}
QTextStream textStream1(&file);
QString string = "<meta http-equiv='Content-Type' content='text/html; charset=utf-8' /><link rel='stylesheet' type='text/css' href='Data/Accessories/qss.css' />";
string += textStream1.readAll();
ui->textBrowser->setHtml(string);
}
However, if I do not read the Unicode content from an html file but directly type them into the parameter, then only it interprets the Unicode fonts. For example, if I do as follows it is fine:
ui->textBrowser->setHtml("<b>akko- sati (ā + kruś), akkhāti (ā + khyā), abbahati (ā + bṛh)</b>");
How can I read the Unicode contents from html files and show them in the QTextBrowser?
I shall be very thankful if someone shows me the buggy parts in my codes or tells me a better way of solving my problem.
You read a binary file into QString but do not tell the program, which bytes correspond to which unicode character, i.e. you don't specify the "encoding" aka. "codec".
To debug your problem, ask QTextStream which codes it uses by default:
QTextStream textStream1(&file);
qDebug() << textStream1.codec()->name();
On my Linux system, that is already "UTF-8" but it might be different on your system. To force QTextStream interpreting the input as UTF-8, use QTextStream::setCodec.

Load and display QString with proper encoding

I am trying to load a name from file that has several special characters and if it is in file (looks like meno: Marek Ružička/) display it. Code here:
QFile File("info/"+meno+".txt");
File.open(QIODevice::ReadOnly);
QVariant Data(File.readAll());
QString in = Data.toString(), pom;
if(in.contains("meno:")){
pom = in.split("meno:").at(1);
pom=pom.split("/").at(0);
ui->label_meno->setText(trUtf8("Celé meno: ")+pom);}
the part trUtf8("Celé meno: ") displays well but I cant find how to display string in pom, it alone looks like Marek RužiÄka, using toUtf8() function makes it Marek RuþiÃÂka, I've tried to convert it to stdString too but doesn't work either. I am not sure if the conversion from QFile to QVariant and to QString is right, if this causes problem how to read data properly?
Try this:
QTextCodec* utf = QTextCodec::codecForName("UTF-8");
QByteArray data = <<INPUT QBYTEARRAY>>.toUtf8();
QString utfString = utf->toUnicode(data);
qDebug() << utfString;
One of the right ways is to use QTextStream for the reading, and then you can specify the codec for utf 8 as follow:
in.setCodec("UTF-8");
See the documentation for further details:
void QTextStream::setCodec(const char * codecName)
Sets the codec for this stream to the QTextCodec for the encoding specified by codecName. Common values for codecName include "ISO 8859-1", "UTF-8", and "UTF-16". If the encoding isn't recognized, nothing happens.
Example:
QTextStream out(&file);
out.setCodec("UTF-8");
Another right way would be to fix your current code without using QTextStream by using the dedicated QString method as follows:
QString in = QString::fromUtf8(File.readAll()), pom;
Please note that though you may wish to add more error handling into your code than available now.

Detect text file encoding

In my program I load plain text files supplied by the user:
QFile file(fileName);
file.open(QIODevice::ReadOnly);
QTextStream stream(&file);
const QString &text = stream.readAll();
This works fine when the files are UTF-8 encoded, but some users try to import Windows-1252 encoded files, and if they have words with special characters (for example "è" in "boutonnière"), those will show incorrectly.
Is there a way to detect the encoding, or at least distinguish between UTF-8 (possibly without BOM), and Windows-1252, without asking the user to tell me the encoding?
Turns out that auto-detecting the encoding is impossible for the general case.
However, there is a workaround to at least fall back to the system locale if the text is not valid UTF-8/UTF-16/UTF-32 text. It uses QTextCodec::codecForUtfText(), which tries to decode a byte array using UTF-8, UTF-16 and UTF-32, and returns the supplied default codec if it fails.
Code to do it:
QTextCodec *codec = QTextCodec::codecForUtfText(byteArray, QTextCodec::codecForName("System"));
const QString &text = codec->toUnicode(byteArray);
Update
The above code will not detect UTF-8 without BOM, however, as codecForUtfText() relies on the BOM markers. To detect UTF-8 without BOM, see https://stackoverflow.com/a/18228382/492336.
This trick works for me, at least so far. This method does not require BOM to work:
QTextCodec::ConverterState state;
QTextCodec *codec = QTextCodec::codecForName("UTF-8");
const QByteArray data(readSource());
const QString text = codec->toUnicode(data.constData(), data.size(), &state);
if (state.invalidChars > 0)
{
// Not a UTF-8 text - using system default locale
QTextCodec * codec = QTextCodec::codecForLocale();
if (!codec)
return;
ui->textBrowser->setPlainText(codec->toUnicode(readSource()));
}
else
{
ui->textBrowser->setPlainText(text);
}

Replace single quote in QFile file name

I want open a file which contains single quote but I can't open it.
File name example : QFile file("my'file.example")
I've tried with file.fileName().replace("\'", "\\\'") but it's the same result.
You are trying to replace "\'" but it is not on the original string so it will not work. Furthermore, QFile::filename return a copy of the filename property, and any modification (like replace) will be made on the copy. To play with the filename (before open), use
file.setFilename(file.fileName().myModificationOperation())
Have you tried with QFile file("my\'file.example")?
to test your parameter use the static call:
QString filename = "my\'file.example";
bool okay = QFile::exists(filename);