UTF-8 error in GtkTextView while decoding base64 - c++

I have been trying to figure this out for a few days now. All I am trying to do is decode a base64 string and add it to a Gtk::TextView. Below is the code:
txtbuffer_ = Gtk::TextBuffer::create();
txtview_.set_buffer(txtbuffer_);
const Glib::ustring str = Glib::Base64::decode("YmJi3A==");
txtbuffer_->set_text(str);
When I run the program I get the error:
Gtk-CRITICAL **: gtk_text_buffer_emit_insert: assertion 'g_utf8_validate (text, len, NULL)' failed
This error only occurs with Unicode characters. When the text is ASCII it all works fine.
I have tried three different base64 decoders, I tried using std::string and Glib::ustring with all the different decoders. I also tried using the function Glib::locale_to_utf8(), but that gives me the error terminate called after throwing an instance of 'Glib::ConvertError'. And I tried using Glib::convert with the same error.
I know that Gtk::TextView can display Unicode because if I set the text to a string with Unicode it will display the text.
I read that Gtk::TextView displays text in UTF-8, so I think my problem is that the decoded string is not coded in UTF-8, but I am not sure. So my question is how can I get Gtk::TextView to display the decoded base64?
Added note: I am using version 3.8 of Gtkmm
Tested using version 3.12, same error message
Minimal program:
//test.h
#ifndef TEST_H_
#define TEST_H_
#include <gtkmm.h>
class MainWindow : public Gtk::Window
{
public:
MainWindow();
virtual ~MainWindow();
protected:
Gtk::Box box_main;
Gtk::TextView txtview_;
Glib::RefPtr<Gtk::TextBuffer> txtbuffer_;
};
#endif /* TEST_H_ */
//test.cpp
#include "test.h"
MainWindow::MainWindow()
{
Gtk::Window::add(box_main);
box_main.pack_start(txtview_);
txtbuffer_ = Gtk::TextBuffer::create();
txtview_.set_buffer(txtbuffer_);
const Glib::ustring str = Glib::Base64::decode("YmJi3A==");
txtbuffer_->set_text(str);
Gtk::Window::show_all_children();
}
MainWindow::~MainWindow()
{
}
//main.cpp
#include "test.h"
int main(int argc, char* argv[])
{
Glib::RefPtr<Gtk::Application> app = Gtk::Application::create(argc, argv, "test.program");
MainWindow mw;
return app->run(mw);
}

The reason why it was not working was because the string that I encoded was not UTF-8. Thanks to: https://mail.gnome.org/archives/gtk-list/2014-April/msg00016.html. I found out that the encoding was ISO-8859-1. So there are 2 fixes kind of, first, first encode the string to utf8:
const Glib::ustring str2 = Glib::Base64::encode("bbbÜ");
or you have to figure out the original encoding of the string, so for me this worked:
Glib::convert(base64_str, "UTF-8", "ISO-8859-1");

From documentation:
Note that the returned binary data is not necessarily zero-terminated,
so it should not be used as a character string.
That means utf8 validate will read beyond bounds with a likelyhood near 1 get a sequence of bytes which fail to be valid utf8 characters.
But even that did not fix it. It seems that the length is one too long and the last value is just garbage.
So you can either use (which I'd recommend)
std::string stdstr = Glib::Base64::decode (x);
const Glib::ustring str(stdstr.c_str(), stdstr.length()-1);
or
gsize len = 0;
const gchar *ret = (gchar*)g_base64_decode (x, &len);
len --;
const Glib::ustring str(ret, len);
g_free (ret);
So I guess this a bug in gtk+ (which gtkmm encapsulates)

Related

QString(const char* p) constructor misrepresents non ASCII characters

I am trying to upgrade a bigger C++ program from Qt4 to Qt5 or higher and have some problems with the legacy code that was written in ISO-LATIN1. From Qt5, the code is expected to be present in UTF8 and that is what we tried to do. We use a own String-class (let's call it myQString here), that was basically a char* under Qt4 and went to a QString-derived class in Qt5. So far so good.
The cases where I still have some problems is when I try to pass char* variables to the myQString class, that includes non-ASCII characters (like the letter characters with diaeresis for example, 'ä', 'Ä', 'ö', 'Ö', etc.).
I tried to write a mini program that reproduces/illustrates the problem. To make it clearer I could post some code but a picture would be better in this case:
Zoom Debugger output
Here we can see via the Debugger-View, that the desired end-products (cyan and yellow color: "mstr2, mstr4, qstr2, qstr4), that should be something that store "Äa", misrepresent the first byte "Ä". All of them use the green marked constructor, myQString(const char* p).
The function that illustrates the problem (in the Debugger) is charPointerToQstring(). It is part of the file main.cpp (see last code block).
If you want to run that "mini" program, I will also post all the files (four) needed to do so. I am using QtCreator and have a project file, which you can call how you like, let's say "testingQString.pro"
QT -= gui
CONFIG += c++17 console
SOURCES += \
main.cpp \
myqstring.cpp
HEADERS += \
myqstring.h
Then we have a "stripped" myQString class, with the two files:
myqstring.h:
#ifndef MYQSTRING_H
#define MYQSTRING_H
#include <QString>
class myQString : public QString
{
public:
myQString();
myQString(const QString& str);
myQString(char c);
myQString(const char* p);
myQString(const QByteArray& ba);
};
#endif // MYQSTRING_H
and "stripped" myqstring.cpp:
#include "myqstring.h"
#include <QDebug>
#define ENTER_FUNCTION qDebug() << "========== Entering:" << Q_FUNC_INFO
myQString::myQString()
{
ENTER_FUNCTION;
}
myQString::myQString(const QString& str) : QString(str)
{
ENTER_FUNCTION;
}
myQString::myQString(char c) : QString(QChar(c))
{
ENTER_FUNCTION;
}
myQString::myQString(const char* p) : QString(p)
{
ENTER_FUNCTION;
}
myQString::myQString(const QByteArray& ba)
{
ENTER_FUNCTION;
foreach (auto c, ba) {
#if QT_VERSION_MAJOR == 5
append(QChar(c));
#endif
#if QT_VERSION_MAJOR == 6
append(char(c));
#endif
}
}
The file main.cpp is also "stripped" here and only shows that one specific problem:
#include "myqstring.h"
#include <QDebug>
#include <array>
// -----------------------------------------------------------------------------
// -----------------------------------------------------------------------------
void charPointerToQstring() {
// case 1 - const char* with string as initialiser
const char* buf1("Äa");
myQString mstr1(buf1);
QString qstr1(buf1);
// case 2 - char* with char assignment
const int len = 2;
char* buf2 = new char[len+1];
buf2[0] = char(0xC4); // 0xC4 == 196 == AE (umlaut)
buf2[1] = 'a';
buf2[len] = '\0';
myQString mstr2(buf2);
QString qstr2(buf2);
// case 3 - str
myQString mstr3("Äa");
QString qstr3("Äa");
// case 4 - std::array<char>
std::array<char, len+1> stda1;
stda1[0] = char(0xC4);
stda1[1] = 'a';
stda1[len] = '\0';
myQString mstr4(stda1.data());
QString qstr4(stda1.data());
qDebug() << "Set a breakpoint exactly on ME (nr 3) and check the results via Debugger!!!";
}
// -----------------------------------------------------------------------------
// -----------------------------------------------------------------------------
int main(int argc, char *argv[])
{
Q_UNUSED(argc)
Q_UNUSED(argv)
// missing code with more tests here...
charPointerToQstring();
}
The big question is: Why isn't Qt handling a single char of a char* argument right but a string as argument (with the same info) goes well? If we have a char* as an argument then we can only go for each char from 0x00 to 0xFF (unsinged). Why not make 0x0000 to 0x00FF out of it?
Edit:
The answer of Artyer explains the behavior for buf1 but not for buf2. buf2 is a char[3] { 0xC4, 0x61, '\0' } which get's converted (with Artyers help) to a QString with elements QChar{ 0x00C4, 0x0061 }. So Qt can easily convert those 0xC4 characters to 0x00C4. In fact qstr1 shows that it can convert the two chars { 0xC3, 0x84 } from 'Ä' to one correct QChar {0x00C4}. If we have a char* as an argument then we can only go for each char from 0x00 to 0xFF (unsinged). Why not make 0x0000 to 0x00FF out of it?
And btw, I can't accept that approach yet because it now breaks mstr1 and mstr3. They then get exactly the "same" elements as buf1 but in QChar (so, without the closing '\0', from char[3] { 0xC3, 0x84, 0x61 } to QChar { 0x00C3, 0x0084, 0x0061 } but it should get QChar { 0x00C4, 0x0061 })
What is probably the case is that "Äa" is three UTF-8 encoded bytes in the source file (Equivalent to char[4]{ 0xC3, 0x84, 'a', '\0' }), and the QString constructor expects UTF-8 encoded data.
The 65533 character (U+FFFD) is the replacement character for the invalid UTF-8 data.
Use QString::fromLatin1:
myQString::myQString(const char* p) : QString(QString::fromLatin1(p, std::strlen(p)))
{
ENTER_FUNCTION;
}
myQString::myQString(const QByteArray& ba) : QString(QString::fromLatin1(ba))
{
ENTER_FUNCTION;
}
Also consider using QLatin1StringView instead of char* to avoid getting confused about encoding (might be called QLatin1String in older QT versions)

Trouble with QT timers: "function definition is not allowed here"

I am trying to make a program that takes images and puts them on your wallpaper using a timer, but I kept getting the error "Timers can only be started with QThread", so I am trying to make this timer with more QThread elements (simpler designs like QThread::msleep haven't worked). Currently, my problem is that my calling slot for when the timer goes off is not working where it currently is, but if I put it in any other location, then the program spits out more errors as it is designed to go in that specific spot. The code itself is mainly a copy/paste of a bunch of other code, and I am new to QT, so I may be going about this completely wrong. If I am, I will gladly accept help so I can understand this better!
#include <mainwindow.h>
#include <mythread.h>
QMediaPlayer * BadAppleS = new QMediaPlayer();
int main(int argc, char *argv[])
{
QApplication app(argc,argv);
int fileN = 0;
BadAppleS->setMedia(QUrl("qrc:/SongN/Bad Apple.mp3"));
BadAppleS->play();
mythread t;
t.start();
if (fileN <= 1625) {
void mythread::doIt(){ //Error here. No more errors elsewhere, though there may be in this function/signal.
QString fileNQ = QString::number(fileN);
QString filepath = (("qrc:/BAPics/scene (") + fileNQ + (")"));
char path[150];
wchar_t wtext[20];
strcpy_s(path, filepath.toStdString().c_str());
mbstowcs(wtext, path, strlen(path)+1);
LPWSTR pathp = wtext;
int result;
result = SystemParametersInfo(SPI_SETDESKWALLPAPER, 0, pathp, SPIF_UPDATEINIFILE);
fileN++;
}
return app.exec();
}
}
Thank you for the help!

Qt convert unicode entities

In QT 5.4 and C++ I try to decode a string that has unicode entities.
I have this QString:
QString string = "file\u00d6\u00c7\u015e\u0130\u011e\u00dc\u0130\u00e7\u00f6\u015fi\u011f\u00fc\u0131.txt";
I want to convert this string to this: fileÖÇŞİĞÜİçöşiğüı.txt
I tried QString's toUtf8 and fromUtf8 methods. Also tried to decode it character by character.
Is there a way to convert it by using Qt?
Qt provides a macro called QStringLiteral for handling string literals correctly.
Here's a full working example:
#include <QString>
#include <QDebug>
int main(void) {
QString string = QStringLiteral("file\u00d6\u00c7\u015e\u0130\u011e\u00dc\u0130\u00e7\u00f6\u015fi\u011f\u00fc\u0131.txt");
qDebug() << string;
return 0;
}
As mentioned in the above comments, you do need to print to a console that supports these characters for this to work.
I have just tested this code:
int main(int argc, char *argv[])
{
QApplication a(argc, argv);
QString s = "file\u00d6\u00c7\u015e\u0130\u011e\u00dc\u0130\u00e7\u00f6\u015fi\u011f\u00fc\u0131.txt";
qDebug() << s.length(); //Outputs: 22
qDebug() << s; //Outputs: fileÖÇŞİĞÜİçöşiğüı.txt
return a.exec();
}
This is with Qt 5.4 on ubuntu, so it looks like your problem is with some OS only.
#include <QTextDocument>
QTextDocument doc;
QString string = "file\u00d6\u00c7\u015e\u0130\u011e\u00dc\u0130\u00e7\u00f6\u015fi\u011f\u00fc\u0131.txt";
doc.setHtml(string); // to convert entities to text
QString result = doc.toPlainText(); // result = "fileÖÇŞİĞÜİçöşiğüı.txt"
NOT USEFUL if you have a CONSOLE app
QTextDocument needs the GUI module.

Wrong output of qDebug() (UTF - 8)

I'm trying to store a string with special chars::
qDebug() << "ÑABCgÓ";
Outputs: (here i can't even type the correct output some garbage is missing after à & Ã)
ÃABCgÃ
I suspect some UTF-8 / Latin1 / ASCII, but can't find the setting to output to console / file. What i have written in my code : "ÑABCgÓ".
(Qt:4.8.5 / Ubunto 12.04 / C++98)
You could use the QString QString::fromUtf8(const char * str, int size = -1) [static] as the sample code presents that below. This is one of the main reasons why QString exists.
See the documentation for details:
http://qt-project.org/doc/qt-5.1/qtcore/qstring.html#fromUtf8
main.cpp
#include <QString>
#include <QDebug>
int main()
{
qDebug() << QString::fromUtf8("ÑABCgÓ");
return 0;
}
Building (customize for your scenario)
g++ -fPIC -I/usr/include/qt -I/usr/include/qt/QtCore -lQt5Core main1000.cpp && ./a.out
Output
"ÑABCgÓ"
That being said, depending on your locale, simply qDebug() << "ÑABCgÓ"; could work as well like in here, but it is recommended to make sure by explicitly asking the UTF-8 handling.
Try this:
QTextCodec *codec = QTextCodec::codecForName("UTF-8");
QTextCodec::setCodecForCStrings(codec);
qDebug() << "ÑABCgÓ";

How to handle Korean character set between QT & Oracle

I'd like to use Oracle with ODBC.
I could get data from Oracle successfully. But Korean character is broken like ????.
As all programmer said on internet forum, I tried to apply QTextCodec like below.
I tried EUC-KR and other codec names. But no change.
QTextCodec *codec = QTextCodec::codecForName("UTF-8");
QSqlQuery q("select * from temp", db);
q.setForwardOnly(true);
QString contect= "";
while(q.next())
{
QByteArray name = q.value(0).toByteArray();
QString age = q.value(1).toString();
contect = contect + codec->toUnicode(name);
ui.textEdit->setText(contect);
}
Oracle side info is.....
NLS_CHARACTERSET : KO16MSWIN949
NLS_NCHAR_CHARACTERSET : AL16UTF16
NLS_LANG : KOREAN_KOREA.KO16MSWIN949
I'm developing with eclipse (on windows 7) and the default file text encoding is utf-8.
I'll appreciate it if you give me comment.
Thanks.
I think you need to change the codec name as you need a codec from the Korean character set into UTF8.
Try changing your code to:
QTextCodec *codec = QTextCodec::codecForName("cp949");
As the Wikipedia page for Code page 949 mentions that it is non-standard Microsoft version of EUC-KR, you could also try EUC-KR.
Try the following program to get the list of text codecs and aliases:
test.cpp
#include <QtCore>
int main(int argc, char** argv)
{
QCoreApplication app(argc, argv);
const auto codecs = QTextCodec::availableCodecs();
for (auto it = codecs.begin(); it != codecs.end(); ++it)
{
const auto codec = QTextCodec::codecForName(*it);
qDebug() << codec->name() << codec->aliases();
}
return 0;
}
test.pro
QT += core
SOURCES=test.cpp
QMAKE_CXXFLAGS += -std=c++0x
Note that the program uses auto for brevity, but this requires a C++11 compiler (tested on GCC 4.4).