How to load large data from txt file in Qt - c++

I need to load a txt file with 5 millions data (i.e. strings, just one word with 9 characters per word separated by new line.) into QVector as fast as possible. The code is now working just fine however, if the user hits upload, the application takes 3-5 seconds to load this data for further manipulation. I need to decrease the time of loading this data. What is the right approach to handle this issue? I'm Ok with Qt/STL/Boost. I prefer Qt though. The code that I'm using for this task is the one suggested in Qt documentation which is
QFile file("in.txt");
if (!file.open(QIODevice::ReadOnly | QIODevice::Text))
return;
QTextStream in(&file);
while (!in.atEnd()) {
QString line = in.readLine();
process_line(line);
}

Try this:
I tested it and read file in 2.1 seconds.
I reserve vector before reading and use QElapsedTimer to get reading time.
void MainWindow::readDataText()
{
QString filePath = "F:\\Qt\\Big_File\\Big_File\\data.txt";
QVector<qint64> *vector = new QVector<qint64>;
vector->reserve(5000000);
QElapsedTimer timer;
QFile readFile(filePath);
if(!readFile.open(QFile::ReadOnly | QFile::Text))
{
// Can't Open File.
}
else
{
QByteArray data;
timer.start();
for (int var = 0; var < 5000000; ++var)
{
vector->insert(var, (readFile.readLine()).toInt());
}
qint64 time = timer.elapsed();
ui->txtReadTimeText->setText(QString::number(time));
}
readFile.close();
}
Also it will better if your file being binary.
Another solution is to use readAll() function and read file in 116 miliseconds, and process(split by '\n') data later like this:
void MainWindow::readDataText()
{
QString filePath = "D:\\ProjectTest\\ProjectTest\\data.txt";
QByteArray data;
data.reserve(5000000);
QElapsedTimer timer;
QFile readFile(filePath);
if(!readFile.open(QFile::ReadOnly | QFile::Text))
{
// Can't Open File.
}
else
{
timer.start();
data = readFile.readAll();
qint64 time = timer.elapsed();
ui->txtReadTimeText->setText(QString::number(time));
}
readFile.close();
}

Your example code actually implicitly does decoding. It reads 8-bit encoded text from the file, and converts it to QString, which internally uses 16 bit Unicode encoding.
You will probably gain a big speedup, if instead of using QTextStream, you use just plain QFile directly, and read form it using this readLine method, which return QByteArray, in other words "raw" file contents. The purpose of doing it this way is to avoid creating QString objects for entire file contents.
If you have 5 million lines, then you will also get a significant memory footprint savings, if you store them in memory in QByteArray, instead of QString. Convert to QString only when you are actually going to display the text in the GUI.
Note: Be aware of text encoding! Any text in any file is always encoded, even if especially English-speakers might not realize it. The most straightforward encoding is 7-bit ASCII, a lot of pure English text is actually this, and almost every encoding including UTF-8 is actually superset of 7-bit ASCII, so 7-bit ASCII file can be loaded using almost any encoding. But for multilingual text, you need to know what encoding the file uses, or you will get the accented and other special characters, like ÄÅÁÀÃ, wrong. UTF8 is the only encoding which can store "everything", other encodings such as Latin1 are designed for specific language families.
Note 2: QByteArray actually corresponds to std::string for most purposes. QString is more like std::wstring. Not saying these are identical 1:1 matches, but it helps to think of them as similar.

Related

how to get usable data between -1 and 1 from QAudioBuffer? Qt6

I would like to know how to efficiently extract data from a QAudioBuffer object. I have a wav audio file that I am decoding with a QAudioDecoder object. I want to extract the results of the decoding contained in the QAudioBuffer object to apply filtering operations on it and finally send the filtered data to a subclass of QIODevice to listen to the result in real time.
At first, I just do a test to make sure that the extraction works well. To do this, I save the data contained in the QAudioBuffer object in a txt file. But I encounter 2 problems.
the resulting TXT file contains only characters, no numbers.
With MATLAB, when I plot the signal represented by the data contained in the TXT file, I get the shape of the original audio signal (the one in the WAV file) but the amplitudes are too big and should be between -1 and 1.
Can you please tell me how to extract the data so that I get a result on which I can apply a filter and how to have data between -1 and 1?
I use Qt6.4
thanks in advance
the code and the slot
QAudioFormat *format_decoder;
format_decoder = new QAudioFormat;
format_decoder->setSampleRate(44100);
format_decoder->setChannelCount(1);
format_decoder->setSampleFormat(QAudioFormat::Int16);
QAudioDecoder decoder;
decoder.setSource(filenameSource);
decoder.setAudioFormat(*format_decoder);
decoder.start();
QObject::connect(&decoder, &QAudioDecoder::bufferReady, this, &MainWindow::test_copy_to_txt)
the slot
void MainWindow::test_copy_to_txt(){
QAudioBuffer audioBuffer = decoder.read();
const qint16* samples = audioBuffer.constData<qint16>(); // Signal shape ok, but not the amplitudes
QFile outputFile(filenameTest1);
if(!outputFile.open(QIODevice::WriteOnly|QIODevice::Append)){
qDebug() << "ERROR";}
QTextStream out(&outputFile);
for (int i = 0; i < audioBuffer.sampleCount(); ++i) {
out << samples[i] << "\n"; // only characters, no numbers.
}
outputFile.close();
}
another question: Can you recommend a documentation other than the one on the Qt site to have more details on audio processing with Qt? How do the classes react to each other? An example so that you understand why I am looking for such documentation is the pure virtual function quint64 readData(char *data, quint64 Len) from QIODevice. For my project, I will have to reimplement it, but I would like to know what function calls it and how to determine the Len parameter.
Thank you for your answers. I followed your recommendations and everything is ok now.
Here is the corrected code
out << static_cast< float >(samples[i]) / std::numeric_limits<qint16>::max() << "\r\n";

QT diacritic characters from Base64

I have a problem with getting diacritic characters from strings encoded in Base64 under QT. I'm creating string then I'm encoding it with Base 64 and I'm saving it to file. Next I want to decode characters from opened file. Here is how I do this.
void MainWindow::on_treeWidget_2_doubleClicked(const QModelIndex &index){
encode("([ęśćźół35:11");
decode();
}
void MainWindow::encode(QString input){
QString item_to_change = input.toUtf8().toBase64();
QString filename="output.txt";
QFile file( filename );
if ( file.open(QIODevice::ReadWrite) )
{
QTextStream stream( &file );
stream << encoded;
}
}
void MainWindow::decode(){
QFile input("output.txt");
if (input.open(QIODevice::ReadOnly)) {
data_in = input.readAll();
}
QString strRestored(QByteArray::fromBase64(data_in));
qDebug() << strRestored;
}
Doing this I'm getting only ([esczol35:11 instead ([ęśćźół35:11
Please help me to get all chars as entered at the beginning.
Thanks
I suspect the problem isn't in the Base64 encoding/decoding because that's going to return exactly what you put in every time; that's the point of it. My belief is that your call to your "encode" method is converting the string literal, or that the toUtf8 is converting it incorrectly. Print out each of those steps using qDebug to see where the problem is, and then you'll probably need to look at the QTextCodec class to set up the conversions properly.

How to write text to a text file only if not already written

I would like to add text to a text file only if the text doesn't already exist in the text file. My implementation below adds text even if it already exists. How can I fix my implementation to only add new non-existent items?
My implementation so far:
WriteToFile::WriteToFile(QString data)
{
path += "C:/Data.txt";
QFile file(path);
if ( file.open(QFile::Append) )
{
QTextStream in (&file);
QString line;
do {
line = in.readAll();
qDebug() << in.readLine();
if (!line.contains(data)) {
QTextStream stream( &file );
data += "\r\n";
stream << data << endl;
}
} while (!line.isNull());
}
}
You will either have to:
parse the entire file and extract all paths from it or
keep track of all paths written to a file to avoid parsing it again and again
From there is it simple, just create a QSet<QString> writtenSoFar, and for every path, check if the set contains it, if so skip writing, if not, write it and append it to the set. In the first case, you will have to write the parsed paths into the set just to make a single check, wildly inefficient, just like the parsing itself. So better keep track of the paths as you go.
The set is important to give you good lookup performance. It is quite fast, since it is hash based, it is essentially a value-less QHash.

QT: Finding and replacing text in a file

I need to find and replace some text in the text file. I've googled and found out that easiest way is to read all data from file to QStringList, find and replace exact line with text and then write all data back to my file. Is it the shortest way? Can you provide some example, please.
UPD1 my solution is:
QString autorun;
QStringList listAuto;
QFile fileAutorun("./autorun.sh");
if(fileAutorun.open(QFile::ReadWrite |QFile::Text))
{
while(!fileAutorun.atEnd())
{
autorun += fileAutorun.readLine();
}
listAuto = autorun.split("\n");
int indexAPP = listAuto.indexOf(QRegExp("*APPLICATION*",Qt::CaseSensitive,QRegExp::Wildcard)); //searching for string with *APPLICATION* wildcard
listAuto[indexAPP] = *(app); //replacing string on QString* app
autorun = "";
autorun = listAuto.join("\n"); // from QStringList to QString
fileAutorun.seek(0);
QTextStream out(&fileAutorun);
out << autorun; //writing to the same file
fileAutorun.close();
}
else
{
qDebug() << "cannot read the file!";
}
If the required change, for example is to replace the 'ou' with the american 'o' such that
"colour behaviour flavour neighbour" becomes "color behavior flavor neighbor", you could do something like this: -
QByteArray fileData;
QFile file(fileName);
file.open(stderr, QIODevice::ReadWrite); // open for read and write
fileData = file.readAll(); // read all the data into the byte array
QString text(fileData); // add to text string for easy string replace
text.replace(QString("ou"), QString("o")); // replace text in string
file.seek(0); // go to the beginning of the file
file.write(text.toUtf8()); // write the new text back to the file
file.close(); // close the file handle.
I haven't compiled this, so there may be errors in the code, but it gives you the outline and general idea of what you can do.
To complete the accepted answer, here is a tested code. It is needed to use QByteArray instead of QString.
QFile file(fileName);
file.open(QIODevice::ReadWrite);
QByteArray text = file.readAll();
text.replace(QByteArray("ou"), QByteArray("o"));
file.seek(0);
file.write(text);
file.close();
I've being used regexp with batch-file and sed.exe (from gnuWin32, http://gnuwin32.sourceforge.net/). Its good enough for replace one-single text.
btw, there is not a simple regexp syntax there. let me know If you want to get some example of script.

How to avoid reading "\n" when using QFile::readAll function

I have a "sequence.dat" file that contains "1"s and "-1"s in a vertical representation (i.e.: each element is in a single line).. I am trying to read the file as follow:
QFile sequence("Sequences.dat");
sequence.open(QIODevice::ReadOnly);
QByteArray data = sequence.readAll();
for(int i=0; i<29; i++){
signedNo[i] = data[i]; // debugging breaking point
}
sequence.close();
however, at the debugging breaking point, the QByteArray "data" contains "1, -, 1, \n" instead of "1,-1" ...
is there is away to read the whole line at once and not each byte individually ? and ...
if there is not, how to tell the "readAll" function to avoid the "\n" (it is not an optimal solution because I want also to read "-1" and not "- and 1" separately)
QFile::readAll() returns a byte array which contains each and every byte of the file as a separate element.
For your use case, you need to read the file line by line.
The QFile documentation shows some approaches how to do this, for example:
QVector<int> elements;
QFile sequence("Sequences.dat");
if (!sequence.open(QIODevice::ReadOnly | QIODevice::Text))
return;
QTextStream in(&sequence);
while (!in.atEnd()) {
QString line = in.readLine();
elements.append(line.toInt());
}
Despite the fact that this sample is from the Qt documentation, I would recommend to check the return value from in.readLine() which returns a null QString when the end of the file has been reached, instead of using atEnd().
You could read line by line, and you could process it right after you read the line:
i = 0;
while (!sequence.atEnd()) {
QByteArray line = sequence.readLine();
signedNo[i] = line[i];
i++;
}