Is it possible to mix QTextStream and QDataStream? - c++

Needing to read a mixed text/binary file I thought using both QTextStream and QDataStream together would be the most effective.
The file looks like this:
Some line of text Another line of text 42 <100 bytes of
binary data> 12 <100 bytes of binary data> ... etc. (an
int in a line, then 100 bytes of binary data in a line, and so on)
Here is the initialization, variables etc.:
// a QFile named in is already opened successfully as binary (without QIODevice::Text)
QTextStream stream(&in);
QDataStream data(&in);
int nr;
int nr_bytes;
char buffer[200];
First I tested whether reading from one stream advances the other. If I read 10 bytes with data.readRawData() from the file, then stream.readLine() will read "of text", so it works!
However, if I do the following, from the beginning of the example file:
stream.readLine();
stream.readLine();
for (/*...*/)
{
stream >> nr;
stream.readLine();
nr_bytes = data.readRawData(buffer, 100);
stream.readLine();
}
it does not work, and the buffer remains empty. Strangely, the numbers (42, 12, etc.) are read correctly into nr, no matter how many bytes I read with data.readRawData(). It can be 1000 bytes, it still does not seem to read anything. The value in nr_bytes, however, indicates that the bytes are successfully read! Still strange, that the last readLine in the loop actually reads the binary data (at least until it encounters a zero, a line feed or other special characters). This means, that data.readRawData() did not read anything at all, but it still has the number of required bytes in its return value.
Does this mean I cannot use QTextStream and QDataStream together, or am I doing something else wrong?

Related

How can I skip N lines of a QFile without temporarily storing them in QStrings?

Basically if I call QFile::readLine, the entire line of a QFile will be copied and pasted into a char* or a QByteArray. If I want to skip 999 lines to go straight to the line of interest (the 1,000th one), then I will be copying & pasting the first 999 lines for no reason whereas I just want to skip them.
I know that istream::ignore enables the user to skip any number of characters until the delimiter is found, so
std::ifstream file("file.txt");
for (auto i = 0u; i < 999u; ++i)
file.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
std::string str;
getline(file, str); // The 1,000th line is copied & pasted into str
would make you go straight to the 1,000th line without wasting any time copying and pasting. How can I do the same thing with QFile?
Qt has no API for seeking a file to the next occurrence of a specific byte without outputting the read data.
You can get quite close, though:
QFile has QIODevice::readLine(char *data, qint64 maxSize), which reads into a preallocated buffer and could be used like this:
QFile f("..."); f.open(...);
int maxSize = 1024; // guess that 1kb will be enough per line
QByteArray lineData(maxSize, '\0');
int skipLines = 100;
while(f.readLine(lineData.data(), maxSize) > 0 && skipLines > 0) {
--skipLines;
}
This call of readLine() reuses the preallocated buffer.
You can see that the critical part here is guessing which preallocation size is best. If your line is longer than the guessed size, you will skip less than 100 lines, because each longer line takes several reads.
QTextStream uses an internal buffer size of 16kb:
from qtextstream.cpp:
static const int QTEXTSTREAM_BUFFERSIZE = 16384;
QIODevice uses the same buffer size:
from qiodevice_p.h:
#define QIODEVICE_BUFFERSIZE Q_INT64_C(16384)
Sidenote:
QTextStream also has readLineInto(QString *line, qint64 maxlen = 0) which dynamically reallocates line if maxlen is 0 or the length of the read line is < line->capacity(). But, because of encoding, reading into a QString is always slower than reading into a QByteArray.
A function like readLineInto(...) doesn't exist for QByteArray, though.
The solution using QIODevice::getChar(char *c) (proposed in the OP comments) is suitable, too, because it uses the same internal read buffer as readLine, has a bit overhead for each call, but the caller doesn't have to worry about lines longer than the chosen buffer size.

Writing preceding zeros with ofstream

I am writing a program that will read and write a file format that dictates the content of the file, byte by byte. The nature of this program is that the first two bytes details how many bytes are left in that part of the file, followed by another two bytes that indicates what the part of the file actually represents. This pattern is repeated for the length of the file. This means I have to write the exact numbers buffered by preceding zeros such that each component is the exact size it needs to be. I have written up a dummy file that illustrates my points:
#include <fstream>
#include <stdint.h>
int main() {
std::ofstream outputFile;
outputFile.open("test.txt",
std::ios::out | std::ios::ate | std::ios::binary);
const int16_t HEADER = 0x0002;
int16_t recordSize = 2*sizeof(int16_t);
int16_t version = 0x0258;
outputFile << recordSize << HEADER << version;
outputFile.close();
}
which writes a file named "test.txt" who's hex contents are:
34 32 36 30 30
and for those of us that can read straight hex this translates to:
42600
As you can see the preceding zeros are removed and my record is not what I was hoping it to be. Is there a way to use ofstream to buffer my numbers with zeros as I naively tried to do by using int16_t for all of the writes that I wanted to be exactly two bytes long? Is there another, possibly more stylistically correct way of doing this?
operator<< is for text formatting. You probably want to use .write() instead.
e.g.
outputFile.write(reinterpret_cast<char*>(&recordSize), sizeof(int16_t));
outputFile.write(reinterpret_cast<char*>(&HEADER), sizeof(int16_t));
// ...

Reading text file in blocks-c++

string lineValue;
ifstream myFile("file.txt");
if (myFile.is_open()) {
//getline(myFile, lineValue);
//cout<<lineValue;
while (getline(myFile, lineValue)) {
cout << lineValue << '\n';
}
myFile.close();
}
else cout << "Unable to open file";
The txt file formate is like this
0 1
1 2
2 3
3 4
4 5
5 5
6 6
7 7
8 8
9 9
Above code is reading data from a text file line by line, but the text file size is quite large (10GB).
So how to read data from the file in chunks/blocks with less I/O and efficiently ?
If you are thinking of reading in large chunks of data then you will be using a technique called buffering. However, ifstream already provides buffering so my first step would be to see if you can get ifstream doing the job for you.
I would set a much larger buffer than the default in you're ifstream. Something like
const int BUFSIZE = 65536;
std::unique_ptr<char> buffer(new char[BUFSIZE]);
std::ifstream is;
is.rdbuf()->pubsetbuf(buffer.get(), BUFSIZE);
is.open(filename.c_str());
const int LINESIZE = 256;
char line[LINESIZE];
if (is) {
for (;;) {
is.getline(line, LINESIZE);
// check for errors and do other work here, (and end loop at some point!)
}
}
is.close();
Make sure your buffer lives as long as the ifstream object that uses it.
If you find the speed of this is still insufficient, then you can try reading chunks of data with ifstream::read. There is no guarantee it will be faster, you'll have to time and compare the options. You use ifstream::read something like this.
const int BUFSIZE = 65536;
std::unique_ptr<char> buffer(new char[BUFSIZE]);
is.read(buffer.get(), BUFSIZE);
You'll have to take care writing the code to call ifstream.read taking care to deal with the fact that a 'line' of input may get split across consecutive blocks (or even across more than two blocks depending upon your data and buffer size). That's why you want to modify ifstream's buffer as you're first option.
If and only if the text lines are the same length, you could simply read the file in using std::istream::read();.
The size of the block to read would be:
block_size = text_line_length * number_of_text_lines;
If you are brave enough to handle more complexity or your text lines are not equal lengths, you could read an arbitrary length of characters into a vector and process the text from the vector.
The complexities come into play when a text line overflows a block. Think of handling the case where only part of the sentence is available at the end of the block.

C++ reading leftover data at the end of a file

I am taking input from a file in binary mode using C++; I read the data into unsigned ints, process them, and write them to another file. The problem is that sometimes, at the end of the file, there might be a little bit of data left that isn't large enough to fit into an int; in this case, I want to pad the end of the file with 0s and record how much padding was needed, until the data is large enough to fill an unsigned int.
Here is how I am reading from the file:
std::ifstream fin;
fin.open('filename.whatever', std::ios::in | std::ios::binary);
if(fin) {
unsigned int m;
while(fin >> m) {
//processing the data and writing to another file here
}
//TODO: read the remaining data and pad it here prior to processing
} else {
//output to error stream and exit with failure condition
}
The TODO in the code is where I'm having trouble. After the file input finishes and the loop exits, I need to read in the remaining data at the end of the file that was too small to fill an unsigned int. I need to then pad the end of that data with 0's in binary, recording enough about how much padding was done to be able to un-pad the data in the future.
How is this done, and is this already done automatically by C++?
NOTE: I cannot read the data into anything but an unsigned int, as I am processing the data as if it were an unsigned integer for encryption purposes.
EDIT: It was suggested that I simply read what remains into an array of chars. Am I correct in assuming that this will read in ALL remaining data from the file? It is important to note that I want this to work on any file that C++ can open for input and/or output in binary mode. Thanks for pointing out that I failed to include the detail of opening the file in binary mode.
EDIT: The files my code operates on are not created by anything I have written; they could be audio, video, or text. My goal is to make my code format-agnostic, so I can make no assumptions about the amount of data within a file.
EDIT: ok, so based on constructive comments, this is something of the approach I am seeing, documented in comments where the operations would take place:
std::ifstream fin;
fin.open('filename.whatever', std::ios::in | std::ios::binary);
if(fin) {
unsigned int m;
while(fin >> m) {
//processing the data and writing to another file here
}
//1: declare Char array
//2: fill it with what remains in the file
//3: fill the rest of it until it's the same size as an unsigned int
} else {
//output to error stream and exit with failure condition
}
The question, at this point, is this: is this truly format-agnostic? In other words, are bytes used to measure file size as discrete units, or can a file be, say, 11.25 bytes in size? I should know this, I know, but I've got to ask it anyway.
Are bytes used to measure file size as discrete units, or can a file be, say, 11.25 bytes in size?
No data type can be less than a byte, and your file is represented as an array of char meaning each character is one byte. Thus it is impossible to not get a whole number measure in bytes.
Here is step one, two, and three as per your post:
while (fin >> m)
{
// ...
}
std::ostringstream buffer;
buffer << fin.rdbuf();
std::string contents = buffer.str();
// fill with 0s
std::fill(contents.begin(), contents.end(), '0');

How to get number of bytes read from QTextStream

The following code I am using to find the number of read bytes from QFile. With some files it gives the correct file size, but with some files it gives me a value that is approximatively fileCSV.size()/2. I am sending two files that have same number of characters in it, but have different file sizes link text. Should i use some other objects for reading the QFile?
QFile fileCSV("someFile.txt");
if ( !fileCSV.open(QIODevice::ReadOnly | QIODevice::Text))
emit errorOccurredReadingCSV(this);
QTextStream textStreamCSV( &fileCSV ); // use a text stream
int fileCSVSize = fileCSV.size());
qint64 reconstructedCSVFileSize = 0;
while ( !textStreamCSV.atEnd() )
{
QString line = textStreamCSV.readLine(); // line of text excluding '\n'
if (!line.isEmpty())
{
reconstructedCSVFileSize += line.size(); //this doesn't work always
reconstructedCSVFileSize += 2;
}
else
reconstructedCSVFileSize += 2;
}
I know that reading the size of QString is wrong, give me some other solutions if you can.
Thank you.
I guess it is because QString::size() returns the number of characters. If your text file is in UTF16 and , say, x bytes long, this will correspond with x/2 characters.
Edit: If you want to know the exact size of a read line, you can just use QFile::readLine(). This returns a QByteArray of which the number of bytes can be queried using size().
I made a solution with QByteArray. The solution is:
QFile fileCSV("someFile.txt");
if ( !fileCSV.open(QIODevice::ReadOnly | QIODevice::Text))
emit errorOccurredReadingCSV(this);
while ( !fileCSV.atEnd())
{
QByteArray arrayCSV = fileCSV.readLine();
reconstructedCSVFileSize += arrayCSV.size();
QTextStream textStreamCSV(arrayCSV);
QString line = textStreamCSV.readLine();
}
But there is a problem. Look close the files that I am sending files2.zip.
When i am reading biggerFile.csv with this approach, the first line is properly read, the size of the string is 108, also the number of characters is 108. The number returned by arrayCSV.size() is 221.
When i am reading the second line, the size of the string is 50, but the number of characters is 25. The number returned by arrayCSV.size() is 51. When i open the string with debuger, the string is empty, although its size is 50. I guess this behavior is because the first line is written with one encoding, while the other is written with different encoding, causing QTextStream to behave non properly.
When i am reading smallerFile.csv, everything is ok. The size of the string is 16, also the number of characters is 16(without the \n character). The number returned by arrayCSV.size() is 18.
The second line is also properly read. The size of the string is 25, also the number of characters is 25. The number returned by arrayCSV.size() is 25.
The first code that i have posted, reads the strings properly from both files.
There is a similar question: QTextStream behavior searching for a string not as expected
. You may check my answer for that.
Briefly: to do correct calculation you should mark begin of line with pos() and end of line after reading with pos(). Like this:
qint64 newFileSize = 0;
while ( !f.atEnd() )
{
const qint64 begin = f.pos();
const QString line = f.readLine();
const qint64 end = f.pos();
// TODO: some your actions
// ...
const qint64 realLengthOfLine = end - begin;
newFileSize += realLengthOfLine;
}