How to get number of bytes read from QTextStream

How to get number of bytes read from QTextStream - c++

The following code I am using to find the number of read bytes from QFile. With some files it gives the correct file size, but with some files it gives me a value that is approximatively fileCSV.size()/2. I am sending two files that have same number of characters in it, but have different file sizes link text. Should i use some other objects for reading the QFile?
QFile fileCSV("someFile.txt");
if ( !fileCSV.open(QIODevice::ReadOnly | QIODevice::Text))
emit errorOccurredReadingCSV(this);
QTextStream textStreamCSV( &fileCSV ); // use a text stream
int fileCSVSize = fileCSV.size());
qint64 reconstructedCSVFileSize = 0;
while ( !textStreamCSV.atEnd() )
{
QString line = textStreamCSV.readLine(); // line of text excluding '\n'
if (!line.isEmpty())
{
reconstructedCSVFileSize += line.size(); //this doesn't work always
reconstructedCSVFileSize += 2;
}
else
reconstructedCSVFileSize += 2;
}
I know that reading the size of QString is wrong, give me some other solutions if you can.
Thank you.

I guess it is because QString::size() returns the number of characters. If your text file is in UTF16 and , say, x bytes long, this will correspond with x/2 characters.
Edit: If you want to know the exact size of a read line, you can just use QFile::readLine(). This returns a QByteArray of which the number of bytes can be queried using size().

I made a solution with QByteArray. The solution is:
QFile fileCSV("someFile.txt");
if ( !fileCSV.open(QIODevice::ReadOnly | QIODevice::Text))
emit errorOccurredReadingCSV(this);
while ( !fileCSV.atEnd())
{
QByteArray arrayCSV = fileCSV.readLine();
reconstructedCSVFileSize += arrayCSV.size();
QTextStream textStreamCSV(arrayCSV);
QString line = textStreamCSV.readLine();
}
But there is a problem. Look close the files that I am sending files2.zip.
When i am reading biggerFile.csv with this approach, the first line is properly read, the size of the string is 108, also the number of characters is 108. The number returned by arrayCSV.size() is 221.
When i am reading the second line, the size of the string is 50, but the number of characters is 25. The number returned by arrayCSV.size() is 51. When i open the string with debuger, the string is empty, although its size is 50. I guess this behavior is because the first line is written with one encoding, while the other is written with different encoding, causing QTextStream to behave non properly.
When i am reading smallerFile.csv, everything is ok. The size of the string is 16, also the number of characters is 16(without the \n character). The number returned by arrayCSV.size() is 18.
The second line is also properly read. The size of the string is 25, also the number of characters is 25. The number returned by arrayCSV.size() is 25.
The first code that i have posted, reads the strings properly from both files.

There is a similar question: QTextStream behavior searching for a string not as expected
. You may check my answer for that.
Briefly: to do correct calculation you should mark begin of line with pos() and end of line after reading with pos(). Like this:
qint64 newFileSize = 0;
while ( !f.atEnd() )
{
const qint64 begin = f.pos();
const QString line = f.readLine();
const qint64 end = f.pos();
// TODO: some your actions
// ...
const qint64 realLengthOfLine = end - begin;
newFileSize += realLengthOfLine;
}

Related

How can I skip N lines of a QFile without temporarily storing them in QStrings?

Basically if I call QFile::readLine, the entire line of a QFile will be copied and pasted into a char* or a QByteArray. If I want to skip 999 lines to go straight to the line of interest (the 1,000th one), then I will be copying & pasting the first 999 lines for no reason whereas I just want to skip them.
I know that istream::ignore enables the user to skip any number of characters until the delimiter is found, so
std::ifstream file("file.txt");
for (auto i = 0u; i < 999u; ++i)
file.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
std::string str;
getline(file, str); // The 1,000th line is copied & pasted into str
would make you go straight to the 1,000th line without wasting any time copying and pasting. How can I do the same thing with QFile?

Qt has no API for seeking a file to the next occurrence of a specific byte without outputting the read data.
You can get quite close, though:
QFile has QIODevice::readLine(char *data, qint64 maxSize), which reads into a preallocated buffer and could be used like this:
QFile f("..."); f.open(...);
int maxSize = 1024; // guess that 1kb will be enough per line
QByteArray lineData(maxSize, '\0');
int skipLines = 100;
while(f.readLine(lineData.data(), maxSize) > 0 && skipLines > 0) {
--skipLines;
}
This call of readLine() reuses the preallocated buffer.
You can see that the critical part here is guessing which preallocation size is best. If your line is longer than the guessed size, you will skip less than 100 lines, because each longer line takes several reads.
QTextStream uses an internal buffer size of 16kb:
from qtextstream.cpp:
static const int QTEXTSTREAM_BUFFERSIZE = 16384;
QIODevice uses the same buffer size:
from qiodevice_p.h:
#define QIODEVICE_BUFFERSIZE Q_INT64_C(16384)
Sidenote:
QTextStream also has readLineInto(QString *line, qint64 maxlen = 0) which dynamically reallocates line if maxlen is 0 or the length of the read line is < line->capacity(). But, because of encoding, reading into a QString is always slower than reading into a QByteArray.
A function like readLineInto(...) doesn't exist for QByteArray, though.
The solution using QIODevice::getChar(char *c) (proposed in the OP comments) is suitable, too, because it uses the same internal read buffer as readLine, has a bit overhead for each call, but the caller doesn't have to worry about lines longer than the chosen buffer size.

Getting a QByteArray from a QFile and writing it in the same file

I am going to edit the contents of a file. I am handling the file using QFile. Now, I want to read it in small chunks like 1024 bytes. So far, I did :
QFile file("~/samplefile");
long long sizeoffile = file.size();
size = size/1024; ///*this is for loop size devoid by 1024 because I want to run loop filesize/1024 because in each cycle I read 1024 bytes **///
QString contentsToBeErased = "sample";
QString eraser = contentsToBeErased;
eraser = eraser.fill('*');
int pos = 0; ////** This is the position of 'contentsToBeErased' in 1024 bytes(for each cycle) **//
QByteArray myByteArray;
if(!file.open(QIODevice::ReadWrite | QIODevice::Text))
return;
for(long long i =0; i<size; i++)
{
myByteArray = file.readLine(1025); ////**1025 is used bcoz readline reads 1 less bytes**//
int sizeArray = 0;
QTextCodec *byteArraytoString = QTextCodec::codecForName("UTF-8"); //conevrting bytearray to string
QString thisString = byteArraytoString->toUnicode(rohitarray);
if(thisString.contains(contenttobeerased, Qt::CaseInsensitive))
{
int occurrence = thisString.count(contentsToBeErased,Qt::CaseInsensitive);
for(int ii = 0; ii<occurrence; ii++)
{
pos = thisString.indexOf(contentsToBeErased, pos,Qt::CaseInsensitive);
thisString.replace(pos,contentsToBeErased.size(), erase);
pos = pos + contentsToBeErased.size() ;
}
myByteArray = thisString.toUtf8();
sizeArray = myByteArray.length();
QFile file1("~/samplefile");
file1.open(QIODevice::WriteOnly);
file1.write(myByteArray);
file1.close();
}
}
This works fine for first attempt but in 2nd attempt, I failed to read next 1024 bytes with readLine(1025);. It reads the first 1024 bytes again.
So my first problem is that I don't know how to increase the readLine(); position to get the next 1024 bytes.
And the 2nd problem is I don't know how to write() the 2nd byte array to file after writing first byte array, because if I only use write(), it will replace the previous byte array with next byte array. So how can I append the array at the end of the file?

Read documentation first.
QIODevice::readLine(qint64 maxSize = 0) reads either until it encounters line feed ("\n") or maxSize bytes.
In this specific case, you need peek and seek methods. You also need to open QFile with QIODevice::Append | QIODevice::ReadWrite flags

Use seek() for accessing specific positions in a file.
readLine(n) should read subsequently all bytes in chunks of size n. Your opening the same file a second time is likely to interfere here (maybe depending on your OS). You should use read, write, and seek with ONE file object which you open in ReadWrite mode.

c++: lseek giving different values compered to the original file

I'm trying to read a file that contain double formatted numbers in a matrix of 82503x1200. I'm reading the file but don't find the way to specify the correct size of the number that is being taken by lseek. Why is giving me that numbers instead of the file numbers?
float fd;
float ret;
float b;
const size_t NUM_ELEMS = 11;
const size_t NUM_BYTES = NUM_ELEMS * sizeof(float);
fd = open("signal_80k.txt",O_RDONLY);
if(fd < 0){
perror("open");
//exit(1);
}
ret = lseek(fd, seekCounter*NUM_BYTES, SEEK_SET);
ret = read(fd, &b, sizeof(float));
cout<<"> " << seekCounter << ": " << b<<endl;
seekCounter++;
close(fd);
it prints:
0: 1.02564e-08
1: 1.08604e-05
2: 0.000174702
3: 6.56482e-07
4: 2.57894e-09
but the first values are:
9.402433000000000e
8.459109000000000e
8.947654000000000e+03
9.021620000000000e
This is how it looks in matlab

In your comments you clarified that the file contains text data, and my answer is based on that. Now, let's take a look at the first number in the file:
1.02564e-08
How many characters are there? I count 11 characters. Then, there's a space after it, so the next value after this one will be twelve characters after the first one.
By casual inspection, it appears that your code sets
const size_t NUM_ELEMS = 11;
to be the number of values per row.
Then your code sets
const size_t NUM_BYTES = NUM_ELEMS * sizeof(float);
To calculate the number of characters taken up by each row. Now, it's possible that I missed the actual meaning of these constants, but in any case, you have a target value in the file, and you're attempting to seek to it directly, that's the bottom line. So, for the purpose of this answer I'll go with this interpretation, but the answer's still the same, in any case.
Pop quiz for you. What is sizeof(float)?
Answer: it's 4 bytes, on most implementations (so I'll assume that going forward). So, you compute that there's going to be 44 characters per row, and you use that to attempt to seek to the appropriate line in the file. That's, at least, how I parsed your code.
The problem, of course, is that, assuming that each value is represented in scientific notation, with 11 values per line, and each value taking up 12 characters (including either a trailing space or a newline), each line will actually take 11 * 12 or 132 characters, and not 44. Add one more character if you're using an implementation O/S that uses \r\n for a new line.
So, you need to make some adjustments there. And even after that, this whole house of cards depends on each value in the file always being represented in scientific notation, with the same number of precisions.
Which is an assumption you can't really make. Furthermore, that's not the only problem here.
The second problem is you are attempting to read() the contents of the file directly into float datatypes. Yes, each float datatype will be four characters, because that's how many bytes it takes to represent a float value in binary. The problem here is that the file does not contain raw binary data, but text data.
In conclusion, I don't see much choice here but to read the file from start to finish, instead of attempting to seek to the right spot, since you have no guarantees that each value in the file will occupy the same number of characters; and then read the file as text, and convert its contents, using operator>>, to float values.

If the file was binary, then lseek would be the suitable method?
I change the approach to this:
ifstream inFile("signal_80k.txt");
string line;
int count = 0 ;
if(!inFile.is_open())
{
cout<<"\n Cannot open the signal_80k.txt file"<<"\n";
}
else
{
cout<<"loading all data... "<<"\n";
while(getline( inFile , line) ){
vector< string > numbers = ci::split( line, " ", false );
for(int i = 0; i <numbers.size(); i++){
try{
float thisNumber = std::stof(numbers.at(i));
cout<<"numbers at: " << " = "<< thisNumber <<"\n";
}
catch (...){
}
}
count++;
cout<<"done: "<<count<<"\n";
}
cout<<"all data ready!"<<"\n";
inFile.close();
}

Qt: c++: how to read a ".dat" file

I have a ".dat" file that contains "1"s and "-1"s as a sequence in a vertical representation (i.e.: each element is in a single line.).
I am trying to read the file as follow:
char buf[30];
QFile sequence("Sequences.dat");
sequence.open(QFile::ReadOnly);
for(int sym=0; sym<29; sym++){
char c = symbols[sym] = sequence.readLine(buf,sizeof(buf));
symbols[sym] = c;
}
sequence.close();
however, the result is nothing like my sequence as seen below:
what did I did wrong ?

Check the readLine API doc: the return value is the number of bytes read, while the line is read into the buf array, which is overwritten at each iteration. Note that the first symbol of the inspected array is a '\0' (empty string), probably because the last line of your file is empty.

Is it possible to mix QTextStream and QDataStream?

Needing to read a mixed text/binary file I thought using both QTextStream and QDataStream together would be the most effective.
The file looks like this:
Some line of text Another line of text 42 <100 bytes of
binary data> 12 <100 bytes of binary data> ... etc. (an
int in a line, then 100 bytes of binary data in a line, and so on)
Here is the initialization, variables etc.:
// a QFile named in is already opened successfully as binary (without QIODevice::Text)
QTextStream stream(&in);
QDataStream data(&in);
int nr;
int nr_bytes;
char buffer[200];
First I tested whether reading from one stream advances the other. If I read 10 bytes with data.readRawData() from the file, then stream.readLine() will read "of text", so it works!
However, if I do the following, from the beginning of the example file:
stream.readLine();
stream.readLine();
for (/*...*/)
{
stream >> nr;
stream.readLine();
nr_bytes = data.readRawData(buffer, 100);
stream.readLine();
}
it does not work, and the buffer remains empty. Strangely, the numbers (42, 12, etc.) are read correctly into nr, no matter how many bytes I read with data.readRawData(). It can be 1000 bytes, it still does not seem to read anything. The value in nr_bytes, however, indicates that the bytes are successfully read! Still strange, that the last readLine in the loop actually reads the binary data (at least until it encounters a zero, a line feed or other special characters). This means, that data.readRawData() did not read anything at all, but it still has the number of required bytes in its return value.
Does this mean I cannot use QTextStream and QDataStream together, or am I doing something else wrong?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js