Ensure all the bytes are properly written to file - c++

Is it possible to check if all the bytes are actually being written on a QFile or not? Currently this is all I have
QFile f(name);
if (f.open(QIODevice::WriteOnly)){
f.write(bytes);
}
bytes has a size of 1MB and there are times when the entire chunk is not written to file, hence I end up getting a corrupted file.

In Qt 5 you really should be using QSaveFile. It ensures two very important invariants:
partial/failed writes don't corrupt the existing file,
the file is flushed by the time the QSaveFile instance is destructed.
Since this is a proper C++ class, implementing RAII, you don't need to do anything special to ensure that it works, except for having to call commit(). The meaning of commit() is: you indicate that you will not be writing any more data to the file. At this point, the implementation is free to close the file, flush it to disk, and replace the old file with the new one.
/// When this function returns true, you can be certain that the file contains exactly "foo bar".
bool writeFooBar() {
QSaveFile file(QStandardPaths::writableLocation(QStandardPaths::DocumentsLocation));
if (!file.open(QIODevice::WriteOnly | QIODevice::Text))
return false;
if (-1 == file.write("foo bar"))
return false;
return file.commit();
}

If you're worried about corrupted files being written, perhaps QSaveFile would be a better class to use, instead of QFile.
As the documentation states: -
QSaveFile is an I/O device for writing text and binary files, without losing existing data if the writing operation fails.

What you are looking for is a checksum with which you can check the integrity of your data. What you want to do here is use qChecksum like this:
QFile f(name);
if (f.open(QIODevice::ReadWrite)) {
f.write(bytes);
}
quint16 fileCheckSum = qChecksum(bytes.data(), bytes.length());
if (f.open(QIODevice::ReadWrite)) {
QByteArray writtenBytes = f.readAll();
quint16 writtenBytesCheckSum = qChecksum(writtenBytes .data(), writtenBytes .length());
if(fileCheckSum == writtenBytesCheckSum)
{
qDebug() << "File is valid.";
}
else
{
qDebug() << "File is corrupt.";
}
}
I haven't compiled the code but it should work. If it doesn't I'll be more specific with an example.

To ensure that all the bytes are written properly to a file, you should maintain a digest (checksum) of all the bytes written to the file. Compare the result of the result of the running checksum to that of a checksum performed over the file.
Please research SHA-1 (Secure Hash Algorithm), MD5 and "Hash functions". Also "c++ data integrity algorithm".

QFile::flush or QFile::close should cause all buffered contents to be written. It's important to check the return values of all of the QFile calls.

Related

Qt5 QFile::close() very slow for writing

I am using QFile as a file reader and a file writer to copy files to USB from inside my application. I have been trying to figure out why my file copies to USB (with progress bar) are taking so long. I finally found out that when I close the QFile object that is used for writing, the close() operation can take well over the time taken for the actual write operation. These are very large files, and I read/write blocks of 16384 bytes, and then I send a signal to the GUI to increase the progress bar that is viewed by the user. I ended up adding a call to flush() after each write since I assume this is a result of the out stream not actually having yet been written to disk. That didn't make a difference. The close of the outgoing QFile object still takes much longer than what seems to have been the write time (timing taken before and after copy, and before and after each of the QFile::close() calls, the timing code has been removed for ease of reading, I also debugged and saw it happening). Of course, it doesn't help to not call the close() function, since the destruction of the QFile object causes it to be called.
My code is as follows (minus error checking, destination space checking, etc):
void FileCopy::run()
{
QByteArray bytes;
int totalBytesWritten = 0;
int inListSize = inList.size();
for (int i=0; !canceled && i<inListSize; i++)
{
QString inPath = inList.at(i).inPath;
QString outPath = inList.at(i).outPath;
QFile inFile(inPath);
QFile outFile(outPath);
int filesize = inFile.size();
int bytesWritten = 0;
if (!inFile.open(QIODevice::ReadOnly))
{
return;
}
if (!outFile.open(QIODevice::WriteOnly))
{
inFile.close();
return;
}
// copy the FCS file with progress
while (!canceled && bytesWritten < filesize)
{
bytes = inFile.read(MAXBYTES);
qint64 outsize = outFile.write(bytes);
outFile.flush();
if (outsize != bytes.size())
{
break;
}
bytesWritten += outsize;
totalBytesWritten += outsize;
Q_EMIT signalBytesCopied(totalBytesWritten, i+1, inListSize);
QThread::usleep(100); // allow time for detecting a cancel
}
inFile.close();
outFile.close();
}
// Other error checking done here
}
Can anyone see a way to get passed this? I would actually prefer that the progress bar move more slowly, more accurately displaying the state of the copy to the user, than to have the progress bar read 100% in less than half the time it takes for the copy and close to actually complete.
I have also tried using QSaveFile instead of QFile for the output, but QSaveFile::commit() has the same exact problem, taking more time to commit than to finish the actual copy loop. I assume that this is because, underneath, it is using the same functionality as QFile is, derived from QIoDevice.
I have considered moving to using standard streams, but would like to keep some consistency in how file handling is done in this application. It is a possibility though, if QFile::close() is going to take this long to close. Or is it possible that the standard stream would have the same issue?
I am working on a Win7 32-bit box with VS2010 using Qt5.1.1 and the Qt 1.2.2 VS add-in. Thanks for any suggestions.
While you are writing, the OS probably just caches the writes in memory (fast). But when you close the file it has to flush all the data to disk (slow - especially if it has not actually written any of it yet). So closing the file has to wait for the OS actually putting all the data onto the disk (USB) and that may actually be all of it at that time.
The reason why operating systems do something like this is of course to speed up writes - and often they can then get away with flushing the data to disk in the background when nothing else is going on (so you don't really notice the actual cost, since it is amortized over time where nothing else is going on). But if you just write and then close at once you are going to notice.
Note: the alternative would be the write calls being slower - you would still end up spending the same actual time.

Trouble saving data written to a file when I kill the app

My program is always writing data to a file but when I close it before the program fully stops, the end result is nothing being written to the file. I would really like to be able to close it without completing it fully, so how can I fix this to make it constantly saving the file?
ofstream outfile;
outfile.open("text.txt", std::ios::app);
bool done = false;
int info;
while (done == false){
cin>>info;
outfile<<info;
cout<<info<<"Choose different info";
if(info == 100){
done = true;
}
}
outfile.close();
This is obviously just an example, but it is very similar to my actual code.
Edit: When i say closing I mean killing it (Hitting red X at top right of console)
You likely need to flush your std::ofstream when you have done "enough" work.
"enough" work here is going to depend on your application.
Perhaps
...
outfile<<info;
outfile.flush();
...
The operation system doesn't write to the file when you call the write function to save time, it wait to check if you want to write anything else or for a time which will be "good" to write. You write to a buffer and the operating system will write this buffer to the file.
When you close the function it write anything that left in the buffer to the file. You can force your code to write to the file using flush method. Just flush your file after every time you write and you will be ok.
flush: http://www.cplusplus.com/reference/ostream/ostream/flush/
outfile << n;
outfile.flush();

how to report progress of data read on a QuaGzipFile (QuaZIP library)

I am using QuaZIP 0.5.1 with Qt 5.1.1 for C++ on Ubuntu 12.04 x86_64.
My program reads a large gzipped binary file, usually 1GB of uncompressed data or more, and makes some computations on it. It is not computational-extensive, and most of the time is passed on I/O. So if I can find a way to report how much data of the file is read, I can report it on a progress bar, and even provide an estimation of ETA.
I open the file with:
QuaGzipFile gzip(fileName);
if (!gzip.open(QIODevice::ReadOnly))
{
// report error
return;
}
But there is no functionality in QuaGzipFile to find the file size nor the current position.
I do not need to find size and position of uncompressed stream, the size and position of compressed stream are fine, because a rough estimation of progress is enough.
Currently, I can find size of compressed file, using QFile(fileName).size(). Also, I can easily find current position in uncompressed stream, by keeping sum of return values of gzip.read(). But these two numbers do not match.
I can alter the QuaZIP library, and access internal zlib-related stuff, if it helps.
There is no reliable way to determine total size of uncompressed stream. See this answer for details and possible workarounds.
However, there is a way to get position in compressed stream:
QFile file(fileName);
file.open(QFile::ReadOnly);
QuaGzipFile gzip;
gzip.open(file.handle(), QuaGzipFile::ReadOnly);
while(true) {
QByteArray buf = gzip.read(1000);
//process buf
if (buf.isEmpty()) { break; }
QFile temp_file_object;
temp_file_object.open(file.handle(), QFile::ReadOnly);
double progress = 100.0 * temp_file_object.pos() / file.size();
qDebug() << qRound(progress) << "%";
}
The idea is to open file manually and use file descriptor to get position. QFile cannot track external position changes, so file.pos() will be always 0. So we create temp_file_object from the file descriptor forcing QFile to request file position. I could use some lower level API (such as lseek()) to get file position but I think my way is more cross-platform.
Note that this method is not very accurate and can give progress values bigger than real. That's because zlib can internally read and decode more data than you have already read.
In zlib 1.2.4 and greater you can use the gzoffset() function to get the current position in the compressed file. The current version of zlib is 1.2.8.
Using an ugly hack to zlib, I was able to find position in compressed stream.
First, I copied definition of gz_stream from gzio.c (from zlib-1.2.3.4 source), to the end of quagzipfile.cpp. Then I reimplemented the virtual function qint64 QIODevice::pos() const:
qint64 QuaGzipFile::pos() const
{
gz_stream *s = (gz_stream *)d->gzd;
return ftello64(s->file);
}
Since quagzipfile.cpp and quagzipfile.h seem to be independent from other QuaZIP library files, maybe it is better to copy the functionality I need from these files and avoid this hack?
The current version of program is something like this:
QFile infile(fileName);
if (!infile.open(QIODevice::ReadOnly))
return;
qint64 fileSize = infile.size;
infile.close();
QuaGzipFile gzip(fileName);
if (!gzip.open(QIODevice::ReadOnly))
return;
qint64 nread;
char buffer[bufferSize];
while ((nread = gzip.read(&buffer, bufferSize)) > 0)
{
// use buffer
int percent = 100.0 * gzip.pos() / fileSize;
// report percent
}
gzip.close();

Protocol Buffers; saving data to disk & loading back issue

I have an issue with storing Protobuf data to disk.
The application i have uses Protocol Buffer to transfer data over a socket (which works fine), but when i try to store the data to disk it fails.
Actually, saving data reports no issues, but i cannot seem to load them again properly.
Any tips would be gladly appreciated.
void writeToDisk(DataList & dList)
{
// open streams
int fd = open("serializedMessage.pb", O_WRONLY | O_CREAT);
google::protobuf::io::ZeroCopyOutputStream* fileOutput = new google::protobuf::io::FileOutputStream(fd);
google::protobuf::io::CodedOutputStream* codedOutput = new google::protobuf::io::CodedOutputStream(fileOutput);
// save data
codedOutput->WriteLittleEndian32(PROTOBUF_MESSAGE_ID_NUMBER); // store with message id
codedOutput->WriteLittleEndian32(dList.ByteSize()); // the size of the data i will serialize
dList.SerializeToCodedStream(codedOutput); // serialize the data
// close streams
delete codedOutput;
delete fileOutput;
close(fd);
}
I've verified the data inside this function, the dList contains the data i expect. The streams report that no errors occur, and that a reasonable amount of bytes were written to disk. (also the file is of reasonable size)
But when i try to read back the data, it does not work. Moreover, what is really strange, is that if i append more data to this file, i can read the first messages (but not the one at the end).
void readDataFromFile()
{
// open streams
int fd = open("serializedMessage.pb", O_RDONLY);
google::protobuf::io::ZeroCopyInputStream* fileinput = new google::protobuf::io::FileInputStream(fd);
google::protobuf::io::CodedInputStream* codedinput = new google::protobuf::io::CodedInputStream(fileinput);
// read back
uint32_t sizeToRead = 0, magicNumber = 0;
string parsedStr = "";
codedinput->ReadLittleEndian32(&magicNumber); // the message id-number i expect
codedinput->ReadLittleEndian32(&sizeToRead); // the reported data size, also what i expect
codedinput->ReadString(&parsedstr, sizeToRead)) // the size() of 'parsedstr' is much less than it should (sizeToRead)
DataList dl = DataList();
if (dl.ParseFromString(parsedstr)) // fails
{
// work with data if all okay
}
// close streams
delete codedinput;
delete fileinput;
close(fd);
}
Obviously i have omitted some of the code here to simplify everything.
As a side note i have also also tried to serialize the message to a string & save that string via CodedOutputStream. This does not work either. I have verified the contents of that string though, so i guess culprit must be the stream functions.
This is a windows environment, c++ with protocol buffers and Qt.
Thank you for your time!
I solved this issue by switching from file descriptors to fstream, and FileCopyStream to OstreamOutputStream.
Although i've seen examples using the former, it didn't work for me.
I found a nice code example in hidden in the google coded_stream header. link #1
Also, since i needed to serialize multiple messages to the same file using protocol buffers, this link was enlightening. link #2
For some reason, the output file is not 'complete' until i actually desctruct the stream objects.
The read failure was because the file was not opened for reading with O_BINARY - change file opening to this and it works:
int fd = open("serializedMessage.pb", O_RDONLY | O_BINARY);
The root cause is the same as here: "read() only reads a few bytes from file". You were very likely following an example in the protobuf documentation which opens the file in the same way, but it stops parsing on Windows when it hits a special character in the file.
Also, in more recent versions of the library, you can use protobuf::util::ParseDelimitedFromCodedStream to simplify reading size+payload pairs.
... the question may be ancient, but the issue still exists and this answer is almost certainly the fix to the original problem.
try to use
codedinput->readRawBytes insead of ReadString
and
dl.ParseFromArray instead of ParseFromString
Not very familiar with protocol buffers but ReadString might only read a field of type strine.

Copying contents of one file to another in C++

I am using the following program to try to copy the contents of a file, src, to another, dest, in C++. The simplified code is given below:
#include <fstream>
using namespace std;
int main()
{
fstream src("c:\\tplat\test\\secClassMf19.txt", fstream::binary);
ofstream dest("c:\\tplat\\test\\mf19b.txt", fstream::trunc|fstream::binary);
dest << src.rdbuf();
return 0;
}
When I built and executed the program using CODEBLOCKS ide with GCC Compiler in windows, a new file named "....mf19.txt" was created, but no data was copied into it, and filesize = 0kb. I am positive I have some data in "...secClassMf19.txt".
I experience the same problem when I compiled the same progeam in windows Visual C++ 2008.
Can anyone please help explain why I am getting this unexpected behaviour, and more importantly, how to solve the problem?
You need to check whether opening the files actually succeeds before using those streams. Also, it never hurts to check if everything went right afterwards. Change your code to this and report back:
int main()
{
std::fstream src("c:\\tplat\test\\secClassMf19.txt", std::ios::binary);
if(!src.good())
{
std::cerr << "error opening input file\n";
std::exit(1);
}
std::ofstream dest("c:\\tplat\\test\\mf19b.txt", std::ios::trunc|std::ios::binary);
if(!dest.good())
{
std::cerr << "error opening output file\n";
std::exit(2);
}
dest << src.rdbuf();
if(!src.eof())
std::cerr << "reading from file failed\n";
if(!dst.good())
std::cerr << "writing to file failed\n";
return 0;
}
I bet you will report that one of the first two checks hits.
If opening the input file fails, try opening it using std::ios::in|std::ios::binary instead of just std::ios::binary.
Do you have any reason to not use CopyFile function?
Best
As it is written, your src instance is a regular fstream, and you are not specifying an open mode for input. The simple solution is to make src an instance of ifstream, and your code works. (Just by adding one byte!)
If you had tested the input stream (as sbi suggests), you would have found that it was not opened correctly, which is why your destination file was of zero size. It was opened in write mode (since it was an ofstream) with the truncation option to make it zero, but writing the result of rdbuf() simply failed, with nothing written.
Another thing to note is that while this works fine for small files, it would be very inefficient for large files. As is, you are reading the entire contents of the source file into memory, then writing it out again in one big block. This wastes a lot of memory. You are better off reading in chunks (say 1MB for example, a reasonable size for a disk cache) and writing a chunk at a time, with the last one being the remainder of the size. To determine the source's size, you can seek to the end and query the file offset, then you know how many bytes you are processing.
And you will probably find your OS is even more efficient at copying files if you use the native APIs, but then it becomes less portable. You may want to look at the Boost filesystem module for a portable solution.