Reopening a closed file stream - c++

Consider the following code,
auto fin = ifstream("address", ios::binary);
if(fin.is_open())
fin.close()
for(auto i = 0; i < N; ++i){
fin.open()
// ....
// read (next) b bytes...
// ....
fin.close()
// Some delay
}
The code above can't be implemented in the C++ I know, but I'd like to know if it is possible to do so?
Here are my requirements:
When reopening the file, there would be no need to pass the parameters (path and mode) again.
When reopening the stream, it continues from the point in the stream that it was when got closed.
Clarification
The files I work with are big in size and in a point of time other threads from third party libraries may decide to (re)move them. An open stream will prevent such actions.
Continuously reading a big file will slow down the system.

The need
Indeed, a file can't be deleted by another process as long as a stream keeps it open.
I suppose you have already asked yourself these questions, but fo the recors I have to suggest you to think about it:
Can't the file be read into (virtual) memory and discarded when no longer needed ?
Can't the file processing be pipelined asynchronously, to read it at once and process it without unnecessary delays ?
What to do if the file can no longer be opened because it was deleted by the other process ? What to do if the location can't be found, because the file was modified (e.g. shortened) ?
If you would have the perfect solution to your issue, what would be the effect if the other process would try to delete the file when it is open (only for a short time, but nevertheless open and blocking the deletion) ?
The solution
Unfortunately, you can't achieve the desired behavior with standard streams. You could emulate it by keeping track of the filename and of the position (and more generally of the state):
auto mypos = ifs.tellg(); // saves position.
// Should flag be saved as well ? and what about gcount ?
ifs.close();
...
if (! ifs.is_open()) {
ifs.open(myfilename, myflags); // open again !
if (! ifs) {
// ouch ! file disapeared ==> process error
}
ifs.seekg(mypos); // restore position
if (! ifs) {
// ouch ! position no longer reachable ==> process error
}
}
Of course, you wouldn't like to repeat this code ever and ever. And it would not be so nice having all the sudden a lot of global variables to keep track of the stream's state. But you could very easily encapsulate it in a wrapper class that would take care of saving and restoring the stream's state using existing standard operations.

Related

Qt5 QFile::close() very slow for writing

I am using QFile as a file reader and a file writer to copy files to USB from inside my application. I have been trying to figure out why my file copies to USB (with progress bar) are taking so long. I finally found out that when I close the QFile object that is used for writing, the close() operation can take well over the time taken for the actual write operation. These are very large files, and I read/write blocks of 16384 bytes, and then I send a signal to the GUI to increase the progress bar that is viewed by the user. I ended up adding a call to flush() after each write since I assume this is a result of the out stream not actually having yet been written to disk. That didn't make a difference. The close of the outgoing QFile object still takes much longer than what seems to have been the write time (timing taken before and after copy, and before and after each of the QFile::close() calls, the timing code has been removed for ease of reading, I also debugged and saw it happening). Of course, it doesn't help to not call the close() function, since the destruction of the QFile object causes it to be called.
My code is as follows (minus error checking, destination space checking, etc):
void FileCopy::run()
{
QByteArray bytes;
int totalBytesWritten = 0;
int inListSize = inList.size();
for (int i=0; !canceled && i<inListSize; i++)
{
QString inPath = inList.at(i).inPath;
QString outPath = inList.at(i).outPath;
QFile inFile(inPath);
QFile outFile(outPath);
int filesize = inFile.size();
int bytesWritten = 0;
if (!inFile.open(QIODevice::ReadOnly))
{
return;
}
if (!outFile.open(QIODevice::WriteOnly))
{
inFile.close();
return;
}
// copy the FCS file with progress
while (!canceled && bytesWritten < filesize)
{
bytes = inFile.read(MAXBYTES);
qint64 outsize = outFile.write(bytes);
outFile.flush();
if (outsize != bytes.size())
{
break;
}
bytesWritten += outsize;
totalBytesWritten += outsize;
Q_EMIT signalBytesCopied(totalBytesWritten, i+1, inListSize);
QThread::usleep(100); // allow time for detecting a cancel
}
inFile.close();
outFile.close();
}
// Other error checking done here
}
Can anyone see a way to get passed this? I would actually prefer that the progress bar move more slowly, more accurately displaying the state of the copy to the user, than to have the progress bar read 100% in less than half the time it takes for the copy and close to actually complete.
I have also tried using QSaveFile instead of QFile for the output, but QSaveFile::commit() has the same exact problem, taking more time to commit than to finish the actual copy loop. I assume that this is because, underneath, it is using the same functionality as QFile is, derived from QIoDevice.
I have considered moving to using standard streams, but would like to keep some consistency in how file handling is done in this application. It is a possibility though, if QFile::close() is going to take this long to close. Or is it possible that the standard stream would have the same issue?
I am working on a Win7 32-bit box with VS2010 using Qt5.1.1 and the Qt 1.2.2 VS add-in. Thanks for any suggestions.
While you are writing, the OS probably just caches the writes in memory (fast). But when you close the file it has to flush all the data to disk (slow - especially if it has not actually written any of it yet). So closing the file has to wait for the OS actually putting all the data onto the disk (USB) and that may actually be all of it at that time.
The reason why operating systems do something like this is of course to speed up writes - and often they can then get away with flushing the data to disk in the background when nothing else is going on (so you don't really notice the actual cost, since it is amortized over time where nothing else is going on). But if you just write and then close at once you are going to notice.
Note: the alternative would be the write calls being slower - you would still end up spending the same actual time.

Clearing eof bit not working with while loop

I have a program I'm trying to write that constantly monitors a log file and outputs specific items into a new file.
I'm using essentially.
for (int i = 1; i < y; i ++)
getline(read, line); // skips to the last known end
while (getline(read, line))
{
cout << line;
}
read.clear();
I also keep track of the line I'm on just using the increment operator on a variable. At the end of the file I clear the eof bit and seek to the last line I was on. From using the debugger it seems that it works. I retrieve the next line in the file as its being written but when I call back to my while (getline(read,line)); it skips going through the while loop, why is that?
program reads the last line in the file.
Sleeps for 5 minutes.
The input file has had new lines added to it from a third party.
After the sleep it wakes up and goes back to the while loop.
It successfully retrieves the new lines from the input. But fails to
enter the while loop again
When using std::getline() at the end of a file both std::iostate::eofbit and std::iostate::failbit are set. In fact, it is std::iostate::failbit which causes the loop to exit. You'll need to clear both of these flags prior to any seek.
For a system which needs to use IOStreams I would actually not bother reading the leading lines but merely wait a bit, clear the flags, and try again. The main issue is detecting whether a complete line is read which could be done by simply reading individual characters, e.g., using std::istreambuf_iterator<char>.
Otherwise I'd look for a system API which provide some sort of indication that new data is available on a file. Older system generally don't provide such facilities but newer system generally have some event-based interface which can be used to get hold of newly available data. The advantage is normally that the processes doesn't poll for new data but idly waits until it gets notified about new data. I haven't used it myself but it seems libuv does this sort of operations in a somewhat platform-independent form.

Trouble saving data written to a file when I kill the app

My program is always writing data to a file but when I close it before the program fully stops, the end result is nothing being written to the file. I would really like to be able to close it without completing it fully, so how can I fix this to make it constantly saving the file?
ofstream outfile;
outfile.open("text.txt", std::ios::app);
bool done = false;
int info;
while (done == false){
cin>>info;
outfile<<info;
cout<<info<<"Choose different info";
if(info == 100){
done = true;
}
}
outfile.close();
This is obviously just an example, but it is very similar to my actual code.
Edit: When i say closing I mean killing it (Hitting red X at top right of console)
You likely need to flush your std::ofstream when you have done "enough" work.
"enough" work here is going to depend on your application.
Perhaps
...
outfile<<info;
outfile.flush();
...
The operation system doesn't write to the file when you call the write function to save time, it wait to check if you want to write anything else or for a time which will be "good" to write. You write to a buffer and the operating system will write this buffer to the file.
When you close the function it write anything that left in the buffer to the file. You can force your code to write to the file using flush method. Just flush your file after every time you write and you will be ok.
flush: http://www.cplusplus.com/reference/ostream/ostream/flush/
outfile << n;
outfile.flush();

C++ continuous read file

I've a producer/consumer set-up: Our client is giving us data that our server processes, and our client is giving data to our server by constantly writing to a file. Our server uses inotify to look for any file modifications, and processes the new data.
Problem: The file reader in the server has a buffer of size 4096. I've a unit test that simulates the above situation. The test constantly writes to an open file, which the file reader constantly tries to read an process. But, I noticed that after the first record is read, which is much smaller than 4096, an error flag is set in the ifstream object. This means that any new data arriving is not being processed. A simple workaround seems to be to call ifstream::clear after every read, and this does solve the issue. But, what is going on? Is this the right solution?
First off, depending on your system it may or may not be possible to read a file another process writes to: On Windows the normal settings when opening a file make the access exclusive. I don't know enough about Window to tell whether there are other settings. On POSIX system a file with suitable permissions can be opened for reading and writing by different processes. From the sounds of it you are using Linux, i.e., something following the POSIX specification.
The approach to polling a file upon change isn't entirely ideal, though: As you noticed, you get an "error" every time you reach the end of the current file. Actually, reaching the end of a file isn't really an error but trying to decode something beyond end of file is an error. Also, reading beyond the end of file will still set std::ios_base::eofbit and, thus, the stream won't be good(). If you insist on using this approach there isn't much choice than reading up to the end of the file and dealing with the incomplete read somehow.
If you have control over creating the file, however, you can do a simple trick: Instead of having the file be a normal file, you can create it is mkfifo to create a named pipe using the file name the writing program will write to: When opening a file on a POSIX system it doesn't create a new file if there is already one but uses the existing file. Well, file or whatever else is addressed by the file name (in addition to files and named pipe you may see directories, character or block special devices, and possibly others).
Named pipes are curious beasts intended to have two processes communicate with each other: What is written to one end by one process is readable at the other end by another process! The named pipe itself doesn't have any content, i.e., if you need both the content of the file and the communication with another process you might need to replicate the content somewhere. Opening a named pipe for reading which will block whenever it has reached the current end of the file, i.e., initially the read would block until there is a writer. Similarly writes to the named pipe will block until there is a reader. Once there two processes communicating the respective other end will receive an error when reading or writing the named pipe after the other process has exited.
If you are good with opening and closing file again and again,
The right solution to this problem would be to store the last read pos and start from there once file is updated:
Exact algo will be :
set start_pos = 0 , end pos =0
update end_pos = infile.tellg(),
move get pointer to start_pos (use seekg()) and read the block (end_pos - start_pos).
update start_pos = end_pos and then close the file.
sleep for some time and open file again.
if file stream is still not good , close the file and jump to step 5.
if file stream is good, Jump to step 1.
All c++ reference is present at http://www.cplusplus.com/reference/istream/istream/seekg/
you can literally utilize the sample code given here.
Exact code will be:
`
#include <iostream>
#include <fstream>
int main(int argc, char *argv[]) {
if (argc != 2)
{
std::cout << "Please pass filename with full path \n";
return -1;
}
int end_pos = 0, start_pos = 0;
long length;
char* buffer;
char *filePath = argv[1];
std::ifstream is(filePath, std::ifstream::binary);
while (1)
{
if (is) {
is.seekg(0, is.end);
end_pos = is.tellg(); //always update end pointer to end of the file
is.seekg(start_pos, is.beg); // move read pointer to the new start position
// allocate memory:
length = end_pos - start_pos;
buffer = new char[length];
// read data as a block: (end_pos - start_pos) blocks form read pointer
is.read(buffer, length);
is.close();
// print content:
std::cout.write(buffer, length);
delete[] buffer;
start_pos = end_pos; // update start pointer
}
//wait and restart with new data
sleep(1);
is.open(filePath, std::ifstream::binary);
}
return 0;
}
`

Delay in ofstream::open, possibly due to mixing with _iobuf?

I have a C++ program that creates an output file "A" with ofstream. This file is then read by some legacy C code that opens the file with _iobuf. The legacy code then creates its own output file "B" using _iobuf, and this file is then read by the C++ program using ifstream. This sequence is iterated many times, with the same file names for A and B in each iteration.
Occasionally, the C++ program cannot open the output file A for writing, and I must try several times before it succeeds. This occurs nondeterministically, and maybe once in a thousand iterations. Note that the C program never has to wait to open its input or output file, nor does the C++ program ever have to wait to open its input file. This informal observation is based on hundreds of thousands of iterations.
I'm wondering if this has something to do with mixing ofstream and _iobuf in the same program? Both the C++ code and the C code are linked into the same program. And the legacy C code is technically C++ code, but was written in a very C-like style. Is there anything I can do to eliminate this waiting to open the ofstream file? I do not want to change the legacy code if I can possibly avoid it.
Pseudo code (not compiled):
void someObject::someMethod()
{
for (int count = 0; count < someLimit; ++count)
{
newerObject::firstMethod();
olderObject::secondMethod();
newerObject::thirdMethod();
}
}
void newerObject::firstMethod()
{
// do some processing first
// then write the results of the processing to a file
ofstream A;
A.open("A", ofstream::out); // this sometimes must be tried multiple times
// write data to file A
A.close();
}
void olderObject::secondMethod()
{
FILE* f;
f = fopen("A", "rt"); // this always works the first time
// read data from file A
fclose(f);
// do some processing
f = fopen("B", "w");
// write data to file B
fclose(f);
}
void newerObject::thirdMethod()
{
ifstream B;
B.open("B"); // this always works the first time
// read data from file B
B.close();
// do some processing
}
Currently, as a work around, I put the ofstream::open in a do-while loop. I would love to get rid of this awkwardness. Thanks in advance for any advice you can give.
First off, the problem is almost certainly not the use of different methods to access the files: under the hood, the C and C++ I/O functions use the same system I/O facilities. You seem to be using Windows (on other systems files typically can be open multiple times simultaneously) and I don't know much about the system but I would suspect that the file system hasn't been updated to reflect that the file is closed when you try to open it. This may have to do with the "t" open flag: I don't know what this is about.
On UNIXes you can force the I/O operations to wait until the actual change on disk completed. Something like this could help avoiding the problem but has the significant cost that operations become hideously slow. On UNIXes one approach would be to blow away the file system entry the moment the file was opened successfully (after all, at this point its name isn't used anymore):
if (FILE* fp = fopen("file", "r")) {
remove("file");
// do processing
}
However, if I recall correctly on Windows you can neither remove the file nor rename it. Personally, in solving the problem I would proceed as follows:
Determine under which situations the file can't be opened, e.g. by keeping the file open and trying to open it. This is mainly intended to create a setup where the problem is reproducible so you can verify later that you indeed found a solution.
Once I found a way to reproduce the problem I would probably a better idea of the actual root cause and possibly googling would help. In any case this is the point where researching the root cause comes in.
Once the cause is understood it is hopefully easy to devise a solution. If not, opening the file multiple times under it is successful may very well be the right solution.