Use RAII for writing end of file marker?

Use RAII for writing end of file marker? - c++

I'm creating a file format where I'd like to write an explicit message into the file indicating that the writer ran to completion. I've had problems in the past with generating files where the generating program crashed and the file was truncated without me realizing, since without an explicit marker there's no way for reading programs to detect a file is incomplete.
So I have a class that is used for writing these files. Now usually if you have an "open" operation and a "close" operation you want to use RAII, so I would put the code to write the end of file marker in the destructor. This way the user can't forget. But in a situation where writing doesn't complete because an exception is thrown the destructor will still be run -- in which case we don't want to write the message so readers will know the file is incomplete.
This seems like something that could happen any time there's a "commit" sort of operation. You want RAII so you can't forget to commit, but you also don't want to commit when an exception occurs. The temptation here is to use std::uncaught_exceptions, but I think that's a code smell.
What's the usual solution to this? Just require that people remember? I'm concerned this will be a stumbling block everytime someone tries to use my API.

One way to tackle this problem is to implement a simple framing system where you can define a header that is only filled in completely at the end of the write. Include a SHA256 hash to make the header useful for verifying the contents of the file. This is usually a lot more convenient than having to read bytes at the end of a file.
In terms of implementation you write out a header with some fields deliberately zeroed out, write the contents of the payload while feeding that data through your hashing method, and then seek back to the header and re-write that with the final values. The file starts out in an obviously invalid state and ends up valid only if everything ran to completion.
You could wrap up all of this in a stream handle that handles the implementation details so as far as the calling code is concerned it's just opening a regular file. Your reading version would throw an exception if the header is incomplete or invalid.

For your example, it seems like RAII would work fine if you add a commit method which the user of your class calls when they are done writing to a file.
class MyFileFormat {
public:
MyFileFormat() : committed_(false) {}
~MyFileFormat() {
if (committed_) {
// write the completion footer (I hope this doesn't throw!)
}
// close the underlying stream...
}
bool open(const std::string& path) {
committed_ = false;
// open the underlying stream...
}
bool commit() {
committed_ = true;
}
};
The onus is on the user to call commit when they're done, but at least you can be sure that resources get closed.
For a more general pattern for cases like this, take a look at ScopeGuards.
ScopeGuards would move the responsibility for cleanup out of your class, and can be used to specify an arbitrary "cleanup" callback in the event that the ScopeGuard goes out of scope and is destroyed before being explicitly dismissed. In your case, you might extend the idea to support callback for both failure-cleanup (e.g. close file handles), and success-cleanup (e.g. write completion footer and close file handles).

I've handled situations like that by writing to a temporary file. Even if you're appending to a file, append to a temporary copy of the file.
In your destructor, you can check std::uncaught_exception() to decide whether your temporary file should be moved to its intended location.

Related

Functions responsibility on data in C

Recently I ran into a problem at work where you have two functions; one opens a file descriptor (which is a local variable in the function), and passes it to another function where it is used for reading or writing. Now, when one of the operations read/write fails the function that was doing the read/write closes this file descriptor, and returns.
The question is, whose responsibility is to close the file descriptor, or let's say do cleanup:
the function which created the fd
the function which experienced the error while read/write
Is there a design rule for these kind of cases; let's say creation and cleanup.
BTW, the problem was that both functions attempted to close the fd, which resulted in a crash on the second call to close.

There are two parts to this answer — the general design issue and the detailed mechanics for your situation.
General Design
Handling resources such as file descriptors correctly, and making sure they are released correctly, is an important design issue. There are multiple ways to manage the problem that work. There are some others that don't.
Your tags use C and C++; be aware that C++ has extra mechanisms available to it.
In C++, the RAII — Resource Acquisition Is Initialization — idiom is a great help. When you acquire a resource, you ensure that whatever acquires the resource initializes a value that will be properly destructed and will release the resource when destructed.
In both languages, it is generally best if the function responsible for allocating the resource also releases it. If a function opens a file, it should close it. If a function is given an open file, it should not close the file.
In the comments, I wrote:
Generally, the function that opened the file should close it; the function that experienced the error should report the error, but not close the file. However, you can work it how you like as long as the contract is documented and enforced — the calling code needs to know when the called code closed the file to avoid double closes.
It would generally be a bad design for the called function to close the file sometimes (on error), but not other times (no error). If you must go that way, then it is crucial that the calling function is informed that the file is closed; the called function must return an error indication that tells the calling code that the file is no longer valid and should neither be used nor closed. As long as the information is relayed and handled, there isn't a problem — but the functions are harder to use.
Note that if a function is designed to return an opened resource (it is a function that's responsible for opening a file and making it available to the function that called it), then the responsibility for closing the file falls on the code that calls the opening function. That is a legitimate design; you just have to make sure that there is a function that knows how to close it, and that the calling code does close it.
Similar comments apply to memory allocation. If a function allocates memory, you must know when the memory will be freed, and ensure that it is freed. If it was allocated for the purposes of the current function and functions it calls, then the memory should be released before return. If it was allocated for use by the calling functions, then the responsibility for release transfers to the calling functions.
Detailed mechanics
Are you sure you're using file descriptors and not FILE * (file streams)? It's unlikely that closing a file descriptor twice would cause a crash (error, yes, but not a crash). OTOH, calling fclose() on an already closed file stream could cause problems.
In general, in C, you pass file descriptors, which a small integers, by value, so there isn't a way to tell the calling function that the file descriptor is no longer valid. In C++, you could pass them by reference, though it is not conventional to do so. Similarly with FILE *; they're most usually passed by value, not by reference, so there isn't a way to tell the calling code that the file is not usable any more by modifying the value passed to the function.
You can invalidate a file descriptor by setting it to -1; that is never a valid file descriptor. Using 0 is a bad idea; it is equivalent to using standard input. You can invalidate a file stream by setting it to 0 (aka NULL). Passing the null pointer to functions that try to use the file stream will tend to cause crashes. Passing an invalid file descriptor typically won't cause crashes — the calls may fail with EBADF set in errno, but that's the limit of the damage, usually.
Using file descriptors, you will seldom get a crash because the file descriptor is no longer valid. Using file streams, all sorts of things can go wrong if you try using an invalid file stream pointer.

Why do C++ standard file streams not follow RAII conventions more closely?

Why do C++ Standard Library streams use open()/close() semantics decoupled from object lifetime? Closing on destruction might still technically make the classes RAII, but acquisition/release independence leaves holes in scopes where handles can point to nothing but still need run-time checks to catch.
Why did the library designers choose their approach over having opening only in constructors that throw on a failure?
void foo() {
std::ofstream ofs;
ofs << "Can't do this!\n"; // XXX
ofs.open("foo.txt");
// Safe access requires explicit checking after open().
if (ofs) {
// Other calls still need checks but must be shielded by an initial one.
}
ofs.close();
ofs << "Whoops!\n"; // XXX
}
// This approach would seem better IMO:
void bar() {
std_raii::ofstream ofs("foo.txt"); // throw on failure and catch wherever
// do whatever, then close ofs on destruction ...
}
A better wording of the question might be why access to a non-opened fstream is ever worth having. Controlling open file duration via handle lifetime does not seem to me to be a burden at all, but actually a safety benefit.

Although the other answers are valid and useful, I think the real reason is simpler.
The iostreams design is much older than a lot of the Standard Library, and predates wide use of exceptions. I suspect that in order to be compatible with existing code, the use of exceptions was made optional, not the default for failure to open a file.
Also, your question is only really relevant to file streams, the other types of standard stream don't have open() or close() member functions, so their constructors don't throw if a file can't be opened :-)
For files, you may want to check that the close() call succeeded, so you know if the data got written to disk, so that's a good reason not to do it in the destructor, because by the time the object is destroyed it is too late to do anything useful with it and you almost certainly don't want to throw an exception from the destructor. So an fstreambuf will call close in its destructor, but you can also do it manually before destruction if you want to.
In any case, I don't agree that it doesn't follow RAII conventions...
Why did the library designers choose their approach over having opening only in constructors that throw on a failure?
N.B. RAII doesn't mean you can't have a separate open() member in addition to a resource-acquiring constructor, or you can't clean up the resource before destruction e.g. unique_ptr has a reset() member.
Also, RAII doesn't mean you must throw on failure, or an object can't be in an empty state e.g. unique_ptr can be constructed with a null pointer or default-constructed, and so can also point to nothing and so in some cases you need to check it before dereferencing.
File streams acquire a resource on construction and release it on destruction - that is RAII as far as I'm concerned. What you are objecting to is requiring a check, which smells of two-stage initialization, and I agree that is a bit smelly. It doesn't make it not RAII though.
In the past I have solved the smell with a CheckedFstream class, which is a simple wrapper that adds a single feature: throwing in the cosntructor if the stream couldn't be opened. In C++11 that's as simple as this:
struct CheckedFstream : std::fstream
{
CheckedFstream() = default;
CheckedFstream(std::string const& path, std::ios::openmode m = std::ios::in|std::ios::out)
: fstream(path, m)
{ if (!is_open()) throw std::ios::failure("Could not open " + path); }
};

This way you get more and nothing less.
You get the same: You still can open the file via constructor. You still get RAII: it will automatically close the file at object destruction.
You get more: you can use the same stream to reopen other file; you can close the file when you want, not being restricted to wait for the object going out of scope or being destructed (this is very important).
You get nothing less: The advantage you see is not real. You say that your way you don’t have to check at each operation. This is false. The stream can fail at any time even if it successfully opened (the file).
As about error checking vs throwing exceptions, see #PiotrS’s answer. Conceptually I see no difference between having to check the return status vs having to catch error. The error is still there; the difference is how you detect it. But as pointed by #PiotrS you can opt for both.

The library designers gave you alternative:
std::ifstream file{};
file.exceptions(std::ifstream::failbit | std::ifstream::badbit);
try
{
file.open(path); // now it will throw on failure
}
catch (const std::ifstream::failure& e)
{
}

The standard library file streams do provide RAII, in the
sense that calling the destructor on one will close any file
which happens to be open. At least in the case of output,
however, this is an emergency measure, which should only be used
if you have encountered another error, and are not going to use
the file which was being written anyway. (Good programming
practice would be to delete it.) Generally, you need to check
the status of the stream after you've closed it, and this is
an operation which can fail, so shouldn't be done in the
destructor.
For input, it's not so critical, since you'll have checked the
status after the last input anyway, and most of the time, will
have read until an input fails. But it does seem reasonable to
have the same interface for both; from a programming point of
view, however, you can usually just let the close in the
destructor do its job on input.
With regards to open: you can just as easily do the open in
the constructor, and for isolated uses like you show, this is
probably the preferred solution. But there are cases where you
might want to reuse an std::filebuf, opening it and closing it
explicitly, and of course, in almost all cases, you will want to
handle a failure to open the file immediately, rather than
through some exception.

It depends on what you are doing, reading or writing.
You can encapsulate an input stream in RAII way, but it is not true for output streams. If the destination is a disk file or network socket, NEVER, NEVER put fclose/close in destructor. Because you need check the return value of fclose, and there is no way to report an error occurred in destructor. see How can I handle a destructor that fails

sequencing events with visitor pattern

I have a visitor pattern implemented and it seems to be working fine but I don't see how to do some housekeeping work at the very start and the very end.
There is no guarantee of when the various overloaded visit() methods will be called so I can't tell who is the first one and who is the last one.
Basically I'm using visitor to save/load settings to/from disk. The problem is (on loading) I need to clear some stuff out before I do any of the other loading steps. I did put in a static variable and method to initialize things and do this load, that should ensure that something happens only once at the very start -but- a person could load things multiple times. So at the end of the reading I'd like to reset the static variable (so they can read in again without the old junk still being there). I can't simply put the reset into the destructor (or a method called by the destructor) because the concrete visitor objects are created/destroyed n times for each grouping of settings.
I guess I need to yolk it with another pattern but am not seeing how to.

Following up on my comment above.
You could have a class
class VisitorState {
public:
VisitorState() {
// stuff to be done on loading
}
~VisitorState() {
// stuff to be done when done.
}
private:
// state info you might want to keep around
};
and then modify your Visitor interface to have methods that include the VisitorState
someReturn visit(VisitorState &state,....)
The VisitorState must be allocated (new'ed) when the file is requested to be loaded and kept around associated to the file being visited... It must be deallocated when (delete'd) when the processing of the file ends.

Copy file even when destination exists (in Qt)

In the QFile::copy documentation it says
If a file with the name newName already exists, copy() returns false
(i.e., QFile will not overwrite it).
But I need to copy a file even if the destination exists. Any workaround available in Qt for that?
Deleting the file is the obvious solution but it invites a race condition...

if (QFile::exists("/home/user/dst.txt"))
{
QFile::remove("/home/user/dst.txt");
}
QFile::copy("/home/user/src.txt", "/home/user/dst.txt");

The obvious solution is of course to delete the file if it exists, before doing the copy.
Note however that doing so opens up the code to a classic race condition, since on a typical multitasking operating system a different process could re-create the file between your applications' delete and copy calls. That would cause the copy to still fail, so you need to be prepared (and perhaps re-try the delete, but that might introduce a need for count so you don't spend forever attempting, and on and on).

The simplest retrying I can think of is:
while !QFile::copy("/home/user/src.txt", "/home/user/dst.txt")
{
QFile::remove("/home/user/dst.txt");
}
But this still isn't a real solution as some of the race conditions are things that don't block remove.
I'm currently hunting for a way to handle writing a web page as an output but without the auto refresh ever catching between the remove and the copy.

Just call remove() before calling copy()

How to handle object destruction in error case vs. non-error case

I have a program that is responsible for reading data, formatting it and creating records, and outputting records to files. The important classes for this discussion are:
RecordGenerator - contains the thread that controls the main flow
(get data, format, output)
FileManager - manages the output files. Records are sent to this
class which then puts it in a charging file.
OutputFile - abstraction of a file that contains records, has
print(), close(), etc. These objects are owned by the FileManager
During a normal process shutdown, the destructors for these classes all get called which causes all remaining records to get flushed out to the current output file and then it gets closed. This ensures we don't lose any data.
However, during an error case, we need to shutdown but we don't want to flush and close the file since the data is likely corrupt. Normally what happens is an exception will get thrown which gets caught in the RecordGenerator which then decides if this is a fatal error or not. If it is, it will initiate the application shutdown. It's at this point that the FileManager gets destructed, but needs to know whether there is an error. Likewise, when the FileManager gets destructed, this causes the OutputFile to get destructed which also needs to know whether there is an error.
My first reaction was to just add some public functions that set error flags for these classes, so RecordGenerator could call FileManager::setErrorFlag() which can then call OutputFile::setErrorFlag(). Adding a chain of these seems like a pretty bad smell to me, especially if you consider the object chain could be much longer than this.
Is there some better way of handling this sort of scenario?

This is a typical problem when people start using RAII the way it's not meant to be used. Destructors should clean resources and revert whatever they are responsible to. They should not commit changes. Typical exception safe C++ code looks like this:
allocate resource
do something
commit changes
For example:
X& X::operator = (const X& x)
{
X y(x); // allocate
this->swap(y); // commit
return *this;
}
void f()
{
Transaction t(...); // begin transaction
// operate
t.commit(); // commit transaction
}
void g()
{
File f(...); // open file
// write to file
f.flush(); // flush the buffers, this may throw but not f.~File()
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js