What happened if CFile::Write throws an exception?

What happened if CFile::Write throws an exception? - c++

Assume that write operation throws an exception half-way. Is there any data written into the file, or is no data written in the file?

Since you have no view of the internals of CFile (or shouldn't, if it's encapsulated properly), you need to rely on the 'contract' of the API. In other words, unless the documentation tells you specifically what happens in certain cases, you can't rely on it.
Even if you had the source code and could figure it out, the API specification is the contract and anything not specified there can change at any time. This is one reason why some software developers are wary of publishing internals as it then can be seen to lock them in to supporting that forever.
If you really want to ensure that your file will be in a known state after the exception, you will need to code around the behavior. This can be something like:
backing up the file at program start-up (simple) ; or
backing it up before each save operation (still relatively simple); or
backing it up before any write operation (complex and slow).

Short answer: Most likely some data will be written to the file, unless the disk is full at the start of the write operation.
Longer answer: It will depend on what CFileException is thrown from the Write call.
http://msdn.microsoft.com/en-us/library/as5cs056(VS.80).aspx

Related

Should exceptions ever be caught

No doubt exceptions are usefull as they show programmer where he's using functions incorrectly or something bad happens with an environment but is there a real need to catch them?
Not caught exceptions are terminating the program but you can still see where the problem is. In well designed libraries every "unexpected" situation has actually workaround. For example using map::find instead of map::at, checking whether your int variable is smaller than vector::size prior to using index operator.
Why would anyone need to do it (excluding people using libraries that enforce it)? Basically if you are writing a handler to given exception you could as well write a code that prevents it from happening.

Not all exceptions are fatal. They may be unusual and, therefore, "exceptions," but a point higher in the call stack can be implemented to either retry or move on. In this way, exceptions are used to unwind the stack and a nested series of function or method calls to a point in the program which can actually handle the cause of the exception -- even if only to clean up some resources, log an error, and continue on as before.

You can't always write code that prevents an exception. Just for an obvious example, consider concurrent code. Let's assume I attempt to verify that i is between (say) 0 and 20, then use i to index into some array. So, I check and i == 12, so I proceed to use it to index into the array. Unfortunately, in between the test and the indexing operation, some other thread added 20 to i, so by the time it's used as an index, it's not in range any more.
The concurrency has led to a race condition, so the attempt at assuring against an exceptional condition has failed. While it's possible to prevent this by (for example) wrapping each such test/use sequence in a critical section (or similar), it's often impractical to do so--first, getting the code correct will often be quite difficult, and second even if you do get it correct, the consequences on execution speed may be unacceptable.
Exceptions also decouple code that detects an exceptional condition from code that reacts to that exceptional condition. This is why exception handling is so popular with library writers. The code in the library doesn't have a clue of the correct way to react to a particular exceptional condition. Just for a really trivial example, let's assume it can't read from a file. Should it print a message to stderr, pop up a MessageBox, or write to a log?
In reality, it should do none of these. At least two (and possibly all three) will be wrong for any given program. So, what it should do is throw an exception, and let code at a higher level determine the appropriate way to respond. For one program it may make sense to log the error and continue with other work, but for another the file may be sufficiently critical that its only reasonable reaction is to abort execution entirely.

Exceptions are very expensive, performance vise - thus, whenever performance matter you will want to write an exception free code (using "plain C" techniques for error propagation).
However, if performance is not of immediate concern, then exceptions would allow you to develop a less cluttered code, as error handling can be postponed (but then you will have to deal with non-local transfer of control, which may be confusing in itself).

I have used extensivelly exceptions as a method to transfer control on specific positions depending on event handling.
Exceptions may also be a method to transfer control to a "labeled" position alog the tree of calling functions.
When an exception happens the code may be thought as backtracking one level at a time and checking if that level has an exception active and executing it.
The real problem with exceptions is that you don't really know where these will happen.
The code that arrives to an exception, usually doesn't know why there is a problem, so a fast returning back to a known state is a good action.
Let's make an example: You are in Venice and you look at the map walking throught small roads, at a moment you arrive somewhere that you aren't able to find in the map.
Essentially you are confused and you don't understand where you are.
If you have the ariadne "μιτος" you may go back to a known point and restart to try to arrive where you want.
I think you should treat error handling only as a control structure allowing to go back at any level signaled (by the error handling routine and the error code).

How safe are memory-mapped files for reading input files?

Mapping an input file into memory and then directly parsing data from the mapped memory pages can be a convenient and efficient way to read data from files.
However, this practice also seems fundamentally unsafe unless you can ensure that no other process writes to a mapped file, because even the data in private read-only mappings may change if the underlying file is written to by another process. (POSIX e.g. doesn't specify "whether modifications to the underlying object done after the MAP_PRIVATE mapping is established are visible through the MAP_PRIVATE mapping".)
If you wanted to make your code safe in the presence of external changes to the mapped file, you'd have to access the mapped memory only through volatile pointers and then be extremely careful about how you read and validate the input, which seems impractical for many use cases.
Is this analysis correct? The documentation for memory mapping APIs generally mentions this issue only in passing, if at all, so I wonder whether I'm missing something.

It is not really a problem.
Yes, another process may modify the file while you have it mapped, and yes, it is possible that you will see the modifications. It is even likely, since almost all operating systems have unified virtual memory systems, so unless one requests unbuffered writes, there's no way of writing without going through the buffer cache, and no way without someone holding a mapping seeing the change.
That isn't even a bad thing. Actually, it would be more disturbing if you couldn't see the changes. Since the file quasi becomes part of your address space when you map it, it makes perfect sense that you see changes to the file.
If you use conventional I/O (such as read), someone can still modify the file while you are reading it. Worded differently, copying file content to a memory buffer is not always safe in presence of modifications. It is "safe" insofar as read will not crash, but it does not guarantee that your data is consistent.
Unless you use readv, you have no guarantees about atomicity whatsoever (and even with readv you have no guarantee that what you have in memory is consistent with what is on disk or that it doesn't change between two calls to readv). Someone might modify the file between two read operations, or even while you are in the middle of it.
This isn't just something that isn't formally guaranteed but "probably still works" -- on the contrary, e.g. under Linux writes are demonstrably not atomic. Not even by accident.
The good news:
Usually, processes don't just open an arbitrary random file and start writing to it. When such a thing happens, it is usually either a well-known file that belongs to the process (e.g. log file), or a file that you explicitly told the process to write to (e.g. saving in a text editor), or the process creates a new file (e.g. compiler creating an object file), or the process merely appends to an existing file (e.g. db journals, and of course, log files). Or, a process might atomically replace a file with another one (or unlink it).
In every case, the whole scary problem boils down to "no issue" because either you are well aware of what will happen (so it's your responsibility), or it works seamlessly without interfering.
If you really don't like the possibility that another process could possibly write to your file while you have it mapped, you can simply omit FILE_SHARE_WRITE under Windows when you create the file handle. POSIX makes it somewhat more complicated since you need to fcntl the descriptor for a mandatory lock, which isn't necessary supported or 100% reliable on every system (for example, under Linux).

In theory, you're probably in real trouble if someone does
modify the file while you're reading it. In practice: you're
reading characters, and nothing else: no pointers, or anything
which could get you into trouble. In practice... formally,
I think it's still undefined behavior, but it's one which
I don't think you have to worry about. Unless the modifications
are very minor, you'll get a lot of compiler errors, but that's
about the end of it.
The one case which might cause problems is if the file was
shortened. I'm not sure what happens then, when you're reading
beyond the end.
And finally: the system isn't arbitrarily going to open and
modify the file. It's a source file; it will be some idiot
programmer who does it, and he deserves what he gets. In no
case will your undefined behavior corrupt the system or other
peoples files.
Note too that most editors work on a private copy; when the
write back, they do so by renaming the original, and creating
a new file. Under Unix, once you've opened the file to mmap
it, all that counts is the inode number. And when the editor
renames or deletes the file, you still keep your copy. The
modified file will get a new inode. The only thing you have to
worry about is if someone opens the file for update, and then
goes around modifying it. Not many programs do this on text
files, except for appending additional data to the end.
So while formally, there's some risk, I don't think you have to
worry about it. (If you're really paranoid, you could turn off
write authorisation while you're mmaped. And if there's
really an enemy agent out to get your, he can turn it right back
on.)

What circumstances ostream::write or ostream::operator<< would fail under?

In my C++ code, I am constantly writing different values into a file. My question is that if there is any circumstances that write or << would fail under, considering the fact that file was opened successfully. Do I need to check every single call of write or << to make sure it was carried out correctly?

There are too many failure reasons to list them all. Possible ones would be:
the partition is finally full
the user exceeds his disk quota
the partition has been brutally unmounted
the partition has been damaged (filesystem bug)
the disk failed physically
...
Do I need to check every single call of write or << to make sure it was carried out correctly?
If you want your program to be resilient to failures then, definitely, yes. If you don't, it simply means the data you are writing may or may not be written, which amounts to say you don't care about it.
Note: Rather than checking the stream state after every operation (which will soon be extremely tedious) you can set std::ostream::exceptions to your liking so that the stream will throw an exception when it fails (which shouldn't be a problem since such disk failures are quite exceptional by definition).

There are any number of reasons why a write could fail. Off the top of my head here are a few:
The disk is full
The disk fails
The file is on an NFS mount and the network goes down
The stream you're writing to (remember that an ostream isn't always a file) happens to be a pipe that closes when the downstream reader crashes
The stream you're writing to is a TCP socket and the peer goes away
And so on.
EDIT: I know you've said that you're writing to a file, I just wanted to draw attention to the fact that your code should only care that it's writing to an ostream which could represent any kind of stream.

The others covered situations that might result in output failure.
But:
Do I need to check every single call of write or << to make sure it was carried out correctly?
To this, I would answer "no". You could conceivably just as well check
if the file was opened successfully, and
if the stream is still good() after you wrote your data.
This depends, of course, on the type of data written, and the possibility / relative complexity of recovering from partial writes vs. re-running the application.
If you need closer control on when exactly a write failed (e.g. in order to do a graceful recovery), the ostream exceptions syam linked to are the way to go. Polling stream state after each operation would bloat the code.

Is it not possible to make a C++ application "Crash Proof"?

Let's say we have an SDK in C++ that accepts some binary data (like a picture) and does something. Is it not possible to make this SDK "crash-proof"? By crash I primarily mean forceful termination by the OS upon memory access violation, due to invalid input passed by the user (like an abnormally short junk data).
I have no experience with C++, but when I googled, I found several means that sounded like a solution (use a vector instead of an array, configure the compiler so that automatic bounds check is performed, etc.).
When I presented this to the developer, he said it is still not possible.. Not that I don't believe him, but if so, how is language like Java handling this? I thought the JVM performs everytime a bounds check. If so, why can't one do the same thing in C++ manually?
UPDATE
By "Crash proof" I don't mean that the application does not terminate. I mean it should not abruptly terminate without information of what happened (I mean it will dump core etc., but is it not possible to display a message like "Argument x was not valid" etc.?)

You can check the bounds of an array in C++, std::vector::at does this automatically.
This doesn't make your app crash proof, you are still allowed to deliberately shoot yourself in the foot but nothing in C++ forces you to pull the trigger.

No. Even assuming your code is bug free. For one, I have looked at many a crash reports automatically submitted and I can assure you that the quality of the hardware out there is much bellow what most developers expect. Bit flips are all too common on commodity machines and cause random AVs. And, even if you are prepared to handle access violations, there are certain exceptions that the OS has no choice but to terminate the process, for example failure to commit a stack guard page.

By crash I primarily mean forceful termination by the OS upon memory access violation, due to invalid input passed by the user (like an abnormally short junk data).
This is what usually happens. If you access some invalid memory usually OS aborts your program.
However the question what is invalid memory... You may freely fill with garbage all the memory in heap and stack and this is valid from OS point of view, it would not be valid from your point of view as you created garbage.
Basically - you need to check the input data carefully and relay on this. No OS would do this for you.
If you check your input data carefully you would likely to manage the data ok.

I primarily mean forceful termination
by the OS upon memory access
violation, due to invalid input passed
by the user
Not sure who "the user" is.
You can write programs that won't crash due to invalid end-user input. On some systems, you can be forcefully terminated due to using too much memory (or because some other program is using too much memory). And as Remus says, there is no language which can fully protect you against hardware failures. But those things depend on factors other than the bytes of data provided by the user.
What you can't easily do in C++ is prove that your program won't crash due to invalid input, or go wrong in even worse ways, creating serious security flaws. So sometimes[*] you think that your code is safe against any input, but it turns out not to be. Your developer might mean this.
If your code is a function that takes for example a pointer to the image data, then there's nothing to stop the caller passing you some invalid pointer value:
char *image_data = malloc(1);
free(image_data);
image_processing_function(image_data);
So the function on its own can't be "crash-proof", it requires that the rest of the program doesn't do anything to make it crash. Your developer also might mean this, so perhaps you should ask him to clarify.
Java deals with this specific issue by making it impossible to create an invalid reference - you don't get to manually free memory in Java, so in particular you can't retain a reference to it after doing so. It deals with a lot of other specific issues in other ways, so that the situations which are "undefined behavior" in C++, and might well cause a crash, will do something different in Java (probably throw an exception).
[*] let's face it: in practice, in large software projects, "often".

I think this is a case of C++ codes not being managed codes.
Java, C# codes are managed, that is they are effectively executed by an Interpreter which is able to perform bound checking and detect crash conditions.
With the case of C++, you need to perform bound and other checking yourself. However, you have the luxury of using Exception Handling, which will prevent crash during events beyond your control.
The bottom line is, C++ codes themselves are not crash proof, but a good design and development can make them to be so.

In general, you can't make a C++ API crash-proof, but there are techniques that can be used to make it more robust. Off the top of my head (and by no means exhaustive) for your particular example:
Sanity check input data where possible
Buffer limit checks in the data processing code
Edge and corner case testing
Fuzz testing
Putting problem inputs in the unit test for regression avoidance

If "crash proof" only mean that you want to ensure that you have enough information to investigate crash after it occurred solution can be simple. Most cases when debugging information is lost during crash resulted from corruption and/or loss of stack data due to illegal memory operation by code running in one of threads. If you have few places where you call library or SDK that you don't trust you can simply save the stack trace right before making call into that library at some memory location pointed to by global variable that will be included into partial or full memory dump generated by system when your application crashes. On windows such functionality provided by CrtDbg API.On Linux you can use backtrace API - just search doc on show_stackframe(). If you loose your stack information you can then instruct your debugger to use that location in memory as top of the stack after you loaded your dump file. Well it is not very simple after all, but if you haunted by memory dumps without any clue what happened it may help.
Another trick often used in embedded applications is cycled memory buffer for detailed logging. Logging to the buffer is very cheap since it is never saved, but you can get idea on what happen milliseconds before crash by looking at content of the buffer in your memory dump after the crash.

Actually, using bounds checking makes your application more likely to crash!
This is good design because it means that if your program is working, it's that much more likely to be working /correctly/, rather than working incorrectly.
That said, a given application can't be made "crash proof", strictly speaking, until the Halting Problem has been solved. Good luck!

How to guarantee files that are decrypted during run time are cleaned up?

Using C or C++, After I decrypt a file to disk- how can I guarantee it is deleted if the application crashes or the system powers off and can't clean it up properly? Using C or C++, on Windows and Linux?

Unfortunately, there's no 100% foolproof way to insure that the file will be deleted in case of a full system crash. Think about what happens if the user just pulls the plug while the file is on disk. No amount of exception handling will protect you from that (the worst) case.
The best thing you can do is not write the decrypted file to disk in the first place. If the file exists in both its encrypted and decrypted forms, that's a point of weakness in your security.
The next best thing you can do is use Brian's suggestion of structured exception handling to make sure the temporary file gets cleaned up. This won't protect you from all possibilities, but it will go a long way.
Finally, I suggest that you check for temporary decrypted files on start-up of your application. This will allow you to clean up after your application in case of a complete system crash. It's not ideal to have those files around for any amount of time, but at least this will let you get rid of them as quickly as possible.

Don't write the file decrypted to disk at all.
If the system is powerd off the file is still on disk, the disk and therefore the file can be accessed.
Exception would be the use of an encrypted file system, but this is out of control of your program.

I don't know if this works on Windows, but on Linux, assuming that you only need one process to access the decrypted file, you can open the file, and then call unlink() to delete the file. The file will continue to exist as long as the process keeps it open, but when it is closed, or the process dies, the file will no longer be accessible.
Of course the contents of the file are still on the disk, so really you need more than just deleting it, but zeroing out the contents. Is there any reason that the decrypted file needs to be on disk (size?). Better would just to keep the decrypted version in memory, preferably marked as unswappable, so it never hits the disk.

Try to avoid it completely:
If the file is sensitive, the best bet is to not have it written to disk in a decrypted format in the first place.
Protecting against crashes: Structured exception handling:
However, you could add structured exception handling to catch any crashes.
__try and __except
What if they pull the plug?:
There is a way to protect against this...
If you are on windows, you can use MoveFileEx and the option MOVEFILE_DELAY_UNTIL_REBOOT with a destination of NULL to delete the file on the next startup. This will protect against accidental computer shutdown with an undeleted file. You can also ensure that you have an exclusively opened handle to this file (specify no sharing rights such as FILE_SHARE_READ and use CreateFile to open it). That way no one will be able to read from it.
Other ways to avoid the problem:
All of these are not excuses for having a decrypted file on disk, but:
You could also consider writing to a file that is larger than MAX_PATH via file syntax of \\?\. This will ensure that the file is not browsable by windows explorer.
You should set the file to have the temporary attribute
You should set the file to have the hidden attribute

In C (and so, I assume, in C++ too), as long as your program doesn't crash, you could register an atexit() handler to do the cleanup. Just avoid using _exit() or _Exit() since those bypass the atexit() handlers.
As others pointed out, though, it is better to avoid having the decrypted data written to disk. And simply using unlink() (or equivalent) is not sufficient; you need to rewrite some other data over the original data. And journalled file systems make that very difficult.

A process cannot protect or watch itself. Your only possibility is to start up a second process as a kind of watchdog, which regularly checks the health of the decrypting other process. If the other process crashes, the watchdog will notice and delete the file itself.
You can do that using hearth-beats (regular polling of the other process to see whether it's still alive), or using interrupts sent from the other process itself, which will trigger a timeout if it has crashed.
You could use sockets to make the connection between the watchdog and your app work, for example.
It's becoming clear that you need some locking mechanism to prevent swapping to the pagefile / swap-partition. On Posix Systems, this can be done by the m(un)lock* family of functions.

There's a problem with deleting the file. It's not really gone.
When you delete files off your hard drive (not counting the recycle bin) the file isn't really gone. Just the pointer to the file is removed.
Ever see those spy movies where they overwrite the hard drive 6, 8,24 times and that's how they know that it's clean.. Well they do that for a reason.
I'd make every effort to not store the file's decrypted data. Or if you must, make it small amounts of data. Even, disjointed data.
If you must, then they try catch should protect you a bit.. Nothing can protect from the power outage though.
Best of luck.

Check out tmpfile().
It is part of BSD UNIX not sure if it is standard.
But it creates a temporary file and automatically unlinks it so that it will be deleted on close.
Writing to the file system (even temporarily) is insecure.
Do that only if you really have to.
Optionally you could create an in-memory file system.
Never used one myself so no recommendations but a quick google found a few.

In C++ you should use an RAII tactic:
class Clean_Up_File {
std::string filename_;
public Clean_Up_File(std::string filename) { ... } //open/create file
public ~Clean_Up_File() { ... } //delete file
}
int main()
{
Clean_Up_File file_will_be_deleted_on_program_exit("my_file.txt");
}
RAII helps automate a lot of cleanup. You simply create an object on the stack, and have that object do clean up at the end of its lifetime (in the destructor which will be called when the object falls out of scope). ScopeGuard even makes it a little easier.
But, as others have mentioned, this only works in "normal" circumstances. If the user unplugs the computer you can't guarantee that the file will be deleted. And it may be possible to undelete the file (even on UNIX it's possible to "grep the harddrive").
Additionally, as pointed out in the comments, there are some cases where objects don't fall out of scope (for instance, the std::exit(int) function exits the program without leaving the current scope), so RAII doesn't work in those cases. Personally, I never call std::exit(int), and instead I either throw exceptions (which will unwind the stack and call destructors; which I consider an "abnormal exit") or return an error code from main() (which will call destructors and which I also consider an "abnormal exit"). IIRC, sending a SIGKILL also does not call destructors, and SIGKILL can't be caught, so there you're also out of luck.

This is a tricky topic. Generally, you don't want to write decrypted files to disk if you can avoid it. But keeping them in memory doesn't always guarentee that they won't be written to disk as part of a pagefile or otherwise.
I read articles about this a long time ago, and I remember there being some difference between Windows and Linux in that one could guarentee a memory page wouldn't be written to disk and one couldn't; but I don't remember clearly.
If you want to do your due diligence, you can look that topic up and read about it. It all depends on your threat model and what you're willing to protect against. After all, you can use compressed air to chill RAM and pull the encryption key out of that (which was actually on the new Christian Slater spy show, My Own Worst Enemy - which I thought was the best use of cutting edge, accurate, computer security techniques in media yet)

on Linux/Unix, use unlink as soon as you created the file. The file will be removed as soon as you program closes the file descriptor or exits.
Better yet, the file will be removed even if the whole system crashes - because it is basically removed as soon as you unlink it.
The data will not be physically deleted from the disk, of course, so it still may be available for hacking.

Remember that the computer could be powered down at any time. Then, somebody you don't like could boot up with a Linux live CD, and examine your disk in any level of detail desired without changing a thing. No system that writes plaintext to the disk can be secure against such attacks, and they aren't hard to do.
You could set up a function that will overwrite the file with ones and zeros repeatedly, preferably injecting some randomness, and set it up to run at end of program, or at exit. This will work, provided there are no hardware or software glitches, power failures, or other interruptions, and provided the file system writes to only the sectors it claims to be using (journalling file systems, for example, may leave parts of the file elsewhere).
Therefore, if you want security, you need to make sure no plaintext is written out, and that also means it cannot be written to swap space or the equivalent. Find out how to mark memory as unswappable on all platforms you're writing for. Make sure decryption keys and the like are treated the same way as plaintext: never written to the disk under any circumstances, and kept in unswappable memory.
Then, your system should be secure against attacks short of hostiles breaking in, interrupting you, and freezing your RAM chips before powering down, so they don't lose their contents before being transferred for examination. Or authorities demanding your key, legally (check your local laws here) or illegally.
Moral of the story: real security is hard.

The method that I am going to implement will be to stream the decryption- so that the only part that is in memory is the part that is decrypted during the read as the data is being used. Here is a diagram of the pipeline:
This will be a streamed implementation, so the only data that is in memory is the data that I am consuming in the application at any given point. This makes some things tricky- considering a lot of traditional file tricks are no longer available, but since the implementation will be stream based i will still be able to seek to different points of the file which would be translated to the crypt stream to decrypt at different sections.
Basically, it will be encrypting blocks of the file at a time - so then if I try to seek to a certain point it will decrypt that block to read. When I read past a block it decrypts the next block and releases the previous (within the crypt stream).
This implementation does not require me to decrypt to a file or to memory and is compatible with other stream consumers and providers (fstream).
This is my 'plan'. I have not done this type of work with fstream before and I will likely be posting a question as soon as I am ready to work on this.
Thanks for all the other answers- it was very informative.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js