Is it necessary to close files after reading (only) in any programming language? - readfile

I read that a program should close files after writing to them in case there is still data in the write buffer not yet physically written to it. I also read that some languages such as Python automatically close all files that go out of scope, such as when the program ends.
But if I'm merely reading a file and not modifying it in any way, maybe except the OS changing its last-access date, is there ever a need to close it (even if the program never terminates, such as a daemon that monitors a log file)?
(Why is it necessary to close a file after using it? asks about file access in general, not only for reading.)

In general, you should always close a file after you are done using it.
Reason number 1: There are not unlimited available File Descriptors
(or in windows, the conceptually similar HANDLES).
Every time you access a file ressource, even for reading, you are reducing the number of handles (or FD's) available to other processes.
every time you close a handle, you release it and makes it available for other processes.
Now consider the consequences of a loop that opens a file, reads it, but doesn't close it...
http://en.wikipedia.org/wiki/File_descriptor
https://msdn.microsoft.com/en-us/library/windows/desktop/aa364225%28v=vs.85%29.aspx
Reason number 2: If you are doing anything else than reading a file, there are problems with race conditions, if multiple processes or threads accesses the same file..
To avoid this, you may find file locks in place.
http://en.wikipedia.org/wiki/File_locking
if you are reading a file, and not closing it afterward, other applications, that could try to obtain a file lock are denied access.
oh - and the file can't be deleted by anyone that doesn't have rights to kill your process..
Reason number 3: There is absolutely no reason to leave a file unclosed. In any language, which is why Python helps the lazy programmers, and automatically closes a handle that drops out of scope, in case the programmer forgot.

Yes, it's better to close file after reading is completed.
That's necessary because the other software might request exclusive access to that file. If file is still opened then such request will fail.

Not closing a file will result in unnecessary resources being taken from the system (File Descriptors on Unix and Handles on windows). Especially when a bug happens in some sort of loop or a system is never turned off, this gets important. Some languages manage unclosed files themselves when they for example run out of scope, others don't or only at some random time when it is checked (like the garbage collector in Java).
Imagine you have some sort of system that needs to run forever. For example a server. Then unclosed files can consume more and more resources, till ultimately all space is used by unclosed files.
In order to read a file you have to open it. So independent of what you do with a file, space will be reserved for the file. So far I tried to explain the importance of closing a file for resources, it's also important that you as a programmer know when an object (file) could be closed since no further use will be required. I think it's bad practice to not be at-least aware of unclosed files, and it's good practice to close files if no further use is required.
Some applications also require only access to a file, so require no other applications to have the file open. For example when you're trying to empty your recycle bin or move a file which you still have open on windows. (This is referred to as file locking). When you still have the file open windows won't let you throw away or move the files. This is just an example of when it would be annoying that a file is open while it should (rather) not (be). The example happens to me daily.

Related

How Linux handles the case when multiple processes try to replace the same file at the same time?

I know this is a bit of theoretical question but haven't got any satisfactory answer yet. So thought to put this question here.
I have multiple C++ processes (would also like to know thread behaviour) which contend to replace the same file at the same time. How much is it safe to do in Linux (Using Ubuntu 14.04 and Centos 7)? Do I need to put locks?
Thanks in advance.
The filesystems of Unix-based OS's like Linux are designed around the notion of inodes, which are internal records describing various metadata about the file. Normally these aren't interacted with directly by users or programs, but their presence gives these filesystems a level of indirection that allows them to provide some useful semantics that other OS's (read: Windows) cannot.
filename --> inode --> data
In particular, when a file gets deleted, what's actually happening is the separation of the file's inode from its filename; not (necessarily) the deletion of the file's data itself. That is, the file and its contents can continue to exist (albeit invisibly, from the user's point of view) until all processes have closed their file-handles that were open on that file; once the inode is no longer accessible to any process, only then will the filesystem actually mark the file's data-blocks as free and available-for-reuse. In the meantime, the filename becomes available for another file's inode (and data) to be associated with, even though the old file's inode/data still technically exists.
The upshot of that is that under Linux it's perfectly valid to delete (or rename) a file at any time, even if other threads/processes are in the middle of using it; your delete will succeed, and any other programs that have that file open at that instant can simply continue reading/writing/using it, exactly as if it hadn't been deleted. The only thing that is different is that the filename will no longer appear in its directory, and when they call fclose() (or close() or etc) on the file, the file's data will go away.
Since doing mv new.txt old.txt is essentially the same as doing a rm old.txt ; mv new.txt old.txt, there should be no problems with doing this from multiple threads without any synchronization. (note that the slightly different situation of having multiple threads or processes opening the same file simultaneously and writing into it at the same time is a bit more perilous; nothing will crash, but it would be easy for them to overwrite each other's data and corrupt the file, if they aren't careful)
It depends a lot on exactly what you're going to be doing and how you're using the files. In general, in Unix/Posix systems like Linux, all file calls will succeed if multiple processes make them, and the general way the OS handles contention is "the last one to do something wins". Essentially, all modifications to the filesystem are serialized, so the filesystem is always in a consistent state. But otherwise it's a free-for-all.
There are a lot of details here though. There's flags used in opening a file like O_EXCL that can result in failure if another process did it first (a sort of lock). There's advisory (aka, nobody is forced by the OS to pay attention to them) locking systems like flock (try typing man 2 flock to learn more) for file contents. There are more Linux specific mandatory locking system.
And there are also details like "What happens if someone deleted a file I have open?" that the other answer explains correctly and well.
And lastly, there's a whole mess of detail surrounding whether it's guaranteed that any particular change to the filesystem is recorded for all eternity, or whether it has a chance of disappearing if someone flicks the power switch. And that's a mess-and-a-half once you really dive into it, between dodgy hardware that lies to the OS about things to the confusing morass of different Linux system calls covering different aspects of this problem, often entering Linux from different eras of Unix/Posix history and interacting with each other in strange and arcane ways.
So, an answer to your very general and open-ended question is going to have to necessarily be vague, abstract, and hand-wavey.

Linux non-persistent backing store for mmap()

First, a little motivating background info: I've got a C++-based server process that runs on an embedded ARM/Linux-based computer. It works pretty well, but as part of its operation it creates a fairly large fixed-size array (e.g. dozens to hundreds of megabytes) of temporary/non-persistent state information, which it currently keeps on the heap, and it accesses and/or updates that data from time to time.
I'm investigating how far I can scale things up, and one problem I'm running into is that eventually (as I stress-test the server by making its configuration larger and larger), this data structure gets big enough to cause out-of-memory problems, and then the OOM killer shows up, and general unhappiness ensues. Note that this embedded configuration of Linux doesn't have swap enabled, and I can't (easily) enable a swap partition.
One idea I have on how to ameliorate the issue is to allocate this large array on the computer's local flash partition, instead of directly in RAM, and then use mmap() to make it appear to the server process like it's still in RAM. That would reduce RAM usage considerably, and my hope is that Linux's filesystem-cache would mask most of the resulting performance cost.
My only real concern is file management -- in particular, I'd like to avoid any chance of filling up the flash drive with "orphan" backing-store files (i.e. old files whose processes don't exist any longer, but the file is still present because its creating process crashed or by some other mistake forgot to delete it on exit). I'd also like to be able to run multiple instances of the server simultaneously on the same computer, without the instances interfering with each other.
My question is, does Linux have any built-it facility for handling this sort of use-case? I'm particularly imagining some way to flag a file (or an mmap() handle or similar) so that when the file that created the process exits-or-crashes, the OS automagically deletes the file (similar to the way Linux already automagically recovers all of the RAM that was allocated by a process, when the process exits-or-crashes).
Or, if Linux doesn't have any built-in auto-temp-file-cleanup feature, is there a "best practice" that people use to ensure that large temporary files don't end up filling up a drive due to unintentionally becoming persistent?
Note that AFAICT simply placing the file in /tmp won't help me, since /tmp is using a RAM-disk and therefore doesn't give me any RAM-usage advantage over simply allocating in-process heap storage.
Yes, and I do this all the time...
open the file, unlink it, use ftruncate or (better) posix_fallocate to make it the right size, then use mmap with MAP_SHARED to map it into your address space. You can then close the descriptor immediately if you want; the memory mapping itself will keep the file around.
For speed, you might find you want to help Linux manage its page cache. You can use posix_madvise with POSIX_MADV_WILLNEED to advise the kernel to page data in and POSIX_MADV_DONTNEED to advise the kernel to release the pages.
You might find that last does not work the way you want, especially for dirty pages. You can use sync_file_range to explicitly control flushing to disk. (Although in that case you will want to keep the file descriptor open.)
All of this is perfectly standard POSIX except for the Linux-specific sync_file_range.
Yes, You create/open the file. Then you remove() the file by its filename.
The file will still be open by your process and you can read/write it just like any opened file, and it will disappear when the process having the file opened exits.
I believe this behavior is mandated by posix, so it will work on any unix like system. Even at a hard reboot, the space will be reclaimed.
I believe this is filesystem-specific, but most Linux filesystems allow deletion of open files. The file will still exist until the last handle to it is closed. I would recommend that you open the file then delete it immediately and it will be automatically cleaned up when your process exits for any reason.
For further details, see this post: What happens to an open file handle on Linux if the pointed file gets moved, delete

With what API do you perform a read-consistent file operation in OS X, analogous to Windows Volume Shadow Service

We're writing a C++/Objective C app, runnable on OSX from versions 10.7 to present (10.11).
Under windows, there is the concept of a shadow file, which allows you read a file as it exists at a certain point in time, without having to worry about other processes writing to that file in the interim.
However, I can't find any documentation or online articles discussing a similar feature in OS X. I know that OS X will not lock a file when it's being written to, so is it necessary to do something special to make sure I don't pick up a file that is in the middle of being modified?
Or does the Journaled Filesystem make any special handling unnecessary? I'm concerned that if I have one process that is creating or modifying files (within a single context of, say, an fopen call - obviously I can't be guaranteed of "completeness" if the writing process is opening and closing a file repeatedly during what should be an atomic operation), that a reading process will end up getting a "half-baked" file.
And if JFS does guarantee that readers only see "whole" files, does this extend to Fat32 volumes that may be mounted as external drives?
A few things:
On Unix, once you open a file, if it is replaced (as opposed to modified), your file descriptor continues to access the file you opened, not its replacement.
Many apps will replace rather than modify files, using things like -[NSData writeToFile:atomically:] with YES for atomically:.
Cocoa and the other high-level frameworks do, in fact, lock files when they write to them, but that locking is advisory not mandatory, so other programs also have to opt in to the advisory locking system to be affected by that.
The modern approach is File Coordination. Again, this is a voluntary system that apps have to opt in to.
There is no feature quite like what you described on Windows. If the standard approaches aren't sufficient for your needs, you'll have to build something custom. For example, you could make a copy of the file that you're interested in and, after your copy is complete, compare it to the original to see if it was being modified as you were copying it. If the original has changed, you'll have to start over with a fresh copy operation (or give up). You can use File Coordination to at least minimize the possibility of contention from cooperating programs.

fopen/fclose performance

Normally, Most of the product has log file mechanism implemented. What is the best practice to write debug log file in terms of fopen / fclose performance. Does keeping file pointer open (if logger is enable) is good option or frequent open and close file pointers every time when some statement need to write into log file ?
Ideally, you shouldn't close the log file until the end. Usually many products also have some "crash handling" mechanism, that would be called in any case before application terminates. This will be the best place to close the logfile.
For windows, you can check SetUnhandledExceptionFilter
Usually, it depends on what kind of file you're interacting with. Binary files must be open while reading, of course. You can parse text into a string and close the file right away if the file is plain text.
I'd like to say you should keep the file pointer open during the whole program lifetime, but in the case you experience a crash, it'll never get closed. Then I will suggest keeping it open during relatively safe and low-risk operations, but open/close it constantly when you're partial on the application's stability. Nobody wants file pointers left open. :)
Based on the comment that you can use fstream. Just open an ofstream at the beginning and it will get closed when the program ends. Using fstream instead of C-style fopen/close make a lot of things easier as it conforms to the RAII principle, so you do not have to worry about file closing. There is probably no reason to close the file manually and open it again every time you want to log something. If you don't want to write to the same file from another program at the same time, opening and closing the file periodically is just waste of time and code.
I usually log to memory (a variable), and only write it to file once, when the app finishes. Of course, this won't do much good if the app crashes, or you need to read the log while the program is running, but that is not usually a problem for me, and the performance is superior. Depends on your needs, I guess.

How to guarantee files that are decrypted during run time are cleaned up?

Using C or C++, After I decrypt a file to disk- how can I guarantee it is deleted if the application crashes or the system powers off and can't clean it up properly? Using C or C++, on Windows and Linux?
Unfortunately, there's no 100% foolproof way to insure that the file will be deleted in case of a full system crash. Think about what happens if the user just pulls the plug while the file is on disk. No amount of exception handling will protect you from that (the worst) case.
The best thing you can do is not write the decrypted file to disk in the first place. If the file exists in both its encrypted and decrypted forms, that's a point of weakness in your security.
The next best thing you can do is use Brian's suggestion of structured exception handling to make sure the temporary file gets cleaned up. This won't protect you from all possibilities, but it will go a long way.
Finally, I suggest that you check for temporary decrypted files on start-up of your application. This will allow you to clean up after your application in case of a complete system crash. It's not ideal to have those files around for any amount of time, but at least this will let you get rid of them as quickly as possible.
Don't write the file decrypted to disk at all.
If the system is powerd off the file is still on disk, the disk and therefore the file can be accessed.
Exception would be the use of an encrypted file system, but this is out of control of your program.
I don't know if this works on Windows, but on Linux, assuming that you only need one process to access the decrypted file, you can open the file, and then call unlink() to delete the file. The file will continue to exist as long as the process keeps it open, but when it is closed, or the process dies, the file will no longer be accessible.
Of course the contents of the file are still on the disk, so really you need more than just deleting it, but zeroing out the contents. Is there any reason that the decrypted file needs to be on disk (size?). Better would just to keep the decrypted version in memory, preferably marked as unswappable, so it never hits the disk.
Try to avoid it completely:
If the file is sensitive, the best bet is to not have it written to disk in a decrypted format in the first place.
Protecting against crashes: Structured exception handling:
However, you could add structured exception handling to catch any crashes.
__try and __except
What if they pull the plug?:
There is a way to protect against this...
If you are on windows, you can use MoveFileEx and the option MOVEFILE_DELAY_UNTIL_REBOOT with a destination of NULL to delete the file on the next startup. This will protect against accidental computer shutdown with an undeleted file. You can also ensure that you have an exclusively opened handle to this file (specify no sharing rights such as FILE_SHARE_READ and use CreateFile to open it). That way no one will be able to read from it.
Other ways to avoid the problem:
All of these are not excuses for having a decrypted file on disk, but:
You could also consider writing to a file that is larger than MAX_PATH via file syntax of \\?\. This will ensure that the file is not browsable by windows explorer.
You should set the file to have the temporary attribute
You should set the file to have the hidden attribute
In C (and so, I assume, in C++ too), as long as your program doesn't crash, you could register an atexit() handler to do the cleanup. Just avoid using _exit() or _Exit() since those bypass the atexit() handlers.
As others pointed out, though, it is better to avoid having the decrypted data written to disk. And simply using unlink() (or equivalent) is not sufficient; you need to rewrite some other data over the original data. And journalled file systems make that very difficult.
A process cannot protect or watch itself. Your only possibility is to start up a second process as a kind of watchdog, which regularly checks the health of the decrypting other process. If the other process crashes, the watchdog will notice and delete the file itself.
You can do that using hearth-beats (regular polling of the other process to see whether it's still alive), or using interrupts sent from the other process itself, which will trigger a timeout if it has crashed.
You could use sockets to make the connection between the watchdog and your app work, for example.
It's becoming clear that you need some locking mechanism to prevent swapping to the pagefile / swap-partition. On Posix Systems, this can be done by the m(un)lock* family of functions.
There's a problem with deleting the file. It's not really gone.
When you delete files off your hard drive (not counting the recycle bin) the file isn't really gone. Just the pointer to the file is removed.
Ever see those spy movies where they overwrite the hard drive 6, 8,24 times and that's how they know that it's clean.. Well they do that for a reason.
I'd make every effort to not store the file's decrypted data. Or if you must, make it small amounts of data. Even, disjointed data.
If you must, then they try catch should protect you a bit.. Nothing can protect from the power outage though.
Best of luck.
Check out tmpfile().
It is part of BSD UNIX not sure if it is standard.
But it creates a temporary file and automatically unlinks it so that it will be deleted on close.
Writing to the file system (even temporarily) is insecure.
Do that only if you really have to.
Optionally you could create an in-memory file system.
Never used one myself so no recommendations but a quick google found a few.
In C++ you should use an RAII tactic:
class Clean_Up_File {
std::string filename_;
public Clean_Up_File(std::string filename) { ... } //open/create file
public ~Clean_Up_File() { ... } //delete file
}
int main()
{
Clean_Up_File file_will_be_deleted_on_program_exit("my_file.txt");
}
RAII helps automate a lot of cleanup. You simply create an object on the stack, and have that object do clean up at the end of its lifetime (in the destructor which will be called when the object falls out of scope). ScopeGuard even makes it a little easier.
But, as others have mentioned, this only works in "normal" circumstances. If the user unplugs the computer you can't guarantee that the file will be deleted. And it may be possible to undelete the file (even on UNIX it's possible to "grep the harddrive").
Additionally, as pointed out in the comments, there are some cases where objects don't fall out of scope (for instance, the std::exit(int) function exits the program without leaving the current scope), so RAII doesn't work in those cases. Personally, I never call std::exit(int), and instead I either throw exceptions (which will unwind the stack and call destructors; which I consider an "abnormal exit") or return an error code from main() (which will call destructors and which I also consider an "abnormal exit"). IIRC, sending a SIGKILL also does not call destructors, and SIGKILL can't be caught, so there you're also out of luck.
This is a tricky topic. Generally, you don't want to write decrypted files to disk if you can avoid it. But keeping them in memory doesn't always guarentee that they won't be written to disk as part of a pagefile or otherwise.
I read articles about this a long time ago, and I remember there being some difference between Windows and Linux in that one could guarentee a memory page wouldn't be written to disk and one couldn't; but I don't remember clearly.
If you want to do your due diligence, you can look that topic up and read about it. It all depends on your threat model and what you're willing to protect against. After all, you can use compressed air to chill RAM and pull the encryption key out of that (which was actually on the new Christian Slater spy show, My Own Worst Enemy - which I thought was the best use of cutting edge, accurate, computer security techniques in media yet)
on Linux/Unix, use unlink as soon as you created the file. The file will be removed as soon as you program closes the file descriptor or exits.
Better yet, the file will be removed even if the whole system crashes - because it is basically removed as soon as you unlink it.
The data will not be physically deleted from the disk, of course, so it still may be available for hacking.
Remember that the computer could be powered down at any time. Then, somebody you don't like could boot up with a Linux live CD, and examine your disk in any level of detail desired without changing a thing. No system that writes plaintext to the disk can be secure against such attacks, and they aren't hard to do.
You could set up a function that will overwrite the file with ones and zeros repeatedly, preferably injecting some randomness, and set it up to run at end of program, or at exit. This will work, provided there are no hardware or software glitches, power failures, or other interruptions, and provided the file system writes to only the sectors it claims to be using (journalling file systems, for example, may leave parts of the file elsewhere).
Therefore, if you want security, you need to make sure no plaintext is written out, and that also means it cannot be written to swap space or the equivalent. Find out how to mark memory as unswappable on all platforms you're writing for. Make sure decryption keys and the like are treated the same way as plaintext: never written to the disk under any circumstances, and kept in unswappable memory.
Then, your system should be secure against attacks short of hostiles breaking in, interrupting you, and freezing your RAM chips before powering down, so they don't lose their contents before being transferred for examination. Or authorities demanding your key, legally (check your local laws here) or illegally.
Moral of the story: real security is hard.
The method that I am going to implement will be to stream the decryption- so that the only part that is in memory is the part that is decrypted during the read as the data is being used. Here is a diagram of the pipeline:
This will be a streamed implementation, so the only data that is in memory is the data that I am consuming in the application at any given point. This makes some things tricky- considering a lot of traditional file tricks are no longer available, but since the implementation will be stream based i will still be able to seek to different points of the file which would be translated to the crypt stream to decrypt at different sections.
Basically, it will be encrypting blocks of the file at a time - so then if I try to seek to a certain point it will decrypt that block to read. When I read past a block it decrypts the next block and releases the previous (within the crypt stream).
This implementation does not require me to decrypt to a file or to memory and is compatible with other stream consumers and providers (fstream).
This is my 'plan'. I have not done this type of work with fstream before and I will likely be posting a question as soon as I am ready to work on this.
Thanks for all the other answers- it was very informative.