I'm learning C/C++ right now and I am reading about file operations. Suppose a program A is working with an external file (say, a text file) and another another program B is, say, trying to move the file (or worse, delete it). Is it possible to tell the OS to inform the program B that the file is in use, even though it was not created by program A?
What you're trying to do is called file locking. Search for "file locking in C".
A file is a resource.
If you happen to open one in C/C++, or any other language for that matter, the OS "lends" this file to your program. While you have control of a file (resource) the OS prevents other processes from taking control over it (i.e. moving the file, deleting the file, etc.).
This is why it's important to close a file after you're done working with it. This tells the OS that you no longer control this resource and other processes can fully access them.
Related
When I open a file in C, I get a file descriptor, if I had not read it's contents, and then someone modifies the file, will I read the old file or the new file?
Let's say a file has lots of lines, what happens that while reading the file, someone edits the beginning, will this somehow corrupt how my file reads the file?
How do programs don't get corrupted while the file is being read? Is the OS that takes care of this problem? If I can still read the old data, where is this data being stored?
The man page of open, has some information about the internals of open, but it is not very clear to me.
The C language standard doesn't acknowledge the existence of other processes nor specify interaction between them and the program (nor does C++). The behaviour depends on the operating system and / or the file system.
Generally, it is safest to assume that file operations are not atomic and therefore accessing a file while another process is editing it would be an example of race condition. Some systems may provide some stricter guarantees.
A general approach to attempt avoiding problems is file locking. The standard C library does not have an API for file locking, but multitasking operating systems generally do.
All this depends heavily on the OS, not at the C++ level. In Windows, for example, opening the file with CreateFile allows you to lock the file for subsequent access. But not at the language level.
You must decide based on the specific OS you work with. There are no assumptions; it all depends on the documentation you are provided with.
Generally, C++ level documentation is not much useful at such problems because there can never be a full standard for something so low level as file access (even the fs was only recently added to C++) and there is no point creating 'portable' code on such. You must make it a habit to immerse in the OS specific documentation and libraries.
We're writing a C++/Objective C app, runnable on OSX from versions 10.7 to present (10.11).
Under windows, there is the concept of a shadow file, which allows you read a file as it exists at a certain point in time, without having to worry about other processes writing to that file in the interim.
However, I can't find any documentation or online articles discussing a similar feature in OS X. I know that OS X will not lock a file when it's being written to, so is it necessary to do something special to make sure I don't pick up a file that is in the middle of being modified?
Or does the Journaled Filesystem make any special handling unnecessary? I'm concerned that if I have one process that is creating or modifying files (within a single context of, say, an fopen call - obviously I can't be guaranteed of "completeness" if the writing process is opening and closing a file repeatedly during what should be an atomic operation), that a reading process will end up getting a "half-baked" file.
And if JFS does guarantee that readers only see "whole" files, does this extend to Fat32 volumes that may be mounted as external drives?
A few things:
On Unix, once you open a file, if it is replaced (as opposed to modified), your file descriptor continues to access the file you opened, not its replacement.
Many apps will replace rather than modify files, using things like -[NSData writeToFile:atomically:] with YES for atomically:.
Cocoa and the other high-level frameworks do, in fact, lock files when they write to them, but that locking is advisory not mandatory, so other programs also have to opt in to the advisory locking system to be affected by that.
The modern approach is File Coordination. Again, this is a voluntary system that apps have to opt in to.
There is no feature quite like what you described on Windows. If the standard approaches aren't sufficient for your needs, you'll have to build something custom. For example, you could make a copy of the file that you're interested in and, after your copy is complete, compare it to the original to see if it was being modified as you were copying it. If the original has changed, you'll have to start over with a fresh copy operation (or give up). You can use File Coordination to at least minimize the possibility of contention from cooperating programs.
I read that a program should close files after writing to them in case there is still data in the write buffer not yet physically written to it. I also read that some languages such as Python automatically close all files that go out of scope, such as when the program ends.
But if I'm merely reading a file and not modifying it in any way, maybe except the OS changing its last-access date, is there ever a need to close it (even if the program never terminates, such as a daemon that monitors a log file)?
(Why is it necessary to close a file after using it? asks about file access in general, not only for reading.)
In general, you should always close a file after you are done using it.
Reason number 1: There are not unlimited available File Descriptors
(or in windows, the conceptually similar HANDLES).
Every time you access a file ressource, even for reading, you are reducing the number of handles (or FD's) available to other processes.
every time you close a handle, you release it and makes it available for other processes.
Now consider the consequences of a loop that opens a file, reads it, but doesn't close it...
http://en.wikipedia.org/wiki/File_descriptor
https://msdn.microsoft.com/en-us/library/windows/desktop/aa364225%28v=vs.85%29.aspx
Reason number 2: If you are doing anything else than reading a file, there are problems with race conditions, if multiple processes or threads accesses the same file..
To avoid this, you may find file locks in place.
http://en.wikipedia.org/wiki/File_locking
if you are reading a file, and not closing it afterward, other applications, that could try to obtain a file lock are denied access.
oh - and the file can't be deleted by anyone that doesn't have rights to kill your process..
Reason number 3: There is absolutely no reason to leave a file unclosed. In any language, which is why Python helps the lazy programmers, and automatically closes a handle that drops out of scope, in case the programmer forgot.
Yes, it's better to close file after reading is completed.
That's necessary because the other software might request exclusive access to that file. If file is still opened then such request will fail.
Not closing a file will result in unnecessary resources being taken from the system (File Descriptors on Unix and Handles on windows). Especially when a bug happens in some sort of loop or a system is never turned off, this gets important. Some languages manage unclosed files themselves when they for example run out of scope, others don't or only at some random time when it is checked (like the garbage collector in Java).
Imagine you have some sort of system that needs to run forever. For example a server. Then unclosed files can consume more and more resources, till ultimately all space is used by unclosed files.
In order to read a file you have to open it. So independent of what you do with a file, space will be reserved for the file. So far I tried to explain the importance of closing a file for resources, it's also important that you as a programmer know when an object (file) could be closed since no further use will be required. I think it's bad practice to not be at-least aware of unclosed files, and it's good practice to close files if no further use is required.
Some applications also require only access to a file, so require no other applications to have the file open. For example when you're trying to empty your recycle bin or move a file which you still have open on windows. (This is referred to as file locking). When you still have the file open windows won't let you throw away or move the files. This is just an example of when it would be annoying that a file is open while it should (rather) not (be). The example happens to me daily.
I have a C++ program which opens files in /tmp (on a *nix system) and reads their contents.
To do this, I am using:
ofstream dest;
dest.open(abs_path.c_str(), ios::app);
where abs_path is a string containing the absolute path to the file.
The problem is that some *nix programs create named pipes as files in /tmp. For example,
/tmp/vgdb-pipe-to-vgdb-from-23732-by-myusername-on-???
Is a pipe created by a debugging utility I am using.
In the documentation for ofstream, the open method it says that the method sets an error bit when opening the file fails. However, in my tests it instead hangs trying to open the file (which is actually a pipe) indefinitely. I assume this is because the file is locked by another program (probably the debugger).
So, how can I force ofstream::open to block for a finite amount of time, or not at all? It's easy enough to clean up gracefully if it fails, but it needs to actually fail first..
The simple answer is that you can't. filebuf::open (called by
ofstream) basically delegates to the OS, and supposed that the OS will
do the right thing. And the interface it supports is very, very
limited; many important options to open (O_SYNC, O_NONBLOCK, etc)
aren't mapped, and thus can't be used. The only solutions I've found to
this is either to use std::ostringstream, then write the string to the
file using system level calls, or to write my own streambuf, which
does what I want (much simpler than it sounds, since you typically only
need part of what filebuf offers—you often don't need
bidirectionality, seeking or code translation).
Neither of these solutions are portable, of course.
Finally, I'm not sure why you're writing into /tmp. By convention,
anything you put into /tmp should contain the process id. And for
security reasons, I'd always create a subdirectory, with the process id
in its name, and with very limited access rights, and create any
temporary files in it.
AFAIK, there is no such thing as non-blocking input defined by the C++ language. (There is a method std::streambuf::in_avail(), but still it can't help you)
You can consider using C method
int file_descr = open( "pipe_addr", O_RDONLY |O_NONBLOCK);
instead of std::ofstream
Both operations create an empty file and return the filename but mkstemp leaves the file open in exclusive mode and gives you the handle. Is there a safety benefit to the C-function? Does this imply that there is a safety hole in the command-line version?
As an aside, it is interesting that there are several related functions in the C api on Linux and most of them say "Don't use this function" (or similar) in their man page.
One trick with temporary files in Unix-like operating systems (including Linux and Mac OS) is as soon as you open the file, delete it. You can still access the file, but nobody else can see it or do anything to it, and it will go away as soon as you close the file, even if your program dies a horrible death.
As you can easily see from mktemp(1) source code, it essentially does nothing but calling mkstemp(3).
Exclusive mode in Linux means that function will fail if the file already exists, it does not guarantee locking. Other process can delete this file, create it again and fill it with data, despite the file handle being open(3) by your process.
There is no additional safety in C function compared to command line utility.
The most obvious difference between a system call and a command line function is that command line functions are used by people or their shell scripts, while system calls are always done by programs.
It would be quite hard to hand a person a file handle.
Regarding "safety", you should think about race conditions: several instances of one program call mkstemp at the same time, but each one is guaranteed to have a unique temporary file. If the program shelled out and called the command line version, it'd be almost impossible to guarantee this.