What happens if someone overwrites a file after I open it? - c++

When I open a file in C, I get a file descriptor, if I had not read it's contents, and then someone modifies the file, will I read the old file or the new file?
Let's say a file has lots of lines, what happens that while reading the file, someone edits the beginning, will this somehow corrupt how my file reads the file?
How do programs don't get corrupted while the file is being read? Is the OS that takes care of this problem? If I can still read the old data, where is this data being stored?
The man page of open, has some information about the internals of open, but it is not very clear to me.

The C language standard doesn't acknowledge the existence of other processes nor specify interaction between them and the program (nor does C++). The behaviour depends on the operating system and / or the file system.
Generally, it is safest to assume that file operations are not atomic and therefore accessing a file while another process is editing it would be an example of race condition. Some systems may provide some stricter guarantees.
A general approach to attempt avoiding problems is file locking. The standard C library does not have an API for file locking, but multitasking operating systems generally do.

All this depends heavily on the OS, not at the C++ level. In Windows, for example, opening the file with CreateFile allows you to lock the file for subsequent access. But not at the language level.
You must decide based on the specific OS you work with. There are no assumptions; it all depends on the documentation you are provided with.
Generally, C++ level documentation is not much useful at such problems because there can never be a full standard for something so low level as file access (even the fs was only recently added to C++) and there is no point creating 'portable' code on such. You must make it a habit to immerse in the OS specific documentation and libraries.

Related

How Linux handles the case when multiple processes try to replace the same file at the same time?

I know this is a bit of theoretical question but haven't got any satisfactory answer yet. So thought to put this question here.
I have multiple C++ processes (would also like to know thread behaviour) which contend to replace the same file at the same time. How much is it safe to do in Linux (Using Ubuntu 14.04 and Centos 7)? Do I need to put locks?
Thanks in advance.
The filesystems of Unix-based OS's like Linux are designed around the notion of inodes, which are internal records describing various metadata about the file. Normally these aren't interacted with directly by users or programs, but their presence gives these filesystems a level of indirection that allows them to provide some useful semantics that other OS's (read: Windows) cannot.
filename --> inode --> data
In particular, when a file gets deleted, what's actually happening is the separation of the file's inode from its filename; not (necessarily) the deletion of the file's data itself. That is, the file and its contents can continue to exist (albeit invisibly, from the user's point of view) until all processes have closed their file-handles that were open on that file; once the inode is no longer accessible to any process, only then will the filesystem actually mark the file's data-blocks as free and available-for-reuse. In the meantime, the filename becomes available for another file's inode (and data) to be associated with, even though the old file's inode/data still technically exists.
The upshot of that is that under Linux it's perfectly valid to delete (or rename) a file at any time, even if other threads/processes are in the middle of using it; your delete will succeed, and any other programs that have that file open at that instant can simply continue reading/writing/using it, exactly as if it hadn't been deleted. The only thing that is different is that the filename will no longer appear in its directory, and when they call fclose() (or close() or etc) on the file, the file's data will go away.
Since doing mv new.txt old.txt is essentially the same as doing a rm old.txt ; mv new.txt old.txt, there should be no problems with doing this from multiple threads without any synchronization. (note that the slightly different situation of having multiple threads or processes opening the same file simultaneously and writing into it at the same time is a bit more perilous; nothing will crash, but it would be easy for them to overwrite each other's data and corrupt the file, if they aren't careful)
It depends a lot on exactly what you're going to be doing and how you're using the files. In general, in Unix/Posix systems like Linux, all file calls will succeed if multiple processes make them, and the general way the OS handles contention is "the last one to do something wins". Essentially, all modifications to the filesystem are serialized, so the filesystem is always in a consistent state. But otherwise it's a free-for-all.
There are a lot of details here though. There's flags used in opening a file like O_EXCL that can result in failure if another process did it first (a sort of lock). There's advisory (aka, nobody is forced by the OS to pay attention to them) locking systems like flock (try typing man 2 flock to learn more) for file contents. There are more Linux specific mandatory locking system.
And there are also details like "What happens if someone deleted a file I have open?" that the other answer explains correctly and well.
And lastly, there's a whole mess of detail surrounding whether it's guaranteed that any particular change to the filesystem is recorded for all eternity, or whether it has a chance of disappearing if someone flicks the power switch. And that's a mess-and-a-half once you really dive into it, between dodgy hardware that lies to the OS about things to the confusing morass of different Linux system calls covering different aspects of this problem, often entering Linux from different eras of Unix/Posix history and interacting with each other in strange and arcane ways.
So, an answer to your very general and open-ended question is going to have to necessarily be vague, abstract, and hand-wavey.

With what API do you perform a read-consistent file operation in OS X, analogous to Windows Volume Shadow Service

We're writing a C++/Objective C app, runnable on OSX from versions 10.7 to present (10.11).
Under windows, there is the concept of a shadow file, which allows you read a file as it exists at a certain point in time, without having to worry about other processes writing to that file in the interim.
However, I can't find any documentation or online articles discussing a similar feature in OS X. I know that OS X will not lock a file when it's being written to, so is it necessary to do something special to make sure I don't pick up a file that is in the middle of being modified?
Or does the Journaled Filesystem make any special handling unnecessary? I'm concerned that if I have one process that is creating or modifying files (within a single context of, say, an fopen call - obviously I can't be guaranteed of "completeness" if the writing process is opening and closing a file repeatedly during what should be an atomic operation), that a reading process will end up getting a "half-baked" file.
And if JFS does guarantee that readers only see "whole" files, does this extend to Fat32 volumes that may be mounted as external drives?
A few things:
On Unix, once you open a file, if it is replaced (as opposed to modified), your file descriptor continues to access the file you opened, not its replacement.
Many apps will replace rather than modify files, using things like -[NSData writeToFile:atomically:] with YES for atomically:.
Cocoa and the other high-level frameworks do, in fact, lock files when they write to them, but that locking is advisory not mandatory, so other programs also have to opt in to the advisory locking system to be affected by that.
The modern approach is File Coordination. Again, this is a voluntary system that apps have to opt in to.
There is no feature quite like what you described on Windows. If the standard approaches aren't sufficient for your needs, you'll have to build something custom. For example, you could make a copy of the file that you're interested in and, after your copy is complete, compare it to the original to see if it was being modified as you were copying it. If the original has changed, you'll have to start over with a fresh copy operation (or give up). You can use File Coordination to at least minimize the possibility of contention from cooperating programs.

Modifying an reading big .txt file with MPI/c++?

I am using MPI together with C++. I want to read information from one file, modify it by some rule, and then write modified content in the same file. I am using temporary file which where I store modified content and at the end I overwrite it by these commands:
temp_file.open("temporary.txt",ios::in);
ofstream output_file(output_name,ios::out);
output_file<<temp_file.rdbuf();
output_file.flush();
temp_file.close();
output_file.close();
remove("temporary.txt");
This function which modify the file is executed by MPI process with rank 0. After exiting from function, MPI_Barrier(MPI_COMM_WORLD); is called to ensure synchronization.
And then, all MPI processes should read modified file and perform some computations. The problem is that, since file is too big, data are not completely written to file when execution of function is finished, and I get wrong results. I also tried to put sleep() command, but sometimes it works, sometimes it doesn't (it depends on the node where I perform computations). Is there general way to solve this problem?
I put MPI as a tag, but I think this problm is inherently connected with c++ standard and manipulating with storage. How to deal with this latency between writing in buffer aand writing in file on storage medium?
Fun topic. You are dealing with two or maybe three consistency semantics here.
POSIX consistency says essentially when a byte is written to a file, it's visible.
NFS consistency says "woah, that's way too hard. you write to this file and I'll make it visible whenever I feel like it. "
MPI-IO consistency semantics (which you aren't using, but are good to know) say that data is visible after specific synchronization events occur. Those two events are "close a file and reopen it" or "sync file, barrier, sync file again".
If you are using NFS, give up now. NFS is horrible. There are a lot of good parallel file systems you can use, several of which you can set up entirely in userspace (such as PVFS).
If you use MPI-IO here, you'll get more well-defined behavior, but the MPI-IO routines are more like C system calls than C++ iostream operators, so think more like open(2) read(2) write(2) and close(2). Text files are usually a headache to deal with but in your case where modifications are appended to file, that shouldn't be too bad.

Non-blocking call to ofstream::open?

I have a C++ program which opens files in /tmp (on a *nix system) and reads their contents.
To do this, I am using:
ofstream dest;
dest.open(abs_path.c_str(), ios::app);
where abs_path is a string containing the absolute path to the file.
The problem is that some *nix programs create named pipes as files in /tmp. For example,
/tmp/vgdb-pipe-to-vgdb-from-23732-by-myusername-on-???
Is a pipe created by a debugging utility I am using.
In the documentation for ofstream, the open method it says that the method sets an error bit when opening the file fails. However, in my tests it instead hangs trying to open the file (which is actually a pipe) indefinitely. I assume this is because the file is locked by another program (probably the debugger).
So, how can I force ofstream::open to block for a finite amount of time, or not at all? It's easy enough to clean up gracefully if it fails, but it needs to actually fail first..
The simple answer is that you can't. filebuf::open (called by
ofstream) basically delegates to the OS, and supposed that the OS will
do the right thing. And the interface it supports is very, very
limited; many important options to open (O_SYNC, O_NONBLOCK, etc)
aren't mapped, and thus can't be used. The only solutions I've found to
this is either to use std::ostringstream, then write the string to the
file using system level calls, or to write my own streambuf, which
does what I want (much simpler than it sounds, since you typically only
need part of what filebuf offers—you often don't need
bidirectionality, seeking or code translation).
Neither of these solutions are portable, of course.
Finally, I'm not sure why you're writing into /tmp. By convention,
anything you put into /tmp should contain the process id. And for
security reasons, I'd always create a subdirectory, with the process id
in its name, and with very limited access rights, and create any
temporary files in it.
AFAIK, there is no such thing as non-blocking input defined by the C++ language. (There is a method std::streambuf::in_avail(), but still it can't help you)
You can consider using C method
int file_descr = open( "pipe_addr", O_RDONLY |O_NONBLOCK);
instead of std::ofstream

File in use C/C++

I'm learning C/C++ right now and I am reading about file operations. Suppose a program A is working with an external file (say, a text file) and another another program B is, say, trying to move the file (or worse, delete it). Is it possible to tell the OS to inform the program B that the file is in use, even though it was not created by program A?
What you're trying to do is called file locking. Search for "file locking in C".
A file is a resource.
If you happen to open one in C/C++, or any other language for that matter, the OS "lends" this file to your program. While you have control of a file (resource) the OS prevents other processes from taking control over it (i.e. moving the file, deleting the file, etc.).
This is why it's important to close a file after you're done working with it. This tells the OS that you no longer control this resource and other processes can fully access them.