On Linux pwrite operation (which is seek+write) is atomic, meaning doing pwrite-s in multiple threads with one file descriptor is safe.
I want to create file descriptor duplicate, using dup(). Now, having fd1 and fd2 - will pwrite-s work as expected, or there's danger of race condition?
File descriptor pairs created through dup share the same file status, (e.g. an lseek operation on one file descriptor will affect the other), because they refer to the same entry in the process open files table, which means they are essentially indistinguishable. The only thing they do not have in common is file descriptor flags, (e.g. FD_CLOEXEC.)
From the man page:
After a successful return from dup()
or dup2(), the old and new file
descriptors may be used
interchangeably. They refer to the
same open file description (see
open(2)) and thus share file offset
and file status flags; for example, if
the file offset is modified by using
lseek(2) on one of the descriptors,
the offset is also changed for the
other.
Given that dup allows you to use the two file descriptors interchangeably, (because they refer to the same file in the process file table) I assume this implies that calling pwrite on one would be the same as calling it on the other, and thus be atomic.
I think pwrite is an atomic operation if the number of bytes you're writing is less than PIPE_BUF of the pipe you're writing to (from the POSIX programmer's manual).
Related
Thread 1 (T1) creates the file using
FILE *MyFile = tmpfile();
Thread 2 (T2) then starts writing to the file. While thread 2 is writing, thread 1 occasionally reads from the file.
I set it up such that T2 is temporarily suspended when T1 is reading but, as T1 is only ever reading part of the file T2 won't be writing to (the file is written sequentially), I'm wondering if suspending T2 is necessary. I know this would be OK if FILE was replaced by fixed size array / vector. Just wondering how disc differs from memory.
Edit.
The writes are done using fseek and fwrite. The reads are done using fseek and fread. I assumed that was a given but maybe not from some of the comments. I suppose if T1 fseeks to position X at the same time as T2 fseeks to position Y then who knows where the next read or write will start from. Will take a look at pipes, Thanks for the help.
Mixing reads and writes on a FILE is not even safe when dealing with a single thread. From the manpage of fopen:
Reads and writes may be intermixed on read/write streams in any order. Note that ANSI C
requires that a file positioning function intervene between output and input, unless an input
operation encounters end-of-file. (If this condition is not met, then a read is allowed to
return the result of writes other than the most recent.) Therefore it is good practice (and
indeed sometimes necessary under Linux) to put an fseek(3) or fgetpos(3) operation between
write and read operations on such a stream. This operation may be an apparent no-op (as in
fseek(..., 0L, SEEK_CUR) called for its synchronizing side effect).
So don't assume reads and writes are magically synchronized for you and protect access to the FILE with a mutex.
I'm trying to do random write (Benchmark test) to a file using multiple threads (pthread). Looks like if I comment out mutex lock the created file size is less than actual as if Some writes are getting lost (always in some multiple of chunk size). But if I keep the mutex it's always exact size.
Is my code have a problem in other place and mutex is not really required (as suggested by #evan ) or mutex is necessary here
void *DiskWorker(void *threadarg) {
FILE *theFile = fopen(fileToWrite, "a+");
....
for (long i = 0; i < noOfWrites; ++i) {
//pthread_mutex_lock (&mutexsum);
// For Random access
fseek ( theFile , randomArray[i] * chunkSize , SEEK_SET );
fputs ( data , theFile );
//Or for sequential access (in this case above 2 lines would not be here)
fprintf(theFile, "%s", data);
//sequential access end
fflush (theFile);
//pthread_mutex_unlock(&mutexsum);
}
.....
}
You are opening a file using "append mode". According to C11:
Opening a file with append mode ('a' as the first character in the
mode argument) causes all subsequent writes to the file to be forced
to the then current end-of-file, regardless of intervening calls to
the fseek function.
C standard does not specified how exactly this should be implemented, but on POSIX system this is usually implemented using O_APPEND flag of open function, while flushing data is done using function write. Note that fseek call in your code should have no effect.
I think POSIX requires this, as it describes how redirecting output in append mode (>>) is done by the shell:
Appended output redirection shall cause the file whose name results
from the expansion of word to be opened for output on the designated
file descriptor. The file is opened as if the open() function as
defined in the System Interfaces volume of POSIX.1-2008 was called
with the O_APPEND flag. If the file does not exist, it shall be
created.
And since most programs use FILE interface to send data to stdout, this probably requires fopen to use open with O_APPEND and write (and not functions like pwrite) when writing data.
So if on your system fopen with 'a' mode uses O_APPEND and flushing is done using write and your kernel and filesystem correctly implement O_APPEND flag, using mutex should have no effect as writes do not intervene:
If the O_APPEND flag of the file status flags is set, the file
offset shall be set to the end of the file prior to each write and no
intervening file modification operation shall occur between changing
the file offset and the write operation.
Note that not all filesystems support this behavior. Check this answer.
As for my answer to your previous question, my suggestion was to remove mutex as it should have no effect on the size of a file (and it didn't have any effect on my machine).
Personally, I never really used O_APPEND and would be hesitant to do so, as its behavior might not be supported at some level, plus its behavior is weird on Linux (see "bugs" section of pwrite).
You definitely need a mutex because you are issuing several different file commands. The underlying file subsystem can't possibly know how many file commands you are going to call to complete your whole operation.
So you need the mutex.
In your situation you may find you get better performance putting the mutex outside the loop. The reason being that, otherwise, switching between threads may cause excessive skipping between different parts of the disk. Hard disks take about 10ms to move the read/write head so that could potentially slow things down a lot.
So it might be a good idea to benchmark that.
I want to know if file IO operations by multiple processes/threads are guaranteed to be sequential consistent on Linux. And if not (as this thread says), how should I code to make sure they are sequential consistent? Consider the following example, where FILE_1 and FILE_2 are two distinct file names with absolute paths where both Process A and B have read-write access.
Process A first creates FILE_1 and then creates FILE_2:
FILE* fp1A = fopen(FILE_1, "w");
fclose(fp1A);
FILE* fp2A = fopen(FILE_2, "w");
fclose(fp2A);
Process B first reads FILE_2 and if success, reads FILE_1:
FILE* fp2B = fopen(FILE_2, "r");
if (fp2B != NULL) {
FILE* fp1B = fopen(FILE_1, "r");
// QUESTION: is fp1B guaranteed to be not NULL here?
}
Question is given by the comment above. In other words, if one process does some file IO operations in a given order specified by its source code, are all other processes going to see the effects of these operations on the system in the same order? Is this guaranteed by some standards (POSIX etc.) or implementation defined?
What if I change "file IO" to other operations which have some visible effect on the system in a broader sense (e.g. changing a kernel parameter)?
BACKGROUND: I have been studying memory ordering in the C++11 thread model. But those concepts only concerns memory rather than OS functionalities such as file IO. I understand this is because it is a language standard independent of OS. So I want to know if any other standards provide similar concepts for OS.
Is there a way to dump a buffer to file atomically?
By "atomically" I mean: if for example someone terminates my application during writing, I'd like to have file in either before- or after-writing state, but not in a corrupted intermediate state.
If the answer is "no", then probably it could be done with a really small buffers?
For example, can I dump 2 consequent int32_t variables with a single 8 bytes fwrite (on x64 platform), and be sure that both of those int32s are dumped, or neither of them, but not only just one of them?
I recommend writing to a temporary file and then doing a rename(2) on it.
ofstream o("file.tmp"); //Write to a temporary file
o << "my data";
o.close();
//Perform an atomic move operation... needed so readers can't open a partially written file
rename("file.tmp", "file.real");
When I run my small-scale parallel codes, I typically output N files (N being number of processors) in the form fileout.dat.xxx where xxx is the processor number (using I3.3) and then just cat them into a single fileout.dat file after the code is finished.
My question is can I use ACCESS='append' or POSITION='append' in the OPEN statement and have all processors write to the same file?
In practice, no. POSITION='append' merely says that the file pointer will be at the end of file after the open statement is executed. It is, however, possible to change the file position, e.g. with the BACKSPACE, REWIND or such statements. Thus, Fortran POSITION='append' does not correspond to the POSIX O_APPEND, and hence a POSIX OS cannot ensure that all writes only append to the file and do not overwrite older data.
Furhtermore, in case you run your code on a cluster, be aware that O_APPEND does not work on many networked file systems such as NFS.
In order to do parallel I/O with several processes/threads writing to a single file, use ACCESS='direct' or ACCESS='stream' and have the processes agree on which records/byte ranges to write to.