Do we need mutex to perform multithreading file IO - c++

I'm trying to do random write (Benchmark test) to a file using multiple threads (pthread). Looks like if I comment out mutex lock the created file size is less than actual as if Some writes are getting lost (always in some multiple of chunk size). But if I keep the mutex it's always exact size.
Is my code have a problem in other place and mutex is not really required (as suggested by #evan ) or mutex is necessary here
void *DiskWorker(void *threadarg) {
FILE *theFile = fopen(fileToWrite, "a+");
....
for (long i = 0; i < noOfWrites; ++i) {
//pthread_mutex_lock (&mutexsum);
// For Random access
fseek ( theFile , randomArray[i] * chunkSize , SEEK_SET );
fputs ( data , theFile );
//Or for sequential access (in this case above 2 lines would not be here)
fprintf(theFile, "%s", data);
//sequential access end
fflush (theFile);
//pthread_mutex_unlock(&mutexsum);
}
.....
}

You are opening a file using "append mode". According to C11:
Opening a file with append mode ('a' as the first character in the
mode argument) causes all subsequent writes to the file to be forced
to the then current end-of-file, regardless of intervening calls to
the fseek function.
C standard does not specified how exactly this should be implemented, but on POSIX system this is usually implemented using O_APPEND flag of open function, while flushing data is done using function write. Note that fseek call in your code should have no effect.
I think POSIX requires this, as it describes how redirecting output in append mode (>>) is done by the shell:
Appended output redirection shall cause the file whose name results
from the expansion of word to be opened for output on the designated
file descriptor. The file is opened as if the open() function as
defined in the System Interfaces volume of POSIX.1-2008 was called
with the O_APPEND flag. If the file does not exist, it shall be
created.
And since most programs use FILE interface to send data to stdout, this probably requires fopen to use open with O_APPEND and write (and not functions like pwrite) when writing data.
So if on your system fopen with 'a' mode uses O_APPEND and flushing is done using write and your kernel and filesystem correctly implement O_APPEND flag, using mutex should have no effect as writes do not intervene:
If the O_APPEND flag of the file status flags is set, the file
offset shall be set to the end of the file prior to each write and no
intervening file modification operation shall occur between changing
the file offset and the write operation.
Note that not all filesystems support this behavior. Check this answer.
As for my answer to your previous question, my suggestion was to remove mutex as it should have no effect on the size of a file (and it didn't have any effect on my machine).
Personally, I never really used O_APPEND and would be hesitant to do so, as its behavior might not be supported at some level, plus its behavior is weird on Linux (see "bugs" section of pwrite).

You definitely need a mutex because you are issuing several different file commands. The underlying file subsystem can't possibly know how many file commands you are going to call to complete your whole operation.
So you need the mutex.
In your situation you may find you get better performance putting the mutex outside the loop. The reason being that, otherwise, switching between threads may cause excessive skipping between different parts of the disk. Hard disks take about 10ms to move the read/write head so that could potentially slow things down a lot.
So it might be a good idea to benchmark that.

Related

Thread 1 reads from file as thread 2 writes to same file

Thread 1 (T1) creates the file using
FILE *MyFile = tmpfile();
Thread 2 (T2) then starts writing to the file. While thread 2 is writing, thread 1 occasionally reads from the file.
I set it up such that T2 is temporarily suspended when T1 is reading but, as T1 is only ever reading part of the file T2 won't be writing to (the file is written sequentially), I'm wondering if suspending T2 is necessary. I know this would be OK if FILE was replaced by fixed size array / vector. Just wondering how disc differs from memory.
Edit.
The writes are done using fseek and fwrite. The reads are done using fseek and fread. I assumed that was a given but maybe not from some of the comments. I suppose if T1 fseeks to position X at the same time as T2 fseeks to position Y then who knows where the next read or write will start from. Will take a look at pipes, Thanks for the help.
Mixing reads and writes on a FILE is not even safe when dealing with a single thread. From the manpage of fopen:
Reads and writes may be intermixed on read/write streams in any order. Note that ANSI C
requires that a file positioning function intervene between output and input, unless an input
operation encounters end-of-file. (If this condition is not met, then a read is allowed to
return the result of writes other than the most recent.) Therefore it is good practice (and
indeed sometimes necessary under Linux) to put an fseek(3) or fgetpos(3) operation between
write and read operations on such a stream. This operation may be an apparent no-op (as in
fseek(..., 0L, SEEK_CUR) called for its synchronizing side effect).
So don't assume reads and writes are magically synchronized for you and protect access to the FILE with a mutex.

WriteFile overlapped and fwrite equivalent

On Windows, the WriteFile() function has a parameter called lpOverlapped which lets you specify an offset at which to write to the file.
I was wondering, is there is an fwrite() cross-platform equivalent of that?
I see that if the file is opened with the rb+ flag, I might be able to use fseek() to write to a particular offset. My question is - will this approach be equivalent to the overlapped WriteFile(), and will it produce the same behaviour on all platforms?
Background
The reason I need this is because I am writing blocked compressed data streams to a file, and I want to be able to load a specific block from the file and be able to decompress it. So, basically if I keep track of where the block begins in a file, I can load the block and decompress it in a more efficient manner. I know that there are probably better ways to do this, but I need this solution for some backwards compatibility.
Assuming you are okay with using POSIX functions and not just things from the C or C++ standard libraries, the solution is pwrite (aka: positioned write).
ssize_t rc = pwrite(file_handle, data_ptr, data_size, destination_offset);
I think you are confusing "overlapped" and "overwrite"/"offset." I didn't study up on the specifics of why Microsoft explicitly says overlapped writes include a parameter for offset (I think it makes sense as I describe below). In general, when Microsoft talks about "overlapped" IO, they are talking about how to synchronize events like starting to write the file, receiving notification that the write completed, and starting another write to the file which might or might not overlap with a previous write. In this last case, by overlap I mean what you would think that overlap means, ie overlaps within the contents of the file. Whereas Microsoft means that writing the file overlaps in time with your thread running, or not. Note that this gets very complicated if more than one thread can write the same file.
If possible, and surely if you want portable code, you want to avoid all this nonsense and just do the simplest write possible in each context, which means avoid Microsoft optimizations like "overlapped IO" unless you really need performance. (And if you need absolutely optimal performance, you might want to cache the file yourself and manage the overlaps, then write it once from start to finish.)
While pwrite is probably the best solution, there is an alternative that sticks with stdio functions. Unfortunately, to make it thread-safe, you're using non-standard "stdio" to take direct control of the FILE*'s internal lock, and the names aren't portable. Specifically, POSIX defines one set of "take/release file lock" names and Windows defines another set (_lock_file/_unlock_file).
That said, you could use these semi-portable constructs to use stdio functions to ensure no buffering conflicts (pwrite to fileno(some_FILE_star) could cause problems if the FILE* buffer overlaps the pwrite location, since pwrite won't fix up the buffer):
// Error checking omitted; you should actually check returns in real code
size_t pfwrite(const void *ptr, size_t size, size_t n,
size_t offset, FILE *stream) {
// Take FILE*'s lock and hold it for entire transaction
flockfile(stream); // _lock_file on Windows
// Record position
long origpos = ftell(stream);
// Seek to desired offset and write
fseek(stream, offset, SEEK_SET); // Possibly offset * size, not just offset?
size_t written = fwrite(ptr, size, n, stream);
// Seek back to original position
fseek(stream, origpos, SEEK_SET);
// Release FILE*'s lock now that transaction complete
funlockfile(stream); // _unlock_file on Windows
return written;
}

Atomic writing to file on linux

Is there a way to dump a buffer to file atomically?
By "atomically" I mean: if for example someone terminates my application during writing, I'd like to have file in either before- or after-writing state, but not in a corrupted intermediate state.
If the answer is "no", then probably it could be done with a really small buffers?
For example, can I dump 2 consequent int32_t variables with a single 8 bytes fwrite (on x64 platform), and be sure that both of those int32s are dumped, or neither of them, but not only just one of them?
I recommend writing to a temporary file and then doing a rename(2) on it.
ofstream o("file.tmp"); //Write to a temporary file
o << "my data";
o.close();
//Perform an atomic move operation... needed so readers can't open a partially written file
rename("file.tmp", "file.real");

Parallel IO & Append

When I run my small-scale parallel codes, I typically output N files (N being number of processors) in the form fileout.dat.xxx where xxx is the processor number (using I3.3) and then just cat them into a single fileout.dat file after the code is finished.
My question is can I use ACCESS='append' or POSITION='append' in the OPEN statement and have all processors write to the same file?
In practice, no. POSITION='append' merely says that the file pointer will be at the end of file after the open statement is executed. It is, however, possible to change the file position, e.g. with the BACKSPACE, REWIND or such statements. Thus, Fortran POSITION='append' does not correspond to the POSIX O_APPEND, and hence a POSIX OS cannot ensure that all writes only append to the file and do not overwrite older data.
Furhtermore, in case you run your code on a cluster, be aware that O_APPEND does not work on many networked file systems such as NFS.
In order to do parallel I/O with several processes/threads writing to a single file, use ACCESS='direct' or ACCESS='stream' and have the processes agree on which records/byte ranges to write to.

is pwrite after dup race safe?

On Linux pwrite operation (which is seek+write) is atomic, meaning doing pwrite-s in multiple threads with one file descriptor is safe.
I want to create file descriptor duplicate, using dup(). Now, having fd1 and fd2 - will pwrite-s work as expected, or there's danger of race condition?
File descriptor pairs created through dup share the same file status, (e.g. an lseek operation on one file descriptor will affect the other), because they refer to the same entry in the process open files table, which means they are essentially indistinguishable. The only thing they do not have in common is file descriptor flags, (e.g. FD_CLOEXEC.)
From the man page:
After a successful return from dup()
or dup2(), the old and new file
descriptors may be used
interchangeably. They refer to the
same open file description (see
open(2)) and thus share file offset
and file status flags; for example, if
the file offset is modified by using
lseek(2) on one of the descriptors,
the offset is also changed for the
other.
Given that dup allows you to use the two file descriptors interchangeably, (because they refer to the same file in the process file table) I assume this implies that calling pwrite on one would be the same as calling it on the other, and thus be atomic.
I think pwrite is an atomic operation if the number of bytes you're writing is less than PIPE_BUF of the pipe you're writing to (from the POSIX programmer's manual).