Thread 1 reads from file as thread 2 writes to same file - c++

Thread 1 (T1) creates the file using
FILE *MyFile = tmpfile();
Thread 2 (T2) then starts writing to the file. While thread 2 is writing, thread 1 occasionally reads from the file.
I set it up such that T2 is temporarily suspended when T1 is reading but, as T1 is only ever reading part of the file T2 won't be writing to (the file is written sequentially), I'm wondering if suspending T2 is necessary. I know this would be OK if FILE was replaced by fixed size array / vector. Just wondering how disc differs from memory.
Edit.
The writes are done using fseek and fwrite. The reads are done using fseek and fread. I assumed that was a given but maybe not from some of the comments. I suppose if T1 fseeks to position X at the same time as T2 fseeks to position Y then who knows where the next read or write will start from. Will take a look at pipes, Thanks for the help.

Mixing reads and writes on a FILE is not even safe when dealing with a single thread. From the manpage of fopen:
Reads and writes may be intermixed on read/write streams in any order. Note that ANSI C
requires that a file positioning function intervene between output and input, unless an input
operation encounters end-of-file. (If this condition is not met, then a read is allowed to
return the result of writes other than the most recent.) Therefore it is good practice (and
indeed sometimes necessary under Linux) to put an fseek(3) or fgetpos(3) operation between
write and read operations on such a stream. This operation may be an apparent no-op (as in
fseek(..., 0L, SEEK_CUR) called for its synchronizing side effect).
So don't assume reads and writes are magically synchronized for you and protect access to the FILE with a mutex.

Related

Does use of Linux pread avoid "unavailability of data for reading written by a different thread"?

Please assume below scenario (OS = Redhat Linux),
Option A :
Writer Thread : Writes to a file using FD=1. Sets last written position and size in a std::atomic<int64_t> variable.
Edit for more clarity : write done using write C function call.
https://www.man7.org/linux/man-pages/man2/write.2.html
Reader Thread : Reads above file using a different FD=2 at value saved in above std::atomic<int64_t> variable.
Then I presume it is possible that, above read thread NOT being able to read all data written by the writer thread (i.e. a read call with FD=2 could return a lesser no of bytes). Since there could be buffering at FD level.
======================================================================================
Option B:
Writer Thread : Writes to a file using FD=1. Sets last written position and size in a std::atomic<int64_t> variable.
Edit for more clarity : Only appends done (no overwrite takes place).write done using write C function call.
https://www.man7.org/linux/man-pages/man2/write.2.html
Reader Thread : Reads(using pread) above file using the same FD=1 at value saved in above std::atomic<int64_t> variable.
https://man7.org/linux/man-pages/man2/pwrite.2.html
Now, is it guaranteed that ALL data written by Writer thread is read by Reader Thread ?
Buffering is at libc level, keeping the data around before handing it over to the kernel. pread is a syscall, it will only give you the data that has already been shown to the kernel.
So no. pread saves you the extra calls for seek+read, it does not solve any buffering issues.
How can you ensure that the kernel gets to see your data? You haven't shown your writer code, but usually calling fflush should do it.

Do we need mutex to perform multithreading file IO

I'm trying to do random write (Benchmark test) to a file using multiple threads (pthread). Looks like if I comment out mutex lock the created file size is less than actual as if Some writes are getting lost (always in some multiple of chunk size). But if I keep the mutex it's always exact size.
Is my code have a problem in other place and mutex is not really required (as suggested by #evan ) or mutex is necessary here
void *DiskWorker(void *threadarg) {
FILE *theFile = fopen(fileToWrite, "a+");
....
for (long i = 0; i < noOfWrites; ++i) {
//pthread_mutex_lock (&mutexsum);
// For Random access
fseek ( theFile , randomArray[i] * chunkSize , SEEK_SET );
fputs ( data , theFile );
//Or for sequential access (in this case above 2 lines would not be here)
fprintf(theFile, "%s", data);
//sequential access end
fflush (theFile);
//pthread_mutex_unlock(&mutexsum);
}
.....
}
You are opening a file using "append mode". According to C11:
Opening a file with append mode ('a' as the first character in the
mode argument) causes all subsequent writes to the file to be forced
to the then current end-of-file, regardless of intervening calls to
the fseek function.
C standard does not specified how exactly this should be implemented, but on POSIX system this is usually implemented using O_APPEND flag of open function, while flushing data is done using function write. Note that fseek call in your code should have no effect.
I think POSIX requires this, as it describes how redirecting output in append mode (>>) is done by the shell:
Appended output redirection shall cause the file whose name results
from the expansion of word to be opened for output on the designated
file descriptor. The file is opened as if the open() function as
defined in the System Interfaces volume of POSIX.1-2008 was called
with the O_APPEND flag. If the file does not exist, it shall be
created.
And since most programs use FILE interface to send data to stdout, this probably requires fopen to use open with O_APPEND and write (and not functions like pwrite) when writing data.
So if on your system fopen with 'a' mode uses O_APPEND and flushing is done using write and your kernel and filesystem correctly implement O_APPEND flag, using mutex should have no effect as writes do not intervene:
If the O_APPEND flag of the file status flags is set, the file
offset shall be set to the end of the file prior to each write and no
intervening file modification operation shall occur between changing
the file offset and the write operation.
Note that not all filesystems support this behavior. Check this answer.
As for my answer to your previous question, my suggestion was to remove mutex as it should have no effect on the size of a file (and it didn't have any effect on my machine).
Personally, I never really used O_APPEND and would be hesitant to do so, as its behavior might not be supported at some level, plus its behavior is weird on Linux (see "bugs" section of pwrite).
You definitely need a mutex because you are issuing several different file commands. The underlying file subsystem can't possibly know how many file commands you are going to call to complete your whole operation.
So you need the mutex.
In your situation you may find you get better performance putting the mutex outside the loop. The reason being that, otherwise, switching between threads may cause excessive skipping between different parts of the disk. Hard disks take about 10ms to move the read/write head so that could potentially slow things down a lot.
So it might be a good idea to benchmark that.

Is reading from an anonymous pipe atomic, in the sense of atomic content?

I am writing a process on Linux with two threads. They communicate using an anonymous pipe, created with the pipe() call.
One end is copying a C structure into the pipe:
struct EventStruct e;
[...]
ssize_t n = write(pipefd[1], &e, sizeof(e));
The other end reads it from the pipe:
struct EventStruct e;
ssize_t n = read(pipefd[0], &e, sizeof(e));
if(n != -1 && n != 0 && n < sizeof(e))
{
// Is a partial read possible here??
}
Can partial reads occur with the anonymous pipe?
The man page (man 7 pipe) stipulates that any write under PIPE_BUF size is atomic. But what they mean is atomic regarding other writers threads... I am not concerned with multiple writers issues. I have only one writer thread, and only one reader thread.
As a side note, my structure is 56 bytes long. Well below the PIPE_BUF size, which is at least 4096 bytes on Linux. It looks like it's even higher on most recent kernel.
Told otherwise: on the reading end, do I have to deal with partial read and store them meanwhile I receive a full structure instance?
As long as you are dealing with fixed size units, there isn't a problem. If you write a unit of N bytes on the pipe and the reader requests a unit of N bytes from the pipe, then there will be no issue. If you can't read all the data in one fell swoop (you don't know the size until after you've read its length, for example), then life gets trickier. However, as shown, you should be fine.
That said, you should still detect short reads. There's a catastrophe pending if you get a short read but assume it is full length. However, you should not expect to detect short reads — code coverage will be a problem. I'd simply test n < (ssize_t)sizeof(e) and anything detected is an error or EOF. Note the cast; otherwise, the signed value will be converted to unsigned and -1 won't be spotted properly.
For specification, you'll need to read the POSIX specifications for:
read()
write()
pipe()
and possibly trace links from those pages. For example, for write(), the specification says:
Write requests to a pipe or FIFO shall be handled in the same way as a regular file with the following exceptions:
There is no file offset associated with a pipe, hence each write request shall append to the end of the pipe.
Write requests of {PIPE_BUF} bytes or less shall not be interleaved with data from other processes doing writes on the same pipe. Writes of greater than {PIPE_BUF} bytes may have data interleaved, on arbitrary boundaries, with writes by other processes, whether or not the O_NONBLOCK flag of the file status flags is set.
Or from the specification of read():
Upon successful completion, where nbyte is greater than 0, read() shall mark for update the last data access timestamp of the file, and shall return the number of bytes read. This number shall never be greater than nbyte. The value returned may be less than nbyte if the number of bytes left in the file is less than nbyte, if the read() request was interrupted by a signal, or if the file is a pipe or FIFO or special file and has fewer than nbyte bytes immediately available for reading. For example, a read() from a file associated with a terminal may return one typed line of data.
So, the write() will write atomic units; the read() will only read atomic units because that's what was written. There won't be a problem, which is what I said at the start.

Atomic writing to file on linux

Is there a way to dump a buffer to file atomically?
By "atomically" I mean: if for example someone terminates my application during writing, I'd like to have file in either before- or after-writing state, but not in a corrupted intermediate state.
If the answer is "no", then probably it could be done with a really small buffers?
For example, can I dump 2 consequent int32_t variables with a single 8 bytes fwrite (on x64 platform), and be sure that both of those int32s are dumped, or neither of them, but not only just one of them?
I recommend writing to a temporary file and then doing a rename(2) on it.
ofstream o("file.tmp"); //Write to a temporary file
o << "my data";
o.close();
//Perform an atomic move operation... needed so readers can't open a partially written file
rename("file.tmp", "file.real");

Is ofstream thread safe?

I am working on a program, which uses multiple std::ifstreams for reading a binary file, one std::ifstream for each thread. Now I need to know, if std::ofstream is thread-safe on Windows and Linux for writing in a same file. I am using using only one std::ofstream and using for multiple threads.
I am reading different blocks using using each thread and writing those block in output file using seekp() and write(). Currently it is working for me but whether it is problematic for big files.
Is std::ofstream thread safe?
If I haven't misunderstood you - no, nothing in the standard library is thread safe (except the std::thread specific things, of course (from C++11 and later)). You need additional synchronization.
Even more - if there are several processes, reading from/writing to these files, you need to lock the files, to sync the access.
From C++ standards (Input/Output Library Thread Safety):
27.1.3 Thread safety [iostreams.thread-safety]
Concurrent access to a stream object [string.streams, file.streams], stream buffer object
[stream.buffers], or C Library stream [c.files] by multiple threads may result in a data
race [intro.multithread] unless otherwise specified [iostream.objects]. [Note: Data races
result in undefined behavior [intro.multithread].
I have written a little program to verify the thread-safety of std::ifstream
and std::ofstream https://github.com/rmspacefish/rpi-file-concurrency . I have tested with on a Linux Desktop host and a Raspberry Pi. The program starts two threads, one writer thread and one reader thread and there is a textual mode and a binary mode.
In the textual mode, I am writing two lines in the writer thread, and the reader thread attempts to read two lines. For the textual mode, I get the following output:
Concurrent file access test
Write Open Fail Count: 0
Write Count: 191090
Write Fail Count: 0
Read Open Fail Count: 0
Read Count: 93253
Read One Line Count: 93253
Read Both Lines Count: 93253
Faulty Read Count: 0
EOF Count: 0
Fail Count: 0
Finished.
So this appears to be thread-safe for Linux. For the binary mode, I am writing a binary block in form of a struct consisting of multiple fields like char arrays, integers with various sizes etc. I have two states which are written in alternating cycles. In the reader thread, I check the consistency of the data (inconistent states or worse, wrong values). Here I get the following results
Concurrent file access test
Write Open Fail Count: 0
Write Count: 0
Write Fail Count: 0
Read Open Fail Count: 0
Read Count: 0
Blob in state one read: 25491
Blob in state two read: 24702
Blob in invalid state: 0
Faulty Read Count: 0
EOF Count: 91295
Fail Count: 91295
Finished.
I checked the error flags after calling read (and this is important). If there are no error flags, the state is read in a consistent manner. It looks thread-safe to me.
The thread-safety might still be implementation dependent, but at least for Linux/GCC, file access seems to be thread-safe. I will still test this with MSVC on Windows, but Microsoft specified this should be thread-safe as well.
Yes. It is.
For Windows:
it is safe to write to fstream from multiple threads on windows. Please see the msdn document: Thread Safety in the C++ Standard Library
For Linux:
In short, it is. From the document of libstdc++: "if your platform's C library is threadsafe, then your fstream I/O operations will be threadsafe at the lowest level". Is your platform's C library threadsafe? Yes. The POSIX standard requires that C stdio FILE* operations(such as fread/fwrite) are atomic, and glibc did so.