Is ofstream thread safe? - c++

I am working on a program, which uses multiple std::ifstreams for reading a binary file, one std::ifstream for each thread. Now I need to know, if std::ofstream is thread-safe on Windows and Linux for writing in a same file. I am using using only one std::ofstream and using for multiple threads.
I am reading different blocks using using each thread and writing those block in output file using seekp() and write(). Currently it is working for me but whether it is problematic for big files.
Is std::ofstream thread safe?

If I haven't misunderstood you - no, nothing in the standard library is thread safe (except the std::thread specific things, of course (from C++11 and later)). You need additional synchronization.
Even more - if there are several processes, reading from/writing to these files, you need to lock the files, to sync the access.

From C++ standards (Input/Output Library Thread Safety):
27.1.3 Thread safety [iostreams.thread-safety]
Concurrent access to a stream object [string.streams, file.streams], stream buffer object
[stream.buffers], or C Library stream [c.files] by multiple threads may result in a data
race [intro.multithread] unless otherwise specified [iostream.objects]. [Note: Data races
result in undefined behavior [intro.multithread].

I have written a little program to verify the thread-safety of std::ifstream
and std::ofstream https://github.com/rmspacefish/rpi-file-concurrency . I have tested with on a Linux Desktop host and a Raspberry Pi. The program starts two threads, one writer thread and one reader thread and there is a textual mode and a binary mode.
In the textual mode, I am writing two lines in the writer thread, and the reader thread attempts to read two lines. For the textual mode, I get the following output:
Concurrent file access test
Write Open Fail Count: 0
Write Count: 191090
Write Fail Count: 0
Read Open Fail Count: 0
Read Count: 93253
Read One Line Count: 93253
Read Both Lines Count: 93253
Faulty Read Count: 0
EOF Count: 0
Fail Count: 0
Finished.
So this appears to be thread-safe for Linux. For the binary mode, I am writing a binary block in form of a struct consisting of multiple fields like char arrays, integers with various sizes etc. I have two states which are written in alternating cycles. In the reader thread, I check the consistency of the data (inconistent states or worse, wrong values). Here I get the following results
Concurrent file access test
Write Open Fail Count: 0
Write Count: 0
Write Fail Count: 0
Read Open Fail Count: 0
Read Count: 0
Blob in state one read: 25491
Blob in state two read: 24702
Blob in invalid state: 0
Faulty Read Count: 0
EOF Count: 91295
Fail Count: 91295
Finished.
I checked the error flags after calling read (and this is important). If there are no error flags, the state is read in a consistent manner. It looks thread-safe to me.
The thread-safety might still be implementation dependent, but at least for Linux/GCC, file access seems to be thread-safe. I will still test this with MSVC on Windows, but Microsoft specified this should be thread-safe as well.

Yes. It is.
For Windows:
it is safe to write to fstream from multiple threads on windows. Please see the msdn document: Thread Safety in the C++ Standard Library
For Linux:
In short, it is. From the document of libstdc++: "if your platform's C library is threadsafe, then your fstream I/O operations will be threadsafe at the lowest level". Is your platform's C library threadsafe? Yes. The POSIX standard requires that C stdio FILE* operations(such as fread/fwrite) are atomic, and glibc did so.

Related

Does use of Linux pread avoid "unavailability of data for reading written by a different thread"?

Please assume below scenario (OS = Redhat Linux),
Option A :
Writer Thread : Writes to a file using FD=1. Sets last written position and size in a std::atomic<int64_t> variable.
Edit for more clarity : write done using write C function call.
https://www.man7.org/linux/man-pages/man2/write.2.html
Reader Thread : Reads above file using a different FD=2 at value saved in above std::atomic<int64_t> variable.
Then I presume it is possible that, above read thread NOT being able to read all data written by the writer thread (i.e. a read call with FD=2 could return a lesser no of bytes). Since there could be buffering at FD level.
======================================================================================
Option B:
Writer Thread : Writes to a file using FD=1. Sets last written position and size in a std::atomic<int64_t> variable.
Edit for more clarity : Only appends done (no overwrite takes place).write done using write C function call.
https://www.man7.org/linux/man-pages/man2/write.2.html
Reader Thread : Reads(using pread) above file using the same FD=1 at value saved in above std::atomic<int64_t> variable.
https://man7.org/linux/man-pages/man2/pwrite.2.html
Now, is it guaranteed that ALL data written by Writer thread is read by Reader Thread ?
Buffering is at libc level, keeping the data around before handing it over to the kernel. pread is a syscall, it will only give you the data that has already been shown to the kernel.
So no. pread saves you the extra calls for seek+read, it does not solve any buffering issues.
How can you ensure that the kernel gets to see your data? You haven't shown your writer code, but usually calling fflush should do it.

Thread 1 reads from file as thread 2 writes to same file

Thread 1 (T1) creates the file using
FILE *MyFile = tmpfile();
Thread 2 (T2) then starts writing to the file. While thread 2 is writing, thread 1 occasionally reads from the file.
I set it up such that T2 is temporarily suspended when T1 is reading but, as T1 is only ever reading part of the file T2 won't be writing to (the file is written sequentially), I'm wondering if suspending T2 is necessary. I know this would be OK if FILE was replaced by fixed size array / vector. Just wondering how disc differs from memory.
Edit.
The writes are done using fseek and fwrite. The reads are done using fseek and fread. I assumed that was a given but maybe not from some of the comments. I suppose if T1 fseeks to position X at the same time as T2 fseeks to position Y then who knows where the next read or write will start from. Will take a look at pipes, Thanks for the help.
Mixing reads and writes on a FILE is not even safe when dealing with a single thread. From the manpage of fopen:
Reads and writes may be intermixed on read/write streams in any order. Note that ANSI C
requires that a file positioning function intervene between output and input, unless an input
operation encounters end-of-file. (If this condition is not met, then a read is allowed to
return the result of writes other than the most recent.) Therefore it is good practice (and
indeed sometimes necessary under Linux) to put an fseek(3) or fgetpos(3) operation between
write and read operations on such a stream. This operation may be an apparent no-op (as in
fseek(..., 0L, SEEK_CUR) called for its synchronizing side effect).
So don't assume reads and writes are magically synchronized for you and protect access to the FILE with a mutex.

Do we need mutex to perform multithreading file IO

I'm trying to do random write (Benchmark test) to a file using multiple threads (pthread). Looks like if I comment out mutex lock the created file size is less than actual as if Some writes are getting lost (always in some multiple of chunk size). But if I keep the mutex it's always exact size.
Is my code have a problem in other place and mutex is not really required (as suggested by #evan ) or mutex is necessary here
void *DiskWorker(void *threadarg) {
FILE *theFile = fopen(fileToWrite, "a+");
....
for (long i = 0; i < noOfWrites; ++i) {
//pthread_mutex_lock (&mutexsum);
// For Random access
fseek ( theFile , randomArray[i] * chunkSize , SEEK_SET );
fputs ( data , theFile );
//Or for sequential access (in this case above 2 lines would not be here)
fprintf(theFile, "%s", data);
//sequential access end
fflush (theFile);
//pthread_mutex_unlock(&mutexsum);
}
.....
}
You are opening a file using "append mode". According to C11:
Opening a file with append mode ('a' as the first character in the
mode argument) causes all subsequent writes to the file to be forced
to the then current end-of-file, regardless of intervening calls to
the fseek function.
C standard does not specified how exactly this should be implemented, but on POSIX system this is usually implemented using O_APPEND flag of open function, while flushing data is done using function write. Note that fseek call in your code should have no effect.
I think POSIX requires this, as it describes how redirecting output in append mode (>>) is done by the shell:
Appended output redirection shall cause the file whose name results
from the expansion of word to be opened for output on the designated
file descriptor. The file is opened as if the open() function as
defined in the System Interfaces volume of POSIX.1-2008 was called
with the O_APPEND flag. If the file does not exist, it shall be
created.
And since most programs use FILE interface to send data to stdout, this probably requires fopen to use open with O_APPEND and write (and not functions like pwrite) when writing data.
So if on your system fopen with 'a' mode uses O_APPEND and flushing is done using write and your kernel and filesystem correctly implement O_APPEND flag, using mutex should have no effect as writes do not intervene:
If the O_APPEND flag of the file status flags is set, the file
offset shall be set to the end of the file prior to each write and no
intervening file modification operation shall occur between changing
the file offset and the write operation.
Note that not all filesystems support this behavior. Check this answer.
As for my answer to your previous question, my suggestion was to remove mutex as it should have no effect on the size of a file (and it didn't have any effect on my machine).
Personally, I never really used O_APPEND and would be hesitant to do so, as its behavior might not be supported at some level, plus its behavior is weird on Linux (see "bugs" section of pwrite).
You definitely need a mutex because you are issuing several different file commands. The underlying file subsystem can't possibly know how many file commands you are going to call to complete your whole operation.
So you need the mutex.
In your situation you may find you get better performance putting the mutex outside the loop. The reason being that, otherwise, switching between threads may cause excessive skipping between different parts of the disk. Hard disks take about 10ms to move the read/write head so that could potentially slow things down a lot.
So it might be a good idea to benchmark that.

Atomic writing to file on linux

Is there a way to dump a buffer to file atomically?
By "atomically" I mean: if for example someone terminates my application during writing, I'd like to have file in either before- or after-writing state, but not in a corrupted intermediate state.
If the answer is "no", then probably it could be done with a really small buffers?
For example, can I dump 2 consequent int32_t variables with a single 8 bytes fwrite (on x64 platform), and be sure that both of those int32s are dumped, or neither of them, but not only just one of them?
I recommend writing to a temporary file and then doing a rename(2) on it.
ofstream o("file.tmp"); //Write to a temporary file
o << "my data";
o.close();
//Perform an atomic move operation... needed so readers can't open a partially written file
rename("file.tmp", "file.real");

C++ how to flush std:stringbuf?

I need to put the standard output of a process (binary data) to a string buffer and consume it in another thread.
Here is the producer:
while (ReadFile(ffmpeg_OUT_Rd, cbBuffer, sizeof(cbBuffer), &byteRead, NULL)){
tByte += byteRead; //total bytes
sb->sputn(cbBuffer, byteRead);
}
m_bIsFinished = true;
printf("%d bytes are generated.\n", tByte);
Here is the consumer:
while (!PCS_DISPATCHER_INSTANCE->IsFinished())
Sleep(200);
Sleep(5000);
Mystringbuf* sb = PCS_DISPATCHER_INSTANCE->sb;
printf("Avail: %d\n", sb->in_avail());
It turns out that the consumer cannot get all the bytes of the produced by the producer.
( tByte <> sb->in_avail() )
Is it a kind of internal buffering problem? If yes, how to force the stringbuf to flush its internal buffer?
A streambufhas nothing like flush: writes are done directly into the buffer. There is a pubsync() member that could help, if you would use an object derived such as a filebuf. But this does not apply to your case.
Your issue certainly comes from a a data race on sputn() or is_avail(). Either protect access to the streambuf via a mutex, or via an atomic. If m_bIsFinished is not an atomic, and depending on your implementation of isFinished(), the synchronisation between the threads might not be guaranteed (for example: producer could write to memory, but consumer still obtains an outdated value from the CPU memory cache), which could conduct to such a data race.
Edit:
If you'd experience the issue within a single thread, thus eliminating any potential racing condition, it may come from implementation of streambuf. I could experience this with MSVC13 in a single thread application:
tracing showed that number of bytes read were accurate, but in_avail() result was always smaller or equal to tByte through the whole loop.
when reading the streambuf, the correct total number of bytes were read (thus more than indicated by in_avail()).
This behaviour is compliant. According to C++ standard: in_avail() shall return egptr() - gptr() if a read position is available, and otherwhise showmanyc(). The latter is defined as returning an estimate of the number of characters available in the sequence. The only guarantee given is that you could read at least in_avail() bytes without encountering eof.
Workaround use sb->pubseekoff(0,ios::end)- sb->pubseekoff(0,ios::beg); to count the number of bytes available, and make sure you're repositioned at sb->pubseekoff(0,ios::beg) beofre you read your streambuf.