Parallel Read Write to C++ stringstream

Parallel Read Write to C++ stringstream - c++

I am using std::stringstream for reading/writing binary data.
std::stringstream strm(std::stringstream::binary|std::stringstream::in|std::stringstream::out);
strm.write(...) //happens in one thread
strm.read(...) //happens in another thread
Does C++ standards guarantees that parallel read write to stringstream work? Or not.

My fstream.h file at /usr/local/pgi/linux86-64/13.10/include/CC/fstream.h contains no mention of mutex locks. Further, in programs I have written output using << operator to stringstream files can become interleaved if written at the 'same' time.
Since you're reading from and writing to the same file, I imagine the line order is important?
As such, I think you want a global mutex lock between threads.
Something like:
#include ....
pthread_mutex_t FileMutex = PTHREAD_MUTEX_INITIALIZER;
std::stringstream strm(std::stringstream::binary|...)
int main()
{
blah blah
pthread_create(&threads, NULL, function, voidPtrToArguments);
blah blah
}
void *function(void *voidPtrToArguments)
{
blah blah some more
pthread_mutex_lock(&FileMutex);
strm.write(...);
pthread_mutex_unlock(&FileMutex);
}
and then the same for a function to read.

This is covered in [iostreams.threadsafety] in the Standard. The C++17 text reads:
Concurrent access to a stream object, stream buffer object, or C Library stream by
multiple threads may result in a data race unless otherwise specified. [Note: Data races result in undefined behavior . —end note ]
There is no "otherwise specified" for std::stringstream so your case would be undefined behaviour.

Related

Reading a variable from reader thread without holding up a writer thread

I've got two threads, a reader thread and a writer thread. The writer thread writes a string and the reader thread reads the string. The writer thread is extremely high speed and I do not want to hold the writer thread up. The reader thread is much slower (a factor million or more slower) and it is not important if the read string is a couple of cycles behind. The only important thing for the reader thread is that when it reads the string that it's not in an undefined state.
Is there a way to be thread safe for reading the string without holding up the writing thread?
I've also looked at making the variable atomic, but I read that this might be a performance bottleneck as well for the writing thread.

I'm not sure if it works but I come up with an idea:
Assume you have two string buffers, Buffer_0 & Buffer_1, each can hold a single string of multiple characters of a predefined max length.
The writer thread alternates between two buffers, but it first checks a mutex. The writer doesn't block on the mutex, it just writes to the other buffer if the mutex is not available. This means that it stops alternating between two buffers and writes into the same buffer multiple times while the reader slowly reads the mutex protected buffer.
Buffer choice of the reader probably doesn't matter much. It can always try to read Buffer_0. It may simply block on the mutex and wait until the writer starts writing Buffer_1. While it reads from the Buffer_0, the writer always writes to Buffer_1 over and over as it fails to acquire the mutex.
Of course, checking the availability, acquiring and releasing of the mutex introduces some run-time cost. Maybe, using an atomic variable which indicates the buffer index that the writer is currently writing into, may work faster than using a mutex. But I'm not sure if it works.
Update: I realized that in the above scenario, Buffer_1 is mostly useless as the reader only reads from Buffer_0. If it's not acceptable for reader to block, it can alternate too and read Buffer_1 instead of waiting. Or the writer can just skip the whole writing operation (writing to Buffer_1) if it's unable to acquire the mutex.

Are you OK with the reader reading a recent value and not necessarily the most recent one ? If so, you can use atomics :
#include <thread>
#include <atomic>
#include <string>
#include <iostream>
std::string spots[4];
std::atomic<int> canWrite;
std::atomic<int> readyIndex;
int writer()
{
while(true) // for demonstration, will be your real writer loop
{
if (readyIndex != canWrite)
{
spots[canWrite] = "foo"; // write here what the writers wants to write
readyIndex = canWrite + 0; // marks that spot as ready
}
}
}
int reader()
{
canWrite = 0;
while(true) // for demonstration, will be your real reader loop
{
if (readyIndex == canWrite)
{
std::cout << spots[readyIndex] << std::endl;
canWrite = (canWrite + 1) % 4; // allow the write to start writing at the next location
}
}
}
int main()
{
std::thread t1(writer);
std::thread r1(reader);
t1.join();
r1.join();
return 0;
}
The reader only writes to canWrite, telling the writer where it can write. The writer only writes to readyIndex, telling the reader where it can read.
If the reader has not read yet the latest string, the writer just skips and goes its merry way.

Thread-safe log buffer in C++?

I'm implementing my own logging system for performance purposes (and because i basically just need a buffer). What i currently have is something like this:
// category => messages
static std::unordered_map<std::string, std::ostringstream> log;
void main() {
while (true) {
log["info"] << "Whatever";
log["192.168.0.1"] << "This is a dynamic entry";
dump_logs();
}
}
void dump_logs() {
// i do something like this for each category, but they have different save strategies
if (log["info"].tellp() > 1000000) {
// save the ostringstream to a file
// clear the log
log["info"].str("")
}
}
It works perfectly. However, i've just added threads and i'm not sure if this code is thread-safe. Any tips?

You can make this thread safe by declaring your map thread_local. If you are going to use it across translation units then make it extern and define it in one translation unit, otherwise static is fine.
You will still need to synchronize writing the logs to disk. A mutex should fix that:
// category => messages (one per thread)
thread_local static std::unordered_map<std::string, std::ostringstream> log;
void main() {
while (true) {
log["info"] << "Whatever";
log["192.168.0.1"] << "This is a dynamic entry";
dump_logs();
}
}
void dump_logs() {
static std::mutex mtx; // mutex shared between threads
// i do something like this for each category, but they have different save strategies
if (log["info"].tellp() > 1000000) {
// now I need to care about threads
// use { to create a lock that will release at the end }
{
std::lock_guard<std::mutex> lock(mtx); // synchronized access
// save the ostringstream to a file
}
// clear the log
log["info"].str("");
}
}

On a POSIX system, if you're always writing data to the end of the file, the fastest way for multiple threads to write data to a file is to use low-level C-style open() in append mode, and just call write(), because the POSIX standard for write() states:
On a regular file or other file capable of seeking, the actual writing
of data shall proceed from the position in the file indicated by the
file offset associated with fildes. Before successful return from
write(), the file offset shall be incremented by the number of bytes
actually written. On a regular file, if the position of the last byte
written is greater than or equal to the length of the file, the length
of the file shall be set to this position plus one.
...
If the O_APPEND flag of the file status flags is set, the file offset
shall be set to the end of the file prior to each write and no
intervening file modification operation shall occur between changing
the file offset and the write operation.
So, all write() calls from within a process to a file opened in append mode are atomic.
No mutexes are needed.
Almost. The only issue you have to be concerned with is
If write() is interrupted by a signal after it successfully writes
some data, it shall return the number of bytes written.
If you have enough control over your environment that you can be sure that your calls to write() will not be interrupted by a signal after only a portion of the data is written, this is the fastest way to write data to a file from multiple threads - you're using the OS-provided lock on the file descriptor that ensures adherence to the POSIX-specified behavior, and as long as you generate the data to be written without any locking, that file descriptor lock is the only one in the entire data path. And that lock will be in your data path no matter what you do in your code.

Implementing File class for both read and write operations on the file

I need to implement a class which holds a regular text file that will be valid for both read and write operations from multiple threads (say, "reader" threads and "writers").
I am working on visual studio 2010 and can use only the available libraries that it (VS 2010) has, so I chose to use the std::fstream class for the file operations and the CreateThread function & CRITICAL_SECTION object from the header.
I might start by saying that I seek, at the beginning, for a simple solution - just so it works....:)
My idea is as follows:
I created a File class that will hold the file and a "mutex" (CRITICAL_SECTION object) as private members.
In addition, this class (File class) provides a "public interface" to the "reader/writer" threads in order to perform a synchronized access to the file for both read and write operations.
See the header file of File class:
class File {
private:
std::fstream iofile;
int size;
CRITICAL_SECTION critical;
public:
File(std::string fileName = " ");
~File();
int getSize();
// the public interface:
void read();
void write(std::string str);
};
Also see the source file:
#include "File.h"
File :: File(std::string fileName)
{
// create & open file for read write and append
// and write the first line of the file
iofile.open(fileName, std::fstream::in | std::fstream::out | std::fstream::app); // **1)**
if(!iofile.is_open()) {
std::cout << "fileName: " << fileName << " failed to open! " << std::endl;
}
// initialize class member variables
this->size = 0;
InitializeCriticalSection(&critical);
}
File :: ~File()
{
DeleteCriticalSection(&critical);
iofile.close(); // **2)**
}
void File :: read()
{
// lock "mutex" and move the file pointer to beginning of file
EnterCriticalSection(&critical);
iofile.seekg(0, std::ios::beg);
// read it line by line
while (iofile)
{
std::string str;
getline(iofile, str);
std::cout << str << std::endl;
}
// unlock mutex
LeaveCriticalSection(&critical);
// move the file pointer back to the beginning of file
iofile.seekg(0, std::ios::beg); // **3)**
}
void File :: write(std::string str)
{
// lock "mutex"
EnterCriticalSection(&critical);
// move the file pointer to the end of file
// and write the string str into the end of the file
iofile.seekg(0, std::ios::end); // **4)**
iofile << str;
// unlock mutex
LeaveCriticalSection(&critical);
}
So my questions are (see the numbers regarding the questions within the code):
1) Do I need to specify anything else for the read and write operations I wish to perform ?
2) Anything else I need to add in the destrutor?
3) What do I need to add here in order that EVERY read operation will occur necessarily from the beginning of the file ?
4) What do I need to modify/add here in order that each write will take place at the end of the file (meaning I wish to append the str string into the end of the file)?
5) Any further comments will be great: another way to implement , pros & cons regarding my implementation, points to watch out , etc'.....
Thanks allot in advance,
Guy.

You must handle exceptions (and errors in general).
No, you destructor even has superfluous things like closing the underlying fstream, which the object takes care of itself in its destructor.
If you always want to start reading at the beginning of the file, just open it for reading and you automatically are at the beginning. Otherwise, you could seek to the beginning and start reading from there.
You already opened the file with ios::app, which causes every write operation to append to the end (including that it ignores seek operations that set the write position, IIRC).
There is a bunch that isn't going to work like you want it to...
Most importantly, you need to define what you need the class to behave like, i.e. what the public interface is. This includes guarantees about the content of the file on disk. For example, after creating an object without passing a filename, what should it write to? Should that really be a file who's name is a single space? Further, what if a thread wants to write two buffers that each contain 100 chars? The only chance to not get interrupted is to first create a buffer combining the data, otherwise it could get interrupted by a different thread. It gets even more complicate concerning the guarantees that your class should fulfill while reading.
Why are you not using references when passing strings? Your tutorial should mention them.
You are invoking the code to enter and leave the critical section at the beginning and end of a function scope. This operation should be bound to the ctor and dtor of a class, check out the RAII idiom in C++.
When you are using a mutex, you should document what it is supposed to protect. In this case, I guess it's the iofile, right? You are accessing it outside the mutex-protected boundaries though...
What is getSize() supposed to do? What would a negative size indicate? In case you want to signal errors with that, that's what exceptions are for! Also, after opening an existing, possibly non-empty file, the size is zero, which sounds weird to me.
read() doesn't return any data, what is it supposed to do?
Using a while-loop to read something always has to have the form "while try-to-read { use data}", yours has the form "while success { try-to-read; use data; }", i.e. it will use data after failing to read it.
Streams have a state, and that state is sticky. Once the failbit is set, it remains set until you explicitly call clear().
BTW: This looks like logging code or a file-backed message queue. Both can be created in a thread-friendly way, but in order to make suggestions, you would have to tell us what you are actually trying to do. This is also what you should put into a comment section on top of your class, so that any reader can understand the intention (and, more important now, so that YOU make up you mind what it's supposed to be).

What can go wrong if cout.rdbuf() is used to switch buffer and never set it back?

The author presented this code under the title A bus error on my platform
#include <fstream>
#include <iostream>
int main()
{
std::ofstream log("oops.log");
std::cout.rdbuf(log.rdbuf());
std::cout << "Oops!\n";
return 0;
}
The string "Oops!\n" is printed to the file "oops.log". The code doesn't restore cout's streambuf, but VS2010 didn't report a runtime error.

Since log and std::cout share a buffer, that buffer will probably be freed twice (once when log goes out of scope, then once more when the program terminates).
This results in undefined behavior, so it's hard to tell the exact reason why it triggers a bus error on his machine but silently fails on yours.

Since the other answers don't mention what to do about this I'll provide that here. You need to save and restore the buffer that cout is supposed to be managing. For example:
#include <fstream>
#include <iostream>
// RAII method of restoring a buffer
struct buffer_restorer {
std::ios &m_s;
std::streambuf *m_buf;
buffer_restorer(std::ios &s, std::streambuf *buf) : m_s(s), m_buf(buf) {}
~buffer_restorer() { m_s.rdbuf(m_buf); }
};
int main()
{
std::ofstream log("oops.log");
buffer_restorer r(std::cout, std::cout.rdbuf(log.rdbuf()));
std::cout << "Oops!\n";
return 0;
}
Now when cout's buffer is replaced before cout is destroyed at the end of the program, so when cout destroys its buffer the correct thing happens.
For simply redirecting standard io generally the environment already has the ability to do that for you (e.g., io redirection in the shell). Rather than the above code I'd probably simply run the program as:
yourprogram > oops.log
Also one thing to remember is that std::cout is a global variable with all the same downsides as other global variables. Instead of modifying it or even using it you may prefer to use the usual techniques to avoid global variables all together. For example you might pass a std::ostream &log_output parameter around and use that instead of having code use cout directly.

Your program has Undefined Behavior.
The destructor of the global cout object will delete the stream buffer when going out of scope, and the same is true of log, which also owns that very same stream buffer. Thus, you are deleting the same object twice.
When a program has Undefined Behavior, anything could happen, from formatting your hard drive to terminating without any error.
On my platform, for instance, the program enters an infinite loop after returning from main().

Basic thread locking in C++11

How do I lock my thread so that my output isn't something like this: hello...hello...hheelhllelolo.l..o......
std::size_t const nthreads=5;
std::vector<std::thread> my_threads(nthreads);
for(unsigned i=0;i<nthreads;i++)
{
my_threads[i] = std::thread ([] () {std::cout << "hello...";});
}

The standard says:
Concurrent access to a synchronized (27.5.3.4) standard iostream object’s formatted and unformatted input (27.7.2.1) and output (27.7.3.1) functions or a standard C stream by multiple threads shall not result in a data race (1.10). [Note: Users must still synchronize concurrent use of these objects and streams by multiple threads if they wish to avoid interleaved characters. — end note] — [iostream.objects.overview] 27.4.1 p4
Notice that the requirement not to produce a data race applies only to the standard iostream objects (cout, cin, cerr, clog, wcout, wcin, wcerr, and wclog) and only when they are synchronized (which they are by default and which can be disabled using the sync_with_stdio member function).
Unfortunately I've noticed two phenomena; implementations either provide stricter guarantees than required (e.g., thread synchronization for all stream objects no matter what, giving poor performance) or fewer (e.g., standard stream objects that are sync_with_stdio produce data races). MSVC seems to lean toward the former while libc++ leans toward the latter.
Anyway, as the note indicates, you have to provide mutual exclusion yourself if you want to avoid interleaved characters. Here's one way to do it:
std::mutex m;
struct lockostream {
std::lock_guard<std::mutex> l;
lockostream() : l(m) {}
};
std::ostream &operator<<(std::ostream &os, lockostream const &l) {
return os;
}
std::cout << lockostream() << "Hello, World!\n";
This way a lock guard is created and lives for the duration of the expression using std::cout. You can templatized the lockostream object to work for any basic_*stream, and even on the address of the stream so that you have a seperate mutex for each one.
Of course the standard stream objects are global variables, so you might want to avoid them the same way all global variables should be avoided. They're handy for learning C++ and toy programs, but you might want to arrange something better for real programs.

You have to use the normal locking techniques as you would do with any other resource otherwise you are experiencing UB.
std::mutex m;
std::lock_guard<std::mutex> lock(m);
std::cout << "hello hello";
or alternativly you can use printf which is threadsafe(on posix):
printf("hello hello");

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Parallel Read Write to C++ stringstream - c++

Related

Reading a variable from reader thread without holding up a writer thread

Thread-safe log buffer in C++?

Implementing File class for both read and write operations on the file

What can go wrong if cout.rdbuf() is used to switch buffer and never set it back?

Basic thread locking in C++11

Categories

Resources