Output stream locking for multiprocess synchronisation? - c++

Having multiple processes, all writing on the same output stream (e.g. with std::cout), is there a way to lock the stream so that, when a process starts writing his own message, it can do it till the end (e.g. with std::endl)?
I need a portable way of doing it.

It's not clear if it would fit the parameters of your situation, but you could potentially funnel all data to a separate worker process that aggregates the data (with its own internal locking) before dumping them to stdout.

You are out of luck. You will have to use whatever your taget OS provides. This means using global/system-wide mutexes or lockf() like functions. You could use some 3rd party library to satisfy the portability requirement, like Boost.Interprocess.

If you are on a UNIX like OS, then you may be able to mimic the behavior you want with a stringstream adapter. This may not be the best way to accomplish it, but the idea is to trigger a single write call whenever std::endl is encountered.
// Assume fd is in blocking mode
class fdostream : public std::ostringstream {
typedef std::ostream & (*manip_t) (std::ostream &);
struct fdbuf : public std::stringbuf {
int fd_;
fdbuf (int fd) : fd_(fd) {}
int sync () {
int r = ::write(fd_, str().data(), str().size());
str(std::string());
return (r > 0) ? 0 : -1;
}
} buf_;
std::ostream & os () { return *this; }
public:
fdostream (int fd) : buf_(fd) { os().rdbuf(&buf_); }
};
fdostream my_cout(1);
my_cout << "Hello," << " world!" << std::endl;
This should achieve the effect of synchronized writes, at the cost of buffering input into a stringstream and then clearing the internal string after each flush.
For greater portability, you could modify the code to use fwrite, and specify unbuffered writes with setvbuf. But, the atomicity of fwrite would depend on the C implementation of the library function.

Related

Asynchronously writing to a file in c++ unix

I have some long loop that I need to write some data to a file on every iteration. The problem is that writing to a file can be slow, so I would like to reduce the time this takes by doing the writing asynchronously.
Does anyone know a good way to do this? Should I be creating a thread that consumes whatever is put into it's buffer by writing it out ( in this case, a single producer, single consumer )?
I am interested mostly in solutions that don't involve anything but the standard library (C++11).
Before going into asynchronous writing, if you are using IOStreams you might want to try to avoid flushing the stream accidentally, e.g., by not using std::endl but rather using '\n' instead. Since writing to IOStreams is buffered this can improve performance quite a bit.
If that's not sufficient, the next question is how the data is written. If there is a lot of formatting going on, there is a chance that the actual formatting takes most of the time. You might be able to push the formatting off into a separate thread but that's quite different from merely passing off writing a couple of bytes to another thread: you'd need to pass on a suitable data structure holding the data to be formatted. What is suitable depends on what you are actually writing, though.
Finally, if writing the buffers to a file is really the bottleneck and you want to stick with the standard C++ library, it may be reasonable to have a writer thread which listens on a queue filled with buffers from a suitable stream buffer and writes the buffers to an std::ofstream: the producer interface would be an std::ostream which would send off probably fixed sized buffers either when the buffer is full or when the stream is flushed (for which I'd use std::flush explicitly) to a queue on which the other read listens. Below is a quick implementation of that idea using only standard library facilities:
#include <condition_variable>
#include <fstream>
#include <mutex>
#include <queue>
#include <streambuf>
#include <string>
#include <thread>
#include <vector>
struct async_buf
: std::streambuf
{
std::ofstream out;
std::mutex mutex;
std::condition_variable condition;
std::queue<std::vector<char>> queue;
std::vector<char> buffer;
bool done;
std::thread thread;
void worker() {
bool local_done(false);
std::vector<char> buf;
while (!local_done) {
{
std::unique_lock<std::mutex> guard(this->mutex);
this->condition.wait(guard,
[this](){ return !this->queue.empty()
|| this->done; });
if (!this->queue.empty()) {
buf.swap(queue.front());
queue.pop();
}
local_done = this->queue.empty() && this->done;
}
if (!buf.empty()) {
out.write(buf.data(), std::streamsize(buf.size()));
buf.clear();
}
}
out.flush();
}
public:
async_buf(std::string const& name)
: out(name)
, buffer(128)
, done(false)
, thread(&async_buf::worker, this) {
this->setp(this->buffer.data(),
this->buffer.data() + this->buffer.size() - 1);
}
~async_buf() {
std::unique_lock<std::mutex>(this->mutex), (this->done = true);
this->condition.notify_one();
this->thread.join();
}
int overflow(int c) {
if (c != std::char_traits<char>::eof()) {
*this->pptr() = std::char_traits<char>::to_char_type(c);
this->pbump(1);
}
return this->sync() != -1
? std::char_traits<char>::not_eof(c): std::char_traits<char>::eof();
}
int sync() {
if (this->pbase() != this->pptr()) {
this->buffer.resize(std::size_t(this->pptr() - this->pbase()));
{
std::unique_lock<std::mutex> guard(this->mutex);
this->queue.push(std::move(this->buffer));
}
this->condition.notify_one();
this->buffer = std::vector<char>(128);
this->setp(this->buffer.data(),
this->buffer.data() + this->buffer.size() - 1);
}
return 0;
}
};
int main()
{
async_buf sbuf("async.out");
std::ostream astream(&sbuf);
std::ifstream in("async_stream.cpp");
for (std::string line; std::getline(in, line); ) {
astream << line << '\n' << std::flush;
}
}
Search the web for "double buffering."
In general, one thread will write to one or more buffers. Another thread reads from the buffers, "chasing" the writing thread.
This may not make your program more efficient. Efficiency with files is achieved by writing in huge blocks so that the drive doesn't get a chance to spin down. One write of many bytes is more efficient than many writes of a few bytes.
This could be achieved by having the writing thread only write when the buffer content has exceeded some threshold like 1k.
Also research the topic of "spooling" or "print spooling".
You'll need to use C++11 since previous versions don't have threading support in the standard library. I don't know why you limit yourself, since Boost has some good stuff in it.

How to read for X seconds maximum in C++?

I want my program to wait for something to read in a FIFO, but if the read (I use std::fstream) lasts more than 5 seconds, I want it to exit.
Is it possible or do I have to use alarm absolutely?
Thank you.
I do not believe there is a clean way to accomplish this that is portable C++ only solution. Your best option is to use poll or select on *nix based systems and WaitForSingleObject or WaitForMultipleObjects on Windows.
You can do this transparently by creating a proxy streambuffer class that forwards calls to a real streambuffer object. This will allow you to call the appropriate wait function before doing the actual read. It might look something like this...
class MyStreamBuffer : public std::basic_streambuf<char>
{
public:
MyStreamBuffer(std::fstream& streamBuffer, int timeoutValue)
: timeoutValue_(timeoutvalue),
streamBuffer_(streamBuffer)
{
}
protected:
virtual std::streamsize xsgetn( char_type* s, std::streamsize count )
{
if(!wait(timeoutValue_))
{
return 0;
}
return streamBuffer_.xsgetn(s, count);
}
private:
bool wait() const
{
// Not entirely complete but you get the idea
return (WAIT_OBJECT_0 == WaitForSingleObject(...));
}
const int timeoutValue_;
std::fstream& streamBuffer_;
};
You would need to do this on every call through. It might get a little tedious but would provide a transparent solution for providing timeouts even where they might not be explicitly supported in client code.
For the one insterested by the way I resolved my problem, here is my function reading from my stream. I couldn't finally use std::fstream so I replaced it by C system calls.
std::string
NamedPipe::readForSeconds(int seconds)
{
fd_set readfs;
struct timeval t = { seconds, 0 };
FD_ZERO(&readfs);
FD_SET(this->_stream, &readfs);
if (select(this->_stream + 1, &readfs, NULL, NULL, &t) < 0)
throw std::runtime_error("Invalid select");
if (FD_ISSET(this->_stream, &readfs))
return this->read();
throw NamedPipe::timeoutException();
}

How to open file in exclusive mode in C++

I am implementing some file system in C++. Up to now I was using fstream but I realized that it is impossible to open it in exclusive mode. Since there are many threads I want to allow multiple reads, and when opening file in writing mode I want to open the file in exclusive mode?
What is the best way to do it? I think Boost offers some features. And is there any other possibility? I would also like to see simple example. If it is not easy / good to do in C++ I could write in C as well.
I am using Windows.
On many operating systems, it's simply impossible, so C++
doesn't support it. You'll have to write your own streambuf.
If the only platform you're worried about is Windows, you can
possibly use the exclusive mode for opening that it offers.
More likely, however, you would want to use some sort of file
locking, which is more precise, and is available on most, if not
all platforms (but not portably—you'll need LockFileEx
under Windows, fcntl under Unix).
Under Posix, you could also use pthread_rwlock. Butenhof
gives an implementation of this using classical mutex and
condition variables, which are present in C++11, so you could
actually implement a portable version (provided all of the
readers and writers are in the same process—the Posix
requests will work across process boundaries, but this is not
true for the C++ threading primitives).
if your app only works on Windows, the Win32 API function CreateFile() is your choice.
For example:
HANDLE hFile = ::CreateFileW(lpszFileFullPathName, GENERIC_WRITE, FILE_SHARE_READ, NULL, OPEN_EXISTING, NULL, NULL);
If you are open to using boost, then I would suggest you use the file_lock class. This means you want to keep the filename of the files you open/close because fstream does not do so for you.
They have two modes lock() that you can use for writing (i.e. only one such lock at a time, the sharable lock prevents this lock too) and lock_sharable() that you can use for reading (i.e. any number of threads can obtain such a lock).
Note that you will find it eventually complicated to manage both, read and write, in this way. That is, if there is always someone to read, the sharable lock may never get released. In that case, the exclusive lock will never be given a chance to take....
// add the lock in your class
#include <boost/interprocess/sync/file_lock.hpp>
class my_files
{
...
private:
...
boost::file_lock m_lock;
};
Now when you want to access a file, you can lock it one way or the other. If the thread is in charge of when they do that, you could add functions for the user to have access to the lock. If your implementation of the read and write functions in my_files are in charge, you want to get a stack based object that locks and unlocks for you (RAII):
class safe_exclusive_lock
{
public:
safe_exclusive_lock(file_lock & lock)
: m_lock_ref(lock)
{
m_lock_ref.lock();
}
~safe_exclusive_lock()
{
m_lock_ref.unlock();
}
private:
file_lock & m_lock_ref;
};
Now you can safely lock the file (i.e. you lock, do things that may throw, you always unlock before exiting your current {}-block):
ssize_t my_files::read(char *buf, size_t len)
{
safe_exclusive_lock guard(m_lock);
...your read code here...
return len;
} // <- here we get the unlock()
ssize_t my_files::write(char const *buf, size_t len)
{
safe_exclusive_lock guard(m_lock);
...your write code here...
return len;
} // <- here we get the unlock()
The file_lock uses a file, so you will want to have the fstream file already created whenever the file_lock is created. If the fstream file may not be created in your constructor, you probably will want to transform the m_lock variable in a unique pointer:
private:
std::unique_ptr<file_lock> m_lock;
And when you reference it, you now need an asterisk:
safe_exclusive_lock guard(*m_lock);
Note that for safety, you should check whether the pointer is indeed allocated, if not defined, it means the file is not open yet so I would suggest you throw:
if(m_lock)
{
safe_exclusive_lock guard(*m_lock);
...do work here...
}
else
{
throw file_not_open();
}
// here the lock was released so you cannot touch the file anymore
In the open, you create the lock:
bool open(std::string const & filename)
{
m_stream.open(...);
...make sure it worked...
m_lock.reset(new file_lock(filename));
// TODO: you may want a try/catch around the m_lock and
// close the m_stream if it fails or use a local
// variable and swap() on success...
return true;
}
And do not forget to release the lock object in your close:
void close()
{
m_lock.reset();
}
Well you can manually prevent yourself from opening a file if it has been opened in write mode already. Just keep track internally of which files you've opened in write mode.
Perhaps you could hash the filename and store it in a table upon open with write access. This would allow fast lookup to see if a file has been opened or not.
You could rename the file, update it under the new name, and rename it back. I've done it, but it's a little heavy.
Since C++17 there are two options:
In C++23 by using the openmode std::ios::noreplace.
In C++17 by using the std::fopen mode x (exclusive).
Note: The x mode was added to c in C11.
C++23 and later:
#include <cerrno>
#include <cstring>
#include <fstream>
#include <iostream>
int main() {
std::ofstream ofs("the_file", std::ios::noreplace);
if (ofs) {
std::cout << "success\n";
} else {
std::cerr << "Error: " << std::strerror(errno) << '\n';
}
}
Demo
C++17 and later:
#include <cerrno>
#include <cstdio>
#include <cstring>
#include <fstream>
#include <iostream>
#include <memory>
struct FILE_closer {
void operator()(std::FILE* fp) const { std::fclose(fp); }
};
// you may want overloads for `std::filesystem::path`, `std::string` etc too:
std::ofstream open_exclusively(const char* filename) {
bool excl = [filename] {
std::unique_ptr<std::FILE, FILE_closer> fp(std::fopen(filename, "wx"));
return !!fp;
}();
auto saveerr = errno;
std::ofstream stream;
if (excl) {
stream.open(filename);
} else {
stream.setstate(std::ios::failbit);
errno = saveerr;
}
return stream;
}
int main() {
std::ofstream ofs = open_exclusively("the_file");
if (ofs) {
std::cout << "success\n";
} else {
std::cout << "Error: " << std::strerror(errno) << '\n';
}
}
Demo

Capture the output of a process in multi-threaded c++

My requirements are simple: start a process, wait for it to finish, then capture and process it's output.
For the longest time I've been using the following:
struct line : public std∷string {
friend std∷istream& operator>> (std∷istream &is, line &l) {
return std∷getline(is, l);
}
};
void capture(std::vector<std::string> &output, const char *command)
{
output.clear();
FILE *f = popen(command, "r");
if(f) {
__gnu_cxx::stdio_filebuf<char> fb(f, ios∷in) ;
std::istream fs(&fb);
std::istream_iterator<line> start(fs), end;
output.insert(output.end(), start, end);
pclose(f);
}
}
And it works really well on single threaded programs.
However, if I call this function from inside a thread, sometimes the popen() call hangs and never return.
So, as a proof-of-concept I replaced the function for this ugly hack:
void capture(std::vector<std::string> &output, const char *command)
{
output.clear();
std::string c = std::string(command) + " > /tmp/out.txt";
::system(c.c_str());
ifstream fs("/tmp/out.txt", std::ios::in);
output.insert(output.end(), istream_iterator<line>(fs), istream_iterator<line>());
unlink("/tmp/out.txt");
}
It's ugly but works, however it kept me wondering what would be the proper way to capture a process output on a multi-threaded program.
The program runs on linux in a embedded powerquiccII processor.
See this: popen - locks or not thread safe? and other references do not seem conclusive that popen() needs to be thread-safe, so perhaps since you are using a less-popular platform, your implementation is not. Any chance you can view the source code of the implementation for your platform?
Otherwise, consider creating a new process and waiting upon it. Or hey, stick with the silly system() hack, but do handle its return code!

iostream thread safety, must cout and cerr be locked separately?

I understand that to avoid output intermixing access to cout and cerr by multiple threads must be synchronized. In a program that uses both cout and cerr, is it sufficient to lock them separately? or is it still unsafe to write to cout and cerr simultaneously?
Edit clarification: I understand that cout and cerr are "Thread Safe" in C++11. My question is whether or not a write to cout and a write to cerr by different threads simultaneously can interfere with each other (resulting in interleaved input and such) in the way that two writes to cout can.
If you execute this function:
void f() {
std::cout << "Hello, " << "world!\n";
}
from multiple threads you'll get a more-or-less random interleaving of the two strings, "Hello, " and "world\n". That's because there are two function calls, just as if you had written the code like this:
void f() {
std::cout << "Hello, ";
std::cout << "world!\n";
}
To prevent that interleaving, you have to add a lock:
std::mutex mtx;
void f() {
std::lock_guard<std::mutex> lock(mtx);
std::cout << "Hello, " << "world!\n";
}
That is, the problem of interleaving has nothing to do with cout. It's about the code that uses it: there are two separate function calls inserting text, so unless you prevent multiple threads from executing the same code at the same time, there's a potential for a thread switch between the function calls, which is what gives you the interleaving.
Note that a mutex does not prevent thread switches. In the preceding code snippet, it prevents executing the contents of f() simultaneously from two threads; one of the threads has to wait until the other finishes.
If you're also writing to cerr, you have the same issue, and you'll get interleaved output unless you ensure that you never have two threads making these inserter function calls at the same time, and that means that both functions must use the same mutex:
std::mutex mtx;
void f() {
std::lock_guard<std::mutex> lock(mtx);
std::cout << "Hello, " << "world!\n";
}
void g() {
std::lock_guard<std::mutex> lock(mtx);
std::cerr << "Hello, " << "world!\n";
}
In C++11, unlike in C++03, the insertion to and extraction from global stream objects (cout, cin, cerr, and clog) are thread-safe. There is no need to provide manual synchronization. It is possible, however, that characters inserted by different threads will interleave unpredictably while being output; similarly, when multiple threads are reading from the standard input, it is unpredictable which thread will read which token.
Thread-safety of the global stream objects is active by default, but it can be turned off by invoking the sync_with_stdio member function of the stream object and passing false as an argument. In that case, you would have to handle the synchronization manually.
It may be unsafe to write to cout and cerr simultaneously !
It depends on wheter cout is tied to cerr or not. See std::ios::tie.
"The tied stream is an output stream object which is flushed before
each i/o operation in this stream object."
This means, that cout.flush() may get called unintentionally by the thread which writes to cerr.
I spent some time to figure out, that this was the reason for randomly missing line endings in cout's output in one of my projects :(
With C++98 cout should not be tied to cerr. But despite the standard it is tied when using MSVC 2008 (my experience). When using the following code everything works well.
std::ostream *cerr_tied_to = cerr.tie();
if (cerr_tied_to) {
if (cerr_tied_to == &cout) {
cerr << "DBG: cerr is tied to cout ! -- untying ..." << endl;
cerr.tie(0);
}
}
See also: why cerr flushes the buffer of cout
There are already several answers here. I'll summarize and also address interactions between them.
Typically,
std::cout and std::cerr will often be funneled into a single stream of text, so locking them in common results in the most usable program.
If you ignore the issue, cout and cerr by default alias their stdio counterparts, which are thread-safe as in POSIX, up to the standard I/O functions (C++14 §27.4.1/4, a stronger guarantee than C alone). If you stick to this selection of functions, you get garbage I/O, but not undefined behavior (which is what a language lawyer might associate with "thread safety," irrespective of usefulness).
However, note that while standard formatted I/O functions (such as reading and writing numbers) are thread-safe, the manipulators to change the format (such as std::hex for hexadecimal or std::setw for limiting an input string size) are not. So, one can't generally assume that omitting locks is safe at all.
If you choose to lock them separately, things are more complicated.
Separate locking
For performance, lock contention may be reduced by locking cout and cerr separately. They're separately buffered (or unbuffered), and they may flush to separate files.
By default, cerr flushes cout before each operation, because they are "tied." This would defeat both separation and locking, so remember to call cerr.tie( nullptr ) before doing anything with it. (The same applies to cin, but not to clog.)
Decoupling from stdio
The standard says that operations on cout and cerr do not introduce races, but that can't be exactly what it means. The stream objects aren't special; their underlying streambuf buffers are.
Moreover, the call std::ios_base::sync_with_stdio is intended to remove the special aspects of the standard streams — to allow them to be buffered as other streams are. Although the standard doesn't mention any impact of sync_with_stdio on data races, a quick look inside the libstdc++ and libc++ (GCC and Clang) std::basic_streambuf classes shows that they do not use atomic variables, so they may create race conditions when used for buffering. (On the other hand, libc++ sync_with_stdio effectively does nothing, so it doesn't matter if you call it.)
If you want extra performance regardless of locking, sync_with_stdio(false) is a good idea. However, after doing so, locking is necessary, along with cerr.tie( nullptr ) if the locks are separate.
This may be useful ;)
inline static void log(std::string const &format, ...) {
static std::mutex locker;
std::lock_guard<std::mutex>(locker);
va_list list;
va_start(list, format);
vfprintf(stderr, format.c_str(), list);
va_end(list);
}
I use something like this:
// Wrap a mutex around cerr so multiple threads don't overlap output
// USAGE:
// LockedLog() << a << b << c;
//
class LockedLog {
public:
LockedLog() { m_mutex.lock(); }
~LockedLog() { *m_ostr << std::endl; m_mutex.unlock(); }
template <class T>
LockedLog &operator << (const T &msg)
{
*m_ostr << msg;
return *this;
}
private:
static std::ostream *m_ostr;
static std::mutex m_mutex;
};
std::mutex LockedLog::m_mutex;
std::ostream* LockedLog::m_ostr = &std::cerr;