Threadsafe logging - c++

I want to implement a simple class for logging from multiple threads. The idea there is, that each object that wants to log stuff, receives an ostream-object that it can write messages to using the usual operators. The desired behaviour is, that the messages are added to the log when the stream is flushed. This way, messages will not get interrupted by messages from other threads. I want to avoid using a temporary stringstream to store the message, as that would make most messages at least twoliners. As I see it, the standard way of achieving this would be to implement my own streambuffer, but this seems very cumbersome and error-prone. Is there a simpler way to do this? If not, do you know a good article/howto/guide on custom streambufs?
Thanks in advance,
Space_C0wbo0y
UPDATE:
Since it seems to work I added my own answer.

Take a look at log4cpp; they have a multi-thread support. It may save your time.

So, I took a look at Boost.IOstreams and here's what I've come up with:
class TestSink : public boost::iostreams::sink {
public:
std::streamsize write( const char * s, std::streamsize n ) {
std::string message( s, n );
/* This would add a message to the log instead of cout.
The log implementation is threadsafe. */
std::cout << message << std::endl;
return n;
}
};
TestSink can be used to create a stream-buffer (see stream_buffer-template). Every thread will receive it's own instance of TestSink, but all TestSinks will write to the same log. TestSink is used as follows:
TestSink sink;
boost::iostreams::stream_buffer< TestSink > testbuf( sink, 50000 );
std::ostream out( &testbuf );
for ( int i = 0; i < 10000; i++ )
out << "test" << i;
out << std::endl;
The important fact here is, that TestSink.write is only called when the stream is flushed (std::endl or std::flush), or when the internal buffer of the stream_buffer instance is full (the default buffer size cannot hold 40000 chars, so I initalize it to 50000). In this program, TestSink.write is called exactly once (the output is too long to post here). This way I can write logmessage using normal formatted stream-IO without any temporary variables and be sure, that the message is posted to the log in one piece when I flush the stream.
I will leave the question open another day, in case there are different suggestions/problems I have not considered.

You think log4cpp is too heavy and you reach for Boost.IOStreams instead? Huh?
You may wish to consider logog. It's thread-safe for POSIX, Win32 and Win64.

Re. your own response. If you are using this for error logging and you program crashes before flushing your stream then you logging is a bit useless isn't it?

Related

How do I write to cin, the input stream, in C++

So I have a program (a game) designed to take input from a human via a keyboard. It is desirable, however, that at certain points I take control away from the user and make certain decisions for them. While it would be possible to write special-case code for use when events force the effects of user input to be emulated, I would much prefer to override the input stream (cin in this case) so that the program is in fact responding no different to forced decisions than had the user made such a decision of their own free will.
I have tried writing to it like I would an output stream (cin<<'z' for example) but the << operator isn't defined for cin and I don't know how to define it.
Would it be better to write to the keyboard buffer? If so, how would I do that in a system agnostic manner?
Writing to the input is quite a hack. It would be much cleaner design to put an abstraction layer between the actual input (like cin) and the game acting on that input. Then, you could just reconfigure this abstraction layer to respond to procedurally generated commands instead of to cin whenever necessary.
You can insert a streambuf into std::cin, something like:
class InjectedData : public std::streambuf
{
std::istream* myOwner;
std::streambuf* mySavedStreambuf;
std::string myData;
public:
InjectedData( std::istream& stream, std::string const& data )
: myOwner( &stream )
, mySavedStreambuf( stream.rdbuf() )
, myData( data )
{
setg( myData.data(), myData.data(), myData.data() + myData.size() );
}
~InjectedData()
{
myOwner->rdbuf(mySavedStreambuf);
}
int underflow() override
{
myOwner->rdbuf(mySavedStreambuf);
return mySavedStreambuf->sgetc();
}
};
(I've not tested this, so there may be errors. But the basic
principle should work.)
Constructing an instance of this with std::cin as argument
will return characters from data until the instance is
destructed, or all of the characters have been consumed.
I would believe that writing to cin is the symptom of a badly designed program.
You probably in fact want to have some event loop which of course is operating system specific. On Posix and Linux, you could build it above the poll(2) syscall, and you might want to set up a pipe(7) from your own process to itself.
Several libraries are providing some event loop: libevent, libev, and also frameworks like Qt, Gtk, POCO, libsdl ... (some of them are ported to several operating systems, so gives you an abstraction above the OS...)

boost::mpi throws MPI_ERR_TRUNCATE on multiple isend/irecv transfers with same tag

I'm seeing an MPI_ERR_TRUNCATE error with boost::mpi when performing multiple isend/irecv transfers with the same tag using serialized data. These are not concurrent transfers, i.e. no threading is involved. There is just more than one transfer outstanding at the same time. Here's a short test program that exhibits the failure:
#include <iostream>
#include <string>
#include <vector>
#include <boost/mpi.hpp>
#include <boost/serialization/string.hpp>
static const size_t N = 2;
int main() {
boost::mpi::environment env;
boost::mpi::communicator world;
#if 1
// Serialized types fail.
typedef std::string DataType;
#define SEND_VALUE "how now brown cow"
#else
// Native MPI types succeed.
typedef int DataType;
#define SEND_VALUE 42
#endif
DataType out(SEND_VALUE);
std::vector<DataType> in(N);
std::vector<boost::mpi::request> sends;
std::vector<boost::mpi::request> recvs;
sends.reserve(N);
recvs.reserve(N);
std::cout << "Multiple transfers with different tags\n";
sends.clear();
recvs.clear();
for (size_t i = 0; i < N; ++i) {
sends.push_back(world.isend(0, i, out));
recvs.push_back(world.irecv(0, i, in[i]));
}
boost::mpi::wait_all(sends.begin(), sends.end());
boost::mpi::wait_all(recvs.begin(), recvs.end());
std::cout << "Multiple transfers with same tags\n";
sends.clear();
recvs.clear();
for (size_t i = 0; i < N; ++i) {
sends.push_back(world.isend(0, 0, out));
recvs.push_back(world.irecv(0, 0, in[i]));
}
boost::mpi::wait_all(sends.begin(), sends.end());
boost::mpi::wait_all(recvs.begin(), recvs.end());
return 0;
}
In this program I first do 2 transfers on different tags, which works fine. Then I attempt 2 transfers on the same tag, which fails with:
libc++abi.dylib: terminating with uncaught exception of type boost::exception_detail::clone_impl >: MPI_Unpack: MPI_ERR_TRUNCATE: message truncated
If I use a native MPI data type so that serialization is not invoked, things seem to work. I get the same error on MacPorts boost 1.55 with OpenMPI 1.7.3, and Debian boost 1.49 with OpenMPI 1.4.5. I tried multiple transfers with the same tag directly with the API C interface and that appeared to work, though of course I can only transfer native MPI data types.
My question is whether having multiple outstanding transfers on the same tag is a valid operation with boost::mpi, and if so is there a bug in my program or a bug in boost::mpi?
At the current version of boost, 1.55, boost::mpi does not guarantee non-overtaking messages. This in contrast to the underlying MPI API which does:
Order Messages are non-overtaking: If a sender sends two messages in
succession to the same destination, and both match the same receive,
then this operation cannot receive the second message if the first one
is still pending. If a receiver posts two receives in succession, and
both match the same message, then the second receive operation cannot
be satisfied by this message, if the first one is still pending. This
requirement facilitates matching of sends to receives. It guarantees
that message-passing code is deterministic, if processes are
single-threaded and the wildcard MPI_ANY_SOURCE is not used in
receives.
The reason boost::mpi does not guarantee non-overtaking is that serialized data types are transferred in two MPI messages, one for size and one for payload, and irecv for the second message cannot be posted until the first message is examined.
A proposal to guarantee non-overtaking in boost::mpi is being considered. Further discussion can be found on the boost::mpi mailing list beginning here.
The problem could be that you're waiting for all of your sends to complete and then for all of your receives. MPI is expecting your sends and receives to match in time as well as in number. What I mean when I say that is that you can't finish all of your send calls without also having your receive calls progressing.
The way MPI usually handles sending a message is that when you call send, it will return from the call as soon as the message is handled by the library. This could that the message has been copied to an internal buffer or that the message was actually transferred to the remote process and has been received. Either way, the message has to go somewhere. If you don't have a receive buffer already waiting, the message has to be buffered internally. Eventually, the implementation will run out of those buffers and starts to do bad things (like return errors to the user), which you are probably seeing here.
The solution is to pre-post your receive buffers. In your case, you can just push all of isend and irecv calls into the same vector and let MPI handle everything. That will give MPI access to all of the receive buffers so your messages have somewhere to go.

std::stringstream with direct output buffer / string result access, avoiding copy?

Is there a canonical / public / free implementations variant of std::stringstream where I don't pay for a full string copy each time I call str()? (Possibly through providing a direct c_str() member in the osteam class?)
I've found two questions here:
C++ stl stringstream direct buffer access (Yeah, it's basically the same title, but note that it's accepted answer doesn't fit this here question at all.)
Stream from std::string without making a copy? (Again, accepted answer doesn't match this question.)
And "of course" the deprecated std::strstream class does allow for direct buffer access, although it's interface is really quirky (apart from it being deprecated).
It also seems one can find several code samples that do explain how one can customize std::streambuf to allow direct access to the buffer -- I haven't tried it in practice, but it seems quite easily implemented.
My question here is really two fold:
Is there any deeper reason why std::[o]stringstream (or, rather, basic_stringbuf) does not allow direct buffer access, but only access through an (expensive) copy of the whole buffer?
Given that it seems easy, but not trivial to implement this, is there any varaint available via boost or other sources, that package this functionality?
Note: The performance hit of the copy that str() makes is very measurable(*), so it seems weird to have to pay for this when the use cases I have seen so far really never need a copy returned from the stringstream. (And if I'd need a copy I could always make it at the "client side".)
(*): With our platform (VS 2005), the results I measure in the release version are:
// tested in a tight loop:
// variant stream: run time : 100%
std::stringstream msg;
msg << "Error " << GetDetailedErrorMsg() << " while testing!";
DoLogErrorMsg(msg.str().c_str());
// variant string: run time: *** 60% ***
std::string msg;
((msg += "Error ") += GetDetailedErrorMsg()) += " while testing!";
DoLogErrorMsg(msg.c_str());
So using a std::string with += (which obviously only works when I don't need custom/number formatting is 40% faster that the stream version, and as far as I can tell this is only due to the complete superfluous copy that str() makes.
I will try to provide an answer to my first bullet,
Is there any deeper reason why std::ostringstream does not allow direct buffer access
Looking at how a streambuf / stringbuf is defined, we can see that the buffer character sequence is not NULL terminated.
As far as I can see, a (hypothetical) const char* std::ostringstream::c_str() const; function, providing direct read-only buffer access, can only make sense when the valid buffer range would always be NULL terminated -- i.e. (I think) when sputc would always make sure that it inserts a terminating NULL after the character it inserts.
I wouldn't think that this is a technical hindrance per se, but given the complexity of the basic_streambuf interface, I'm totally not sure whether it is correct in all cases.
As for the second bullet
Given that it seems easy, but not trivial to implement this, is there
any variant available via boost or other sources, that package this
functionality?
There is Boost.Iostreams and it even contains an example of how to implement an (o)stream Sink with a string.
I came up with a little test implementation to measure it:
#include <string>
#include <boost/iostreams/stream.hpp>
#include <libs/iostreams/example/container_device.hpp> // container_sink
namespace io = boost::iostreams;
namespace ex = boost::iostreams::example;
typedef ex::container_sink<std::wstring> wstring_sink;
struct my_boost_ostr : public io::stream<wstring_sink> {
typedef io::stream<wstring_sink> BaseT;
std::wstring result;
my_boost_ostr() : BaseT(result)
{ }
// Note: This is non-const for flush.
// Suboptimal, but OK for this test.
const wchar_t* c_str() {
flush();
return result.c_str();
}
};
In the tests I did, using this with it's c_str()helper ran slightly faster than a normal ostringstream with it's copying str().c_str() version.
I do not include measuring code. Performance in this area is very brittle, make sure to measure your use case yourself! (For example, the constructor overhead of a string stream is non-negligible.)

Internal "Tee" setup

I have inherited some really old VC6.0 code that I am upgrading to VS2008 for building a 64-bit app. One required feature that was implemented long, long ago is overriding std::cout so its output goes simultaneously to a console window and to a file. The implementation depended on the then-current VC98 library implementation of ostream and, of course, is now irretrievably broken with VS2008. It would be reasonable to accumulate all the output until program termination time and then dump it to a file. I got part of the way home by using freopen(), setvbuf(), and ios::sync_with_stdio(), but to my dismay, the internal library does not treat its buffer as a ring buffer; instead when it flushes to the output device it restarts at the beginning, so every flush wipes out all my accumulated output. Converting to a more standard logging function is not desirable, as there are over 1600 usages of "std::cout << " scattered throughout almost 60 files. I have considered overriding ostream's operator<< function, but I'm not sure if that will cover me, since there are global operator<< functions that can't be overridden. (Or can they?)
Any ideas on how to accomplish this?
You could write a custom stream buffer and attach it to cout with cout.rdbuf(). Your custom stream buffer would then tee the data to cout's original stream buffer and to a stream buffer stolen from an appropriate ofstream.
ofstream flog("log.txt");
teebuf tb(flog.rdbuf(), cout.rdbuf());
cout.rdbuf(&tb);
For your teeing stream buffer you can get an inspiration from this page.
You could use the pre-processor:
#define cout MyLogger
to inject new code.

Non-threadsafe file I/O in C/C++

While troubleshooting some performance problems in our apps, I found out that C's stdio.h functions (and, at least for our vendor, C++'s fstream classes) are threadsafe. As a result, every time I do something as simple as fgetc, the RTL has to acquire a lock, read a byte, and release the lock.
This is not good for performance.
What's the best way to get non-threadsafe file I/O in C and C++, so that I can manage locking myself and get better performance?
MSVC provides _fputc_nolock, and GCC provides unlocked_stdio and flockfile, but I can't find any similar functions in my compiler (CodeGear C++Builder).
I could use the raw Windows API, but that's not portable and I assume would be slower than an unlocked fgetc for character-at-a-time I/O.
I could switch to something like the Apache Portable Runtime, but that could potentially be a lot of work.
How do others approach this?
Edit: Since a few people wondered, I had tested this before posting. fgetc doesn't do system calls if it can satisfy reads from its buffer, but it does still do locking, so locking ends up taking an enormous percentage of time (hundreds of locks to acquire and release for a single block of data read from disk). Not doing character-at-a-time I/O would be a solution, but C++Builder's fstream classes unfortunately use fgetc (so if I want to use iostream classes, I'm stuck with it), and I have a lot of legacy code that uses fgetc and friends to read fields out of record-style files (which would be reasonable if it weren't for locking issues).
I'd simply not do IO a char at a time if it is sensible performance wise.
fgetc is almost certainly not reading a byte each time you call it (where by 'reading' I mean invoking a system call to perform I/O). Look somewhere else for your performance bottleneck, as this is probably not the problem, and using unsafe functions is certainly not the solution. Any lock handling you do will probably be less efficient than the handling done by the standard routines.
The easiest way would be to read the entire file in memory, and then provide your own fgetc-like interface to that buffer.
Why not just memory map the file? Memory mapping is portable (except in Windows Vista which requires you to jump through hopes to use it now, the dumbasses). Anyhow, map your file into memory, and do you're own locking/not-locking on the resulting memory location.
The OS handles all the locking required to actually read from the disk - you'll NEVER be able to eliminate this overhead. But your processing overhead, on the otherhand, won't be affected by extraneous locking other than that which you do yourself.
the multi-platform approach is pretty simple. Avoid functions or operators where standard specifies that they should use sentry. sentry is an inner class in iostream classes which ensures stream consistency for every output character and in multi-threaded environment it locks the stream related mutex for each character being output. This avoids race conditions at low level but still makes the output unreadable, since strings from two threads might be output concurrently as the following example states:
thread 1 should write: abc
thread 2 should write: def
The output might look like: adebcf instead of abcdef or defabc. This is because sentry is implemented to lock and unlock per character.
The standard defines it for all functions and operators dealing with istream or ostream. The only way to avoid this is to use stream buffers and your own locking (per string for example).
I have written an app, which outputs some data to a file and mesures the speed. If you add here a function which ouptuts using the fstream directly without using the buffer and flush, you will see the speed difference. It uses boost, but I hope it is not a problem for you. Try to remove all the streambuffers and see the difference with and without them. I my case the performance drawback was factor 2-3 or so.
The following article by N. Myers will explain how locales and sentry in c++ IOStreams work. And for sure you should look up in ISO C++ Standard document, which functions use sentry.
Good Luck,
Ovanes
#include <vector>
#include <fstream>
#include <iterator>
#include <algorithm>
#include <iostream>
#include <cassert>
#include <cstdlib>
#include <boost/progress.hpp>
#include <boost/shared_ptr.hpp>
double do_copy_via_streambuf()
{
const size_t len = 1024*2048;
const size_t factor = 5;
::std::vector<char> data(len, 1);
std::vector<char> buffer(len*factor, 0);
::std::ofstream
ofs("test.dat", ::std::ios_base::binary|::std::ios_base::out);
noskipws(ofs);
std::streambuf* rdbuf = ofs.rdbuf()->pubsetbuf(&buffer[0], buffer.size());
::std::ostreambuf_iterator<char> oi(rdbuf);
boost::progress_timer pt;
for(size_t i=1; i<=250; ++i)
{
::std::copy(data.begin(), data.end(), oi);
if(0==i%factor)
rdbuf->pubsync();
}
ofs.flush();
double rate = 500 / pt.elapsed();
std::cout << rate << std::endl;
return rate;
}
void count_avarage(const char* op_name, double (*fct)())
{
double av_rate=0;
const size_t repeat = 1;
std::cout << "doing " << op_name << std::endl;
for(size_t i=0; i<repeat; ++i)
av_rate+=fct();
std::cout << "average rate for " << op_name << ": " << av_rate/repeat
<< "\n\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n"
<< std::endl;
}
int main()
{
count_avarage("copy via streambuf iterator", do_copy_via_streambuf);
return 0;
}
One thing to consider is to build a custom runtime. Most compilers provide the source to the runtime library (I'd be surprised if it weren't in the C++ Builder package).
This could end up being a lot of work, but maybe they've localized the thread support to make something like this easy. For example, with the embedded system compiler I'm using, it's designed for this - they have documented hooks to add the lock routines. However, it's possible that this could be a maintenance headache, even if it turns out to be relatively easy initially.
Another similar route would be to talk to someone like Dinkumware about using a 3rd party runtime that provides the capabilities you need.