Easy way to avoid clog collision from different threads? - c++

I have a multi-threaded program where two separate threads are sending debug output to std::clog and the outputs are interspersed. I would like to find an easy way to force the output to at least be kept separate except at line feeds in the output. This way, the debug output can be more readily interpreted. In some places, I've inserted a sleep(1) before the output and gather up the output into a string before sending it to clog to reduce the chances of collision, but I'd prefer a more robust and sure fired solution.
Is there an easy way to ensure that each thread writes a whole line at a time to std::clog before the other thread can get in and write its own line of output?

There's no particularly easy way of doing this, and there's an extended discussion about it here: http://www.cplusplus.com/forum/general/27760/
The problem is somewhat solved there with the creation of a new AtomicStream that writes an entire line atomically, before anything else is streamed (this is done with buffering tricks). You'll need to come up with a similar solution. Sorry for the non-easy answer -- thread synchronization will somehow have to make it into your solution.
This may be derivative, but if your std::clog redirects to a file, you could also have multiple files for the multiple threads.

You ... can't. You're writing to the same stream at the same time. The buffering in clog will help a little, but there's still no guarantees.
Unless you want to synchronize your threads' logging (kinda expensive for what you're doing) maybe you should look at using a logging facility instead (this would allow you to log to say, different files for different things).

Yeah, you're lookign for a cross-thread synchronization method. These are usually available in an operating system's API, you can also find one in Boost.

Related

What is the most efficient way of logging very small amount of data?

Suppose I am logging only 1 integer when a function is called in multi-threaded environment, then what is the best design to implement this mechanism ? Example:
void foo1 () {
log(1);
...
}
void foo2 () {
log(2);
...
}
Following are the possible ways:
Simply log into the file using fprintf(). Problem:
Isn't it an expensive operation to call a
function just to log 1 integer ? Correct me if I am wrong.
Store logged integers into an array buffer; and flush periodically into a file.
Problem: If a thread crashes, then process would stop all the threads. So possibly, I may loose lot of last log info.
Any more suggestion for efficient logging mechanism ?
Well, "simple" logging isn't. fprintf will make a jump to kernel (context switch), then back to program (also context switch). Ain't fast, if speed is what you need. You'd probably also need a very, very expensive sync() to make sure the logging data actually makes it to the disk in case of power failure. You really don't want to go there :)
I'd say that the buffered method is actually the fastest and most reasonable tradeoff between speed and reliability. What I'd do is have that buffer, synchronized to be safely written by multiple threads. Concurrently I'd run a disk-writer thread that would flush data to disk once in a while (depends a lot on kind of data you have). I'd use very basic language feature, going more into the plain C land, just because some features (exception handling, multiple inheritance..) are just too prone to break in special circumstances.
One thing you maybe don't know, is that programs do have a say when they crash. You can subscribe to program killing signals (some signals can be cancelled by program, but killing signal isn't one of them). While you're in signal handling, you can flush the log buffer one last time and save more data. And there is also atexit().
You could either take a look at a logging library like boost log or alternatively look at wrapping up std::cout, cerr, cin (or the file you log too) with mutexes, because there are buffered then it shouldn't continuously be writing small amounts to the file.

Threading the writing in a log file

I'm setting up a log system for my (2d) game engine, and it should be able to write lines to a file.
The point is, writing to the disc is not instantaneous. If the file writing (basically, the file.flush()) is done in the thread who is calling the Trace.Write(), will it hang while the file is being written ?
If it is the case, then it would be interesting to create a thread used only to write the log lines to the log file, while the processing thread would continue what it is doing.
Same question with the console (while I'm here...).
The question is :
"Is it interesting in a calculation intensive program, to thread the console and/or file writing ?"
Thank you.
Yes, your thread may be suspended while it is in a IOWAIT state. This is a classical suspend situation.
If it is a good idea to create a thread only responsible for writing logfile entries depends on your code. Is it I/O bound? Then it might be a good idea. Is your code CPU bound? Then it won't help much. Is it neither? Then it doesn't matter.
The best way to figure this out is to analyze your code and benchmark the two versions.
If you queue off the log writes to a dedicated logging thread, there are many advantages. The big disadvantage is that the logging will almost certainly not have happened when your log call returns. If the problem you are trying to catch is is a disastrous crash, the log entry that identifies the bug may not get written at all.
Is it interesting in a calculation intensive program, to thread the console and/or file writing ?
In general, given the caveat above, probably yes:
See also:
If the file writing (basically, the file.flush()) is done in the thread who is calling the Trace.Write(), will it hang while the file is being written ?
Yes. This is because the flush() call is designed to ensure the data hits the disk.
If it is the case, then it would be interesting to create a thread used only to write the log lines to the log file, while the processing thread would continue what it is doing.
Why not just stop calling flush()? If you're not interested in making absolutely sure that, by a certain part of the program, all the data written so far is on the disk, just stop calling flush() manually, and it'll get buffered and written out in the usual efficient manner.
Ultimately there might be some small benefit of having the log writes in another thread, if the disk writing system requires periodic syncs that hang the thread (which I'm not confident is the case), but I would expect that you lose far more than you gain by having to implement synchronisation on however you pass your loggable strings to the background thread. Then you start getting into wondering whether you can use a lock-free queue or some other complex system when really you probably just needed to do it the simple way in the first place - write whenever you like, only flush when absolutely necessary.

console out in multi-threaded applications

Usually developing applications I am used to print to console in order to get useful debugging/tracing information. The application I am working now since it is multi-threaded sometimes I see my printf overlapping each other.
I tried to synchronize the screen using a mutex but I end up in slowing and blocking the app. How to solve this issue?
I am aware of MT logging libraries but in using them, since I log too much, I slow ( a bit ) my app.
I was thinking to the following idea..instead of logging within my applications why not log outside it? I would like to send logging information via socket to a second application process that actually print out on the screen.
Are you aware of any library already doing this?
I use Linux/gcc.
thanks
afg
You have 3 options. In increasing order of complexity:
Just use a simple mutex within each thread. The mutex is shared by all threads.
Send all the output to a single thread that does nothing but the logging.
Send all the output to a separate logging application.
Under most circumstances, I would go with #2. #1 is fine as a starting point, but in all but the most trivial applications you can run in to problems serializing the application. #2 is still very simple, and simple is a good thing, but it is also quite scalable. You still end up doing the processing in the main application, but for the vast majority of applications you gain nothing by spinning this off to it's own, dedicated application.
Number 3 is what you're going to do in preformance-critical server type applications, but the minimal performance gain you get with this approach is 1: very difficult to achieve, 2: very easy to screw up, and 3: not the only or even most compelling reason people generally take this approach. Rather, people typically take this approach when they need the logging service to be seperated from the applications using it.
Which OS are you using?
Not sure about specific library's, but one of the classical approaches to this sort of problem is to use a logging queue, which is worked by a writer thread, who's job is purely to write the log file.
You need to be aware, either with a threaded approach, or a multi-process approach that the write queue may back up, meaning it needs to be managed, either by discarding entries or by slowing down your application (which is obviously easier if it's the threaded approach).
It's also common to have some way of categorising your logging output, so that you can have one section of your code logging at a high level, whilst another section of your code logs at a much lower level. This makes it much easier to manage the amount of output that's being written to files and offers you the option of releasing the code with the logging in it, but turned off so that it can be used for fault diagnosis when installed.
As I know critical section has less weight.
Critical section
Using critical section
If you use gcc, you could use atomic accesses. Link.
Frankly, a Mutex is the only way you really want to do that, so it's always going to be slow in your case because you're using so many print statements.... so to solve your question then, don't use so many print_f statements; that's your problem to begin with.
Okay, is your solution using a mutex to print? Perhaps you should have a mutex to a message queue which another thread is processing to print; that has a potential hang up, but I think will be faster. So, use an active logging thread that spins waiting for incoming messages to print. The networking solution could work too, but that requires more work; try this first.
What you can do is to have one queue per thread, and have the logging thread routinely go through each of these and post the message somewhere.
This is fairly easy to set up and the amount of contention can be very low (just a pointer swap or two, which can be done w/o locking anything).

Asynchronous thread-safe logging in C++

I'm looking for a way to do asynchronous and thread-safe logging in my C++ project, if possible to one file. I'm currently using cerr and clog for the task, but since they are synchronous, execution shortly pauses every time something is logged. It's a relatively graphics-heavy app, so this kind of thing is quite annoying.
The new logger should use asynchronous I/O to get rid of these pauses. Thread-safety would also be desirable as I intend to add some basic multithreading soon.
I considered a one-file-per-thread approach, but that seemed like it would make managing the logs a nightmare. Any suggestions?
I noticed this 1 year+ old thread. Maybe the asynchronous logger I wrote could be of interest.
http://www.codeproject.com/KB/library/g2log.aspx
G2log uses a protected message queue to forward log entries to a background worker that the slow disk accesses.
I have tried it with a lock-free queue which increased the average time for a LOG call but decreased the worst case time, however I am using the protected queue now as it is cross-platform. It's tested on Windows/Visual Studio 2010 and Ubuntu 11.10/gcc4.6.
It's released as public domain so you can do with it what you want with no strings attached.
This is VERY possible and practical. How do I know? I wrote exactly that at my last job. Unfortunately (for us), they now own the code. :-) Sadly, they don't even use it.
I intend on writing an open source version in the near future. Meanwhile, I can give you some hints.
I/O manipulators are really just function names. You can implement them for your own logging class so that your logger is cout/cin compatible.
Your manipulator functions can tokenize the operations and store them into a queue.
A thread can be blocked on that queue waiting for chunks of log to come flying through. It then processes the string operations and generates the actual log.
This is intrinsically thread compatible since you are using a queue. However, you still would want to put some mutex-like protection around writing to the queue so that a given log << "stuff" << "more stuff"; type operation remains line-atomic.
Have fun!
I think the proper approach is not one-file-per-thread, but one-thread-per-file. If any one file (or resource in general) in your system is only ever accessed by one thread, thread-safe programming becomes so much easier.
So why not make Logger a dedicated thread (or several threads, one per file, if you're logging different things in different files), and in all other threads, writing to log would place the message on the input queue in the appropriate Logger thread, which would get to it after it's done writing the previous message. All it takes is a mutex to protect the queue from adding an event while Logger is reading an event, and a condvar for Logger to wait on when its queue is empty.
Have you considered using a log library.
There are several available, I discovered Pantheios recently and it really seems to be quite incredible.
It's more a front-end logger, you can customize which system is used. It can interact with ACE or log4cxx for example and it seems really easy to use and configure. The main advantage is that it use typesafe operators, which is always great.
If you just want a barebone logging library:
ACE
log4c*
Boost.Log
Pick any :)
I should note that it's possible to implement lock-free queues in C++ and that they are great for logging.
I had the same issue and I believe I have found the perfect solution. I present to you, a single-header library called loguru: https://github.com/emilk/loguru
It's simple to use, portable, configurable, macro-based and by default doesn't #include anything (for that sweet, sweet compilation times).

On MacOSX, in a C++ program, what guarantees can I have on file IO

I am on MacOSX.
I am writing a multi threaded program.
One thread does logging.
The non-logging threads may crash at any time.
What conventions should I adopt in the logger / what guarantees can I have?
I would prefer a solution where even if I crash during part of a write, previous writes still go to disk, and when reading back the log, I can figure out "ah, I wrote 100 complete enties, then I crashed on the 101th".
Thanks!
I program on Linux, not MacOSX, but probably it's the same there.
If only one thread in your program logs, it means that you buffer the logging data in this logging thread and then it writes it to a file probably some larger portion to avoid too many I/O operations and make the logging process faster.
The bad thing is that if one thread segfaults, the whole process is destroyed along with the buffered data.
The solutions (for Linux) I know of are:
Transfer the logging data through a socket, without using buffering logging thread (syslog for example). In this case the OS will probably take care of the data, written to the socket, and even if your application crashes, the data should be received on the other end and logged successfully.
Don's use logging thread, every thread can log synchronously to a file. In this case the losses of the log data after the crash should be very small or none. It's slower though.
I don't know better solutions for this problem yet, it would be interesting to learn ones though.
As Dmitry says, there's only a few options to ensure you actually capture he logging output. Do you really need to write your own? And does it really need to be on another thread? This may introduce a timing window for a crash to miss logs, when you normally want to log synchronously.
The syslog facility on Unix is the standard means for reliable logging for system services. It essentially solves these sorts of problems you describe (ie. logs are processed out-of-process, so if you crash, your logs still get saved).
If your application is aimed only at Mac OS X, you should have a look at the Apple System Log facility (ASL). It provides a more sophisticated API than syslog and a superset of its functionality.