Threading the writing in a log file

Threading the writing in a log file - c++

I'm setting up a log system for my (2d) game engine, and it should be able to write lines to a file.
The point is, writing to the disc is not instantaneous. If the file writing (basically, the file.flush()) is done in the thread who is calling the Trace.Write(), will it hang while the file is being written ?
If it is the case, then it would be interesting to create a thread used only to write the log lines to the log file, while the processing thread would continue what it is doing.
Same question with the console (while I'm here...).
The question is :
"Is it interesting in a calculation intensive program, to thread the console and/or file writing ?"
Thank you.

Yes, your thread may be suspended while it is in a IOWAIT state. This is a classical suspend situation.
If it is a good idea to create a thread only responsible for writing logfile entries depends on your code. Is it I/O bound? Then it might be a good idea. Is your code CPU bound? Then it won't help much. Is it neither? Then it doesn't matter.
The best way to figure this out is to analyze your code and benchmark the two versions.

If you queue off the log writes to a dedicated logging thread, there are many advantages. The big disadvantage is that the logging will almost certainly not have happened when your log call returns. If the problem you are trying to catch is is a disastrous crash, the log entry that identifies the bug may not get written at all.
Is it interesting in a calculation intensive program, to thread the console and/or file writing ?
In general, given the caveat above, probably yes:
See also:

If the file writing (basically, the file.flush()) is done in the thread who is calling the Trace.Write(), will it hang while the file is being written ?
Yes. This is because the flush() call is designed to ensure the data hits the disk.
If it is the case, then it would be interesting to create a thread used only to write the log lines to the log file, while the processing thread would continue what it is doing.
Why not just stop calling flush()? If you're not interested in making absolutely sure that, by a certain part of the program, all the data written so far is on the disk, just stop calling flush() manually, and it'll get buffered and written out in the usual efficient manner.
Ultimately there might be some small benefit of having the log writes in another thread, if the disk writing system requires periodic syncs that hang the thread (which I'm not confident is the case), but I would expect that you lose far more than you gain by having to implement synchronisation on however you pass your loggable strings to the background thread. Then you start getting into wondering whether you can use a lock-free queue or some other complex system when really you probably just needed to do it the simple way in the first place - write whenever you like, only flush when absolutely necessary.

Related

How to write-off an expensive file I/O in the middle of a C++ program

I am working on some code which is performance wise extremely demanding (I am using microsecond timers!). The thing is, it has a server<->client architecture where a lot of data is being shared at high speeds. To maintain a sync between the client and the server a simple "sequence number" based approach is followed. Such that if the client's program crashes, the client can "resume" communication by sending the server the last sequence number and they can "resume operations" without missing out on anything.
The issue with this is that I am forced to write sequence numbers to disk. Sadly this has to be done on every "transaction". This file write causes huge time costs (as we would expect).
So I thought I would use threads to get around this problem. However, if I create a regular thread, I would have to wait until the file write finishes anyway and if I used a detached thread, I am doing something risky, as the thread might not finish when my actual process is killed (let's say) and thus the sequence number gets messed up.
What are my options here. Kindly note that sadly I do not have access to C++11. I am using lpthread on linux.

You can just add the data to a queue, and have the secondary threads dequeue, write, and signal when they're done.
You can also get some inspiration from log-based file systems. They get around this problem by having the main thread first writing a small record to a log file and returning control immediately to the rest of the program. Meanwhile, secondary threads can carry out the actual data write, and signal when done by also writing to the log file. This helps your maintain throughput by deferring writes to when more system resources are available, and doesn't block the main thread. Read more about it here

What is the most efficient way of logging very small amount of data?

Suppose I am logging only 1 integer when a function is called in multi-threaded environment, then what is the best design to implement this mechanism ? Example:
void foo1 () {
log(1);
...
}
void foo2 () {
log(2);
...
}
Following are the possible ways:
Simply log into the file using fprintf(). Problem:
Isn't it an expensive operation to call a
function just to log 1 integer ? Correct me if I am wrong.
Store logged integers into an array buffer; and flush periodically into a file.
Problem: If a thread crashes, then process would stop all the threads. So possibly, I may loose lot of last log info.
Any more suggestion for efficient logging mechanism ?

Well, "simple" logging isn't. fprintf will make a jump to kernel (context switch), then back to program (also context switch). Ain't fast, if speed is what you need. You'd probably also need a very, very expensive sync() to make sure the logging data actually makes it to the disk in case of power failure. You really don't want to go there :)
I'd say that the buffered method is actually the fastest and most reasonable tradeoff between speed and reliability. What I'd do is have that buffer, synchronized to be safely written by multiple threads. Concurrently I'd run a disk-writer thread that would flush data to disk once in a while (depends a lot on kind of data you have). I'd use very basic language feature, going more into the plain C land, just because some features (exception handling, multiple inheritance..) are just too prone to break in special circumstances.
One thing you maybe don't know, is that programs do have a say when they crash. You can subscribe to program killing signals (some signals can be cancelled by program, but killing signal isn't one of them). While you're in signal handling, you can flush the log buffer one last time and save more data. And there is also atexit().

You could either take a look at a logging library like boost log or alternatively look at wrapping up std::cout, cerr, cin (or the file you log too) with mutexes, because there are buffered then it shouldn't continuously be writing small amounts to the file.

Easy way to avoid clog collision from different threads?

I have a multi-threaded program where two separate threads are sending debug output to std::clog and the outputs are interspersed. I would like to find an easy way to force the output to at least be kept separate except at line feeds in the output. This way, the debug output can be more readily interpreted. In some places, I've inserted a sleep(1) before the output and gather up the output into a string before sending it to clog to reduce the chances of collision, but I'd prefer a more robust and sure fired solution.
Is there an easy way to ensure that each thread writes a whole line at a time to std::clog before the other thread can get in and write its own line of output?

There's no particularly easy way of doing this, and there's an extended discussion about it here: http://www.cplusplus.com/forum/general/27760/
The problem is somewhat solved there with the creation of a new AtomicStream that writes an entire line atomically, before anything else is streamed (this is done with buffering tricks). You'll need to come up with a similar solution. Sorry for the non-easy answer -- thread synchronization will somehow have to make it into your solution.
This may be derivative, but if your std::clog redirects to a file, you could also have multiple files for the multiple threads.

You ... can't. You're writing to the same stream at the same time. The buffering in clog will help a little, but there's still no guarantees.
Unless you want to synchronize your threads' logging (kinda expensive for what you're doing) maybe you should look at using a logging facility instead (this would allow you to log to say, different files for different things).

Yeah, you're lookign for a cross-thread synchronization method. These are usually available in an operating system's API, you can also find one in Boost.

Is there a way to abort an SQLite call?

I'm using SQLite3 in a Windows application. I have the source code (so-called SQLite amalgamation).
Sometimes I have to execute heavy queries. That is, I call sqlite3_step on a prepared statement, and it takes a lot of time to complete (due to the heavy I/O load).
I wonder if there's a possibility to abort such a call. I would also be glad if there was an ability to do some background processing in the middle of the call within the same thread (since most of the time is spent in waiting for the I/O to complete).
I thought about modifying the SQLite code myself. In the simplest scenario I could check some condition (like an abort event handle for instance) before every invocation of either ReadFile/WriteFile, and return an error code appropriately. And in order to allow the background processing the file should be opened in the overlapped mode (this enables asynchronous ReadFile/WriteFile).
Is there a chance that interruption of WriteFile may in some circumstances leave the database in the inconsistent state, even with the journal enabled? I guess not, since the whole idea of the journal file is to be prepared for any error of any kind. But I'd like to hear more opinions about this.
Also, did someone tried something similar?
Thanks in advance.
EDIT:
Thanks to ereOn. I wasn't aware of the existence of sqlite3_interrupt. This probably answers my question.
Now, for all of you who wonders how (and why) one expects to do some background processing during the I/O within the same thread.
Unfortunately not many people are familiar with so-called "Overlapped I/O".
http://en.wikipedia.org/wiki/Overlapped_I/O
Using it one issues an I/O operation asynchronously, and the calling thread is not blocked. Then one receives the I/O completion status using one of the completion mechanisms: waitable event, new routine queued into the APC, or the completion port.
Using this technique one doesn't have to create extra threads. Actually the only real legitimation for creating threads is when your bottleneck is the computation time (i.e. CPU load), and the machine has several CPUs (or cores).
And creating a thread just to let it be blocked by the OS most of the time - this doesn't make sense. This leads to the unjustified waste of the OS resources, complicates the program (need for synchronization and etc.).
Unfortunately not all the libraries/APIs allow asynchronous mode of operation, thus making creating extra threads the necessarily evil.
EDIT2:
I've already found the solution, thansk to ereOn.
For all those who nevertheless insist that it's not worth doing things "in background" while "waiting" for the I/O to complete using overlapped I/O. I disagree, and I think there's no point to argue about this. At least this is not related to the subject.
I'm a Windows programmer (as you may noticed), and I have a very extensive experience in all kinds of multitasking. Plus I'm also a driver writer, so that I also know how things work "behind the scenes".
I know that it's a "common practice" to create several threads to do several things "in parallel". But this doesn't mean that this is a good practice. Please allow me not to follow the "common practice".

I don't understand why you want the interruption to come from the same thread and I even don't understand how that would be possible: if the current thread is blocked, waiting for some IO, you can't execute any other code. (Yeah, that's what "blocked" means)
Perhaps if you give us more hints about why you want this, we might help further.
Usually, I use sqlite3_interrupt() to cancel calls. But this, obviously, involves that the call is made from another thread.

By default, SQLite is threadsafe. It sounds to me like the easiest thing to do would be to start the Sqlite command on a background thread, and let SQLite to the necessary locking to have that work.
From your perspective then, the sqlite call looks like an asynchronous bit of I/O, and you can continue normal processing on this thread, such as e.g. using a loop including interruptible sleep and a bit of occasional background processing (e.g. to update a liveness indicator). When the SQLite statement completes, the background thread should set a state variable to indicate this, wake the main thread (if necessary), and terminate.

On MacOSX, in a C++ program, what guarantees can I have on file IO

I am on MacOSX.
I am writing a multi threaded program.
One thread does logging.
The non-logging threads may crash at any time.
What conventions should I adopt in the logger / what guarantees can I have?
I would prefer a solution where even if I crash during part of a write, previous writes still go to disk, and when reading back the log, I can figure out "ah, I wrote 100 complete enties, then I crashed on the 101th".
Thanks!

I program on Linux, not MacOSX, but probably it's the same there.
If only one thread in your program logs, it means that you buffer the logging data in this logging thread and then it writes it to a file probably some larger portion to avoid too many I/O operations and make the logging process faster.
The bad thing is that if one thread segfaults, the whole process is destroyed along with the buffered data.
The solutions (for Linux) I know of are:
Transfer the logging data through a socket, without using buffering logging thread (syslog for example). In this case the OS will probably take care of the data, written to the socket, and even if your application crashes, the data should be received on the other end and logged successfully.
Don's use logging thread, every thread can log synchronously to a file. In this case the losses of the log data after the crash should be very small or none. It's slower though.
I don't know better solutions for this problem yet, it would be interesting to learn ones though.

As Dmitry says, there's only a few options to ensure you actually capture he logging output. Do you really need to write your own? And does it really need to be on another thread? This may introduce a timing window for a crash to miss logs, when you normally want to log synchronously.
The syslog facility on Unix is the standard means for reliable logging for system services. It essentially solves these sorts of problems you describe (ie. logs are processed out-of-process, so if you crash, your logs still get saved).
If your application is aimed only at Mac OS X, you should have a look at the Apple System Log facility (ASL). It provides a more sophisticated API than syslog and a superset of its functionality.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js