UPDATE 14 June 2011
A quick update... Most respondents have focused on the dodgy method for handling the queue of messages to be logged however while there is certainly a lack of optimisation there it's certainly not the root of the problem. We switched the Yield over to a short sleep (yes, the Yield did result in 100% CPU once the system went quiet) however the system still can't keep up with the logging even when it's going nowhere near that sleep. From what I can see the Send is just not very efficient. One respondent commented that we should block up the Send() together in to one send and that would seem like the most appropriate solution to the larger underlying issue and that's why I have marked this as the answer to the original question. I certainly agree the queue model is very flawed though, so thanks for feedback on that and I have up-voted all answers that have contributed to the discussion.
However, this exercise has got us to review why we're using the external logging over a socket like we are, and while it may well have made sense previously when the logging server did lots of processing over the log entries... it no longer does any of that and therefore we have opted to remote that entire module and go for a direct-to-file approach via some pre-existing logging framework, this should eliminate the problem entirely as well as remove unnecessary complexity in the system.
Thanks again for all the feedback.
ORIGINAL QUESTION
In our system we have two components important to this problem - one is developed in Visual C++ and the other is Java (don't ask, historic reasons).
The C++ component is the main service and generates log entries. These log entries are sent via a CSocket::Send out to a Java logging service.
The problem
Performance of sending data seems very low. If we queue on the C++ side then the queue gets backed up progressively on busier systems.
If I hit the Java Logging Server with a simple C# application then I can hammer it way faster then I will ever need to from the C++ tool and it keeps up beautifully.
In the C++ world, the function that adds messages to the queue is:
void MyLogger::Log(const CString& buffer)
{
struct _timeb timebuffer;
_ftime64_s( &timebuffer );
CString message;
message.Format("%d%03d,%04d,%s\r\n", (int)timebuffer.time, (int)timebuffer.millitm, GetCurrentThreadId(), (LPCTSTR)buffer);
CString* queuedMessage = new CString(message);
sendMessageQueue.push(queuedMessage);
}
The function run in a separate thread that sends to the socket is:
void MyLogger::ProcessQueue()
{
CString* queuedMessage = NULL;
while(!sendMessageQueue.try_pop(queuedMessage))
{
if (!running)
{
break;
}
Concurrency::Context::Yield();
}
if (queuedMessage == NULL)
{
return;
}
else
{
socket.Send((LPCTSTR)*queuedMessage, queuedMessage->GetLength());
delete queuedMessage;
}
}
Note that ProcessQueue is run repeatedly by the outer loop thread itself, which excluding a bunch of nonsense preamble:
while(parent->running)
{
try
{
logger->ProcessQueue();
}
catch(...)
{
}
}
The queue is:
Concurrency::concurrent_queue<CString*> sendMessageQueue;
So the effect we're seeing is that the queue is just getting bigger and bigger, log entries are being sent out to the socket but at a much lower rate than they're going in.
Is this a limitation of CSocket::Send that makes it less than useful for us? A mis-use of it? Or an entire red-herring and the problem lies elsewhere?
Your advice is much appreciated.
Kind Regards
Matt Peddlesden
Well, you could start by using a blocking producer-consumer queue and to get rid of the 'Yield'. I'm not surprised that messages get blocked up - when one is posted, the logger thread is typically, on a busy system, ready but not running. This will introduce a lot of avoidable latency before any message on the queue can be processed. The background thread than has a quantum to try an get rid of all the messages that have accumulated on the queue. If there are a lot of ready threads on a busy system, it could well be that the thread just does not sufficient time to handle the messages. especially if a lot have built up and the socket.send blocks.
Also, almost competely wasting one CPU core on queue polling cannot be good for overall performance.
Rgds,
Martin
In my opinion, you're definitely not looking at the most efficient solution. You should definitely call Send() once. For all messages. Concatenate all the messages in the queue on the user side, send them all at once with Send(), then yield.
In addition, this really isn't how you're meant to do it. The PPL contains constructs explicitly intended for asynchronous callbacks- like the call object. You should use that instead of hand-rolling your own.
Things here that might be slowing you up:
The queue you are using. I think this is a classic example of premature optimization. There's no reason here to use the Concurrency::concurrent_queue class and not a regular message queue with a blocking pop() method. If I understand correctly, the Concurrency classes use non-blocking algorithms when in this case you do want to block while the queue is empty and release the CPU for other threads to use.
The use of new and delete for each message and the inner allocations of the CString class. You should try and see if recycling the messages and strings (using a pool) will help performance for two reasons: 1. The allocation and deallocation of the messages and string objects. 2. The allocations and deallocations done inside the strings can maybe be avoided if the string class will internally recycle its buffers.
Have you tried profiling to see where your application is having trouble? Is it only when logging that there are issues with the sender? Is it CPU bound or blocking?
The only thing I can see is that you don't protect the message queue with any sort of locking so the container state could get weird causing all sorts of unexpected behavior.
Related
Recently I ran into a rather common problem about using cout in a multithreading application but with a little twist. I've got several callbackfunctions which get called by external hardware via a driver. Main objective of the callback funtions is to receive some data and store it in a queue and signal a processing-task as soon as a certain amout of datasets got collected. The callback-function needs to run as fast as possible in order to respond to the hardware in soft realtime.
My problem is this: From time to time my queue gets full and I have to handle this case by printing out a warning to the console (hard requirement). As I work with several threads I've created a wrapper function which uses a mutex to synchronise cout. Unfortunately, in some cases waiting for access to cout can take so much time that my callback function doesn't end fast enough to respond to the hardware before a timeout. My solution was to use a atomic variable for each possible error to count the number of occurences and a further task to check these variables periodically and print out the messages afterwards, but I'm pretty sure that this is not the best approach to solve my performance problems.
Are there any general approaches for this type of problem?
Any recommendations how I could improve or simplify my solution?
Thank you in advance
Don't write output in the hotpath.
Instead, queue up the stuff you want to log (prefereably raw data rather than a fully formatted string). Have another OOB thread running which picks up this stuff and logs it.
I have an application with main thread and additional (detached) process created in it.
In that process we are running network server which sends logs from queue through the network.
The question is: is it possible to do something in segfault handler to wait/finish for sending that log queue. So I want almost 100% delivery of that queue.
While it is possible to write a segfault handler, I highly recommend against it. First off, it's very easy to get your program into a "won't terminate" state due to a segfault in the segfault handler.
Second, as dan3 mentions, the memory of the process is likely in a corrupt state, making it hard to know what will and won't work.
Finally, you lose the opportunity to use the coredump from the process to help track down the problem.
While it's not recommended, it is possible.
My recommendation is to write a small program that avoids memory allocation and the use of pointers as much as possible. Perhaps create buffers as global arrays and only ever access them with limited code that can be reviewed by several skilled developers and tested thoroughly (stress testing is great here). Keep in mind, though, that the message could still get lost by the sender or receiver if they crash, so it may not be worth the effort.
By the way - when Netscape first wrote a version of their browser for Linux, I ran it and it kept getting into a locked-up state. Using the strace program, I quickly found that it was in an infinite segfault loop. Very frustrating, and leading to almost 100% cpu wasted.
You can wait() for a process and pthread_wait() for a thread to finish (you didn't specify clearly which one you use).
Remember that if you are in segfault handler, your memory is messed up (avoid malloc() and free()) and your FILE * could also be borked.
Usually developing applications I am used to print to console in order to get useful debugging/tracing information. The application I am working now since it is multi-threaded sometimes I see my printf overlapping each other.
I tried to synchronize the screen using a mutex but I end up in slowing and blocking the app. How to solve this issue?
I am aware of MT logging libraries but in using them, since I log too much, I slow ( a bit ) my app.
I was thinking to the following idea..instead of logging within my applications why not log outside it? I would like to send logging information via socket to a second application process that actually print out on the screen.
Are you aware of any library already doing this?
I use Linux/gcc.
thanks
afg
You have 3 options. In increasing order of complexity:
Just use a simple mutex within each thread. The mutex is shared by all threads.
Send all the output to a single thread that does nothing but the logging.
Send all the output to a separate logging application.
Under most circumstances, I would go with #2. #1 is fine as a starting point, but in all but the most trivial applications you can run in to problems serializing the application. #2 is still very simple, and simple is a good thing, but it is also quite scalable. You still end up doing the processing in the main application, but for the vast majority of applications you gain nothing by spinning this off to it's own, dedicated application.
Number 3 is what you're going to do in preformance-critical server type applications, but the minimal performance gain you get with this approach is 1: very difficult to achieve, 2: very easy to screw up, and 3: not the only or even most compelling reason people generally take this approach. Rather, people typically take this approach when they need the logging service to be seperated from the applications using it.
Which OS are you using?
Not sure about specific library's, but one of the classical approaches to this sort of problem is to use a logging queue, which is worked by a writer thread, who's job is purely to write the log file.
You need to be aware, either with a threaded approach, or a multi-process approach that the write queue may back up, meaning it needs to be managed, either by discarding entries or by slowing down your application (which is obviously easier if it's the threaded approach).
It's also common to have some way of categorising your logging output, so that you can have one section of your code logging at a high level, whilst another section of your code logs at a much lower level. This makes it much easier to manage the amount of output that's being written to files and offers you the option of releasing the code with the logging in it, but turned off so that it can be used for fault diagnosis when installed.
As I know critical section has less weight.
Critical section
Using critical section
If you use gcc, you could use atomic accesses. Link.
Frankly, a Mutex is the only way you really want to do that, so it's always going to be slow in your case because you're using so many print statements.... so to solve your question then, don't use so many print_f statements; that's your problem to begin with.
Okay, is your solution using a mutex to print? Perhaps you should have a mutex to a message queue which another thread is processing to print; that has a potential hang up, but I think will be faster. So, use an active logging thread that spins waiting for incoming messages to print. The networking solution could work too, but that requires more work; try this first.
What you can do is to have one queue per thread, and have the logging thread routinely go through each of these and post the message somewhere.
This is fairly easy to set up and the amount of contention can be very low (just a pointer swap or two, which can be done w/o locking anything).
I'm looking for a way to do asynchronous and thread-safe logging in my C++ project, if possible to one file. I'm currently using cerr and clog for the task, but since they are synchronous, execution shortly pauses every time something is logged. It's a relatively graphics-heavy app, so this kind of thing is quite annoying.
The new logger should use asynchronous I/O to get rid of these pauses. Thread-safety would also be desirable as I intend to add some basic multithreading soon.
I considered a one-file-per-thread approach, but that seemed like it would make managing the logs a nightmare. Any suggestions?
I noticed this 1 year+ old thread. Maybe the asynchronous logger I wrote could be of interest.
http://www.codeproject.com/KB/library/g2log.aspx
G2log uses a protected message queue to forward log entries to a background worker that the slow disk accesses.
I have tried it with a lock-free queue which increased the average time for a LOG call but decreased the worst case time, however I am using the protected queue now as it is cross-platform. It's tested on Windows/Visual Studio 2010 and Ubuntu 11.10/gcc4.6.
It's released as public domain so you can do with it what you want with no strings attached.
This is VERY possible and practical. How do I know? I wrote exactly that at my last job. Unfortunately (for us), they now own the code. :-) Sadly, they don't even use it.
I intend on writing an open source version in the near future. Meanwhile, I can give you some hints.
I/O manipulators are really just function names. You can implement them for your own logging class so that your logger is cout/cin compatible.
Your manipulator functions can tokenize the operations and store them into a queue.
A thread can be blocked on that queue waiting for chunks of log to come flying through. It then processes the string operations and generates the actual log.
This is intrinsically thread compatible since you are using a queue. However, you still would want to put some mutex-like protection around writing to the queue so that a given log << "stuff" << "more stuff"; type operation remains line-atomic.
Have fun!
I think the proper approach is not one-file-per-thread, but one-thread-per-file. If any one file (or resource in general) in your system is only ever accessed by one thread, thread-safe programming becomes so much easier.
So why not make Logger a dedicated thread (or several threads, one per file, if you're logging different things in different files), and in all other threads, writing to log would place the message on the input queue in the appropriate Logger thread, which would get to it after it's done writing the previous message. All it takes is a mutex to protect the queue from adding an event while Logger is reading an event, and a condvar for Logger to wait on when its queue is empty.
Have you considered using a log library.
There are several available, I discovered Pantheios recently and it really seems to be quite incredible.
It's more a front-end logger, you can customize which system is used. It can interact with ACE or log4cxx for example and it seems really easy to use and configure. The main advantage is that it use typesafe operators, which is always great.
If you just want a barebone logging library:
ACE
log4c*
Boost.Log
Pick any :)
I should note that it's possible to implement lock-free queues in C++ and that they are great for logging.
I had the same issue and I believe I have found the perfect solution. I present to you, a single-header library called loguru: https://github.com/emilk/loguru
It's simple to use, portable, configurable, macro-based and by default doesn't #include anything (for that sweet, sweet compilation times).
This is a fairly straightforward question, I'm basically looking for a 'best practice' approach to what I'm trying to do.
I have a Win32 GUI application which starts up a worker thread to do a bunch of blocking calls. I want this thread to send string messages back to the GUI so they can be displayed to the user.
Currently I'm thinking the use of SendMessage would be a good approach, using WM_COPYDATA? Is this on the right track? I did originally have a thread safe queue class which sent simple notification messages back to the GUI thread, which then popped the string off the queue. However I soon took a step back and realised I didn't need the queue; I could just send the string directly.
Any tips? Thanks!
Edit: And for completeness, I'm using C++.
WM_COPYDATA would work fine, but I think it's better to simply define your own private window message. Allocate the string on the worker thread and free it on the GUI thread when you're done. Use PostMessage instead of SendMessage so that you don't block your worker unnecessarily.
As several others have pointed out a custom window message may be the best approach here.
Another thing to consider is the actual memory being used. By passing the string between threads, I'm guessing that you are also passing ownership of the string between threads. This can cause a few issues you should be aware of including
Memory leaks: What happens if Thread A posts the messages and hence gives away ownership but the window is destroyed before Thread B processes the message
Can the memory manager backing your string safely allocate and free memory on different threads.
The first issue is probably the one that has the biggest chance of impacting your application. I think this is a good reason to re-consider your original approach. As long as the queue itself is properly managed you can eliminate the memory leak as the queue can serve as a temporary owner of the memory.
WM_COPYDATA is for sending between processes, not between threads of one process. You certainly can use it for interthread communication but the overhead could be bigger because Windows would presumably need to do some more work to copy the data to a temporary buffer and pass it to the receiving application.
If your concern is simplicity of the application - stick to the way which is easier to implement. If the concern is performance - do profile and choose the variant which is really faster.
Visual Studio 2010 is making these kinds of scenarios significantly easier with the Asynchronous Agents Library. You can take a look at the walkthroughs in documentation here, but here's some not so pseudo-code:
//somewhere stateful, e.g. main
unbounded_buffer<stringtype> buff;
//thread 1
{
//send a message asynchronously to the buffer
asend(&buff,stringtype("hello world");
}
//thread 2
{
//get the message out of the buffer
//if this is a UI thread, receive is blocking so use try_receive which isn't
stringtype message = receive(&buff)
}
If I was doing this with today's toolset, I would use a threadsafe queue.
In similar situations I have always put the strings in a resource file, and used one of the params of a user message to send the resource identifier. If you need to send dynamic information, I would create a threadsafe buffer, allocated by the UI thread, and pass a pointer to the buffer to the worker thread.
A lot depends on the way you want the information to flow.
The fastest way to share the information might be a shared variable with some form of sentinel to prevent race conditions. If there are multiple strings, you could have a queue of some sort.
However that model can be hard to get correct if you don't have experience in data synchronisation. Sending a custom Windows message with the data attached might prove a simpler (less buggy) model.