This is a fairly straightforward question, I'm basically looking for a 'best practice' approach to what I'm trying to do.
I have a Win32 GUI application which starts up a worker thread to do a bunch of blocking calls. I want this thread to send string messages back to the GUI so they can be displayed to the user.
Currently I'm thinking the use of SendMessage would be a good approach, using WM_COPYDATA? Is this on the right track? I did originally have a thread safe queue class which sent simple notification messages back to the GUI thread, which then popped the string off the queue. However I soon took a step back and realised I didn't need the queue; I could just send the string directly.
Any tips? Thanks!
Edit: And for completeness, I'm using C++.
WM_COPYDATA would work fine, but I think it's better to simply define your own private window message. Allocate the string on the worker thread and free it on the GUI thread when you're done. Use PostMessage instead of SendMessage so that you don't block your worker unnecessarily.
As several others have pointed out a custom window message may be the best approach here.
Another thing to consider is the actual memory being used. By passing the string between threads, I'm guessing that you are also passing ownership of the string between threads. This can cause a few issues you should be aware of including
Memory leaks: What happens if Thread A posts the messages and hence gives away ownership but the window is destroyed before Thread B processes the message
Can the memory manager backing your string safely allocate and free memory on different threads.
The first issue is probably the one that has the biggest chance of impacting your application. I think this is a good reason to re-consider your original approach. As long as the queue itself is properly managed you can eliminate the memory leak as the queue can serve as a temporary owner of the memory.
WM_COPYDATA is for sending between processes, not between threads of one process. You certainly can use it for interthread communication but the overhead could be bigger because Windows would presumably need to do some more work to copy the data to a temporary buffer and pass it to the receiving application.
If your concern is simplicity of the application - stick to the way which is easier to implement. If the concern is performance - do profile and choose the variant which is really faster.
Visual Studio 2010 is making these kinds of scenarios significantly easier with the Asynchronous Agents Library. You can take a look at the walkthroughs in documentation here, but here's some not so pseudo-code:
//somewhere stateful, e.g. main
unbounded_buffer<stringtype> buff;
//thread 1
{
//send a message asynchronously to the buffer
asend(&buff,stringtype("hello world");
}
//thread 2
{
//get the message out of the buffer
//if this is a UI thread, receive is blocking so use try_receive which isn't
stringtype message = receive(&buff)
}
If I was doing this with today's toolset, I would use a threadsafe queue.
In similar situations I have always put the strings in a resource file, and used one of the params of a user message to send the resource identifier. If you need to send dynamic information, I would create a threadsafe buffer, allocated by the UI thread, and pass a pointer to the buffer to the worker thread.
A lot depends on the way you want the information to flow.
The fastest way to share the information might be a shared variable with some form of sentinel to prevent race conditions. If there are multiple strings, you could have a queue of some sort.
However that model can be hard to get correct if you don't have experience in data synchronisation. Sending a custom Windows message with the data attached might prove a simpler (less buggy) model.
Related
I've got a ROUTER/DEALER setup where both ends need to be able to receive and send data asynchronously, as soon as it's available. The model is pretty much 0MQ's async C++ server: http://zguide.zeromq.org/cpp:asyncsrv
Both the client and the server workers poll, when there's data available they call a callback. While this happens, from another thread (!) I'm putting data in a std::deque. In each poll-forever thread, I check the deque (under lock), and if there are items there, I send them out to the specified DEALER id (the id is placed in the queue).
But I can't help thinking that this is not idiomatic 0MQ. The mutex is possibly a design problem. Plus, memory consumption can probably get quite high if enough time passes between polls (and data accumulates in the deque).
The only alternative I can think of is having another DEALER thread connect to an inproc each time I want to send out data, and just have it send it and exit. However, this implies a connect per item of data sent + construction and destruction of a socket, and it's probably not ideal.
Is there an idiomatic 0MQ way to do this, and if so, what is it?
I dont fully understand your design but I do understand your concern about using locks.
In most cases you can redesign your code to remove the use of locks using zeromq PAIR sockets and inproc.
Do you really need a std::deque? If not you could just use a zerom queue as its just a queue that you can read/write from from different threads using sockets.
If you really need the deque then encapsulate it into its own thread (a class would be nice) and make its API (push etc) accessible via inproc sockets.
So like I said before I may be on the wrong track but in 99% of cases I have come across you can always remove the locks completely with some ZMQ_PAIR/inproc if you need signalling.
0mq queue has limited buffer size and it can be controlled. So memory issue will get to some point and then dropping data will occur. For that reason you may consider using conflate option leaving only most recent data in queue.
In a case of single server and communication within single machine with many threads I suggest using publish/subscribe model where with conflate option you will receive new data as soon as you read buffer and won't have to worry about memory. And it removes blocking queue problem.
As for your implementation you are quite right, it is not best design but it is quite unavoidable. I suggest checking question Access std::deque from 3 threads while it answers your problem, it may not be the best approach.
I read a article about multithread program design http://drdobbs.com/architecture-and-design/215900465, it says it's a best practice that "replacing shared data with asynchronous messages. As much as possible, prefer to keep each thread’s data isolated (unshared), and let threads instead communicate via asynchronous messages that pass copies of data".
What confuse me is that I don't see the difference between using shared data and message queues. I am now working on a non-gui project on windows, so let's use windows's message queues. and take a tradition producer-consumer problem as a example.
Using shared data, there would be a shared container and a lock guarding the container between the producer thread and the consumer thread. when producer output product, it first wait for the lock and then write something to the container then release the lock.
Using message queue, the producer could simply PostThreadMessage without block. and this is the async message's advantage. but I think there must exist some lock guarding the message queue between the two threads, otherwise the data will definitely corrupt. the PostThreadMessage call just hide the details. I don't know whether my guess is right but if it's true, the advantage seems no longer exist,since both two method do the same thing and the only difference is that the system hide the details when using message queues.
ps. maybe the message queue use a non-blocking containner, but I could use a concurrent container in the former way too. I want to know how the message queue is implemented and is there any performance difference bwtween the two ways?
updated:
I still don't get the concept of async message if the message queue operations are still blocked somewhere else. Correct me if my guess was wrong: when we use shared containers and locks we will block in our own thread. but when using message queues, myself's thread returned immediately, and left the blocking work to some system thread.
Message passing is useful for exchanging smaller amounts of data, because no conflicts need be avoided. It's much easier to implement than is shared memory for intercomputer communication. Also, as you've already noticed, message passing has the advantage that application developers don't need to worry about the details of protections like shared memory.
Shared memory allows maximum speed and convenience of communication, as it can be done at memory speeds when within a computer. Shared memory is usually faster than message passing, as message-passing are typically implemented using system calls and thus require the more time-consuming tasks of kernel intervention. In contrast, in shared-memory systems, system calls are required only to establish shared-memory regions. Once established, all access are treated as normal memory accesses w/o extra assistance from the kernel.
Edit: One case that you might want implement your own queue is that there are lots of messages to be produced and consumed, e.g., a logging system. With the implemenetation of PostThreadMessage, its queue capacity is fixed. Messages will most liky get lost if that capacity is exceeded.
Imagine you have 1 thread producing data,and 4 threads processing that data (presumably to make use of a multi core machine). If you have a big global pool of data you are likely to have to lock it when any of the threads needs access, potentially blocking 3 other threads. As you add more processing threads you increase the chance of a lock having to wait and increase how many things might have to wait. Eventually adding more threads achieves nothing because all you do is spend more time blocking.
If instead you have one thread sending messages into message queues, one for each consumer thread then they can't block each other. You stil have to lock the queue between the producer and consumer threads but as you have a separate queue for each thread you have a separate lock and each thread can't block all the others waiting for data.
If you suddenly get a 32 core machine you can add 20 more processing threads (and queues) and expect that performance will scale fairly linearly unlike the first case where the new threads will just run into each other all the time.
I have used a shared memory model where the pointers to the shared memory are managed in a message queue with careful locking. In a sense, this is a hybrid between a message queue and shared memory. This is very when large quantities of data must be passed between threads while retaining the safety of the message queue.
The entire queue can be packaged in a single C++ class with appropriate locking and the like. The key is that the queue owns the shared storage and takes care of the locking. Producers acquire a lock for input to the queue and receive a pointer to the next available storage chunk (usually an object of some sort), populates it and releases it. The consumer will block until the next shared object has released by the producer. It can then acquire a lock to the storage, process the data and release it back to the pool. In A suitably designed queue can perform multiple producer/multiple consumer operations with great efficiency. Think a Java thread safe (java.util.concurrent.BlockingQueue) semantics but for pointers to storage.
Of course there is "shared data" when you pass messages. After all, the message itself is some sort of data. However, the important distinction is when you pass a message, the consumer will receive a copy.
the PostThreadMessage call just hide the details
Yes, it does, but being a WINAPI call, you can be reasonably sure that it does it right.
I still don't get the concept of async message if the message queue operations are still blocked somewhere else.
The advantage is more safety. You have a locking mechanism that is systematically enforced when you are passing a message. You don't even need to think about it, you can't forget to lock. Given that multi-thread bugs are some of the nastiest ones (think of race conditions), this is very important. Message passing is a higher level of abstraction built on locks.
The disadvantage is that passing large amounts of data would be probably slow. In that case, you need to use need shared memory.
For passing state (i.e. worker thread reporting progress to the GUI) the messages are the way to go.
It's quite simple (I'm amazed others wrote such length responses!):
Using a message queue system instead of 'raw' shared data means that you have to get the synchronization (locking/unlocking of resources) right only once, in a central place.
With a message-based system, you can think in higher terms of "messages" without having to worry about synchronization issues anymore. For what it's worth, it's perfectly possible that a message queue is implemented using shared data internally.
I think this is the key piece of info there: "As much as possible, prefer to keep each thread’s data isolated (unshared), and let threads instead communicate via asynchronous messages that pass copies of data". I.e. use producer-consumer :)
You can do your own message passing or use something provided by the OS. That's an implementation detail (needs to be done right ofc). The key is to avoid shared data, as in having the same region of memory modified by multiple threads. This can cause hard to find bugs, and even if the code is perfect it will eat performance because of all the locking.
I had exact the same question. After reading the answers. I feel:
in most typical use case, queue = async, shared memory (locks) = sync. Indeed, you can do a async version of shared memory, but that's more code, similar to reinvent the message passing wheel.
Less code = less bug and more time to focus on other stuff.
The pros and cons are already mentioned by previous answers so I will not repeat.
UPDATE 14 June 2011
A quick update... Most respondents have focused on the dodgy method for handling the queue of messages to be logged however while there is certainly a lack of optimisation there it's certainly not the root of the problem. We switched the Yield over to a short sleep (yes, the Yield did result in 100% CPU once the system went quiet) however the system still can't keep up with the logging even when it's going nowhere near that sleep. From what I can see the Send is just not very efficient. One respondent commented that we should block up the Send() together in to one send and that would seem like the most appropriate solution to the larger underlying issue and that's why I have marked this as the answer to the original question. I certainly agree the queue model is very flawed though, so thanks for feedback on that and I have up-voted all answers that have contributed to the discussion.
However, this exercise has got us to review why we're using the external logging over a socket like we are, and while it may well have made sense previously when the logging server did lots of processing over the log entries... it no longer does any of that and therefore we have opted to remote that entire module and go for a direct-to-file approach via some pre-existing logging framework, this should eliminate the problem entirely as well as remove unnecessary complexity in the system.
Thanks again for all the feedback.
ORIGINAL QUESTION
In our system we have two components important to this problem - one is developed in Visual C++ and the other is Java (don't ask, historic reasons).
The C++ component is the main service and generates log entries. These log entries are sent via a CSocket::Send out to a Java logging service.
The problem
Performance of sending data seems very low. If we queue on the C++ side then the queue gets backed up progressively on busier systems.
If I hit the Java Logging Server with a simple C# application then I can hammer it way faster then I will ever need to from the C++ tool and it keeps up beautifully.
In the C++ world, the function that adds messages to the queue is:
void MyLogger::Log(const CString& buffer)
{
struct _timeb timebuffer;
_ftime64_s( &timebuffer );
CString message;
message.Format("%d%03d,%04d,%s\r\n", (int)timebuffer.time, (int)timebuffer.millitm, GetCurrentThreadId(), (LPCTSTR)buffer);
CString* queuedMessage = new CString(message);
sendMessageQueue.push(queuedMessage);
}
The function run in a separate thread that sends to the socket is:
void MyLogger::ProcessQueue()
{
CString* queuedMessage = NULL;
while(!sendMessageQueue.try_pop(queuedMessage))
{
if (!running)
{
break;
}
Concurrency::Context::Yield();
}
if (queuedMessage == NULL)
{
return;
}
else
{
socket.Send((LPCTSTR)*queuedMessage, queuedMessage->GetLength());
delete queuedMessage;
}
}
Note that ProcessQueue is run repeatedly by the outer loop thread itself, which excluding a bunch of nonsense preamble:
while(parent->running)
{
try
{
logger->ProcessQueue();
}
catch(...)
{
}
}
The queue is:
Concurrency::concurrent_queue<CString*> sendMessageQueue;
So the effect we're seeing is that the queue is just getting bigger and bigger, log entries are being sent out to the socket but at a much lower rate than they're going in.
Is this a limitation of CSocket::Send that makes it less than useful for us? A mis-use of it? Or an entire red-herring and the problem lies elsewhere?
Your advice is much appreciated.
Kind Regards
Matt Peddlesden
Well, you could start by using a blocking producer-consumer queue and to get rid of the 'Yield'. I'm not surprised that messages get blocked up - when one is posted, the logger thread is typically, on a busy system, ready but not running. This will introduce a lot of avoidable latency before any message on the queue can be processed. The background thread than has a quantum to try an get rid of all the messages that have accumulated on the queue. If there are a lot of ready threads on a busy system, it could well be that the thread just does not sufficient time to handle the messages. especially if a lot have built up and the socket.send blocks.
Also, almost competely wasting one CPU core on queue polling cannot be good for overall performance.
Rgds,
Martin
In my opinion, you're definitely not looking at the most efficient solution. You should definitely call Send() once. For all messages. Concatenate all the messages in the queue on the user side, send them all at once with Send(), then yield.
In addition, this really isn't how you're meant to do it. The PPL contains constructs explicitly intended for asynchronous callbacks- like the call object. You should use that instead of hand-rolling your own.
Things here that might be slowing you up:
The queue you are using. I think this is a classic example of premature optimization. There's no reason here to use the Concurrency::concurrent_queue class and not a regular message queue with a blocking pop() method. If I understand correctly, the Concurrency classes use non-blocking algorithms when in this case you do want to block while the queue is empty and release the CPU for other threads to use.
The use of new and delete for each message and the inner allocations of the CString class. You should try and see if recycling the messages and strings (using a pool) will help performance for two reasons: 1. The allocation and deallocation of the messages and string objects. 2. The allocations and deallocations done inside the strings can maybe be avoided if the string class will internally recycle its buffers.
Have you tried profiling to see where your application is having trouble? Is it only when logging that there are issues with the sender? Is it CPU bound or blocking?
The only thing I can see is that you don't protect the message queue with any sort of locking so the container state could get weird causing all sorts of unexpected behavior.
I'm quite bewildered by the use of message queues in realtime OS. The code that was given seems to have message queues used down to the bone: even passing variables to another class object is done through MQ. I always have a concept of MQ used in IPC. Question is: what is a proper use of a message queue?
In realtime OS environments you often face the problem that you have to guarantee execution of code at a fixed schedule. E.g. you may have a function that gets called exactly each 10 milliseconds. Not earlier, not later.
To guarantee such hard timing constraints you have to write code that must not block the time critical code under any circumstances.
The posix thread synchronization primitives from cannot be used here.
You must never lock a mutex or aqurie a semaphore from time critical code because a different process/thread may already have it locked. However, often you are allowed to unblock some other thread from time critical code (e.g. releasing a semaphore is okay).
In such environments message queues are a nice choice to exchange data because they offer a clean way to pass data from one thread to another without ever blocking.
Using queues to just set variables may sound like overkill, but it is very good software design. If you do it that way you have a well-defined interface to your time critical code.
Also it helps to write deterministic code because you'll never run into the problem of race-conditions. If you set variables via message-queues you can be sure that the time critical code sees the messages in the same order as they have been sent. When mixing direct memory access and messages you can't guarantee this.
Message Queues are predominantly used as an IPC Mechanism, whenever there needs to be exchange of data between two different processes. However, sometimes Message Queues are also used for thread context switching. For eg:
You register some callback with a software layer which sits on top of driver. The callback is returned to you in the context of the driver. It is a thread spawned by the driver. Now you cannot hog this thread of driver by doing a lot of processing in it. So one may add the data returned in callback in a message Queue, which has application threads blocked on it for performing the processing on the data.
I dont see why one should use Message Queues for replacing just normal function calls.
I've been playing with a DataBus-type design for a hobby project, and I ran into an issue. Back-end components need to notify the UI that something has happened. My implementation of the bus delivers the messages synchronously with respect to the sender. In other words, when you call Send(), the method blocks until all the handlers have called. (This allows callers to use stack memory management for event objects.)
However, consider the case where an event handler updates the GUI in response to an event. If the handler is called, and the message sender lives on another thread, then the handler cannot update the GUI due to Win32's GUI elements having thread affinity. More dynamic platforms such as .NET allow you to handle this by calling a special Invoke() method to move the method call (and the arguments) to the UI thread. I'm guessing they use the .NET parking window or the like for these sorts of things.
A morbid curiosity was born: can we do this in C++, even if we limit the scope of the problem? Can we make it nicer than existing solutions? I know Qt does something similar with the moveToThread() function.
By nicer, I'll mention that I'm specifically trying to avoid code of the following form:
if(! this->IsUIThread())
{
Invoke(MainWindowPresenter::OnTracksAdded, e);
return;
}
being at the top of every UI method. This dance was common in WinForms when dealing with this issue. I think this sort of concern should be isolated from the domain-specific code and a wrapper object made to deal with it.
My implementation consists of:
DeferredFunction - functor that stores the target method in a FastDelegate, and deep copies the single event argument. This is the object that is sent across thread boundaries.
UIEventHandler - responsible for dispatching a single event from the bus. When the Execute() method is called, it checks the thread ID. If it does not match the UI thread ID (set at construction time), a DeferredFunction is allocated on the heap with the instance, method, and event argument. A pointer to it is sent to the UI thread via PostThreadMessage().
Finally, a hook function for the thread's message pump is used to call the DeferredFunction and de-allocate it. Alternatively, I can use a message loop filter, since my UI framework (WTL) supports them.
Ultimately, is this a good idea? The whole message hooking thing makes me leery. The intent is certainly noble, but are there are any pitfalls I should know about? Or is there an easier way to do this?
I have been out of the Win32 game for a long time now, but the way we used to achieve this was by using PostMessage to post a windows message back to the UI thread and then handle the call from there, passing the additional info you need in wParam/lParam.
In fact I wouldn't be surprised if that is how .NET handles this in Control.Invoke.
Update: I was currios so I checked with reflector and this is what I found.
Control.Invoke calls MarshaledInvoke which does a bunch of checkes etc. but the interesting calls are to RegisterWindowMessage and PostMessage. So things have not changed that much :)
A little bit of follow-up info:
There are a few ways you can do this, each of which has advantages and disadvantages:
The easiest way is probably the QueueUserAPC() call. APCs are a bit too in-depth to explain, but the only drawback is they may run when you're not ready for them if the thread gets put into an alertable wait state accidently. Because of this, I avoided them. For short applications, this is probably OK.
The second way involves using PostThreadMessage(), as previously mentioned. This is better than QueueUserAPC() in that your callbacks aren't sensitive to the UI thread being in an alertable wait state, but using this API has the problem of your callbacks not being run at all. See Raymond Chen's discussion on this. To get around this, you need to put a hook on the thread's message queue.
The third way involves setting up an invisible, message-only window whose WndProc calls the deferred call, and using PostMessage() for your callback data. Because it is directed at a specific window, the messages won't get eaten in modal UI situations. Also, message-only windows are immune to system message broadcasts (thus preventing message ID collisions). The downside is it requires more code than the other options.