Controlled application shut-down strategy - c++

Our (Windows native C++) app is composed of threaded objects and managers. It is pretty well written, with a design that sees Manager objects controlling the lifecycle of their minions. Various objects dispatch and receive events; some events come from Windows, some are home-grown.
In general, we have to be very aware of thread interoperability so we use hand-rolled synchronization techniques using Win32 critical sections, semaphores and the like. However, occasionally we suffer thread deadlock during shut-down due to things like event handler re-entrancy.
Now I wonder if there is a decent app shut-down strategy we could implement to make this easier to develop for - something like every object registering for a shutdown event from a central controller and changing its execution behaviour accordingly? Is this too naive or brittle?
I would prefer strategies that don't stipulate rewriting the entire app to use Microsoft's Parallel Patterns Library or similar. ;-)
Thanks.
EDIT:
I guess I am asking for an approach to controlling object life cycles in a complex app where many threads and events are firing all the time. Giovanni's suggestion is the obvious one (hand-roll our own), but I am convinced there must be various off-the-shelf strategies or frameworks, for cleanly shutting down active objects in the correct order. For example, if you want to base your C++ app on an IoC paradigm you might use PocoCapsule instead of trying to develop your own container. Is there something similar for controlling object lifecycles in an app?

This seems like a special case of the more general question, "how do I avoid deadlocks in my multithreaded application?"
And the answer to that is, as always: make sure that any time your threads have to acquire more than one lock at a time, that they all acquire the locks in the same order, and make sure all threads release their locks in a finite amount of time. This rule applies just as much at shutdown as at any other time. Nothing less is good enough; nothing more is necessary. (See here for a relevant discussion)
As for how to best do this... the best way (if possible) is to simplify your program as much as you can, and avoid holding more than one lock at a time if you can possibly help it.
If you absolutely must hold more than one lock at a time, you must verify your program to be sure that every thread that holds multiple locks locks them in the same order. Programs like helgrind or Intel thread checker can help with this, but it often comes down to simply eyeballing the code until you've proved to yourself that it satisfies this constraint. Also, if you are able to reproduce the deadlocks easily, you can examine (using a debugger) the stack trace of each deadlocked thread, which will show where the deadlocked threads are forever-blocked at, and with that information, you can that start to figure out where the lock-ordering inconsistencies are in your code. Yes, it's a major pain, but I don't think there is any good way around it (other than avoiding holding multiple locks at once). :(

One possible general strategy would be to send an "I am shutting down" event to every manager, which would cause the managers to do one of three things (depending on how long running your event-handlers are, and how much latency you want between the user initiating shutdown, and the app actually exiting).
1) Stop accepting new events, and run the handlers for all events received before the "I am shutting down" event. To avoid deadlocks you may need to accept events that are critical to the completion of other event handlers. These could be signaled by a flag in the event or the type of the event (for example). If you have such events then you should also consider restructuring your code so that those actions are not performed through event handlers (as dependent events would be prone to deadlocks in ordinary operation too.)
2) Stop accepting new events, and discard all events that were received after the event that the handler is currently running. Similar comments about dependent events apply in this case too.
3) Interrupt the currently running event (with a function similar to boost::thread::interrupt()), and run no further events. This requires your handler code to be exception safe (which it should already be, if you care about resource leaks), and to enter interruption points at fairly regular intervals, but it leads to the minimum latency.
Of course you could mix these three strategies together, depending on the particular latency and data corruption requirements of each of your managers.

As a general method, use an atomic boolean to indicate "i am shutting down", then every thread checks this boolean before acquiring each lock, handling each event etc. Can't give a more detailed answer unless you give us a more detailed question.

Related

Cancelling arbitary jobs running in a thread_pool

Is there a way for a thread-pool to cancel a task underway? Better yet, is there a safe alternative for on-demand cancelling opaque function calls in thread_pools?
Killing the entire process is a bad idea and using native handle to perform pthread_cancel or similar API is a last resort only.
Extra
Bonus if the cancellation is immediate, but it's acceptable if the cancellation has some time constraint 'guarantees' (say cancellation within 0.1 execution seconds of the thread in question for example)
More details
I am not restricted to using Boost.Thread.thread_pool or any specific library. The only limitation is compatibility with C++14, and ability to work on at least BSD and Linux based OS.
The tasks are usually data-processing related, pre-compiled and loaded dynamically using C-API (extern "C") and thus are opaque entities. The aim is to perform compute intensive tasks with an option to cancel them when the user sends interrupts.
While launching, the thread_id for a specific task is known, and thus some API can be sued to find more details if required.
Disclaimer
I know using native thread handles to cancel/exit threads is not recommended and is a sign of bad design. I also can't modify the functions using boost::this_thread::interrupt_point, but can wrap them in lambdas/other constructs if that helps. I feel like this is a rock and hard place situation, so alternate suggestions are welcome, but they need to be minimally intrusive in existing functionality, and can be dramatic in their scope for the feature-set being discussed.
EDIT:
Clarification
I guess this should have gone in the 'More Details' section, but I want it to remain separate to show that existing 2 answers are based o limited information. After reading the answers, I went back to the drawing board and came up with the following "constraints" since the question I posed was overly generic. If I should post a new question, please let me know.
My interface promises a "const" input (functional programming style non-mutable input) by using mutexes/copy-by-value as needed and passing by const& (and expecting thread to behave well).
I also mis-used the term "arbitrary" since the jobs aren't arbitrary (empirically speaking) and have the following constraints:
some which download from "internet" already use a "condition variable"
not violate const correctness
can spawn other threads, but they must not outlast the parent
can use mutex, but those can't exist outside the function body
output is via atomic<shared_ptr> passed as argument
pure functions (no shared state with outside) **
** can be lambda binding a functor, in which case the function needs to makes sure it's data structures aren't corrupted (which is the case as usually, the state is a 1 or 2 atomic<inbuilt-type>). Usually the internal state is queried from an external db (similar architecture like cookie + web-server, and the tab/browser can be closed anytime)
These constraints aren't written down as a contract or anything, but rather I generalized based on the "modules" currently in use. The jobs are arbitrary in terms of what they can do: GPU/CPU/internet all are fair play.
It is infeasible to insert a periodic check because of heavy library usage. The libraries (not owned by us) haven't been designed to periodically check a condition variable since it'd incur a performance penalty for the general case and rewriting the libraries is not possible.
Is there a way for a thread-pool to cancel a task underway?
Not at that level of generality, no, and also not if the task running in the thread is implemented natively and arbitrarily in C or C++. You cannot terminate a running task prior to its completion without terminating its whole thread, except with the cooperation of the task.
Better
yet, is there a safe alternative for on-demand cancelling opaque
function calls in thread_pools?
No. The only way to get (approximately) on-demand preemption of a specific thread is to deliver a signal to it (that is is not blocking or ignoring) via pthread_kill(). If such a signal terminates the thread but not the whole process then it does not automatically make any provision for freeing allocated objects or managing the state of mutexes or other synchronization objects. If the signal does not terminate the thread then the interruption can produce surprising and unwanted effects in code not designed to accommodate such signal usage.
Killing the entire process is a bad idea and using native handle to
perform pthread_cancel or similar API is a last resort only.
Note that pthread_cancel() can be blocked by the thread, and that even when not blocked, its effects may be deferred indefinitely. When the effects do occur, they do not necessarily include memory or synchronization-object cleanup. You need the thread to cooperate with its own cancellation to achieve these.
Just what a thread's cooperation with cancellation looks like depends in part on the details of the cancellation mechanism you choose.
Cancelling a non cooperative, not designed to be cancelled component is only possible if that component has limited, constrained, managed interactions with the rest of the system:
the ressources owned by the components should be managed externally (the system knows which component uses what resources)
all accesses should be indirect
the modifications of shared ressources should be safe and reversible until completion
That would allow the system to clean up resource, stop operations, cancel incomplete changes...
None of these properties are cheap; all the properties of threads are the exact opposite of these properties.
Threads only have an implied concept of ownership apparent in the running thread: for a deleted thread, determining what was owned by the thread is not possible.
Threads access shared objects directly. A thread can start modifications of shared objects; after cancellation, such modifications that would be partial, non effective, incoherent if stopped in the middle of an operation.
Cancelled threads could leave locked mutexes around. At least subsequent accesses to these mutexes by other threads trying to access the shared object would deadlock.
Or they might find some data structure in a bad state.
Providing safe cancellation for arbitrary non cooperative threads is not doable even with very large scale changes to thread synchronization objects. Not even by a complete redesign of the thread primitives.
You would have to make thread almost like full processes to be able to do that; but it wouldn't be called a thread then!

Design and Technical issue in Multi Threaded Application

I wanted to Discuss the Design and technical issue/challenges related with multi threaded application.
Issue I faced
1.I came across the situation where there is multiple thread is using the shared function/variable crash the application, so proper guard is required on that occasion.
2. State Machine and Multi thread-
There are several point one should remember before delve in to the multi thread application.
There can issue related to 1. Memory 2. Handle 3. Socket etc.
please share your experience on the following point
what are the common mistake one do in the multi threaded application
Any specific issue related to multi threaded.
Should we pass data by value or by referen in the thread function.
Well, there are so many...
1) Shared functions/procedures - they are just code and, unless the code modifies itself, there can be no problem. Local variables are no problem because each thread calls on a separate stack, (amost by definition:). Any other data can an issue and may need protection. 99.99% of all household API calls on multiTasking OS are thread-safe, again, almost by definition. Another poster has already warned about thread-local storage...
2) State machines. Can be a little awkward. You can easly lock all the events firing into the SM, so ensuring the integrity of the state, but you must not make blocking calls from inside the SM while it is locked, (might seem obvious, but I have done this.. once :).
I occasionally run state-machines from one thread only, queueing event objects to it. This moves the locking to the input queue and means that the SM is somewhat easier to debug. It also means that the thread running the SM can implement timeouts on an internal delta queue and so itself fire timeout calls to the objects on the delta queue, (classic example: TCP server sockets with connection timeouts - thousands of socket objects that each need an independent timeout).
3) 'Should we pass data by value or by referen in the thread function.'. Not sure what you mean, here. Most OS allow one pointer to be passed on thread creation - do with it what you will. You could pass it an event it should signal on work completion or a queue object upon which it is to wait for work requests. After creation, you need some form of inter-thread comms to send requests and get results, (unless you are going to use the direct 'read/write/waitForExit' mechanism - AV/deadlock/noClose generator).
I usually use a simple semaphore/CS producer-consumer queue to send/receive comms objects between worker threads, and the PostMessage API to send them to a UI thread. Apart from the locking in the queue, I don't often need any more locking. You have to try quite hard to deadlock a threaded system based on message-passing and things like thread pools become trivial - just make [no. of CPU] threads and pass each one the same queue to wait on.
Common mistakes. See the other posters for many, to which I would add:
a) Reading/writing directly to thread fields to pass parameters and return results, (esp. between UI threads and 'worker' threads), ie 'Create thread suspended, load parameters into thread fields, resume thread, wait on thread handle for exit, read results from thread fields, free thread object'. This causes performance hit from continually creating/terminating/destroying threads and often forces the developer to ensure that thread are terminated when exiting an app to prevent AV/216/217 exceptions on close. This can be very tricky, in some cases impossible because a few API's block with no way of unblocking them. If developers would stop this nasty practice, there would be far fewer app close problems.
b) Trying to build multiThreaded apps in a procedural fashion, eg. trying to wait for results from a work thread in a UI event handler. Much safer to build a thread request object, load it with parameters, queue it to a work thread and exit the event handler. The thread can get the object, do work, put results back into the object and, (on Windows, anyway), PostMessage the object back. A UI message-handler can deal with the results and dispose of the object, (or recycle, reuse:). This approach means that, since the UI and worker are always operating on different data that can outlive them both, no locking and, (usually), no need to ensure that the work thread is freed when closing the app, (problems with this are ledgendary).
Rgds,
Martin
The biggest issue people face in multi threading applications are race conditions, deadlocks and not using semaphores of some sort to protect globally accessible variables.
You are facing these problems when using thread locks.
Deadlock
Priority Inversion
Convoying
“Async-signal-safety”
Kill-tolerant availability
Preemption tolerance
Overall performance
If you want to look at more advanced threading techniques you can look at the lock free threading, where many threads work on the same problem in case they are waiting.
Deadlocks, memory corruption (of shared resources) due to lack of proper synchronization, buffer overflow (even that can be occured due to memory corruption), improper usage of thread local storage are the most common things
Also it depends on under which platform and technology you're using to implement the thread. For e.g. in Microsoft Windows, if you use MFC objects, several MFC objects are not really shareable across threads because they're heavily rely on thread local storage (e.g CSocket, CWnd classes etc.)

Using asynchronous method vs thread wait

I have 2 versions of a function which are available in a C++ library which do the same task. One is a synchronous function, and another is of asynchronous type which allows a callback function to be registered.
Which of the below strategies is preferable for giving a better memory and performance optimization?
Call the synchronous function in a worker thread, and use mutex synchronization to wait until I get the result
Do not create a thread, but call the asynchronous version and get the result in callback
I am aware that worker thread creation in option 1 will cause more overhead. I am wanting to know issues related to overhead caused by thread synchronization objects, and how it compares to overhead caused by asynchronous call. Does the asynchronous version of a function internally spin off a thread and use synchronization object, or does it uses some other technique like directly talk to the kernel?
"Profile, don't speculate." (DJB)
The answer to this question depends on too many things, and there is no general answer. The role of the developer is to be able to make these decisions. If you don't know, try the options and measure. In many cases, the difference won't matter and non-performance concerns will dominate.
"Premature optimisation is the root of all evil, say 97% of the time" (DEK)
Update in response to the question edit:
C++ libraries, in general, don't get to use magic to avoid synchronisation primitives. The asynchronous vs. synchronous interfaces are likely to be wrappers around things you would do anyway. Processing must happen in a context, and if completion is to be signalled to another context, a synchronisation primitive will be necessary to do that.
Of course, there might be other considerations. If your C++ library is talking to some piece of hardware that can do processing, things might be different. But you haven't told us about anything like that.
The answer to this question depends on context you haven't given us, including information about the library interface and the structure of your code.
Use asynchronous function because will probably do what you want to do manually with synchronous one but less error prone.
Asynchronous: Will create a thread, do work, when done -> call callback
Synchronous: Create a event to wait for, Create a thread for work, Wait for event, On thread call sync version , transfer result, signal event.
You might consider that threads each have their own environment so they use more memory than a non threaded solution when all other things are equal.
Depending on your threading library there can also be significant overhead to starting and stopping threads.
If you need interprocess synchronization there can also be a lot of pain debugging threaded code.
If you're comfortable writing non threaded code (i.e. you won't burn a lot of time writing and debugging it) then that might be the best choice.

Is there a way to abort an SQLite call?

I'm using SQLite3 in a Windows application. I have the source code (so-called SQLite amalgamation).
Sometimes I have to execute heavy queries. That is, I call sqlite3_step on a prepared statement, and it takes a lot of time to complete (due to the heavy I/O load).
I wonder if there's a possibility to abort such a call. I would also be glad if there was an ability to do some background processing in the middle of the call within the same thread (since most of the time is spent in waiting for the I/O to complete).
I thought about modifying the SQLite code myself. In the simplest scenario I could check some condition (like an abort event handle for instance) before every invocation of either ReadFile/WriteFile, and return an error code appropriately. And in order to allow the background processing the file should be opened in the overlapped mode (this enables asynchronous ReadFile/WriteFile).
Is there a chance that interruption of WriteFile may in some circumstances leave the database in the inconsistent state, even with the journal enabled? I guess not, since the whole idea of the journal file is to be prepared for any error of any kind. But I'd like to hear more opinions about this.
Also, did someone tried something similar?
Thanks in advance.
EDIT:
Thanks to ereOn. I wasn't aware of the existence of sqlite3_interrupt. This probably answers my question.
Now, for all of you who wonders how (and why) one expects to do some background processing during the I/O within the same thread.
Unfortunately not many people are familiar with so-called "Overlapped I/O".
http://en.wikipedia.org/wiki/Overlapped_I/O
Using it one issues an I/O operation asynchronously, and the calling thread is not blocked. Then one receives the I/O completion status using one of the completion mechanisms: waitable event, new routine queued into the APC, or the completion port.
Using this technique one doesn't have to create extra threads. Actually the only real legitimation for creating threads is when your bottleneck is the computation time (i.e. CPU load), and the machine has several CPUs (or cores).
And creating a thread just to let it be blocked by the OS most of the time - this doesn't make sense. This leads to the unjustified waste of the OS resources, complicates the program (need for synchronization and etc.).
Unfortunately not all the libraries/APIs allow asynchronous mode of operation, thus making creating extra threads the necessarily evil.
EDIT2:
I've already found the solution, thansk to ereOn.
For all those who nevertheless insist that it's not worth doing things "in background" while "waiting" for the I/O to complete using overlapped I/O. I disagree, and I think there's no point to argue about this. At least this is not related to the subject.
I'm a Windows programmer (as you may noticed), and I have a very extensive experience in all kinds of multitasking. Plus I'm also a driver writer, so that I also know how things work "behind the scenes".
I know that it's a "common practice" to create several threads to do several things "in parallel". But this doesn't mean that this is a good practice. Please allow me not to follow the "common practice".
I don't understand why you want the interruption to come from the same thread and I even don't understand how that would be possible: if the current thread is blocked, waiting for some IO, you can't execute any other code. (Yeah, that's what "blocked" means)
Perhaps if you give us more hints about why you want this, we might help further.
Usually, I use sqlite3_interrupt() to cancel calls. But this, obviously, involves that the call is made from another thread.
By default, SQLite is threadsafe. It sounds to me like the easiest thing to do would be to start the Sqlite command on a background thread, and let SQLite to the necessary locking to have that work.
From your perspective then, the sqlite call looks like an asynchronous bit of I/O, and you can continue normal processing on this thread, such as e.g. using a loop including interruptible sleep and a bit of occasional background processing (e.g. to update a liveness indicator). When the SQLite statement completes, the background thread should set a state variable to indicate this, wake the main thread (if necessary), and terminate.

More threads, better performance?

When I write a message driven app. much like a standard windows app only that it extensively uses messaging for internal operations, what would be the best approach regarding to threading?
As I see it, there are basically three approaches (if you have any other setup in mind, please share):
Having a single thread process all of the messages.
Having separate threads for separate message types (General, UI, Networking, etc...)
Having multiple threads that share and process a single message queue.
So, would there be any significant performance differences between the three?
Here are some general thoughts:
Obviously, the last two options benefit from a situation where there's more than one processor. Plus, if any thread is waiting for an external event, other threads can still process unrelated messages. But ignoring that, seems that multiple threads only add overhead (Thread switches, not to mention more complicated sync situations).
And another question: Would you recommend to implement such a system upon the standard Windows messaging system, or to implement a separate queue mechanism, and why?
The specific choice of threading model should be driven by the nature of the problem you are trying to solve. There isn't necessarily a single "correct" approach to designing the threading model for such an application. However, if we adopt the following assumptions:
messages arrive frequently
messages are independent and don't rely too heavily on shared resources
it is desirable to respond to an arriving message as quickly as possible
you want the app to scale well across processing architectures (i.e. multicode/multi-cpu systems)
scalability is the key design requirement (e.g. more message at a faster rate)
resilience to thread failure / long operations is desirable
In my experience, the most effective threading architecture would be to employ a thread pool. All messages arrive on a single queue, multiple threads wait on the queue and process messages as they arrive. A thread pool implementation can model all three thread-distribution examples you have.
#1 Single thread processes all messages => thread pool with only one thread
#2 Thread per N message types => thread pool with N threads, each thread peeks at the queue to find appropriate message types
#3 Multiple threads for all messages => thread pool with multiple threads
The benefits of this design is that you can scale the number of threads in the thread in proportion to the processing environment or the message load. The number of threads can even scale at runtime to adapt to the realtime message load being experienced.
There are many good thread pooling libraries available for most platforms, including .NET, C++/STL, Java, etc.
As to your second question, whether to use standard windows message dispatch mechanism. This mechanism comes with significant overhead and is really only intended for pumping messages through an windows application's UI loop. Unless this is the problem you are trying to solve, I would advise against using it as a general message dispatching solution. Furthermore, windows messages carry very little data - it is not an object-based model. Each windows message has a code, and a 32-bit parameter. This may not be enough to base a clean messaging model on. Finally, the windows message queue is not design to handle cases like queue saturation, thread starvation, or message re-queuing; these are cases that often arise in implementing a decent message queing solution.
We can't tell you much for sure without knowing the workload (ie, the statistical distribution of events over time) but in general
single queue with multiple servers is at least as fast, and usually faster, so 1,3 would be preferable to 2.
multiple threads in most languages add complexity because of the need to avoid contention and multiple-writer problems
long duration processes can block processing for other things that could get done quicker.
So horseback guess is that having a single event queue, with several server threads taking events off the queue, might be a little faster.
Make sure you use a thread-safe data structure for the queue.
It all depends.
For example:
Events in a GUI queue are best done by a single thread as there is an implied order in the events thus they need to be done serially. Which is why most GUI apps have a single thread to handle events, though potentially multiple events to create them (and it does not preclude the event thread from creating a job and handling it off to a worker pool (see below)).
Events on a socket can potentially by done in parallel (assuming HTTP) as each request is stateless and can thus by done independently (OK I know that is over simplifying HTTP).
Work Jobs were each job is independent and placed on queue. This is the classic case of using a set of worker threads. Each thread does a potentially long operation independently of the other threads. On completion comes back to the queue for another job.
In general, don't worry about the overhead of threads. It's not going to be an issue if you're talking about merely a handful of them. Race conditions, deadlocks, and contention are a bigger concern, and if you don't know what I'm talking about, you have a lot of reading to do before you tackle this.
I'd go with option 3, using whatever abstractions my language of choice offers.
Note that there are two different performance goals, and you haven't stated which you are targetting: throughput and responsiveness.
If you're writing a GUI app, the UI needs to be responsive. You don't care how many clicks per second you can process, but you do care about showing some response within a 10th of a second or so (ideally less). This is one of the reasons it's best to have a single thread devoted to handling the GUI (other reasons have been mentioned in other answers). The GUI thread needs to basically convert windows messages into work-items and let your worker queue handle the heavy work. Once the worker is done, it notifies the GUI thread, which then updates the display to reflect any changes. It does things like painting a window, but not rendering the data to be displayed. This gives the app a quick "snapiness" that is what most users want when they talk about performance. They don't care if it takes 15 seconds to do something hard, as long as when they click on a button or a menu, it reacts instantly.
The other performance characteristic is throughput. This is the number of jobs you can process in a specific amount of time. Usually this type of performance tuning is only needed on server type applications, or other heavy-duty processing. This measures how many webpages can be served up in an hour, or how long it takes to render a DVD. For these sort of jobs, you want to have 1 active thread per CPU. Fewer than that, and you're going to be wasting idle clock cycles. More than that, and the threads will be competing for CPU time and tripping over each other. Take a look at the second graph in this article DDJ articles for the trade-off you're dealing with. Note that the ideal thread count is higher than the number of available CPUs due to things like blocking and locking. The key is the number of active threads.
A good place to start is to ask yourself why you need multiple threads.
The well-thought-out answer to this question will lead you to the best answer to the subsequent question, "how should I use multiple threads in my application?"
And that must be a subsequent question; not a primary question. The fist question must be why, not how.
I think it depends on how long each thread will be running. Does each message take the same amount of time to process? Or will certain messages take a few seconds for example. If I knew that Message A was going to take 10 seconds to complete I would definitely use a new thread because why would I want to hold up the queue for a long running thread...
My 2 cents.
I think option 2 is the best. Having each thread doing independant tasks would give you best results. 3rd approach can cause more delays if multiple threads are doing some I/O operation like disk reads, reading common sockets and so on.
Whether to use Windows messaging framework for processing requests depends on the work load each thread would have. I think windows restricts the no. of messages that can be queued at the most to 10000. For most of the cases this should not be an issue. But if you have lots of messages to be queued this might be some thing to take into consideration.
Seperate queue gives a better control in a sense that you may reorder it the way you want (may be depending on priority)
Yes, there will be performance differences between your choices.
(1) introduces a bottle-neck for message processing
(3) introduces locking contention because you'll need to synchronize access to your shared queue.
(2) is starting to go in the right direction... though a queue for each message type is a little extreme. I'd probably recommend starting with a queue for each model in your app and adding queues where it makes since to do so for improved performance.
If you like option #2, it sounds like you would be interested in implementing a SEDA architecture. It is going to take some reading to understand what is going on, but I think the architecture fits well with your line of thinking.
BTW, Yield is a good C++/Python hybrid implementation.
I'd have a thread pool servicing the message queue, and make the number of threads in the pool easily configurable (perhaps even at runtime). Then test it out with expected load.
That way you can see what the actual correlation is - and if your initial assumptions change, you can easily change your approach.
A more sophisticated approach would be for the system to introspect its own performance traits and adapt it's use of resources, threads in particular, as it goes. Probably overkill for most custom application code, but I'm sure there are products that do that out there.
As for the windows events question - I think that's probably an application specific question that there is no right or wrong answer to in the general case. That said, I usually implement my own queue as I can tailor it to the specific characteristics of the task at hand. Sometimes that might involve routing events via the windows message queue.